CVPR2022将于6月22日召开🎉🎉🎉，本次会议共收录了2067篇论文。由于数量较多，本文将分四个子文章呈现，可直接点击论文标题获取文档。
📃第一部分, 📃第三部分, 📃 第四部分。

【CVPR2022】论文列表与下载——PartTwo

2. Part Two

Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints To Better Classify Objects in Videos [supp]

Learning Canonical F-Correlation Projection for Compact Multiview Representation [supp]

DIFNet: Boosting Visual Information Flow for Image Captioning

Weakly Supervised Object Localization As Domain Adaption [supp]

Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation

Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation [supp]

Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching [supp]

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation [supp]

Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error [supp]

MatteFormer: Transformer-Based Image Matting via Prior-Tokens [supp]

Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training [supp]

Ranking Distance Calibration for Cross-Domain Few-Shot Learning [supp]

Robust and Accurate Superquadric Recovery: A Probabilistic Approach [supp]

Zero-Shot Text-Guided Object Generation With Dream Fields [supp]

Learning Pixel Trajectories With Multiscale Contrastive Random Walks

Self-Supervised Correlation Mining Network for Person Image Generation

Grounding Answers for Visual Questions Asked by Visually Impaired People [supp]

Task Adaptive Parameter Sharing for Multi-Task Learning [supp]

Sparse Instance Activation for Real-Time Instance Segmentation

Automatic Color Image Stitching Using Quaternion Rank-1 Alignment [supp]

VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning [supp]

ESCNet: Gaze Target Detection With the Understanding of 3D Scenes [supp]

Can You Spot the Chameleon? Adversarially Camouflaging Images From Co-Salient Object Detection

Finding Badly Drawn Bunnies [supp]

Point2Cyl: Reverse Engineering 3D Objects From Point Clouds to Extrusion Cylinders [supp]

All-Photon Polarimetric Time-of-Flight Imaging [supp]

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation [supp]

Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis [supp]

Learning From Temporal Gradient for Semi-Supervised Action Recognition [supp]

Towards Implicit Text-Guided 3D Shape Generation [supp]

Audio-Driven Neural Gesture Reenactment With Video Motion Graphs [supp]

SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage [supp]

Transforming Model Prediction for Tracking [supp]

A Unified Framework for Implicit Sinkhorn Differentiation [supp]

DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation [supp]

Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs With Language Structures via Dependency Relationships

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling [supp]

Locality-Aware Inter- and Intra-Video Reconstruction for Self-Supervised Correspondence Learning [supp]

A Versatile Multi-View Framework for LiDAR-Based 3D Object Detection With Guidance From Panoptic Segmentation [supp]

Query and Attention Augmentation for Knowledge-Based Explainable Reasoning [supp]

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality [supp]

RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion [supp]

Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection [supp]

Interactron: Embodied Adaptive Object Detection [supp]

3D Scene Painting via Semantic Image Synthesis [supp]

MeMOT: Multi-Object Tracking With Memory

Revisiting Weakly Supervised Pre-Training of Visual Perception Models [supp]

Semi-Supervised Semantic Segmentation With Error Localization Network

Meta Convolutional Neural Networks for Single Domain Generalization [supp]

Generalizing Gaze Estimation With Rotation Consistency

Anomaly Detection via Reverse Distillation From One-Class Embedding [supp]

Fine-Grained Object Classification via Self-Supervised Pose Alignment [supp]

Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction [supp]

CellTypeGraph: A New Geometric Computer Vision Benchmark [supp]

Clustering Plotted Data by Image Segmentation

Accelerating Neural Network Optimization Through an Automated Control Theory Lens [supp]

Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding [supp]

Learning To Learn Across Diverse Data Biases in Deep Face Recognition [supp]

Back to Reality: Weakly-Supervised 3D Object Detection With Shape-Guided Label Enhancement [supp]

Long-Tail Recognition via Compositional Knowledge Transfer [supp]

EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval [supp]

Multi-Dimensional, Nuanced and Subjective – Measuring the Perception of Facial Expressions [supp]

PyMiceTracking: An Open-Source Toolbox for Real-Time Behavioral Neuroscience Experiments

Self-Taught Metric Learning Without Labels [supp]

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition [supp]

Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization

Embracing Single Stride 3D Object Detector With Sparse Transformer [supp]

Multidimensional Belief Quantification for Label-Efficient Meta-Learning [supp]

UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog

Relieving Long-Tailed Instance Segmentation via Pairwise Class Balance [supp]

Online Convolutional Re-Parameterization [supp]

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning [supp]

RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding [supp]

RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition [supp]

HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks

RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior [supp]

Smooth Maximum Unit: Smooth Activation Function for Deep Networks Using Smoothing Maximum Technique [supp]

Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography [supp]

Personalized Image Aesthetics Assessment With Rich Attributes

Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data [supp]

Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification [supp]

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation [supp]

HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging

OW-DETR: Open-World Detection Transformer [supp]

Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds [supp]

Reversible Vision Transformers [supp]

Amodal Panoptic Segmentation [supp]

Gravitationally Lensed Black Hole Emission Tomography [supp]

3D-Aware Image Synthesis via Learning Structural and Textural Representations [supp]

Text-to-Image Synthesis Based on Object-Guided Joint-Decoding Transformer [supp]

Correlation Verification for Image Retrieval [supp]

Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment [supp]

Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-Robust Makeup Transfer [supp]

PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning [supp]

Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning

Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation

Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing

Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut [supp]

Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection [supp]

Towards Robust Adaptive Object Detection Under Noisy Annotations [supp]

Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer [supp]

Learning To Memorize Feature Hallucination for One-Shot Image Generation

AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis

Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation [supp]

Glass: Geometric Latent Augmentation for Shape Spaces

COAP: Compositional Articulated Occupancy of People [supp]

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions With Superior OOD Generalization [supp]

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities [supp]

Deterministic Point Cloud Registration via Novel Transformation Decomposition [supp]

Motion-Adjustable Neural Implicit Video Representation

Neural Prior for Trajectory Estimation [supp]

DPICT: Deep Progressive Image Compression Using Trit-Planes [supp]

Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation [supp]

Long-Tailed Recognition via Weight Balancing [supp]

Text to Image Generation With Semantic-Spatial Aware GAN

The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization [supp]

ShapeFormer: Transformer-Based Shape Completion via Sparse Representation [supp]

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures [supp]

Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation [supp]

Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization [supp]

Learning Optical Flow With Kernel Patch Attention

Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model [supp]

TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation [supp]

General Incremental Learning With Domain-Aware Categorical Representations [supp]

Interactive Segmentation and Visualization for Tiny Objects in Multi-Megapixel Images

ActiveZero: Mixed Domain Learning for Active Stereovision With Zero Annotation [supp]

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers [supp]

Global-Aware Registration of Less-Overlap RGB-D Scans [supp]

RayMVSNet: Learning Ray-Based 1D Implicit Fields for Accurate Multi-View Stereo [supp]

ContrastMask: Contrastive Learning To Segment Every Thing [supp]

Efficient Deep Embedded Subspace Clustering [supp]

Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture [supp]

Revisiting Temporal Alignment for Video Restoration [supp]

Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning [supp]

Neural Reflectance for Shape Recovery With Shadow Handling [supp]

Rep-Net: Efficient On-Device Learning via Feature Reprogramming [supp]

Surface Representation for Point Clouds [supp]

Implicit Motion Handling for Video Camouflaged Object Detection [supp]

OVE6D: Object Viewpoint Encoding for Depth-Based 6D Object Pose Estimation [supp]

DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides

Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer [supp]

WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery [supp]

Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification [supp]

Optical Flow Estimation for Spiking Camera [supp]

MetaFormer Is Actually What You Need for Vision [supp]

GradViT: Gradient Inversion of Vision Transformers [supp]

Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning

InstaFormer: Instance-Aware Image-to-Image Translation With Transformer [supp]

Revisiting Near/Remote Sensing With Geospatial Attention [supp]

Joint Global and Local Hierarchical Priors for Learned Image Compression [supp]

Knowledge Distillation via the Target-Aware Transformer [supp]

Recurring the Transformer for Video Action Recognition [supp]

Subspace Adversarial Training [supp]

3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection [supp]

Image Segmentation Using Text and Image Prompts [supp]

AutoMine: An Unmanned Mine Dataset [supp]

Neural Data-Dependent Transform for Learned Image Compression [supp]

Background Activation Suppression for Weakly Supervised Object Localization [supp]

How Many Observations Are Enough? Knowledge Distillation for Trajectory Forecasting [supp]

Evaluation-Oriented Knowledge Distillation for Deep Face Recognition

Improving Subgraph Recognition With Variational Graph Information Bottleneck

Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation [supp]

Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos [supp]

Efficient Video Instance Segmentation via Tracklet Query and Proposal [supp]

Synthetic Generation of Face Videos With Plethysmograph Physiology

TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting [supp]

Hallucinated Neural Radiance Fields in the Wild [supp]

NeuralHDHair: Automatic High-Fidelity Hair Modeling From a Single Image Using Implicit Neural Representations [supp]

The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization [supp]

Global Tracking Transformers

Backdoor Attacks on Self-Supervised Learning [supp]

Multimodal Token Fusion for Vision Transformers [supp]

Exploring Frequency Adversarial Attacks for Face Forgery Detection

GMFlow: Learning Optical Flow via Global Matching [supp]

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation [supp]

FLAVA: A Foundational Language and Vision Alignment Model [supp]

Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production [supp]

Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

OCSampler: Compressing Videos to One Clip With Single-Step Sampling [supp]

Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning

Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction

Scanline Homographies for Rolling-Shutter Plane Absolute Pose [supp]

TableFormer: Table Structure Understanding With Transformers [supp]

Exemplar-Based Pattern Synthesis With Implicit Periodic Field Network

Grounded Language-Image Pre-Training [supp]

Spectral Unsupervised Domain Adaptation for Visual Recognition [supp]

AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement [supp]

PatchFormer: An Efficient Point Transformer With Patch Attention

Recurrent Glimpse-Based Decoder for Detection With Transformer [supp]

Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction To Treat Diabetic Foot Ulcers [supp]

SimMIM: A Simple Framework for Masked Image Modeling [supp]

OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion [supp]

Label Matching Semi-Supervised Object Detection [supp]

RegionCLIP: Region-Based Language-Image Pretraining [supp]

Video Frame Interpolation Transformer

An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation [supp]

Fast Light-Weight Near-Field Photometric Stereo [supp]

BCOT: A Markerless High-Precision 3D Object Tracking Benchmark [supp]

Omni-DETR: Omni-Supervised Object Detection With Transformers [supp]

Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching [supp]

High-Resolution Image Synthesis With Latent Diffusion Models [supp]

Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations

Transferable Sparse Adversarial Attack

CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping

Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos [supp]

APRIL: Finding the Achilles’ Heel on Privacy for Vision Transformers [supp]

Text Spotting Transformers

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields [supp]

VALHALLA: Visual Hallucination for Machine Translation [supp]

StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation [supp]

Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment [supp]

GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras [supp]

HINT: Hierarchical Neuron Concept Explainer [supp]

Capturing and Inferring Dense Full-Body Human-Scene Contact [supp]

Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions [supp]

Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark To Fuse Infrared and Visible for Object Detection [supp]

En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning [supp]

Neural Face Identification in a 2D Wireframe Projection of a Manifold Object [supp]

LC-FDNet: Learned Lossless Image Compression With Frequency Decomposition Network [supp]

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation [supp]

Deep Rectangling for Image Stitching: A Learning Baseline [supp]

PCL: Proxy-Based Contrastive Learning for Domain Generalization [supp]

SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation With Learnt Surface Embeddings

Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation [supp]

Learning 3D Object Shape and Layout Without 3D Supervision [supp]

An Empirical Study of End-to-End Temporal Action Detection [supp]

SimVP: Simpler Yet Better Video Prediction [supp]

Object Localization Under Single Coarse Point Supervision [supp]

Unsupervised Learning of Accurate Siamese Tracking [supp]

Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection [supp]

Brain-Supervised Image Editing [supp]

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces [supp]

Unified Transformer Tracker for Object Tracking [supp]

Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo [supp]

Equalized Focal Loss for Dense Long-Tailed Object Detection [supp]

Generating High Fidelity Data From Low-Density Regions Using Diffusion Models [supp]

DeepDPM: Deep Clustering With an Unknown Number of Clusters [supp]

Spiking Transformers for Event-Based Single Object Tracking [supp]

FocalClick: Towards Practical Interactive Image Segmentation

ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation [supp]

Unsupervised Domain Adaptation for Nighttime Aerial Tracking [supp]

Balanced Multimodal Learning via On-the-Fly Gradient Modulation [supp]

RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs [supp]

Understanding Uncertainty Maps in Vision With Statistical Testing [supp]

CAFE: Learning To Condense Dataset by Aligning Features

Causality Inspired Representation Learning for Domain Generalization [supp]

Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction

A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration

Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency

PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation [supp]

Block-NeRF: Scalable Large Scene Neural View Synthesis [supp]

Coupling Vision and Proprioception for Navigation of Legged Robots [supp]

Fine-Grained Predicates Learning for Scene Graph Generation

Generalized Few-Shot Semantic Segmentation [supp]

Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation [supp]

Neural Head Avatars From Monocular RGB Videos [supp]

B-Cos Networks: Alignment Is All We Need for Interpretability [supp]

EMOCA: Emotion Driven Monocular Face Capture and Animation [supp]

Burst Image Restoration and Enhancement [supp]

What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors [supp]

Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis [supp]

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [supp]

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis [supp]

Localized Adversarial Domain Generalization [supp]

X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning [supp]

How Much Does Input Data Type Impact Final Face Model Accuracy? [supp]

Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data [supp]

HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video [supp]

PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound

Which Images To Label for Few-Shot Medical Landmark Detection?

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis [supp]

Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention [supp]

AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation [supp]

Self-Distillation From the Last Mini-Batch for Consistency Regularization

Interactive Multi-Class Tiny-Object Detection [supp]

Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection [supp]

UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection [supp]

Multi-View Depth Estimation by Fusing Single-View Depth Probability With Multi-View Geometry [supp]

Learning To Collaborate in Decentralized Learning of Personalized Models

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields [supp]

ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation [supp]

Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields [supp]

360-Attack: Distortion-Aware Perturbations From Perspective-Views

Targeted Supervised Contrastive Learning for Long-Tailed Recognition [supp]

Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding

Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition [supp]

Balanced Contrastive Learning for Long-Tailed Visual Recognition [supp]

Slimmable Domain Adaptation [supp]

Bandits for Structure Perturbation-Based Black-Box Attacks To Graph Neural Networks With Theoretical Guarantees

NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration [supp]

DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow [supp]

Few-Shot Object Detection With Fully Cross-Transformer [supp]

Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation

Decoupling Makes Weakly Supervised Local Feature Better [supp]

Cross-Architecture Self-Supervised Video Representation Learning

High-Resolution Image Harmonization via Collaborative Dual Transformations [supp]

Homography Loss for Monocular 3D Object Detection

A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors [supp]

Dynamic Sparse R-CNN

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation [supp]

Stable Long-Term Recurrent Video Super-Resolution [supp]

Dual-Generator Face Reenactment

Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence

Self-Supervised Neural Articulated Shape and Appearance Models [supp]

A Hybrid Quantum-Classical Algorithm for Robust Fitting [supp]

Topology Preserving Local Road Network Estimation From Single Onboard Camera Image [supp]

Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes [supp]

Human Instance Matting via Mutual Guidance and Multi-Instance Refinement [supp]

TCTrack: Temporal Contexts for Aerial Tracking [supp]

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing [supp]

GAN-Supervised Dense Visual Alignment [supp]

SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition [supp]

Multi-Level Feature Learning for Contrastive Multi-View Clustering

RendNet: Unified 2D/3D Recognizer With Latent Space Rendering

iPLAN: Interactive and Procedural Layout Planning [supp]

Video Frame Interpolation With Transformer [supp]

GIFS: Neural Implicit Function for General Shape Representation [supp]

Deblur-NeRF: Neural Radiance Fields From Blurry Images [supp]

Egocentric Prediction of Action Target in 3D [supp]

TemporalUV: Capturing Loose Clothing With Temporally Coherent UV Coordinates [supp]

Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction

DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering [supp]

Towards Real-World Navigation With Deep Differentiable Planners [supp]

An Iterative Quantum Approach for Transformation Estimation From Point Sets [supp]

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation [supp]

UnweaveNet: Unweaving Activity Stories [supp]

Balanced MSE for Imbalanced Visual Regression [supp]

Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [supp]

PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer

Dimension Embeddings for Monocular 3D Object Detection

Look Closer To Supervise Better: One-Shot Font Generation via Component-Based Discriminator [supp]

NeRFReN: Neural Radiance Fields With Reflections [supp]

Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel [supp]

Finding Good Configurations of Planar Primitives in Unorganized Point Clouds [supp]

PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images [supp]

SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization [supp]

Beyond Fixation: Dynamic Window Visual Transformer

Progressive End-to-End Object Detection in Crowded Scenes [supp]

FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification [supp]

Improving GAN Equilibrium by Raising Spatial Awareness [supp]

Neural Convolutional Surfaces [supp]

HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet [supp]

A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes [supp]

ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes [supp]

Source-Free Domain Adaptation via Distribution Estimation [supp]

Robust Combination of Distributed Gradients Under Adversarial Perturbations [supp]

Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network

VisCUIT: Visual Auditor for Bias in CNN Image Classifier

Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis [supp]

Transferability Estimation Using Bhattacharyya Class Separability [supp]

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Hierarchical Self-Supervised Representation Learning for Movie Understanding

Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality

Does Robustness on ImageNet Transfer to Downstream Tasks? [supp]

Propagation Regularizer for Semi-Supervised Learning With Extremely Scarce Labeled Samples [supp]

Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory [supp]

Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations [supp]

Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection [supp]

Proto2Proto: Can You Recognize the Car, the Way I Do? [supp]

Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation [supp]

Learning Video Representations of Human Motion From Synthetic Data [supp]

TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing

Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

FS6D: Few-Shot 6D Pose Estimation of Novel Objects [supp]

Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale [supp]

The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions [supp]

Vision-Language Pre-Training for Boosting Scene Text Detectors

Reflection and Rotation Symmetry Detection via Equivariant Learning [supp]

BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation

Simple but Effective: CLIP Embeddings for Embodied AI [supp]

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition [supp]

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

Collaborative Transformers for Grounded Situation Recognition [supp]

DyRep: Bootstrapping Training With Dynamic Re-Parameterization [supp]

Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection [supp]

CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild [supp]

Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition [supp]

Interactive Disentanglement: Learning Concepts by Interacting With Their Prototype Representations [supp]

CDGNet: Class Distribution Guided Network for Human Parsing [supp]

Recall@k Surrogate Loss With Large Batches and Similarity Mixup [supp]

Direct Voxel Grid Optimization: Super-Fast Convergence for Radiance Fields Reconstruction [supp]

Continual Test-Time Domain Adaptation [supp]

URetinex-Net: Retinex-Based Deep Unfolding Network for Low-Light Image Enhancement [supp]

Towards Multi-Domain Single Image Dehazing via Test-Time Training

Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks [supp]

Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase [supp]

Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information [supp]

HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network [supp]

ScanQA: 3D Question Answering for Spatial Scene Understanding [supp]

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering [supp]

Class-Incremental Learning by Knowledge Distillation With Adaptive Feature Consolidation [supp]

Learning Program Representations for Food Images and Cooking Recipes

Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport [supp]

Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering [supp]

Federated Learning With Position-Aware Neurons [supp]

Fair Contrastive Learning for Facial Attribute Classification [supp]

MDAN: Multi-Level Dependent Attention Network for Visual Emotion Analysis

Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design [supp]

BNUDC: A Two-Branched Deep Neural Network for Restoring Images From Under-Display Cameras [supp]

RGB-Depth Fusion GAN for Indoor Depth Completion [supp]

Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer

RCL: Recurrent Continuous Localization for Temporal Action Detection [supp]

C2SLR: Consistency-Enhanced Continuous Sign Language Recognition [supp]

Human Trajectory Prediction With Momentary Observation

FoggyStereo: Stereo Matching With Fog Volume Representation [supp]

Trajectory Optimization for Physics-Based Reconstruction of 3D Human Pose From Monocular Video [supp]

Directional Self-Supervised Learning for Heavy Image Augmentations [supp]

Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation [supp]

No-Reference Point Cloud Quality Assessment via Domain Adaptation

Generating Representative Samples for Few-Shot Classification [supp]

Comprehending and Ordering Semantics for Image Captioning

Dynamic Scene Graph Generation via Anticipatory Pre-Training

A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection [supp]

GaTector: A Unified Framework for Gaze Object Prediction [supp]

ELIC: Efficient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding [supp]

CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows [supp]

LaTr: Layout-Aware Transformer for Scene-Text VQA [supp]

Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification [supp]

ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks [supp]

Enhancing Face Recognition With Self-Supervised 3D Reconstruction

HeadNeRF: A Real-Time NeRF-Based Parametric Head Model

FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction [supp]

Reduce Information Loss in Transformers for Pluralistic Image Inpainting [supp]

Replacing Labeled Real-Image Datasets With Auto-Generated Contours

Cross-Modal Transferable Adversarial Attacks From Images to Videos [supp]

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection [supp]

Do Explanations Explain? Model Knows Best [supp]

WebQA: Multihop and Multimodal QA [supp]

Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture [supp]

BasicVSR++: Improving Video Super-Resolution With Enhanced Propagation and Alignment [supp]

IDR: Self-Supervised Image Denoising via Iterative Data Refinement [supp]

MogFace: Towards a Deeper Appreciation on Face Detection [supp]

GuideFormer: Transformers for Image Guided Depth Completion [supp]

Multi-Label Iterated Learning for Image Classification With Label Ambiguity [supp]

Region-Aware Face Swapping

Towards Language-Free Training for Text-to-Image Generation [supp]

Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers [supp]

Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees [supp]

Physical Simulation Layer for Accurate 3D Modeling [supp]

Deformable Sprites for Unsupervised Video Decomposition [supp]

CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation [supp]

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos [supp]

Learning To Detect Mobile Objects From LiDAR Scans Without Labels [supp]

BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural Volume Fusion [supp]

Probabilistic Representations for Video Contrastive Learning [supp]

EnvEdit: Environment Editing for Vision-and-Language Navigation [supp]

Omnivore: A Single Model for Many Visual Modalities [supp]

Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors

Reflash Dropout in Image Super-Resolution [supp]

WildNet: Learning Domain Generalized Semantic Segmentation From the Wild [supp]

Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage [supp]

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

DECORE: Deep Compression With Reinforcement Learning

Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving [supp]

MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection [supp]

Task Discrepancy Maximization for Fine-Grained Few-Shot Classification [supp]

FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction [supp]

Efficient Classification of Very Large Images With Tiny Objects [supp]

SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization [supp]

Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation [supp]

Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers [supp]

Generating Diverse 3D Reconstructions From a Single Occluded Face Image [supp]

RBGNet: Ray-Based Grouping for 3D Object Detection [supp]

Stand-Alone Inter-Frame Attention in Video Models

Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [supp]

Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources [supp]

Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening

Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer [supp]

Large-Scale Pre-Training for Person Re-Identification With Noisy Labels [supp]

Adiabatic Quantum Computing for Multi Object Tracking [supp]

Feature Erasing and Diffusion Network for Occluded Person Re-Identification

Is Mapping Necessary for Realistic PointGoal Navigation? [supp]

Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification

Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting [supp]

Masked Feature Prediction for Self-Supervised Visual Pre-Training [supp]

Critical Regularizations for Neural Surface Reconstruction in the Wild [supp]

EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning [supp]

Object-Relation Reasoning Graph for Action Recognition

Semantic Segmentation by Early Region Proxy [supp]

GIQE: Generic Image Quality Enhancement via Nth Order Iterative Degradation [supp]

Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers

FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset [supp]

Bring Evanescent Representations to Life in Lifelong Class Incremental Learning [supp]

Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures With Uncalibrated Stereo Data [supp]

LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition [supp]

SimVQA: Exploring Simulated Environments for Visual Question Answering [supp]

Thin-Plate Spline Motion Model for Image Animation [supp]

Learning Local Displacements for Point Cloud Completion [supp]

Human Hands As Probes for Interactive Object Understanding [supp]

Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training [supp]

Certified Patch Robustness via Smoothed Vision Transformers [supp]

Look Back and Forth: Video Super-Resolution With Explicit Temporal Difference Modeling

UCC: Uncertainty Guided Cross-Head Co-Training for Semi-Supervised Semantic Segmentation

HVH: Learning a Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture [supp]

RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising [supp]

Rethinking Visual Geo-Localization for Large-Scale Applications [supp]

Learning Based Multi-Modality Image and Video Compression [supp]

文章出处登录后可见！

已经登录？立即刷新

【CVPR2022】论文列表与下载——PartTwo

2. Part Two

相关推荐