  1. ADOP: Approximate Differentiable One-Pixel Point Rendering — Rückert et al — https://paperswithcode.com/paper/adop-approximate-differentiable-one-pixel
  2. The Bayesian Learning Rule —Khan et al https://paperswithcode.com/paper/the-bayesian-learning-rule
  3. Program Synthesis with Large Language Models — Austin et al https://paperswithcode.com/paper/program-synthesis-with-large-language-models
  4. Masked Autoencoders Are Scalable Vision Learners — He et al https://paperswithcode.com/paper/masked-autoencoders-are-scalable-vision
  5. 8-bit Optimizers via Block-wise Quantization — Dettmers et al https://paperswithcode.com/paper/8-bit-optimizers-via-block-wise-quantization
  6. Revisiting ResNets: Improved Training and Scaling Strategies — Bello et al https://paperswithcode.com/paper/revisiting-resnets-improved-training-and
  7. Image Super-Resolution via Iterative Refinement — Saharia et al https://paperswithcode.com/paper/image-super-resolution-via-iterative
  8. Perceiver IO: A General Architecture for Structured Inputs & Outputs — Jaegle et al https://paperswithcode.com/paper/perceiver-io-a-general-architecture-for
  9. Do Vision Transformers See Like Convolutional Neural Networks? — Raghu et al https://paperswithcode.com/paper/do-vision-transformers-see-like-convolutional
  10. Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions — Niepert et al https://paperswithcode.com/paper/implicit-mle-backpropagating-through-discrete
  1. PyTorch Image Models — Ross Wightman — https://github.com/rwightman/pytorch-image-models
  2. Transformers — Hugging Face — https://github.com/huggingface/transformers
  3. PyTorch-GAN — Erik Linder-Norén — https://github.com/eriklindernoren/PyTorch-GAN
  4. MMDetection — OpenMMLab — https://github.com/open-mmlab/mmdetection
  5. Darknet — AlexeyAB — https://github.com/AlexeyAB/darknet
  6. Vision Transformer PyTorch — lucidrains — https://github.com/lucidrains/vit-pytorch
  7. InsightFace — DeepInsight — https://github.com/deepinsight/insightface
  8. Detectron2 — Meta AI — https://github.com/facebookresearch/detectron2
  9. PaddleOCR — PaddlePaddle — https://github.com/PaddlePaddle/PaddleOCR
  10. FairSeq — Meta AI — https://github.com/pytorch/fairseq

Top Dataset - 2021

  1. MATH — Hendrycks et al https://paperswithcode.com/dataset/math
  2. UAV-Human — Li et al https://paperswithcode.com/dataset/uav-human
  3. UPFD (User Preference-aware Fake News Detection) — Dou et al https://paperswithcode.com/dataset/upfd
  4. OGB-LSC (OGB Large-Scale Challenge) — Hu et al https://paperswithcode.com/dataset/ogb-lsc
  5. CodeXGLUE —Lu et al https://paperswithcode.com/dataset/codexglue
  6. AGORA — Patel et al https://paperswithcode.com/dataset/agora
  7. BEIR (Benchmarking IR) — Thakur et al https://paperswithcode.com/dataset/beir
  8. WikiGraphs — Wang et al https://paperswithcode.com/dataset/wikigraphs
  9. Few-NERD — Ding et al https://paperswithcode.com/dataset/few-nerd
  10. PASS (Pictures without humAns for Self-Supervision) —Asano et al https://paperswithcode.com/dataset/pass

Papers of 2022

  1. Controllable Animation of Fluid Elements in Still Images
  2. F-SfT: Shape-From-Template With A Physics-Based Deformation Model
  3. TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation
  4. Do Learned Representations Respect Causal Relationships?
  5. ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
  6. 3D Moments From Near-Duplicate Photos
  7. Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization
  8. Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots
  9. Balanced and Hierarchical Relation Learning for One-Shot Object Detection
  10. NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
  11. Stochastic Trajectory Prediction Via Motion Indeterminacy Diffusion
  12. CLRNet: Cross Layer Refinement Network for Lane Detection
  13. Motion-Aware Contrastive Video Representation Learning Via Foreground-Background Merging
  14. DINE: Domain Adaptation From Single and Multiple Black-Box Predictors
  15. FaceFormer: Speech-Driven 3D Facial Animation With Transformers
  16. Rotationally Equivariant 3D Object Detection
  17. Accelerating DETR Convergence Via Semantic-Aligned Matching
  18. Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification
  19. GeoNeRF: Generalizing NeRF With Geometry Priors
  20. ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo
  21. Expanding Low-Density Latent Regions for Open-Set Object Detection
  22. Uformer: A General U-Shaped Transformer for Image Restoration
  23. Exploring Dual-Task Correlation for Pose Guided Person Image Generation
  24. Portrait Eyeglasses and Shadow Removal By Leveraging 3D Synthetic Data
  25. Modeling 3D Layout for Group Re-Identification
  26. Toward Fast, Flexible, and Robust Low-Light Image Enhancement
  27. Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
  28. HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network
  29. Modular Action Concept Grounding in Semantic Video Prediction
  30. StyleSwin: Transformer-Based GAN for High-Resolution Image Generation
  31. Discrete Cosine Transform Network for Guided Depth Map Super-Resolution
  32. Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing
  33. TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization
  34. Contrastive Boundary Learning for Point Cloud Segmentation
  35. Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution
  36. CVNet: Contour Vibration Network for Building Extraction
  37. Swin Transformer V2: Scaling Up Capacity and Resolution
  38. Projective Manifold Gradient Layer for Deep Rotation Regression
  39. HCSC: Hierarchical Contrastive Selective Coding
  40. TransRank: Self-Supervised Video Representation Learning Via Ranking-Based Transformation Recognition
  41. DiSparse: Disentangled Sparsification for Multitask Model Compression
  42. Pushing The Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make A Difference
  43. Towards Efficient and Scalable Sharpness-Aware Minimization
  44. OSSO: Obtaining Skeletal Shape From Outside
  45. A Study on The Distribution of Social Biases in Self-Supervised Learning Visual Models
  46. Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
  47. Comparing Correspondences: Video Prediction With Correspondence-Wise Losses
  48. Towards Fewer Annotations: Active Learning Via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
  49. CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
  50. Few Shot Generative Model Adaption Via Relaxed Spatial Structural Alignment
  51. Enhancing Adversarial Training With Second-Order Statistics of Weights
  52. Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo
  53. Moving Window Regression: A Novel Approach to Ordinal Regression
  54. Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
  55. Robust Optimization As Data Augmentation for Large-Scale Graphs
  56. Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients
  57. Improving The Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input
  58. ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
  59. 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation
  60. POCO: Point Convolution for Surface Reconstruction
  61. Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
  62. Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs
  63. DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
  64. ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes
  65. UNIST: Unpaired Neural Implicit Shape Translation Network
  66. APES: Articulated Part Extraction From Sprite Sheets
  67. SPAct: Self-Supervised Privacy Preservation for Action Recognition
  68. De-Rendering 3D Objects in The Wild
  69. Global Sensing and Measurements Reuse for Image Compressed Sensing
  70. Practical Evaluation of Adversarial Robustness Via Adaptive Auto Attack
  71. Cross-View Transformers for Real-Time Map-View Semantic Segmentation
  72. Controllable Dynamic Multi-Task Architectures
  73. FastDOG: Fast Discrete Optimization on GPU
  74. Focal and Global Knowledge Distillation for Detectors
  75. Learning To Prompt for Continual Learning
  76. Human Mesh Recovery From Multiple Shots
  77. Convolution of Convolution: Let Kernels Spatially Collaborate
  78. Make It Move: Controllable Image-to-Video Generation With Text Descriptions
  79. Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling
  80. Video-Text Representation Learning Via Differentiable Weak Temporal Alignment
  81. Bi-Directional Object-Context Prioritization Learning for Saliency Ranking
  82. Vehicle Trajectory Prediction Works, But Not Everywhere
  83. MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer
  84. Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning
  85. Generalized Category Discovery
  86. Contour-Hugging Heatmaps for Landmark Detection
  87. Voxel Field Fusion for 3D Object Detection
  88. DisARM: Displacement Aware Relation Module for 3D Detection
  89. MixFormer: Mixing Features Across Windows and Dimensions
  90. FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment
  91. HEAT: Holistic Edge Attention Transformer for Structured Reconstruction
  92. Mobile-Former: Bridging MobileNet and Transformer
  93. CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision
  94. VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
  95. Towards End-to-End Unified Scene Text Detection and Layout Analysis
  96. AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation
  97. ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
  98. End-to-End Referring Video Object Segmentation With Multimodal Transformers
  99. IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo
  100. Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds
  101. Detecting Camouflaged Object in Frequency Domain
  102. SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video
  103. Equivariant Point Cloud Analysis Via Learning Orientations for Message Passing
  104. Node Representation Learning in Graph Via Node-to-Neighbourhood Mutual Information Maximization
  105. Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction
  106. Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With A Bayesian Model
  107. How Well Do Sparse ImageNet Models Transfer?
  108. REX: Reasoning-Aware and Grounded Explanation
  109. Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
  110. Object-Aware Video-Language Pre-Training for Retrieval
  111. MAT: Mask-Aware Transformer for Large Hole Image Inpainting
  112. Align and Prompt: Video-and-Language Pre-Training With Entity Prompts
  113. MSG-Transformer: Exchanging Local Spatial Information By Manipulating Messenger Tokens
  114. Cross Modal Retrieval With Querybank Normalisation
  115. Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization
  116. ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization
  117. Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs
  118. End-to-End Multi-Person Pose Estimation With Transformers
  119. REGTR: End-to-End Point Cloud Correspondences With Transformers
  120. Neural 3D Scene Reconstruction With The Manhattan-World Assumption
  121. V2C: Visual Voice Cloning
  122. Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection
  123. MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions
  124. Gait Recognition in The Wild With Dense 3D Representations and A Benchmark
  125. ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation Via Online Exploration and Synthesis
  126. QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection
  127. IDEA-Net: Dynamic 3D Point Cloud Interpolation Via Deep Embedding Alignment
  128. BEHAVE: Dataset and Method for Tracking Human Object Interactions
  129. Revisiting Random Channel Pruning for Neural Network Compression
  130. Generating Diverse and Natural 3D Human Motions From Text
  131. E-CIR: Event-Enhanced Continuous Intensity Recovery
  132. Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond
  133. Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation
  134. AziNorm: Exploiting The Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception
  135. Weakly Supervised Rotation-Invariant Aerial Object Detection Network
  136. Surface Reconstruction From Point Clouds By Learning Predictive Context Priors
  137. IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
  138. DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation
  139. Weakly Supervised Temporal Action Localization Via Representative Snippet Knowledge Propagation
  140. E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation
  141. BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning
  142. Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation
  143. Learning Multi-View Aggregation in The Wild for Large-Scale 3D Semantic Segmentation
  144. PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition
  145. Clothes-Changing Person Re-Identification With RGB Modality Only
  146. Robust Image Forgery Detection Over Online Social Network Shared Images
  147. Representation Compensation Networks for Continual Semantic Segmentation
  148. Tracking People By Predicting 3D Appearance, Location and Pose
  149. Text2Mesh: Text-Driven Neural Stylization for Meshes
  150. C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image
  151. Forward Compatible Few-Shot Class-Incremental Learning
  152. Weakly Supervised Object Localization As Domain Adaption
  153. Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
  154. Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching
  155. Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation
  156. MatteFormer: Transformer-Based Image Matting Via Prior-Tokens
  157. Video Shadow Detection Via Spatio-Temporal Interpolation Consistency Training
  158. Robust and Accurate Superquadric Recovery: A Probabilistic Approach
  159. Grounding Answers for Visual Questions Asked By Visually Impaired People
  160. Sparse Instance Activation for Real-Time Instance Segmentation
  161. VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning
  162. MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
  163. Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis
  164. Towards Implicit Text-Guided 3D Shape Generation
  165. SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage
  166. Query and Attention Augmentation for Knowledge-Based Explainable Reasoning
  167. Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
  168. Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection
  169. Fine-Grained Object Classification Via Self-Supervised Pose Alignment
  170. Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding
  171. Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization
  172. Relieving Long-Tailed Instance Segmentation Via Pairwise Class Balance
  173. Online Convolutional Re-Parameterization
  174. Mimicking The Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning
  175. RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition
  176. Personalized Image Aesthetics Assessment With Rich Attributes
  177. Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification
  178. HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging
  179. OW-DETR: Open-World Detection Transformer
  180. Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds
  181. Reversible Vision Transformers
  182. Amodal Panoptic Segmentation
  183. Correlation Verification for Image Retrieval
  184. Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation
  185. Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut
  186. Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection
  187. Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing
  188. Glass: Geometric Latent Augmentation for Shape Spaces
  189. DPICT: Deep Progressive Image Compression Using Trit-Planes
  190. Text to Image Generation With Semantic-Spatial Aware GAN
  191. Generalizable Cross-Modality Medical Image Segmentation Via Style Augmentation and Dual Normalization
  192. Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model
  193. Interactive Segmentation and Visualization for Tiny Objects in Multi-Megapixel Images
  194. Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture
  195. Surface Representation for Point Clouds
  196. Implicit Motion Handling for Video Camouflaged Object Detection
  197. DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides
  198. Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification
  199. Optical Flow Estimation for Spiking Camera
  200. GradViT: Gradient Inversion of Vision Transformers
  201. Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution Via Cycle-Projected Mutual Learning
  202. Joint Global and Local Hierarchical Priors for Learned Image Compression
  203. Knowledge Distillation Via The Target-Aware Transformer
  204. Subspace Adversarial Training
  205. 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection
  206. Image Segmentation Using Text and Image Prompts
  207. AutoMine: An Unmanned Mine Dataset
  208. Background Activation Suppression for Weakly Supervised Object Localization
  209. Synthetic Generation of Face Videos With Plethysmograph Physiology
  210. Hallucinated Neural Radiance Fields in The Wild
  211. Global Tracking Transformers
  212. Backdoor Attacks on Self-Supervised Learning
  213. GMFlow: Learning Optical Flow Via Global Matching
  214. Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
  215. Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline
  216. Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction
  217. Scanline Homographies for Rolling-Shutter Plane Absolute Pose
  218. AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement
  219. Recurrent Glimpse-Based Decoder for Detection With Transformer
  220. SimMIM: A Simple Framework for Masked Image Modeling
  221. Label Matching Semi-Supervised Object Detection
  222. RegionCLIP: Region-Based Language-Image Pretraining
  223. Video Frame Interpolation Transformer
  224. BCOT: A Markerless High-Precision 3D Object Tracking Benchmark
  225. Omni-DETR: Omni-Supervised Object Detection With Transformers
  226. Transferable Sparse Adversarial Attack
  227. CREAM: Weakly Supervised Object Localization Via Class RE-Activation Mapping
  228. VALHALLA: Visual Hallucination for Machine Translation
  229. HINT: Hierarchical Neuron Concept Explainer
  230. Neural Face Identification in A 2D Wireframe Projection of A Manifold Object
  231. Nonuniform-to-Uniform Quantization: Towards Accurate Quantization Via Generalized Straight-Through Estimation
  232. An Empirical Study of End-to-End Temporal Action Detection
  233. Object Localization Under Single Coarse Point Supervision
  234. Unsupervised Learning of Accurate Siamese Tracking
  235. Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo
  236. Equalized Focal Loss for Dense Long-Tailed Object Detection
  237. DeepDPM: Deep Clustering With An Unknown Number of Clusters
  238. ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation
  239. Unsupervised Domain Adaptation for Nighttime Aerial Tracking
  240. RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs
  241. Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction
  242. A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration
  243. Not Just Selection, But Exploration: Online Class-Incremental Continual Learning Via Dual View Consistency
  244. Coupling Vision and Proprioception for Navigation of Legged Robots
  245. Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation
  246. EMOCA: Emotion Driven Monocular Face Capture and Animation
  247. Quarantine: Sparsity Can Uncover The Trojan Attack Trigger for Free
  248. AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation
  249. Interactive Multi-Class Tiny-Object Detection
  250. Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection
  251. Multi-View Depth Estimation By Fusing Single-View Depth Probability With Multi-View Geometry
  252. Slimmable Domain Adaptation
  253. High-Resolution Image Harmonization Via Collaborative Dual Transformations
  254. MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation
  255. Self-Supervised Neural Articulated Shape and Appearance Models
  256. Topology Preserving Local Road Network Estimation From Single Onboard Camera Image
  257. Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes
  258. SwinTextSpotter: Scene Text Spotting Via Better Synergy Between Text Detection and Text Recognition
  259. Deblur-NeRF: Neural Radiance Fields From Blurry Images
  260. Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction
  261. Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
  262. Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
  263. Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel
  264. Faithful Extreme Rescaling Via Generative Prior Reciprocated Invertible Representations
  265. Proto2Proto: Can You Recognize The Car, The Way I Do?
  266. TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing
  267. Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution
  268. Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale
  269. Simple But Effective: CLIP Embeddings for Embodied AI
  270. NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
  271. Collaborative Transformers for Grounded Situation Recognition
  272. CPPF: Towards Robust Category-Level 9D Pose Estimation in The Wild
  273. Continual Test-Time Domain Adaptation
  274. Dynamic MLP for Fine-Grained Image Classification By Leveraging Geographical and Temporal Information
  275. MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering
  276. Fair Contrastive Learning for Facial Attribute Classification
  277. Directional Self-Supervised Learning for Heavy Image Augmentations
  278. No-Reference Point Cloud Quality Assessment Via Domain Adaptation
  279. Comprehending and Ordering Semantics for Image Captioning
  280. A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection
  281. Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification
  282. HeadNeRF: A Real-Time NeRF-Based Parametric Head Model
  283. Occlusion-Robust Face Alignment Using A Viewpoint-Invariant Hierarchical Network Architecture
  284. IDR: Self-Supervised Image Denoising Via Iterative Data Refinement
  285. MogFace: Towards A Deeper Appreciation on Face Detection
  286. Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers
  287. CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation
  288. FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos
  289. Learning To Detect Mobile Objects From LiDAR Scans Without Labels
  290. WildNet: Learning Domain Generalized Semantic Segmentation From The Wild
  291. DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
  292. Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation
  293. Generating Diverse 3D Reconstructions From A Single Occluded Face Image
  294. Stand-Alone Inter-Frame Attention in Video Models
  295. Large-Scale Pre-Training for Person Re-Identification With Noisy Labels
  296. Semantic Segmentation By Early Region Proxy
  297. LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition
  298. HVH: Learning A Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture
  299. Rethinking Visual Geo-Localization for Large-Scale Applications
  300. The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
  301. ViM: Out-of-Distribution With Virtual-Logit Matching
  302. Class-Aware Contrastive Semi-Supervised Learning
  303. Ditto: Building Digital Twins of Articulated Objects From Interaction
  304. Adaptive Early-Learning Correction for Segmentation From Noisy Annotations
  305. Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation
  306. RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution
  307. Partial Class Activation Attention for Semantic Segmentation
  308. Multi-Scale Memory-Based Video Deblurring
  309. A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching
  310. Geometric Structure Preserving Warp for Natural Image Stitching
  311. GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping
  312. Conditional Prompt Learning for Vision-Language Models
  313. Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification
  314. Undoing The Damage of Label Shift for Cross-Domain Semantic Segmentation
  315. FisherMatch: Semi-Supervised Rotation Regression Via Entropy-Based Filtering
  316. Affine Medical Image Registration With Coarse-To-Fine Vision Transformer
  317. A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift
  318. Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes
  319. Restormer: Efficient Transformer for High-Resolution Image Restoration
  320. IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation
  321. Large Loss Matters in Weakly Supervised Multi-Label Classification
  322. Neural Inertial Localization
  323. GraftNet: Towards Domain Generalized Stereo Matching With A Broad-Spectrum and Task-Oriented Feature
  324. VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning
  325. Catching Both Gray and Black Swans: Open-Set Supervised Anomaly Detection
  326. MLSLT: Towards Multilingual Sign Language Translation
  327. Towards An End-to-End Framework for Flow-Guided Video Inpainting
  328. Contrastive Test-Time Adaptation
  329. MotionAug: Augmentation With Physical Correction for Human Motion Prediction
  330. Modeling Indirect Illumination for Inverse Rendering
  331. TransWeather: Transformer-Based Restoration of Images Degraded By Adverse Weather Conditions
  332. H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection
  333. P3Depth: Monocular Depth Estimation With A Piecewise Planarity Prior
  334. GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
  335. Simple Multi-Dataset Detection
  336. Proactive Image Manipulation Detection
  337. StyTr2: Image Style Transfer With Transformers
  338. Global Matching With Overlapping Attention for Optical Flow Estimation
  339. Language As Queries for Referring Video Object Segmentation
  340. MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
  341. Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language
  342. Rethinking Efficient Lane Detection Via Curve Modeling
  343. Self-Supervised Arbitrary-Scale Point Clouds Upsampling Via Implicit Neural Representation
  344. Co-Advise: Cross Inductive Bias Distillation
  345. AdaMixer: A Fast-Converging Query-Based Object Detector
  346. DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification
  347. BEVT: BERT Pretraining of Video Transformers
  348. Deep Generalized Unfolding Networks for Image Restoration
  349. VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation
  350. Deep Unlearning Via Randomized Conditionally Independent Hessians
  351. Revisiting Skeleton-Based Action Recognition
  352. Stereo Depth From Events Cameras: Concentrate and Focus on The Future
  353. A Simple Data Mixing Prior for Improving Self-Supervised Learning
  354. Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability
  355. BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops to Distributed Cluster
  356. Attentive Fine-Grained Structured Sparsity for Image Restoration
  357. Learning Fair Classifiers With Partially Annotated Group Labels
  358. NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night
  359. Constrained Few-Shot Class-Incremental Learning
  360. Threshold Matters in WSSS: Manipulating The Activation for The Robust and Accurate Segmentation Model Against Thresholds
  361. TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers
  362. DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis
  363. The Majority Can Help The Minority: Context-Rich Minority Oversampling for Long-Tailed Classification
  364. IntentVizor: Towards Generic Query Guided Interactive Video Summarization
  365. Shape-Invariant 3D Adversarial Point Clouds
  366. Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training
  367. PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents
  368. Meta-Attention for ViT-Backed Continual Learning
  369. DST: Dynamic Substitute Training for Data-Free Black-Box Attack
  370. Unified Contrastive Learning in Image-Text-Label Space
  371. Unsupervised Pre-Training for Temporal Action Localization Tasks
  372. Look Outside The Room: Synthesizing A Consistent Long-Term 3D Scene Video From A Single Image
  373. High-Fidelity Human Avatars From A Single RGB Camera
  374. Multiview Transformers for Video Recognition
  375. How Good Is Aesthetic Ability of A Fashion Model?
  376. Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds
  377. Sequential Voting With Relational Box Fields for Active Object Detection
  378. Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning
  379. Consistency Learning Via Decoding Path Augmentation for Transformers in Human Object Interaction Detection
  380. Consistent Explanations By Contrastive Learning
  381. Hierarchical Modular Network for Video Captioning
  382. Depth Estimation By Combining Binocular Stereo and Monocular Structured-Light
  383. Salient-to-Broad Transition for Video Person Re-Identification
  384. DeeCap: Dynamic Early Exiting for Efficient Image Captioning
  385. RepMLPNet: Hierarchical Vision MLP With Re-Parameterized Locality
  386. DR.VIC: Decomposition and Reasoning for Video Individual Counting
  387. ARCS: Accurate Rotation and Correspondence Search
  388. Learning To Anticipate Future With Dynamic Context Removal
  389. GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors
  390. On The Integration of Self-Attention and Convolution
  391. Domain Adaptation on Point Clouds Via Geometry-Aware Implicits
  392. GroupViT: Semantic Segmentation Emerges From Text Supervision
  393. DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
  394. BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks Via Image Quantization and Contrastive Adversarial Learning
  395. Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation
  396. Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector
  397. Topology-Preserving Shape Reconstruction and Registration Via Neural Diffeomorphic Flow
  398. Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection
  399. MAXIM: Multi-Axis MLP for Image Processing
  400. Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles
  401. PSTR: End-to-End One-Step Person Search With Transformers
  402. NFormer: Robust Person Re-Identification With Neighbor Transformer
  403. Bridging Global Context Interactions for High-Fidelity Image Completion
  404. SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning
  405. Not All Tokens Are Equal: Human-Centric Visual Analysis Via Token Clustering Transformer
  406. Temporally Efficient Vision Transformer for Video Instance Segmentation
  407. The Devil Is in The Margin: Margin-Based Label Smoothing for Network Calibration
  408. NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
  409. WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation
  410. Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
  411. E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
  412. OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization
  413. OnePose: One-Shot Object Pose Estimation Without CAD Models
  414. Rethinking Minimal Sufficient Representation in Contrastive Learning
  415. Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels
  416. Federated Class-Incremental Learning
  417. Show, Deconfound and Tell: Image Captioning With Causal Inference
  418. MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image
  419. Parameter-Free Online Test-Time Adaptation
  420. SIGMA: Semantic-Complete Graph Matching for Domain Adaptive Object Detection
  421. No Pain, Big Gain: Classify Dynamic Point Cloud Sequences With Static Models By Fitting Feature-Level Space-Time Surfaces
  422. HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging
  423. Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space
  424. Learning To Estimate Robust 3D Human Mesh From In-the-Wild Crowded Scenes
  425. Detecting Deepfakes With Self-Blended Images
  426. Implicit Sample Extension for Unsupervised Person Re-Identification
  427. Energy-Based Latent Aligner for Incremental Learning
  428. Towards Semi-Supervised Deep Facial Expression Recognition With An Adaptive Confidence Margin
  429. Group R-CNN for Weakly Semi-Supervised Object Detection With Points
  430. Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction
  431. Hybrid Relation Guided Set Matching for Few-Shot Action Recognition
  432. Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images
  433. Generalized Binary Search Network for Highly-Efficient Multi-View Stereo
  434. SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation
  435. FlexIT: Towards Flexible Semantic Image Translation
  436. CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
  437. BoxeR: Box-Attention for 2D and 3D Transformers
  438. Neural Architecture Search With Representation Mutual Information
  439. Can Neural Nets Learn The Same Model Twice? Investigating Reproducibility and Double Descent From The Decision Boundary Perspective
  440. Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction
  441. Multi-View Transformer for 3D Visual Grounding
  442. Structured Sparse R-CNN for Direct Scene Graph Generation
  443. BARC: Learning To Regress 3D Dog Shape From Images By Exploiting Breed Information
  444. PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models
  445. Towards Understanding Adversarial Robustness of Optical Flow Networks
  446. Lifelong Graph Learning
  447. Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning
  448. Computing Wasserstein-p Distance Between Images With Linear Cost
  449. Unsupervised Representation Learning for Binary Networks By Joint Classifier Learning
  450. Large-Scale Video Panoptic Segmentation in The Wild: A Benchmark
  451. GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains
  452. Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification
  453. MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
  454. Oriented RepPoints for Aerial Object Detection
  455. Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning
  456. Low-Resource Adaptation for Personalized Co-Speech Gesture Generation
  457. Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection
  458. MS2DG-Net: Progressive Correspondence Learning Via Multiple Sparse Semantics Dynamic Graph
  459. Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion
  460. Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video
  461. MixFormer: End-to-End Tracking With Iterative Mixed Attention
  462. Plenoxels: Radiance Fields Without Neural Networks
  463. Selective-Supervised Contrastive Learning With Noisy Labels
  464. SimT: Handling Open-Set Noise for Domain Adaptive Semantic Segmentation
  465. Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity
  466. Video Demoireing With Relation-Based Temporal Consistency
  467. Industrial Style Transfer With Large-Scale Geometric Warping and Content Preservation
  468. Modeling Image Composition for Complex Scene Generation
  469. Decoupling Zero-Shot Semantic Segmentation
  470. Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions
  471. Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting The Adversarial Transferability
  472. IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
  473. Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation
  474. TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
  475. The Wanderings of Odysseus in 3D Scenes
  476. All-in-One Image Restoration for Unknown Corruption
  477. PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors
  478. MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
  479. RCP: Recurrent Closest Point for Point Cloud
  480. A Dual Weighting Label Assignment Scheme for Object Detection
  481. Hyperbolic Vision Transformers: Combining Improvements in Metric Learning
  482. Instance-Aware Dynamic Neural Network Quantization
  483. Exploring Effective Data for Surrogate Training Towards Black-Box Attack
  484. JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection
  485. Investigating Top-k White-Box and Transferable Black-Box Attack
  486. Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition
  487. A Self-Supervised Descriptor for Image Copy Detection
  488. Negative-Aware Attention Framework for Image-Text Matching
  489. An Image Patch Is A Wave: Phase-Aware Vision MLP
  490. Shunted Self-Attention Via Multi-Scale Token Aggregation
  491. Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression
  492. Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to The Task of Accelerated MRI Reconstruction
  493. Surpassing The Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning
  494. Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond
  495. TrackFormer: Multi-Object Tracking With Transformers
  496. 3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow
  497. Feature Statistics Mixing Regularization for Generative Adversarial Networks
  498. OpenTAL: Towards Open Set Temporal Action Localization
  499. Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection
  500. Ego4D: Around The World in 3,000 Hours of Egocentric Video
  501. Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
  502. Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data
  503. DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From A Single Image
  504. Reconstructing Surfaces for Sparse Point Clouds With On-Surface Priors
  505. VCLIMB: A Novel Video Class Incremental Learning Benchmark
  506. Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements
  507. ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation
  508. Interacting Attention Graph for Single Image Two-Hand Reconstruction
  509. Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task
  510. Cross-Image Relational Knowledge Distillation for Semantic Segmentation
  511. Towards Layer-Wise Image Vectorization
  512. Scenic: A JAX Library for Computer Vision Research and Beyond
  513. Real-Time Object Detection for Streaming Perception
  514. VisualHow: Multimodal Problem Solving
  515. Spatial Commonsense Graph for Object Localisation in Partial Scenes
  516. OSSGAN: Open-Set Semi-Supervised Image Generation
  517. Bi-Level Alignment for Cross-Domain Crowd Counting
  518. ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation
  519. Efficient Multi-View Stereo By Iterative Dynamic Cost Volume
  520. TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing
  521. Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework
  522. SGTR: End-to-End Scene Graph Generation With Transformer
  523. Decoupled Knowledge Distillation
  524. DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
  525. Reusing The Task-Specific Classifier As A Discriminator: Discriminator-Free Adversarial Domain Adaptation
  526. Show Me What and Tell Me How: Video Synthesis Via Multimodal Conditioning
  527. SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks
  528. Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss
  529. CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings
  530. IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization
  531. I M Avatar: Implicit Morphable Head Avatars From Videos
  532. Weakly-Supervised Metric Learning With Cross-Module Communications for The Classification of Anterior Chamber Angle Images
  533. A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution
  534. Multi-Modal Dynamic Graph Transformer for Visual Grounding
  535. Geometric Transformer for Fast and Robust Point Cloud Registration
  536. UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection
  537. Demystifying The Neural Tangent Kernel From A Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training?
  538. The Devil Is in The Details: Window-Based Attention for Image Compression
  539. DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation
  540. PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images
  541. Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
  542. Spatio-Temporal Relation Modeling for Few-Shot Action Recognition
  543. Multi-Person Extreme Motion Prediction
  544. B-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
  545. CMT: Convolutional Neural Networks Meet Vision Transformers
  546. KNN Local Attention for Image Restoration
  547. Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered By Pre-Trained Vision-Language Model
  548. TransMix: Attend To Mix for Vision Transformers
  549. Inertia-Guided Flow Completion and Style Fusion for Video Inpainting
  550. Long-Tailed Visual Recognition Via Gaussian Clouded Logit Adjustment
  551. Image Animation With Perturbed Masks
  552. Domain Generalization Via Shuffled Style Assembly for Face Anti-Spoofing
  553. OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction
  554. MonoScene: Monocular 3D Semantic Scene Completion
  555. AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
  556. Continuous Scene Representations for Embodied AI
  557. Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
  558. Non-Probability Sampling Network for Stochastic Human Trajectory Prediction
  559. ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning
  560. Human-Aware Object Placement for Visual Environment Reconstruction
  561. X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
  562. RAMA: A Rapid Multicut Algorithm on GPU
  563. Adversarial Parametric Pose Prior
  564. Mask Transfiner for High-Quality Instance Segmentation
  565. It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning By Contrastive Data Collection
  566. DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis
  567. Event-Based Video Reconstruction Via Potential-Assisted Spiking Neural Network
  568. YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset
  569. DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
  570. Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification
  571. Self-Supervised Video Transformer
  572. AutoRF: Learning 3D Object Radiance Fields From Single View Observations
  573. Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles
  574. TubeR: Tubelet Transformer for Video Action Detection
  575. MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection
  576. Learning Non-Target Knowledge for Few-Shot Semantic Segmentation
  577. UKPGAN: A General Self-Supervised Keypoint Detector
  578. Raw High-Definition Radar for Multi-Task Learning
  579. Coarse-To-Fine Feature Mining for Video Semantic Segmentation
  580. Compressing Models With Few Samples: Mimicking Then Replacing
  581. PokeBNN: A Binary Pursuit of Lightweight Accuracy
  582. Zoom in and Out: A Mixed-Scale Triplet Network for Camouflaged Object Detection
  583. SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images
  584. EMScore: Evaluating Video Captioning Via Coarse-Grained and Fine-Grained Embedding Matching
  585. PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision
  586. Group Contextualization for Video Recognition
  587. Single-Domain Generalized Object Detection in Urban Scene Via Cyclic-Disentangled Self-Distillation
  588. L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation
  589. Self-Augmented Unpaired Image Dehazing Via Density and Depth Decomposition
  590. Neural 3D Video Synthesis From Multi-View Video
  591. SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation
  592. Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search
  593. HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening
  594. Structure-Aware Flow Generation for Human Body Reshaping
  595. Learning To Answer Questions in Dynamic Audio-Visual Scenarios
  596. Synthetic Aperture Imaging With Events and Frames
  597. MonoGround: Detecting Monocular 3D Objects From The Ground
  598. Deep Visual Geo-Localization Benchmark
  599. StyleGAN-V: A Continuous Video Generator With The Price, Image Quality and Perks of StyleGAN2
  600. LISA: Learning Implicit Shape and Appearance of Hands
  601. Iterative Deep Homography Estimation
  602. Learned Queries for Efficient Local Attention
  603. Colar: Effective and Efficient Online Action Detection By Consulting Exemplars
  604. SoftGroup for 3D Instance Segmentation on Point Clouds
  605. MVS2D: Efficient Multi-View Stereo Via Attention-Driven 2D Convolutions
  606. Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation Via Semantic Knowledge Transfer and Self-Refinement
  607. Deep Constrained Least Squares for Blind Image Super-Resolution
  608. EDTER: Edge Detection With Transformer
  609. AirObject: A Temporally Evolving Graph Embedding for Object Identification
  610. From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering
  611. Semantic-Aware Domain Generalized Segmentation
  612. DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
  613. UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
  614. AKB-48: A Real-World Articulated Object Knowledge Base
  615. Stratified Transformer for 3D Point Cloud Segmentation
  616. Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations
  617. Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis
  Day-to-Night Image Synthesis for Training Nighttime Neural ISPs


