Computer Vision Research Work
Computer Vision Research WorkPermalink
When we talk about “vision” capabilities, most people don’t understand how complex the brain is in processing the visual spectrum (light signals). What kind of processing happens inside our brain that allows us to understand color, depth, motion, speed, segments, objects, scenes, different kinds of art, drawings, culture, etc.? Until recently, when “computer vision” became a serious field in AI, only neurology researchers, surgeons, and brain specialists had some insights into these processes. But since 2012 (AlexNet Paper), with new papers being published almost every month, we are constantly learning how far we’ve come in computer vision. This article is not only about the chronology of computer vision but also about software engineers, computer scientists, AI engineers, and everyone who wants to understand how their phone performs certain computer visions tasks and becomes intelligent.
SNo | Research Name | Short Description of Paper | Month-Year | Organization | URL |
---|---|---|---|---|---|
1 | DeconvNet | Deconvolutional Networks for Feature Learning | Nov 2010 | KAIST | Paper, Blog |
2 | Saliency Propagation | A method for salient object detection that propagates saliency information through optimization | Apr 2014 | Chinese Academy of Sciences | Paper |
3 | SDS | Simultaneous Detection and Segmentation | Jun 2014 | UC Berkeley | Paper |
4 | GoogleNet | Introduced the Inception module to increase network depth and width efficiently. | Sep-2014 | ||
5 | VGGNet | Used small 3x3 convolution filters to increase depth, achieving high accuracy. | Sep-2014 | Oxford University | |
6 | FCN | Fully Convolutional Networks for semantic segmentation | Nov 2014 | UC Berkeley | Paper |
7 | HyperColumn | Multi-scale CNN feature fusion | Nov 2014 | UC Berkeley | Paper |
8 | DeepLab v1 | Semantic Image Segmentation with Deep Convolutional Nets and CRFs | Dec 2014 | Paper | |
9 | U-Net | Convolutional network for biomedical image segmentation | May 2015 | University of Freiburg | Paper, Blog |
10 | Highway Network | Proposed highway layers to enable training of very deep networks. | May-2015 | University of Montreal | |
11 | YOLO Series | You Only Look Once: series of real-time object detection systems (v1-v4) | Jun 2015 (v1) - Apr 2020 (v4) | University of Washington, Darknet | Paper, Blog |
12 | CRF-RNN | Conditional Random Fields as Recurrent Neural Networks | Jun 2015 | University of Oxford | Paper |
13 | MR-CNN & S-CNN | Multi-Region CNN and Semantic CNN for object detection | Jun 2015 | University of California, Berkeley | Paper |
14 | DeepMask | Learning to Segment Objects Candidates | Jun 2015 | Facebook AI Research | Paper |
15 | LAPGAN | Laplacian Pyramid of Generative Adversarial Networks for image generation | Jun 2015 | Facebook AI Research | Paper |
16 | CUDMedVision1 | Medical Image Segmentation System 1 | Sep 2015 | Chinese University of Hong Kong | Paper, Blog |
17 | SegNet | Deep Convolutional Encoder-Decoder Architecture for Image Segmentation | Oct 2015 | University of Cambridge | Paper |
18 | DilatedNet | Multi-Scale Context Aggregation by Dilated Convolutions | Nov 2015 | Princeton University | Paper |
19 | CAM | Class Activation Mapping for identifying discriminative regions | Dec 2015 | MIT | Paper |
20 | ParseNet | Looking Wider to See Better for semantic segmentation | Dec 2015 | UNC Chapel Hill | Paper |
21 | MNC | Instance-aware Semantic Segmentation via Multi-task Network Cascades | Dec 2015 | Microsoft Research | Paper |
22 | ResNet | Introduced residual learning to address vanishing gradients in deep networks. | Dec-2015 | Microsoft Research | |
23 | SqueezeNet | AlexNet-level accuracy with 50x fewer parameters | Feb 2016 | UC Berkeley, Stanford | Paper, Blog |
24 | SqueezeNet | Designed to reduce model size while maintaining accuracy, using 1x1 convolutions. | Feb-2016 | DeepScale, UC Berkeley | |
25 | Pre-activation ResNet | Identity Mappings in Deep Residual Networks | Mar 2016 | Microsoft Research | Paper |
26 | SharpMask | Learning to Refine Object Segments | Mar 2016 | Facebook AI Research | Paper |
27 | InstanceFCN | Instance-sensitive Fully Convolutional Networks | Mar 2016 | Microsoft Research | Paper |
28 | MultipathNet | Multiple Path Aggregation Network | Apr 2016 | Facebook AI Research | Paper |
29 | R-FCN | Region-based Fully Convolutional Networks for object detection | May 2016 | Microsoft Research | Paper |
30 | NOC | Neural Object Counting for object detection | May 2016 | Microsoft Research | Paper |
31 | DeepLab v2 | Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution | Jun 2016 | Paper | |
32 | DeepSim | Deep Learning Approach for Image Quality Assessment | Jun 2016 | Tsinghua University | Paper |
33 | DIS | Deep Image Smoothing | Jun 2016 | University of Illinois | Paper |
34 | V-Net | Fully Convolutional Neural Network for volumetric medical image segmentation | Jun 2016 | University College London | Paper |
35 | 3D U-net | Volumetric Segmentation with 3D U-net | Jun 2016 | University of Freiburg | Paper |
36 | ENet | Efficient Neural Network for Real-time Semantic Segmentation | Jul 2016 | University of Cambridge | Paper |
37 | ResNet38 | Wider or Deeper: Revisiting the ResNet Model | Jul 2016 | KAIST | Paper |
38 | DRRN | Deep Recursive Residual Network for image super-resolution | Jul 2016 | National University of Singapore | Paper |
39 | Multi-Channel | Multi-Channel CNN for medical image analysis | Jul 2016 | University of California, San Diego | Paper |
40 | GCN | Graph Convolutional Networks for processing graph-structured data | Sep 2016 | University of Montreal | Paper |
41 | M²FCN | Multi-modal Fully Convolutional Networks for medical imaging | Sep 2016 | Chinese Academy of Sciences | Paper, Blog |
42 | Graph CNN | Graph Convolutional Neural Networks | Sep 2016 | University of Montreal | Paper |
43 | Grad-CAM | Gradient-weighted Class Activation Mapping | Oct 2016 | Georgia Tech | Paper |
44 | ResNeXt | Aggregated Residual Transformations for Deep Neural Networks | Nov 2016 | Facebook AI Research | Paper |
45 | DRN | Dilated Residual Networks | Nov 2016 | Princeton University | Paper |
46 | RefineNet | Multi-Path Refinement Networks for high-resolution semantic segmentation | Nov 2016 | University of Adelaide | Paper |
47 | FractalNet | Ultra-Deep Neural Networks without Residuals | Nov 2016 | University of Toronto | Paper |
48 | SSD | Single Shot MultiBox Detector for real-time object detection | Dec 2016 | Paper, Blog | |
49 | TDM | Top-Down Modulation for object detection | Dec 2016 | Carnegie Mellon University | Paper |
50 | FPN | Feature Pyramid Networks for object detection | Dec 2016 | Facebook AI Research | Paper |
51 | VoxResNet | Deep Voxelwise Residual Networks | Dec 2016 | Chinese Academy of Sciences | Paper |
52 | DSSD | Deconvolutional Single Shot Detector | Jan 2017 | UNC Chapel Hill | Paper |
53 | PolyNet | Better Vision with More Complex Paths | Mar 2017 | Microsoft Research | Paper |
54 | IGCNet | Interleaved Group Convolutions | Mar 2017 | Microsoft Research | Paper |
55 | DCN | Deformable Convolutional Networks | Mar 2017 | Microsoft Research Asia | Paper |
56 | IDW-CNN | Image Dependent Warping CNN | Mar 2017 | Seoul National University | Paper |
57 | FCIS | Fully Convolutional Instance-aware Semantic Segmentation | Mar 2017 | Microsoft Research Asia | Paper |
58 | Residual Attention Network | Attention mechanism for image classification | Apr 2017 | Tsinghua University | Paper |
59 | ResNet-DUC-HDC | Dense Upsampling Convolution and Hybrid Dilated Convolution | Apr 2017 | Tsinghua University | Paper |
60 | MobileNet | Focused on efficient models for mobile and embedded devices using depthwise separable convolutions. | Apr-2017 | ||
61 | G-RMI | Google’s large scale object detection system | Jun 2017 | Google Research | Paper |
62 | GraphSAGE | Inductive Representation Learning on Large Graphs | Jun 2017 | Stanford University | Paper |
63 | DPN | Dual Path Networks combining ResNet and DenseNet | Jul 2017 | UCSD, Momenta | Paper |
64 | ERFNet | Efficient Residual Factorized ConvNet for real-time semantic segmentation | Jul 2017 | Universidad de Alcalá | Paper |
65 | Suggestive Annotation | Active Learning for medical image segmentation | Jul 2017 | ETH Zurich | Paper |
66 | RetinaNet | Focal Loss for Dense Object Detection | Aug 2017 | Facebook AI Research | Paper, Blog |
67 | Hide-and-Seek | Weakly-supervised object detection training strategy | Aug 2017 | Carnegie Mellon University | Paper |
68 | C3 | Cross-City Cascade for semantic segmentation | Aug 2017 | University of Oxford | Paper |
69 | U-net+Res-net | Combined U-net and Residual Network for medical segmentation | Aug 2017 | Technical University of Munich | Paper |
70 | DenseVoxNet | Dense Voxel Network for 3D medical image segmentation | Sep 2017 | Chinese University of Hong Kong | Paper |
71 | Graph Attention Networks | Self-attention for Graph Data | Oct 2017 | Université de Montréal | Paper |
72 | Light-Head R-CNN | Light-weight object detection architecture | Nov 2017 | Megvii Technology | Paper |
73 | LayerCascade | Instance segmentation via layer cascade | Nov 2017 | University of Washington | Paper |
74 | 3D U-net + ResNet | Combined 3D U-net and ResNet for volumetric segmentation | Nov 2017 | Technical University of Munich | Paper |
75 | Cascade R-CNN | Multi-stage object detection refinement | Dec 2017 | CIDSE | Paper |
76 | StairNet | Top-down semantic feature refinement | Dec 2017 | Seoul National University | Paper |
77 | MaskLab | Instance Segmentation by Refining Object Detection | Jan 2018 | Paper | |
78 | RU-Net + R2U-Net | Recurrent Residual U-Net variants | Jan 2018 | University of Dhaka | Paper |
79 | AmoebaNet | Evolutionary Architecture Search | Feb 2018 | Google Brain | Paper |
80 | SqueezeNext | Hardware-Aware Neural Network Design | Feb 2018 | UC Berkeley | Paper |
81 | ENAS | Efficient Neural Architecture Search | Feb 2018 | Google Brain | Paper |
82 | DeepLab v3+ | Encoder-Decoder with Atrous Separable Convolution | Feb 2018 | Paper, Blog | |
83 | Group Normalization | Alternative to Batch Normalization | Mar 2018 | Facebook AI Research | Paper |
84 | ACoL | Adversarial Complementary Learning for weakly supervised object localization | Mar 2018 | University of Technology Sydney | Paper |
85 | BR²Net | Boundary Refinement and Recurrent Network for semantic segmentation | Mar 2018 | Tsinghua University | Paper |
86 | PANet | Path Aggregation Network for Instance Segmentation | Mar 2018 | Chinese Academy of Sciences | Paper |
87 | MorphNet | Fast & Simple Resource-Constrained Structure Learning | Apr 2018 | Google Research | Paper |
88 | ImageNet Rethinking | Research on ImageNet training strategies | Apr 2018 | Facebook AI Research | Paper |
89 | Attention U-net | Attention Gates for Medical Image Segmentation | Apr 2018 | University College London | Paper |
90 | MegNet | Multi-Evidence Guidance for weakly supervised object detection | Jun 2018 | University of Technology Sydney | Paper |
91 | H-DenseUNet | Hybrid Densely Connected UNet for medical segmentation | Jun 2018 | Chinese University of Hong Kong | Paper |
92 | PNASNet | Progressive Neural Architecture Search | Jul 2018 | Google Brain | Paper |
93 | ShuffleNetV2 | Practical Guidelines for Mobile Network Design | Jul 2018 | Face++ | Paper |
94 | BAM | Bottleneck Attention Module | Jul 2018 | KAIST | Paper |
95 | CBAM | Convolutional Block Attention Module | Jul 2018 | KAIST | Paper |
96 | NetAdapt | Platform-Aware Neural Network Adaptation | Jul 2018 | MIT | Paper |
97 | U-Net++ | Nested U-Net Architecture | Jul 2018 | Arizona State University | Paper |
98 | DU-Net | Deformable U-Net for medical image segmentation | Aug 2018 | Shanghai Jiao Tong University | Paper |
99 | DropBlock | Structured dropout method for convolutional networks | Oct 2018 | Google Brain | Paper |
100 | AutoDeepLab | Neural Architecture Search for Semantic Image Segmentation | Jan 2019 | Google Research | Paper |
101 | ESPNetv2 | Efficient Spatial Pyramid of Dilated Convolutions | Mar 2019 | MIT | Paper |
102 | SiamRPN++ | Deep learning-based visual tracking framework that removes spatial awareness by sampling features across different layers | Mar 2019 | Chinese Academy of Sciences | Paper |
103 | Libra R-CNN | Balanced learning framework for object detection that addresses sample level, feature level, and objective level imbalance | Apr 2019 | SenseTime Research | Paper |
104 | FBNet | Hardware-Aware Efficient ConvNet Design | May 2019 | Facebook AI Research | Paper |
105 | SDN | Selective Deep Network for efficient visual recognition | May 2019 | University of Texas | Paper |
106 | MultiResUNet | Multi-Resolution U-Net for medical image segmentation | May 2019 | Bangladesh University | Paper |
107 | EfficientNet | Scaled networks uniformly in depth, width, and resolution for better efficiency. | May-2019 | ||
108 | ADL | Attention-based Dropout Layer for weakly supervised object localization | Jun 2019 | KAIST | Paper |
109 | ARMA Convolution | Auto-Regressive Moving Average Graph Filtering | Jun 2019 | Università degli Studi di Modena | Paper |
110 | Panoptic Segmentation | Unified Scene Parsing Framework | Jun 2019 | Facebook AI Research | Paper |
111 | CutMix | Data augmentation method combining cut and mix images | Aug 2019 | Clova AI Research, NAVER | Paper |
112 | SlowFast | Two-pathway network for video recognition that captures both slow and fast motion patterns | Aug 2019 | Facebook AI Research | Paper |
113 | EfficientDet | Scalable object detection architecture using weighted bidirectional feature network and compound scaling | Nov 2019 | Google Research | Paper |
114 | AdderNet | Neural Networks with Only Addition Operations | Dec 2019 | Huawei Noah’s Ark Lab | Paper |
115 | TPN | Temporal Pyramid Network for action detection in videos | Dec 2019 | Microsoft Research Asia | Paper |
116 | ATSS | Adaptive Training Sample Selection for object detection | Dec 2019 | ByteDance AI Lab | Paper |
117 | ACNe | Attentive Context Normalization for robust permutation-equivariant learning | Dec 2019 | KAIST | Paper |
118 | Cascade Cost Volume | Cascade Cost Volume for stereo matching | Dec 2019 | Megvii Technology | Paper |
119 | Yolact++ | Real-time instance segmentation with improved mask quality and inference speed | Jan 2020 | University of California, Davis | Paper |
120 | MCN | Multi-task Collaboration Network | Jan 2020 | Microsoft Research Asia | Paper |
121 | RandLA-Net | Large-scale Point Cloud Semantic Segmentation | Jan 2020 | University of Oxford | Paper |
122 | OccuSeg | 3D instance segmentation approach that handles occlusions in point clouds | Mar 2020 | Stanford University | Paper |
123 | GTAD | Global Temporal Action Detection framework for temporal action localization | Mar 2020 | Sun Yat-sen University | Paper |
124 | Attention-RPN | Visual tracking framework with attention mechanism in Region Proposal Network | Mar 2020 | Chinese Academy of Sciences | Paper |
125 | QSA + QNT | Quantized Squeeze-and-Attention Networks | Mar 2020 | Tsinghua University | Paper |
126 | UNet 3+ | Full-Scale Connected UNet for medical image segmentation | Mar 2020 | Southern Medical University | Paper |
127 | ROAM | Recurrently Optimizing Tracking Model | Mar 2020 | ByteDance AI Lab | Paper |
128 | PF-NET | Point Fractal Network for 3D point cloud completion | Mar 2020 | Simon Fraser University | Paper |
129 | Total3DUnderstanding | 3D Scene Understanding | Mar 2020 | National University of Singapore | Paper |
130 | SG-NN | Scene Graph Neural Networks | Mar 2020 | Georgia Tech | Paper |
131 | SEAN | Semantic Region-Adaptive Normalization | Mar 2020 | ETH Zürich | Paper |
132 | SAOL | Self-Attention Object Localization | Apr 2020 | Seoul National University | Paper |
133 | VGGNet For Covid19 | Modified VGG architecture for COVID-19 detection | Apr 2020 | Multiple Institutions | Paper |
134 | CentripetalNet | Anchor-free object detection with point-based prediction | Apr 2020 | Megvii Technology | Paper |
135 | PointAugment | Auto-Augmentation for 3D Point Cloud | Apr 2020 | National University of Singapore | Paper |
136 | PQ-Net | Learning to Generate 3D Shapes | Apr 2020 | Stanford University | Paper |
137 | Axial-DeepLab | Stand-Alone Axial-Attention for Vision Models | Apr 2020 | Johns Hopkins University | Paper |
138 | SipMask | Spatial Information Preservation for Fast Instance Segmentation | Apr 2020 | Inception Institute of AI | Paper |
139 | SCAN | Learning to Classify Images without Labels | Apr 2020 | Facebook AI Research | Paper |
140 | MutualNet | Adaptive ConvNet via Mutual Learning | Apr 2020 | Microsoft Research Asia | Paper |
141 | DETR | End-to-End Object Detection with Transformers | May 2020 | Facebook AI Research | Paper |
142 | C-Flow | Conditional Normalizing Flows | May 2020 | ETH Zürich | Paper |
143 | PerfectShape | Shape completion using implicit functions | May 2020 | Stanford University | Paper |
144 | UFO² | Unified Framework for Object Detection | May 2020 | Carnegie Mellon University | Paper |
145 | Refinement Network | RGB-D Scene Understanding | May 2020 | Technical University Munich | Paper |
146 | AssembleNet++ | Video Recognition with Learnable Connectivity | May 2020 | Google Research | Paper |
147 | WeightNet | Revisiting Weight Networks | May 2020 | Microsoft Research | Paper |
148 | YOLOv5 | Improved version of YOLO with better speed-accuracy trade-off | Jun 2020 | Ultralytics | Paper |
149 | UCTGAN | Unsupervised Cartoon-to-Real Translation GAN for image translation between cartoon and real-world domains | Jun 2020 | Nanyang Technological University | Paper |
150 | IF-Nets | Implicit Function Neural Networks for 3D reconstruction | Jun 2020 | Max Planck Institute | Paper |
151 | SketchGCN | Sketch Recognition using Graph Convolutional Networks | Jun 2020 | University of British Columbia | Paper |
152 | AABO | Adaptive Anchor Box Optimization | Jun 2020 | Huawei Noah’s Ark Lab | Paper |
153 | Polka Lines | Line Detection using Polar Coordinates | Jun 2020 | Korea University | Paper |
154 | Pose2Mesh | 3D Human Pose and Mesh Recovery | Jun 2020 | Korea University | Paper |
155 | SNE-RoadSeg | Road Segmentation with Synthetic Data | Jun 2020 | Hong Kong University | Paper |
156 | Deep Hough Transform | Line Detection using Deep Learning | Jun 2020 | Chinese Academy of Sciences | Paper |
157 | Non-Local Sparse Attention | Efficient Attention Mechanism | Jun 2020 | Google Research | Paper |
158 | Hit-Detector | Hierarchical Trinity architecture for object detection combining different detection paradigms | Jul 2020 | ByteDance AI Lab | Paper |
159 | Spectral 3D Computer Vision | Graph Neural Network Library | Jul 2020 | Multiple Contributors | Paper |
160 | TIDE | Error Analysis Tool for Object Detection | Jul 2020 | Carnegie Mellon University | Paper |
161 | SimAug | Learning Robust Representations through Simulation | Jul 2020 | Carnegie Mellon University | Paper |
162 | HOTR | End-to-End Human-Object Interaction Detection | Jul 2020 | KAIST | Paper |
163 | ReXNet | Rethinking Channel Dimensions for Efficient Model Design | Jul 2020 | UC Berkeley | Paper |
164 | Keep Eyes on the Lane | Lane Detection with Deep Learning | Jul 2020 | Shanghai Jiao Tong University | Paper |
165 | AdvPC | Adversarial Point Cloud Defense | Jul 2020 | Tsinghua University | Paper |
166 | PD-GAN | Probabilistic Diverse GAN | Jul 2020 | University of Oxford | Paper |
167 | FedDG | Federated Domain Generalization | Jul 2020 | Carnegie Mellon University | Paper |
168 | Dynamic RCNN | Dynamic R-CNN for object detection with improved training and inference | Aug 2020 | ByteDance AI Lab | Paper |
169 | Aug-FPN | Augmented Feature Pyramid Network for object detection with improved multi-scale feature fusion | Aug 2020 | Tsinghua University | Paper |
170 | Instant-teaching | Self-training for Object Detection | Aug 2020 | ByteDance AI Lab | Paper |
171 | Soft-IntroVAE | Soft Introduction of Variational AutoEncoders | Aug 2020 | Tel Aviv University | Paper |
172 | DiNTS | Differentiable Neural Network Transform Search | Aug 2020 | Microsoft Research | Paper |
173 | Eagle Eye | Fast Sub-net Evaluation for Efficient Neural Network Training | Aug 2020 | MIT | Paper |
174 | StyleMapGAN | Exploiting Spatial Dimensions of Latent for Image Manipulation | Aug 2020 | KAIST | Paper |
175 | TediGAN | Text-Guided Diverse Image Generation | Aug 2020 | Microsoft Research Asia | Paper |
176 | Auto-Exposure Fusion | Automatic Exposure Fusion for Photography | Aug 2020 | ETH Zürich | Paper |
177 | Vision Transformer | Transformer architecture adapted for image recognition tasks | Sep 2020 | Google Research | Paper |
178 | IDU | Instance Depth Embedding for RGB-D salient object detection | Sep 2020 | Nankai University | Paper |
179 | VideoMoCo | Contrastive Learning for Video Understanding | Sep 2020 | Microsoft Research Asia | Paper |
180 | MZSR | Meta-Transfer Learning for Zero-Shot Super-Resolution | Nov 2020 | KAIST | Paper |
181 | DeiT | Data-efficient training of image transformers | Dec 2020 | Facebook AI Research | Paper |
182 | Involution | Inverting Convolution for Visual Recognition | Dec 2020 | Shanghai AI Lab | Paper |
183 | Deep Learning on Semantic Segmentation | Comprehensive Survey and Benchmark | Dec 2020 | Chinese Academy of Sciences | Paper |
184 | LiteFlowNet3 | Lightweight Optical Flow Estimation | Dec 2020 | Chinese University of Hong Kong | Paper |
185 | PPDM | Parallel Point Detection and Matching | Dec 2020 | ByteDance AI Lab | Paper |
186 | RepVGG | Making VGG-style ConvNets Great Again | Jan 2021 | MEGVII Technology | Paper |
187 | PSConvolution | Parameter-Sharing Convolution for Deep Learning | Jan 2021 | Tsinghua University | Paper |
188 | PerPixel Classification | Pixel-wise Classification Network | Jan 2021 | ETH Zürich | Paper |
189 | PIPAL | Perceptual Image Quality Assessment | Jan 2021 | Nanyang Technological University | Paper |
190 | ArtGAN | Artwork Synthesis with GAN | Feb 2021 | NVIDIA Research | Paper |
191 | Synthetic to Real | Domain Adaptation for Semantic Segmentation | Feb 2021 | ETH Zürich | Paper |
192 | Spatial-Phase-Shallow-Learning | Phase-Based Feature Learning | Feb 2021 | Peking University | Paper |
193 | DARKGAN | Dark Image Enhancement with GAN | Feb 2021 | Tsinghua University | Paper |
194 | Deep Imbalance Regression | Learning from Imbalanced Data | Feb 2021 | Carnegie Mellon University | Paper |
195 | Room Classification GNN | Graph Neural Network for Room Layout | Feb 2021 | Facebook Research | Paper |
196 | Pyramid Vision Transformer | Hierarchical Vision Transformer | Feb 2021 | KAIST | Paper |
197 | Residual Attention | Attention Mechanism for CNNs | Feb 2021 | Google Research | Paper |
198 | Teachers do more than teach | Multi-teacher approach for image-to-image translation | Mar 2021 | Tel Aviv University | Paper |
199 | Vip-DeepLab | Visual Parsing DeepLab for Panoptic Segmentation | Mar 2021 | Google Research | Paper |
200 | HistoGAN | Histological Image Generation with GAN | Mar 2021 | University of Oxford | Paper |
201 | Anchor-Free Person Search | End-to-End Person Search without Anchors | Mar 2021 | Chinese Academy of Sciences | Paper |
202 | CBNetV2 | Composite Backbone Network | Mar 2021 | Megvii Technology | Paper |
203 | Kaleido-BERT | Vision-Language Pre-training | Mar 2021 | Microsoft Research Asia | Paper |
204 | Elastic Graph Neural Network | Adaptive Graph Structure Learning | Mar 2021 | Stanford University | Paper |
205 | Rank and Sort Loss | Loss Function for Object Detection | Mar 2021 | ByteDance AI Lab | Paper |
206 | EigenGAN | Eigenvalue-Based GAN Architecture | Mar 2021 | MIT | Paper |
207 | DetCo | Unsupervised Detection Pre-training | Mar 2021 | Microsoft Research Asia | Paper |
208 | MG-GAN | Multi-Generator GAN | Mar 2021 | NVIDIA Research | Paper |
209 | AdaAttN | Adaptive Attention for Style Transfer | Mar 2021 | Microsoft Research Asia | Paper |
210 | AirBERT | Vision-Language Model for Aerial Images | Mar 2021 | Chinese Academy of Sciences | Paper |
211 | DeepGCNs | Deep Graph Convolutional Networks | Mar 2021 | KAUST | Paper |
212 | Survey: Instance Segmentation | Comprehensive review of instance segmentation methods | Mar 2021 | Multiple Institutions | Paper |
213 | LoFTR | Local Feature TRansformer for establishing dense correspondences between images | Apr 2021 | Zhejiang University | Paper |
214 | Semantic Image Matting | Matting with Semantic Guidance | Apr 2021 | ByteDance AI Lab | Paper |
215 | EfficientNetV2 | Improved EfficientNet Architecture | Apr 2021 | Google Research | Paper |
216 | Closed-Loop Matters | Dual Regression for Image Generation | Apr 2021 | University of Oxford | Paper |
217 | Mobile-Former | Mobile-Friendly Transformer | Apr 2021 | Microsoft Research | Paper |
218 | GNeRF | Generalizable Neural Radiance Fields | Apr 2021 | UC Berkeley | Paper |
219 | DETR with Modulated Co-Attention | Enhanced DETR Architecture | Apr 2021 | Facebook AI Research | Paper |
220 | Adaptable GAN Encoders | Flexible GAN Inversion | Apr 2021 | Adobe Research | Paper |
221 | Conformer | Local Features Meet Global Dependencies | Apr 2021 | Shanghai AI Lab | Paper |
222 | VMNet | Visual Manipulation Networks | Apr 2021 | Stanford University | Paper |
223 | Battle of Network Structure | Network Architecture Comparison Study | Apr 2021 | Google Research | Paper |
224 | Efficient Person Search | Fast Person Search Framework | Apr 2021 | University of Technology Sydney | Paper |
225 | SLIDE | Smart Learning on Large-Scale Data | Apr 2021 | Carnegie Mellon University | Paper |
226 | SOTR | Transformer for Set Operations | Apr 2021 | Tsinghua University | Paper |
227 | CANet | Class-Agnostic Segmentation Networks | Apr 2021 | UC Berkeley | Paper |
228 | YOLOP | Real-time Driving Perception | May 2021 | Huawei Noah’s Ark Lab | Paper |
229 | InSeGAN | Interactive Segmentation with GAN | May 2021 | Adobe Research | Paper |
230 | GroupFormer | Group-Based Attention | May 2021 | Microsoft Research | Paper |
231 | Super Neuron | Neural Architecture Enhancement | May 2021 | MIT | Paper |
232 | SO-Pose | Self-Occlusion Aware Pose Estimation | May 2021 | NVIDIA Research | Paper |
233 | TxT | Text-driven Text Generation | May 2021 | Google Research | Paper |
234 | OS2D | One-Stage 2D Object Detection | May 2021 | Yandex Research | Paper |
235 | CodeNet | Large-Scale Code Dataset | May 2021 | IBM Research | Paper |
236 | Geometric Deep Learning | Blueprint for designing architectures for geometric data | May 2021 | Imperial College London | Paper |
237 | Oriented R-CNN | Oriented Object Detection | Jun 2021 | Tongji University | Paper |
238 | XVFI | Video Frame Interpolation | Jun 2021 | KAIST | Paper |
239 | Cross Domain Contrastive Learning | Domain Adaptation via Contrastive Learning | Jun 2021 | Microsoft Research | Paper |
240 | PointManifoldCut | Data Augmentation for Point Clouds | Jun 2021 | Stanford University | Paper |
241 | Distance IOU Loss | Improved Loss Function for Object Detection | Jun 2021 | Tsinghua University | Paper |
242 | ConvMLP | Convolutional MLP Architecture | Jul 2021 | University of Oregon | Paper |
243 | Graph-FPN | Feature Pyramid Networks with Graph Neural Networks | Jul 2021 | Carnegie Mellon University | Paper |
244 | WatchOut! | Motion Blur Impact on DNNs | Jul 2021 | ETH Zürich | Paper |
245 | ECA-Net | Efficient Channel Attention Network | Jul 2021 | Tsinghua University | Paper |
246 | ShiftAddNet | Efficient Neural Network Training | Aug 2021 | MIT | Paper |
247 | Deep Imitation Learning | Survey of Imitation Learning Methods | Aug 2021 | DeepMind | Paper |
248 | 3DETR | 3D Object Detection with Transformers | Aug 2021 | Facebook AI Research | Paper |
249 | ByteTrack | Multi-Object Tracking Framework | Aug 2021 | ByteDance AI Lab | Paper |
250 | Neuron Merging | Network Compression via Neuron Merging | Sep 2021 | Microsoft Research | Paper |
251 | Focal Transformer | Vision Transformer with Focal Attention | Sep 2021 | Microsoft Research | Paper |
252 | Non-Deep Networks | Alternative to Deep Neural Networks | Sep 2021 | MIT | Paper |
253 | PytorchVideo | Deep Learning Library for Video Understanding | Sep 2021 | Facebook AI Research | Paper |
254 | HeadGAN | Head Generation and Editing | Oct 2021 | Tel Aviv University | Paper |
255 | StyleGAN3 | Alias-Free Generative Network | Oct 2021 | NVIDIA Research | Paper |
256 | MedMNIST | Medical Image Dataset Collection | Oct 2021 | Stanford University | Paper |
257 | TokenLearner | Dynamic Token Selection in Vision Transformers | Oct 2021 | Google Research | Paper |
258 | Temporal Fusion Transformer | Multi-horizon Forecasting | Oct 2021 | Google Research | Paper |
259 | NeuralProphet | Neural Network based Time-Series Model | Oct 2021 | Stanford University | Paper |
260 | MetNet-2 | Weather Forecasting Model | Oct 2021 | Google Research | Paper |
261 | Plan-then-generate | Controlled Text Generation | Nov 2021 | Microsoft Research | Paper |
262 | ProjectedGAN | Improved GAN Image Quality | Nov 2021 | NVIDIA Research | Paper |
263 | PHALP | Pose and Human Analysis using Language Processing | Nov 2021 | Carnegie Mellon University | Paper |
264 | Semantic Diffusion Guidance | Controlled Image Generation | Nov 2021 | Stanford University | Paper |
265 | GauGAN | Text-to-Image Generation | Nov 2021 | NVIDIA Research | Paper |
266 | NeatNet | Neural Architecture Evolution | Nov 2021 | Google Research | Paper |
267 | DenseULearn | Dense Prediction with Uncertainty | Nov 2021 | ETH Zürich | Paper |
268 | StyleNeRF | Neural Radiance Fields with Style-based Generation | Dec 2021 | NVIDIA Research | Paper |
269 | Colossal-AI | Large-Scale Parallel Training System | Dec 2021 | UC Berkeley | Paper |
270 | EditGAN | Semantic Image Editing with GANs | Dec 2021 | Adobe Research | Paper |
271 | PoolFormer | Alternative to Attention-based Transformers | Dec 2021 | Sea AI Lab | Paper |
272 | GLIP | Grounded Language-Image Pre-training | Dec 2021 | Microsoft Research | Paper |
273 | PixMix | Data Augmentation Strategy | Dec 2021 | Google Research | Paper |
274 | GANgealing | GAN-based Image Alignment | Dec 2021 | MIT | Paper |
275 | HiClass | Hierarchical Classification Metrics | Dec 2021 | Microsoft Research | Paper |
276 | MetaFormer | General Architecture for Vision | Dec 2021 | Sea AI Lab | Paper |
277 | SAVi | Slot Attention for Video Understanding | Dec 2021 | DeepMind | Paper |
278 | PARP | Parameter Reduction Technique | Dec 2021 | MIT | Paper |
279 | TransMix | Data Augmentation for Transformers | Dec 2021 | Microsoft Research | Paper |
280 | Stable Long Term Video SR | Long-term Video Super Resolution | Dec 2021 | ETH Zürich | Paper |
281 | Few-Shot Learner | Few-Shot Learning Framework | Dec 2021 | Meta AI Research | Paper |
282 | StyleSwin | StyleGAN with Swin Transformer | Dec 2021 | Microsoft Research | Paper |
283 | 2 Stage U-net | Two-Stage Medical Image Segmentation | Dec 2021 | Stanford University | Paper |
284 | ELSA | Efficient Long-term Semantic Aggregation | Dec 2021 | ETH Zürich | Paper |
285 | GLIDE | Text-Guided Image Generation | Dec 2021 | OpenAI | Paper |
286 | AdaViT | Adaptive Vision Transformers | Jan 2022 | Microsoft Research | Paper |
287 | Exemplar Transformers | Example-based Vision Transformers | Jan 2022 | Google Research | Paper |
288 | RepMLNet | Reprogrammable Multi-Layer Network | Jan 2022 | Tsinghua University | Paper |
289 | Untrained Deep NN | Deep Networks without Training | Jan 2022 | MIT | Paper |
290 | JoJoGAN | Just one Joint Training GAN | Jan 2022 | National University of Singapore | Paper |
291 | PRIME | Pre-trained Image Encoders | Jan 2022 | Google Research | Paper |
292 | StyleGAN-V | Video Generation with StyleGAN | Jan 2022 | NVIDIA Research | Paper |
293 | SmoothNet | Motion Smoothing Network | Jan 2022 | ETH Zürich | Paper |
294 | PCACE | Point Cloud Auto-Encoder | Jan 2022 | Tsinghua University | Paper |
295 | Siamese CD | Change Detection with Transformers | Jan 2022 | Wuhan University | Paper |
296 | SASA | Self-Attention Spatial Adaptivity | Jan 2022 | Carnegie Mellon University | Paper |
297 | GCD | Generalized Category Discovery | Jan 2022 | University of Oxford | Paper |
298 | 3D ConvNet Optimization | Optimization Planning for 3D CNNs | Jan 2022 | Google Research | Paper |
299 | SeamlessGAN | Seamless Image Generation | Jan 2022 | Adobe Research | Paper |
300 | HardBoost | Hard Example Mining with Boosting | Jan 2022 | Tsinghua University | Paper |
301 | Q-ViT | Quantized Vision Transformer | Jan 2022 | Meta AI Research | Paper |
302 | GeoFill | Geometry-aware Image Inpainting | Jan 2022 | Adobe Research | Paper |
303 | Detic | Detector with Image Classes | Jan 2022 | UC Berkeley | Paper |
304 | RelTR | Relational Transformer | Jan 2022 | Microsoft Research | Paper |
305 | ResiDualGAN | Residual Dual GAN Architecture | Jan 2022 | NVIDIA Research | Paper |
306 | You Only Cut Once | Single-Shot Instance Segmentation | Jan 2022 | ByteDance AI Lab | Paper |
307 | KFIoU Loss | Kalman Filter IoU Loss Function | Jan 2022 | Tongji University | Paper |
308 | StyleGAN3 Editing | Image and Video Editing Framework | Jan 2022 | NVIDIA Research | Paper |
309 | Block-NeRF | City-scale Neural Radiance Fields using blocked-based decomposition | Jan 2022 | Waymo/Google Research | Paper |
310 | SeMask | Semantically Masked Transformers | Feb 2022 | NVIDIA Research | Paper |
311 | SLIP | Self-supervision with Language-Image Pre-training | Feb 2022 | UC Berkeley | Paper |
312 | Deformable ViT | Vision Transformer with Deformable Attention | Feb 2022 | Microsoft Research | Paper |
313 | Lawin Transformer | Lightweight Transformer for Segmentation | Feb 2022 | Nanjing University | Paper |
314 | HyperionSolarNet | Solar Panel Detection Network | Feb 2022 | Stanford University | Paper |
315 | KerGNNs | Kernel Graph Neural Networks | Feb 2022 | MIT | Paper |
316 | gDNA | Geometric DNA Networks | Feb 2022 | DeepMind | Paper |
317 | HYDRA | Hybrid Deep Learning Architecture | Feb 2022 | Microsoft Research | Paper |
318 | DDU-Net | Dense Dual-Path U-Net | Feb 2022 | Shanghai Jiao Tong University | Paper |
319 | SPAMs | Spatial Attention Modules | Feb 2022 | Google Research | Paper |
320 | ReLICv2 | Representation Learning with Image Consistency | Feb 2022 | Meta AI Research | Paper |
321 | Momentum Capsules | Dynamic Routing with Momentum | Feb 2022 | Google Research | Paper |
322 | SAR Despecking | Transformer for SAR Image Denoising | Feb 2022 | Chinese Academy of Sciences | Paper |
323 | VRT | Video Restoration Transformer | Feb 2022 | ETH Zürich | Paper |
324 | StyleGAN-XL | Extra Large Scale StyleGAN | Feb 2022 | NVIDIA Research | Paper |
325 | AlphaCode | Code Generation AI System | Feb 2022 | DeepMind | Paper |
326 | StyleGAN-Human | Human image synthesis using StyleGAN | Apr 2022 | Microsoft Research | Paper |
327 | How Do Vision Transformers Work? | Analysis of internal mechanisms of Vision Transformers | Jun 2022 | Google Research | Paper |
328 | FERV39k | Facial Expression Recognition Dataset with 39k samples | Jun 2022 | South China University of Technology | Paper |
329 | DaViT | Data-efficient Vision Transformer | Jul 2022 | Microsoft Research | Paper |
330 | BEVFormer | Bird’s Eye View Transformer for autonomous driving | Aug 2022 | Shanghai AI Lab | Paper |
331 | TensoRF | Tensorial Radiance Fields for efficient 3D reconstruction | Sep 2022 | Zhejiang University | Paper |
332 | WebFace260M | Large-scale face recognition dataset | Sep 2022 | InsightFace | Paper |
333 | Neighborhood Attention Transformer | Local attention mechanism for vision tasks | Oct 2022 | Meta AI Research | Paper |
334 | Barbershop | Hair editing and synthesis framework | Oct 2022 | Adobe Research | Paper |
335 | Visual Attention Network | Novel attention mechanism for computer vision | Nov 2022 | Meta AI Research | Paper |
336 | MaskGIT | Masked Generative Image Transformer | Nov 2022 | Google Research | Paper |
337 | CenterNet++ | Improved CenterNet for object detection | Nov 2022 | University of Texas | Paper |
338 | Patch-NetVLAD+ | Enhanced visual place recognition using patch-based features | Dec 2022 | Oxford University | Paper |
339 | PENCIL | Probabilistic end-to-end noise correction | Dec 2022 | NTU Singapore | Paper |
340 | CenterSnap | Center-based 3D object pose estimation | Dec 2022 | Intel Labs | Paper |
341 | AGCN | Adaptive Graph Convolutional Network | Dec 2022 | Tsinghua University | Paper |
342 | AutoAvatar | Automated avatar generation from images | Dec 2022 | Tencent AI Lab | Paper |
343 | Balanced MSE | Balanced Mean Squared Error for imbalanced data | Dec 2022 | Carnegie Mellon University | Paper |
344 | ReCLIP | Improved CLIP with region-based features | Dec 2022 | Google Research | Paper |
345 | EditGAN | GAN-based image editing framework | Dec 2022 | NVIDIA Research | Paper |
346 | HuMMan | Human Motion and Manipulation dataset | Dec 2022 | Max Planck Institute | Paper |
347 | BlobGAN | Unsupervised part-aware image generation | Dec 2022 | MIT | Paper |
348 | Deep Spectral Methods | Spectral analysis for deep learning | Dec 2022 | MIT | Paper |
349 | TransformNet | Transformer-based architecture for geometry transformation | Jan 2023 | Carnegie Mellon University | Paper |
350 | Mirror-YOLO | YOLO variant using mirror augmentation for detection | Jan 2023 | Peking University | Paper |
351 | Paying U-Attention to Textures | U-Net based texture synthesis with attention | Jan 2023 | Adobe Research | Paper |
352 | ZippyPoint | Fast point cloud processing architecture | Jan 2023 | ETH Zürich | Paper |
353 | InsetGAN for Full-Body Image Generation | GAN-based full-body image synthesis | Jan 2023 | Max Planck Institute | Paper |
354 | Mixed Differential Privacy | Privacy-preserving vision model training | Jan 2023 | MIT | Paper |
355 | L³U-Net | Lightweight U-Net variant with enhanced learning | Jan 2023 | ETH Zürich | Paper |
356 | RBGNet | Residual Bidirectional Graph Network | Jan 2023 | Peking University | Paper |
357 | TopFormer | Top-down Transformer for vision tasks | Jan 2023 | Microsoft Research | Paper |
358 | CLIP-GEN | CLIP-guided image generation | Jan 2023 | OpenAI | Paper |
359 | DANBO | Dynamic Attention Network for Body Pose | Jan 2023 | Carnegie Mellon University | Paper |
360 | KeypointNeRF | NeRF with keypoint conditioning | Jan 2023 | Stanford University | Paper |
361 | VOS (Visual Object Streaming) | Efficient streaming framework for video object segmentation | Feb 2023 | ETH Zürich | Paper |
362 | ScoreNet | Score-based generative modeling for point cloud generation | Feb 2023 | UC Berkeley | Paper |
363 | GroupViT | Vision Transformer with dynamic grouping mechanism | Feb 2023 | NVIDIA Research | Paper |
364 | TCTrack | Temporal context-aware tracking framework | Feb 2023 | Chinese Academy of Sciences | Paper |
365 | MLSeg | Multi-level semantic segmentation framework | Feb 2023 | Stanford University | Paper |
366 | StyleBabel | Text-guided style transfer using BABEL embeddings | Feb 2023 | NVIDIA Research | Paper |
367 | Mixed DualStyleGAN | Dual-domain style transfer with mixed training | Feb 2023 | NVIDIA | Paper |
368 | StyleT2I | Style-based text-to-image generation | Feb 2023 | Microsoft Research | Paper |
369 | SPAct | Spatial-temporal action recognition | Feb 2023 | University of Oxford | Paper |
370 | JIFF | Joint Image and Feature Fusion | Feb 2023 | Stanford University | Paper |
371 | C3-STISR | Cross-Camera Stereo Image Super-Resolution | Feb 2023 | Tsinghua University | Paper |
372 | IVY | Integrated Vision System | Feb 2023 | Intel Research | Paper |
373 | StyLandGAN | Stylized landscape generation | Feb 2023 | NVIDIA Research | Paper |
374 | NeuralFusion | Neural fusion for 3D reconstruction using implicit representations | Mar 2023 | MIT | Paper |
375 | COLA | Contrastive learning approach for visual recognition | Mar 2023 | Stanford University | Paper |
376 | VLP (Vision-Language Pre-training) | Joint pre-training for vision and language tasks | Mar 2023 | Microsoft Research | Paper |
377 | Level-K to Nash Equilibrium | Game theoretic approach to vision problems | Mar 2023 | DeepMind | Paper |
378 | HyperTransformer | Hypernetwork-based transformer for vision tasks | Mar 2023 | Google Research | Paper |
379 | GrainSpace | Granular spatial representation learning | Mar 2023 | Carnegie Mellon University | Paper |
380 | ROOD-MRI | Robust out-of-distribution detection for medical imaging | Mar 2023 | MIT | Paper |
381 | Bamboo | Framework for efficient neural architecture search | Mar 2023 | Microsoft Research | Paper |
382 | BigDetection | Large-scale object detection framework | Mar 2023 | Facebook AI Research | Paper |
383 | TransEditor | Transformer-based image editing framework | Mar 2023 | Adobe Research | Paper |
384 | Event Transformer | Transformer architecture for event-based vision | Mar 2023 | Intel Labs | Paper |
385 | MVSTER | Multi-view Stereo Transformer | Mar 2023 | ETH Zürich | Paper |
386 | CLIP-Art | CLIP-based artistic image synthesis | Mar 2023 | DeepMind | Paper |
387 | Sequencer | Sequential modeling for vision tasks | Mar 2023 | Google Research | Paper |
388 | GraphWorld | Benchmark for graph neural networks | Mar 2023 | DeepMind | Paper |
389 | F8Net | Lightweight network for efficient feature extraction | Apr 2023 | Tsinghua University | Paper |
390 | LatentFormer | Transformer architecture for latent space manipulation | Apr 2023 | MIT | Paper |