35 minute read

What-are-Transformers-in-AI

What Are Transformers in AI

Transformer Architecture

Transformer

Background

Whether GPT, ChatGPT, DALL-E, Whisper, Satablity AI or whatever significant you see in the AI worlds nowdays it is because of Transformer Architecture. Transformers are a type of neural network architecture that have several properties that make them effective for modeling data with long-range dependencies. They generally feature a combination of multi-headed attention mechanisms, residual connections, layer normalization, feedforward connections, and positional embeddings.

Precursors of Transformers were RNN, LSTM, and GRU architecture. Transformers are based on the 2017 research paper “Attention is All You Need”

Initially, Transformers were used for NLP-related tasks. Slowly researchers started exploring the power of the Transformer Architectures and as of 2023 these are used for hundreds of tasks in different AI domains of technologies like:

  • Text Models (NLP, NLU, NLG)
  • Vision Models (Computer Vision)
  • Audio Models (Audio Processing, Classification, Audio Generation)
  • Reinforcement (RL) Models
  • Time-series Models
  • Multimodal: OCR (extract information from scanned documents), video classification, visual QA, table data question answering
  • Graph Models

Starting the journey in 2017, as of now (2023) we have approx 200 Transformer based architectures proposed by various researchers for various purposes. Using these architecture and various benchmark datasets thousands of models have been created which give SOTA performance on various tasks. Based on your need you choose which architecture can help you meet your project objective. There are high chances you will get some pre-trained models which you can use without training (Zero-shot) or small finetuning (one-shot or few-shot) efforts. For that you need to explore Huggingface and PaperWithCode

This articles list all the major Transformer related researcher paper, their creators, capability and date of release.

Tasks, which Transformer can do

Vision Tasks

  • Image classification
  • Semantic segmentation
  • Video classification
  • Object detection
  • Zero-shot object detection
  • Zero-shot image classification
  • Depth estimation

Multimodal Tasks

  • Image captioning
  • Document Question Answering
  • Image to Text
  • Text to Video
  • Document Question Answering
  • Visual Question Answering
  • Text to Image
  • Image to Image
  • Image Generation

Audio Tasks

  • Audio classification
  • Automatic speech recognition
  • Audio to Audio
  • Text to Speech
  • Voice Activity Detection
  • Audio Generation

Text Tasks

  • Text classification
  • Token classification (NER, POS etc)
  • Question answering
  • Causal language modeling
  • Masked language modeling
  • Translation
  • Summarization
  • Multiple choice
  • Sentence Similarity
  • Table Question Answering
  • Fill in the black (Masking Filling)
  • Conversation

Frameworks Used for Writing Models

As of May’2023 following frameworks are used for creating models.

  • TensorFlow
  • Caffe
  • Caffe2
  • PyTorch
  • MXNet
  • Keras
  • Chainer
  • JAX

Number of Models in Model Repositories

There are many model repositories but the most famous are as below. These model repositories host pre-trained models. You can download these models and use them for your project.

  • Huggingface : As of May, 2023, Huggingface has 196,000+ models in the repository. As of Sep’2021, there were 10,000 models. You can see the exponential growth in the models in the Huggingface model repository.
  • Another model repository tfhub has around 132,000+ models. Tfhub hosts tensorflow-based models.
  • Keras Moel Zoo hosts around 3500 models.
  • Pytorch Model Hub

Summary of 200+ Transformer

Below is the table which summarises these approx 200 transformers.

Note : Name starting with * are not Transformers, most of them are pretransformer age architectures.
Help Needed: If you find any archive paper’s link is incorrect then let me know via hari.prasad@vedavit-ps.com

Sno Transformer Date Type Researcher Paper Author
1 *AlexNet Paper Dec, 2012 CNN University of Toronto, Google ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton
2 *VGG16 Paper Sep, 2014 CNN University of Oxford Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan, Andrew Zisserman
3 *VGG19 Paper Apr, 2015 CNN University of Oxford Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan, Andrew Zisserman
4 *ResNet Paper Dec, 2015 CNN Microsoft Research Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
5 *InceptionResNet Paper Aug, 2016 CNN Google Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi
6 *ConvNeXt Paper Dec, 2016 CNN Cornell University, Tsinghua University Convolutional Neural Networks with Alternately Updated Clique Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, Kilian Weinberger
7 *DenseNet Paper Jan, 2017 CNN Cornell University, Tsinghua University Densely Connected Convolutional Networks Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger
8 *MobileNetV1 Paper Apr, 2017 Autoencoding Google Inc. Efficient Convolutional Neural Networks for Mobile Vision Applications by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.
9 *Xception Paper Apr, 2017 CNN Google Xception: Deep Learning with Depthwise Separable Convolutions François Chollet
10 EncoderDecoder Paper May, 2017 Sequence-to-Sequence Google Research Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
11 *MobileNetV2 Paper Feb, 2018 Autoencoding Google Inc. Inverted Residuals and Linear Bottlenecks by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.
12 Data2Vec Paper Mar, 2018 Language Model Facebook A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
13 GPT Paper Jun, 2018 Autoregressive OpenAI Improving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
14 BERT Paper Oct, 2018 Autoencoding Google Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
15 MarianMT Paper Oct, 2018 Autoencoding   Machine translation models trained using OPUS data by Jörg Tiedemann. The Marian Framework is being developed by the Microsoft Translator Team.
16 BiT Paper Jan, 2019 Vision Transformer Google AI General Visual Representation Learning by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby.
17 Transformer-XL Paper Jan, 2019 Autoregressive Google/CMU Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
18 XLM Paper Jan, 2019 BERT-based Facebook Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.
19 CTRL Paper Feb, 2019 Autoencoding Salesforce A Conditional Transformer Language Model for Controllable Generation by Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong and Richard Socher.
20 GPT-2 Paper Feb, 2019 Autoregressive OpenAI Language Models are Unsupervised Multitask Learners by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodeiand Ilya Sutskever.
21 Funnel Transformer Paper Apr, 2019 Autoregressive CMU/Google Brain Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
22 *EfficientNet B0 Paper May, 2019 CNN Google Research EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Mingxing Tan, Quoc V. Le
23 ALBERT Paper May, 2019 Factorized BERT Google Research and the Toyota Technological Institute at Chicago A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
24 EfficientNet Paper May, 2019 Vision Transformer Google Brain Rethinking Model Scaling for Convolutional Neural Networks by Mingxing Tan, Quoc V. Le.
25 MobileNetV3 Paper May, 2019 Autoencoding Google Searching for MobileNetV3 Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam
26 Nezha Paper May, 2019 Autoencoding Huawei Noah’s Ark Lab Neural Contextualized Representation for Chinese Language Understanding by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
27 BART Paper Jun, 2019 Sequence-to-Sequence Facebook Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
28 ERNIE Paper Jun, 2019 Autoencoding Baidu Enhanced Representation through Knowledge Integration by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
29 ErnieM Paper Jun, 2019 Autoencoding Baidu Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang.
30 FlauBERT Paper Jun, 2019 Autoencoding CNRS Unsupervised Language Model Pre-training for French by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
31 LXMERT Paper Jun, 2019 Autoencoding UNC Chapel Hill Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering by Hao Tan and Mohit Bansal.
32 Pegasus Paper Jun, 2019 Autoregressive Google Pre-training with Extracted Gap-sentences for Abstractive Summarization by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
33 XLNet Paper Jun, 2019 Autoregressive Google/CMU Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
34 BioGpt Paper Jul, 2019 Autoregressive Microsoft Research AI4Science generative pre-trained transformer for biomedical text generation and mining by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu.
35 Hubert Paper Jul, 2019 Autoencoding Facebook Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
36 REALM Paper Jul, 2019 Hybrid Google Research Retrieval-Augmented Language Model Pre-Training by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
37 SpeechToTextTransformer Paper Jul, 2019 Hybrid Facebook, Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
38 XLM-V Paper Jul, 2019 Multilingual Meta AI Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models by Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa.
39 RoBERTa Paper Aug, 2019 BERT-based Facebook A Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
40 GPT Neo Paper Sep, 2019 Autoregressive EleutherAI EleutherAI/gpt-neo by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
41 CamemBERT Paper Oct, 2019 Autoencoding Inria/Facebook/Sorbonne a Tasty French Language Model by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
42 DialoGPT Paper Oct, 2019 Autoregressive Microsoft Research Large-Scale Generative Pre-training for Conversational Response Generation by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
43 DistilBERT Paper Oct, 2019 Autoencoding HuggingFace smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
44 LiLT Paper Oct, 2019 Autoencoding South China University of Technology A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding by Jiapeng Wang, Lianwen Jin, Kai Ding.
45 LUKE Paper Oct, 2019 Autoencoding Studio Ousia Deep Contextualized Entity Representations with Entity-aware Self-attention by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
46 MobileBERT Paper Oct, 2019 Autoencoding CMU/Google Brain a Compact Task-Agnostic BERT for Resource-Limited Devices by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
47 MT5 Paper Oct, 2019 Autoregressive Google AI A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
48 RAG Paper Oct, 2019 Hybrid Facebook Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
49 ConvBERT Paper Nov, 2019 Autoencoding YituTech Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
50 Megatron-GPT2 Paper Nov, 2019 Autoregressive NVIDIA Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
51 PhoBERT Paper Nov, 2019 BERT-based VinAI Research Pre-trained language models for Vietnamese by Dat Quoc Nguyen and Anh Tuan Nguyen.
52 RoBERTa-PreLayerNorm Paper Nov, 2019 BERT-based Facebook A Fast, Extensible Toolkit for Sequence Modeling by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli.
53 BERTweet Paper Dec, 2019 Autoencoding VinAI Research A pre-trained language model for English Tweets by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
54 mBART Paper Dec, 2019 Autoregressive Facebook Multilingual Denoising Pre-training for Neural Machine Translation by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
55 Megatron-BERT Paper Dec, 2019 Autoregressive NVIDIA Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
56 SpeechToTextTransformer2 Paper Dec, 2019 Hybrid Facebook, Large-Scale Self- and Semi-Supervised Learning for Speech Translation by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
57 BERT For Sequence Generation Paper Feb, 2020 Autoencoding Google Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
58 ConvNeXT Paper Mar, 2020 Vision Transformer Facebook AI A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
59 ELECTRA Paper Apr, 2020 Autoencoding Google Research/Stanford University Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
60 Longformer Paper Apr, 2020 Autoregressive AllenAI The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
61 RegNet Paper Apr, 2020 CNN META Platforms Designing Network Design Space by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
62 SqueezeBERT Paper Apr, 2020 BERT-based Berkeley What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
63 LayoutLM Paper May, 2020 Autoencoding Microsoft Research Asia Pre-training of Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
64 MPNet Paper May, 2020 Autoencoding Microsoft Research Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
65 VisualBERT Paper May, 2020 BERT-based UCLA NLP A Simple and Performant Baseline for Vision and Language by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
66 Conditional DETR Paper Jun, 2020 Vision Transformer Microsoft Research Asia Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.
67 GPTBigCode Paper Jun, 2020 Autoregressive BigCode don’t reach for the stars! by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
68 M-CTC-T Paper Jun, 2020 Autoencoding Facebook Pseudo-Labeling For Massively Multilingual Speech Recognition by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
69 Pix2Struct Paper Jun, 2020 Hybrid Google Screenshot Parsing as Pretraining for Visual Language Understanding by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova.
70 ProphetNet Paper Jun, 2020 Autoregressive Microsoft Research Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
71 SEW Paper Jun, 2020 Vision Transformer (ViT) ASAPP Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
72 T5 Paper Jun, 2020 Autoregressive Google AI Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
73 DeBERTa Paper Jul, 2020 Autoencoding Microsoft Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
74 Informer Paper Jul, 2020 Autoencoding Beihang University, UC Berkeley, Rutgers University, SEDD Company Beyond Efficient Transformer for Long Sequence Time-Series Forecasting by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
75 LED Paper Jul, 2020 Autoregressive AllenAI The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
76 SwitchTransformers Paper Jul, 2020 Hybrid Google Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by William Fedus, Barret Zoph, Noam Shazeer.
77 Whisper Paper Jul, 2020 Autoregressive OpenAI Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.
78 XLM-ProphetNet Paper Jul, 2020 Hybrid Microsoft Research Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
79 XLM-RoBERTa Paper Jul, 2020 BERT-based Facebook AI, Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
80 Deformable DETR Paper Aug, 2020 Vision Transformer SenseTime Research Deformable Transformers for End-to-End Object Detection by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.
81 FNet Paper Aug, 2020 Autoencoding Google Research Mixing Tokens with Fourier Transforms by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
82 GPTSAN-japanese Paper Aug, 2020 Autoregressive   released in the repository tanreinama/GPTSAN by Toshiyuki Sakamoto(tanreinama).
83 SEW-D Paper Aug, 2020 Vision Transformer (ViT) ASAPP Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
84 CPM Paper Sep, 2020 Sequence-to-Sequence Tsinghua University A Large-scale Generative Chinese Pre-trained Language Model by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
85 GIT Paper Sep, 2020 Autoencoding Microsoft Research A Generative Image-to-text Transformer for Vision and Language by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang.
86 LayoutXLM Paper Sep, 2020 Autoencoding Microsoft Research Asia Multimodal Pre-training for Multilingual Visually-rich Document Understanding by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
87 DETR Paper Oct, 2020 Vision Transformer Facebook End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
88 GPT NeoX Paper Oct, 2020 Autoregressive EleutherAI An Open-Source Autoregressive Language Model by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
89 RemBERT Paper Oct, 2020 BERT-based Google Research Rethinking embedding coupling in pre-trained language models by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
90 RoCBert Paper Oct, 2020 BERT-based WeChatAI Robust Chinese Bert with Multimodal Contrastive Pretraining by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou.
91 TAPAS Paper Oct, 2020 Hybrid Google AI Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
92 UPerNet Paper Oct, 2020 Vision Transformer (ViT) Peking University Unified Perceptual Parsing for Scene Understanding by Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun.
93 Vision Transformer (ViT) Paper Oct, 2020 Vision Transformer (ViT) Google AI Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
94 Wav2Vec2 Paper Oct, 2020 Autoregressive Facebook AI A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
95 PLBart Paper Nov, 2020 Hybrid UCLA NLP Unified Pre-training for Program Understanding and Generation by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
96 DiT Paper Dec, 2020 Vision Transformer Microsoft Research Self-supervised Pre-training for Document Image Transformer by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
97 DPR Paper Dec, 2020 Sequence-to-Sequence Facebook Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
98 GLPN Paper Dec, 2020 Autoencoding KAIST Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
99 LeViT Paper Dec, 2020 Autoencoding Meta AI A Vision Transformer in ConvNet’s Clothing for Faster Inference by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
100 NAT Paper Dec, 2020 Autoencoding SHI Labs Neighborhood Attention Transformer by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.
101 TAPEX Paper Dec, 2020 Hybrid Microsoft Research Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
102 VideoMAE Paper Dec, 2020 Hybrid Multimedia Computing Group, Nanjing University Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
103 Wav2Vec2-Conformer Paper Dec, 2020 Autoregressive Facebook AI Fast Speech-to-Text Modeling with FAIRSEQ by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
104 CLIP Paper Jan, 2021 Vision-Language Pretraining OpenAI Learning Transferable Visual Models From Natural Language Supervision by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
105 XLS-R Paper Jan, 2021 Autoregressive Facebook AI Self-supervised Cross-lingual Speech Representation Learning at Scale by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
106 Audio Spectrogram Transformer Paper Feb, 2021 Audio Transformer MIT Audio Spectrogram Transformer by Yuan Gong, Yu-An Chung, James Glass.
107 M2M100 Paper Feb, 2021 Autoregressive Facebook Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
108 MEGA Paper Feb, 2021 Autoencoding Facebook Moving Average Equipped Gated Attention by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer.
109 BEiT Paper Mar, 2021 Vision Transformer Microsoft BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei.
110 BigBird-Pegasus Paper Mar, 2021 Sequence-to-Sequence Google Research Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
111 BigBird-RoBERTa Paper Mar, 2021 Autoencoding Google Research Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
112 CLIPSeg Paper Mar, 2021 Vision-Language Pretraining University of Göttingen Image Segmentation Using Text and Image Prompts by Timo Lüddecke and Alexander Ecker.
113 DPT Paper Mar, 2021 Vision Transformer Intel Labs Vision Transformers for Dense Prediction by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
114 Perceiver IO Paper Mar, 2021 Hybrid Deepmind A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
115 Reformer Paper Mar, 2021 Hybrid Google Research The Efficient Transformer by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
116 RoFormer Paper Mar, 2021 Hybrid ZhuiyiTechnology Enhanced Transformer with Rotary Position Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
117 Swin Transformer Paper Mar, 2021 Vision Transformer (ViT) Microsoft Hierarchical Vision Transformer using Shifted Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
118 TrOCR Paper Mar, 2021 Hybrid Microsoft, Transformer-based Optical Character Recognition with Pre-trained Models by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
119 Wav2Vec2Phoneme Paper Mar, 2021 Autoregressive Facebook AI Simple and Effective Zero-shot Cross-lingual Phoneme Recognition by Qiantong Xu, Alexei Baevski, Michael Auli.
120 X-CLIP Paper Mar, 2021 Hybrid Microsoft Research Expanding Language-Image Pretrained Models for General Video Recognition by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling.
121 XLSR-Wav2Vec2 Paper Mar, 2021 Autoregressive Facebook AI Unsupervised Cross-Lingual Representation Learning For Speech Recognition by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
122 Blenderbot Paper Apr, 2021 Sequence-to-Sequence Facebook Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
123 BlenderbotSmall Paper Apr, 2021 Sequence-to-Sequence Facebook Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
124 BLIP Paper Apr, 2021 Vision Transformer Salesforce Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi.
125 ByT5 Paper Apr, 2021 Sequence-to-Sequence Google Research Towards a token-free future with pre-trained byte-to-byte models by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
126 CvT Paper Apr, 2021 Vision Transformer Microsoft Introducing Convolutions to Vision Transformers by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
127 DeBERTa-v2 Paper Apr, 2021 Autoencoding Microsoft Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
128 DeiT Paper Apr, 2021 Vision Transformer Facebook Training data-efficient image transformers & distillation through attention by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
129 GroupViT Paper Apr, 2021 Autoencoding UCSD, NVIDIA Semantic Segmentation Emerges from Text Supervision by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
130 LayoutLMv2 Paper Apr, 2021 Autoencoding Microsoft Research Asia Multi-modal Pre-training for Visually-Rich Document Understanding by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
131 MaskFormer Paper Apr, 2021 Autoencoding Meta and UIUC Per-Pixel Classification is Not All You Need for Semantic Segmentation by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
132 SegFormer Paper Apr, 2021 Hybrid NVIDIA Simple and Efficient Design for Semantic Segmentation with Transformers by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
133 Time Series Transformer Paper Apr, 2021 Hybrid HuggingFace.    
134 TimeSformer Paper Apr, 2021 Hybrid Facebook Space-Time Attention All You Need for Video Understanding? by Gedas Bertasius, Heng Wang, Lorenzo Torresani.
135 Trajectory Transformer Paper Apr, 2021 Hybrid the University of California at Berkeley Offline Reinforcement Learning as One Big Sequence Modeling Problem by Michael Janner, Qiyang Li, Sergey Levine
136 UniSpeech Paper Apr, 2021 Hybrid Microsoft Research Unified Speech Representation Learning with Labeled and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
137 UniSpeechSat Paper Apr, 2021 Hybrid Microsoft Research UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
138 ALIGN Paper May, 2021 Vision Transformer Google Research Scaling Up Visual and Vision-Language. Representation Learning With Noisy Text Supervision by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
139 BORT Paper May, 2021 Sequence-to-Sequence Alexa Optimal Subarchitecture Extraction For BERT by Adrian de Wynter and Daniel J. Perry.
140 DePlot Paper May, 2021 Vision Transformer Google AI One-shot visual language reasoning by plot-to-table translation by Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun.
141 DETA Paper May, 2021 Sequence-to-Sequence The University of Texas at Austin NMS Strikes Back by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl.
142 DiNAT Paper May, 2021 Vision Transformer SHI Labs Dilated Neighborhood Attention Transformer by Ali Hassani and Humphrey Shi.
143 Jukebox Paper May, 2021 Autoencoding OpenAI A Generative Model for Music by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.
144 mBART-50 Paper May, 2021 Autoregressive Facebook Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
145 Nyströmformer Paper May, 2021 Autoencoding the University of Wisconsin - Madison A Nyström-Based Algorithm for Approximating Self-Attention by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
146 ViT Hybrid Paper May, 2021 Hybrid Google AI Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
147 X-MOD Paper May, 2021 Hybrid Meta AI Lifting the Curse of Multilinguality by Pre-training Modular Transformers by Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe.
148 BARTpho Paper Jun, 2021 Autoregressive VinAI Research Pre-trained Sequence-to-Sequence Models for Vietnamese by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
149 BridgeTower Paper Jun, 2021 Vision Transformer Harbin Institute of Technology/Microsoft Research Asia/Intel Labs Building Bridges Between Encoders in Vision-Language Representation Learning by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
150 CodeGen Paper Jun, 2021 Vision Transformer Salesforce A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
151 GPT-J Paper Jun, 2021 Autoregressive EleutherAI released in the repository kingoflolz/mesh-transformer-jax by Ben Wang and Aran Komatsuzaki.
152 LLaMA Paper Jun, 2021 Autoencoding The FAIR team of Meta AI Open and Efficient Foundation Language Models by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample.
153 MarkupLM Paper Jun, 2021 Autoencoding Microsoft Research Asia Pre-training of Text and Markup Language for Visually-rich Document Understanding by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.
154 PoolFormer Paper Jun, 2021 Autoregressive Sea AI Labs MetaFormer is Actually What You Need for Vision by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
155 QDQBert Paper Jun, 2021 BERT-based NVIDIA Principles and Empirical Evaluation by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
156 ViLT Paper Jun, 2021 Vision Transformer (ViT) NAVER AI Lab/Kakao Enterprise/Kakao Brain Vision-and-Language Transformer Without Convolution or Region Supervision by Wonjae Kim, Bokyung Son, Ildoo Kim.
157 BARThez Paper Jul, 2021 Autoregressive École polytechnique a Skilled Pretrained French Sequence-to-Sequence Model by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
158 Donut Paper Jul, 2021 Time Series Transformer NAVER OCR-free Document Understanding Transformer by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
159 ImageGPT Paper Jul, 2021 Autoregressive OpenAI Generative Pretraining from Pixels by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
160 OPT Paper Jul, 2021 Hybrid Meta AI Open Pre-trained Transformer Language Models by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
161 Splinter Paper Jul, 2021 Hybrid Tel Aviv University, Few-Shot Question Answering by Pretraining Span Selection by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
162 XGLM Paper Jul, 2021 Hybrid Facebook AI Few-shot Learning with Multilingual Language Models by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
163 YOSO Paper Jul, 2021 Object Detection the University of Wisconsin - Madison You Only Sample (Almost)  
164 EfficientFormer Paper Aug, 2021 Vision Transformer Snap Research Vision Transformers at MobileNetSpeed by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren.
165 ESM Paper Aug, 2021 Protein Transformer Meta AI ESM-1b. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.
166 ESM Paper Aug, 2021   Meta AI ESM-1v was released with the paper Language models enable zero-shot prediction of the effects of mutations on protein function by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives.
167 ESM Paper Aug, 2021   Meta AI ESM-2 and ESMFold were released with the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
168 Mask2Former Paper Aug, 2021 Autoencoding FAIR and UIUC Masked-attention Mask Transformer for Universal Image Segmentation by Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar.
169 MGP-STR Paper Aug, 2021 Autoencoding Alibaba Research Multi-Granularity Prediction for Scene Text Recognition by Peng Wang, Cheng Da, and Cong Yao.
170 NLLB Paper Aug, 2021 Autoencoding Meta Scaling Human-Centered Machine Translation by the NLLB team.
171 T5v1.1 Paper Aug, 2021 Autoregressive Google AI released in the repository google-research/text-to-text-transfer-transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
172 TVLT Paper Aug, 2021 Hybrid UNC Chapel Hill Textless Vision-Language Transformer by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal.
173 WavLM Paper Aug, 2021 Autoregressive Microsoft Research Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
174 XLM-RoBERTa-XL Paper Aug, 2021 BERT-based Facebook AI, Larger-Scale Transformers for Multilingual Masked Language Modeling by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
175 Chinese-CLIP Paper Sep, 2021 Vision-Language Pretraining OFA-Sys Contrastive Vision-Language Pretraining in Chinese by An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou.
176 CLAP Paper Sep, 2021 Vision Transformer LAION-AI [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation]https //arxiv.org/abs/2211.06687)  
177 Decision Transformer Paper Sep, 2021 Vision Transformer Berkeley/Facebook/Google Reinforcement Learning via Sequence Modeling by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
178 BLIP-2 Paper Oct, 2021 Vision Transformer Salesforce Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models by Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi.
179 CANINE Paper Oct, 2021 Vision Transformer Google Research Pre-training an Efficient Tokenization-Free Encoder for Language Representation by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
180 Graphormer Paper Oct, 2021 Autoencoding Microsoft Do Transformers Really Perform Bad for Graph Representation? by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu.
181 I-BERT Paper Oct, 2021 Autoencoding Berkeley Integer-only BERT Quantization by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
182 MatCha Paper Oct, 2021 Autoencoding Google AI Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering by Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos.
183 mLUKE Paper Oct, 2021 Autoencoding Studio Ousia The Power of Entity Representations in Multilingual Pretrained Language Models by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
184 MobileViT Paper Oct, 2021 Autoencoding Apple Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari.
185 OWL-ViT Paper Oct, 2021 Vision Transformer (ViT) Google AI Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
186 SpeechT5 Paper Oct, 2021 Autoregressive Microsoft Research Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
187 Swin Transformer V2 Paper Oct, 2021 Vision Transformer (ViT) Microsoft Scaling Up Capacity and Resolution by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
188 ViTMAE Paper Oct, 2021 Vision Transformer (ViT) Meta AI Masked Autoencoders Are Scalable Vision Learners by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
189 BLOOM Paper Nov, 2021 Vision Transformer BigScience workshop    
190 ConvNeXTV2 Paper Nov, 2021 Vision Transformer Facebook AI Co-designing and Scaling ConvNets with Masked Autoencoders by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
191 CPM-Ant Paper Nov, 2021 Sequence-to-Sequence OpenBMB    
192 GPT-Sw3 Paper Nov, 2021 Autoregressive AI-Sweden Building the First Large-Scale Generative Language Model for Swedish by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson, Magnus Sahlgren.
193 LongT5 Paper Nov, 2021 Autoregressive Google AI Efficient Text-To-Text Transformer for Long Sequences by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
194 OneFormer Paper Nov, 2021 Autoregressive SHI Labs One Transformer to Rule Universal Image Segmentation by Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi.
195 Table Transformer Paper Nov, 2021 Hybrid Microsoft Research Towards Comprehensive Table Extraction From Unstructured Documents by Brandon Smock, Rohith Pesala, Robin Abraham.
196 VAN Paper Nov, 2021 Vision Transformer (ViT) Tsinghua University and Nankai University Visual Attention Network by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
197 AltCLIP Paper Dec, 2021 Vision-Language Pretraining BAAI Altering the Language Encoder in CLIP for Extended Language Capabilities by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
198 MVP Paper Dec, 2021 Autoencoding RUC AI Box Multi-task Supervised Pre-training for Natural Language Generation by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
199 NLLB-MOE Paper Dec, 2021 Autoencoding Meta Scaling Human-Centered Machine Translation by the NLLB team.
200 PEGASUS-X Paper Dec, 2021 Autoregressive Google Investigating Efficiently Extending Transformers for Long Input Summarization by Jason Phang, Yao Zhao, and Peter J. Liu.
201 Swin2SR Paper Dec, 2021 Vision Transformer (ViT) University of Würzburg SwinV2 Transformer for Compressed Image Super-Resolution and Restoration by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte.
202 UL2 Paper Dec, 2021 Hybrid Google Research Unifying Language Learning Paradigms by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
203 ViTMSN Paper Dec, 2021 Vision Transformer (ViT) Meta AI Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.
204 YOLOS Paper Dec, 2021 Object Detection Huazhong University of Science & Technology Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
205 FLAN-T5 Paper Feb, 2022 Autoregressive Google AI released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
206 GPT NeoX Japanese Paper Feb, 2022 Autoregressive ABEJA by Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori.  
207 LayoutLMv3 Paper Mar, 2022 Autoencoding Microsoft Research Asia Pre-training for Document AI with Unified Text and Image Masking by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
208 FLAN-UL2 Paper Apr, 2022 Autoregressive Google AI released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
209 FLAVA Paper Apr, 2022 Autoencoding Facebook AI A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.

Conclusion

I hope this article gave you an idea about Transformer architecture, their variants, their types, their birth chronology and the creators. As we have seen, the Transformer architecture has been a game-changer in natural language processing and computer vision tasks. It has been instrumental in enabling breakthroughs in machine translation, language understanding, and image classification, among other fields.

There are many types of Transformers, such as autoregressive models like GPT, autoencoding models like BERT and its variants, and hybrid models that combine the strengths of both. Additionally, there are many variants of the Transformer architecture, such as XLNet, RoBERTa, and T5, each with their unique contributions and improvements.

The Transformer’s birth chronology spans just a few years, from the original paper in 2017 to the latest models that are being developed today. Its creators include some of the most prominent names in the field of AI, such as Google, Facebook, and OpenAI.

As AI technology continues to evolve, we can expect more exciting developments in the field of Transformers, with even more powerful and sophisticated models that can tackle even more complex tasks. The Transformer architecture has shown us that there is still much to explore in the world of deep learning, and we can’t wait to see what the future holds.