Top Ten BERT Alternatives For NLU Projects

The last few years have witnessed a wider adoption of Transformer architecture in natural language processing (NLP) and natural language understanding (NLU). Bidirectional Encoder Representations from Transformers or BERT set new benchmarks for NLP when it was introduced by Google AI Research in 2018. The model has paved the way to newer and enhanced models.  

Here is a compilation of the top ten alternatives of the popular language model BERT for natural language understanding (NLU) projects.

1| GPT-2 and GPT-3 by OpenAI

In 2019, OpenAI rolled out GPT-2 — a transformer-based language model with 1.5 Billion parameters and trained on 8 million web pages. The model comes armed with a broad set of capabilities, including the ability to generate conditional synthetic text samples of good quality.


Sign up for your weekly dose of what's up in emerging technology.

OpenA launched GPT-3 as the successor to GPT-2 in 2020. GPT-3 is an autoregressive language model with 175 billion parameters, ten times more than any previous non-sparse language model. The model, equipped with few-shot learning capability, can generate human-like text and even write code from minimal text prompts.

Know more here.

2| XLNet by Carnegie Mellon University

XLNet is a generalised autoregressive pretraining method for learning bidirectional contexts by maximising the expected likelihood over all permutations of the factorization order. XLNet uses Transformer-XL and is good at language tasks involving long context. Due to its autoregressive formulation, the model performs better than BERT on 20 tasks, including sentiment analysis, question answering, document ranking and natural language inference.

Know more here.

3| RoBERTa by Facebook

Developed by Facebook, RoBERTa or a Robustly Optimised BERT Pretraining Approach is an optimised method for pretraining self-supervised NLP systems. The model is built on the language modelling strategy of BERT that allows RoBERTa to predict intentionally hidden sections of text within otherwise unannotated language examples. It also modifies key hyperparameters in BERT, including removing BERT’s next-sentence pretraining objective and training with much larger mini-batches and learning rates.

Know more here.

4| ALBERT by Google

ALBERT or A Lite BERT for Self-Supervised Learning of Language Representations is an enhanced model of BERT introduced by Google AI researchers. The model incorporates two parameter reduction techniques to overcome major obstacles in scaling pre-trained models. According to its developers, the success of ALBERT demonstrated the significance of distinguishing the aspects of a model that give rise to the contextual representations. It has significantly fewer parameters than a traditional BERT architecture.

Know more here.

5| DistilBERT by Hugging Face

DistilBERT is a distilled version of BERT. DistilBERT is a general-purpose pre-trained version of BERT, 40% smaller, 60% faster and retains 97% of the language understanding capabilities.

Know more here.

6| StructBERT by Alibaba

Developed by the researchers at Alibaba, StructBERT is an extended version of the traditional BERT model. StructBERT incorporates language structures into BERT pre-training by proposing two linearisation strategies. In addition to the existing masking strategy, StructBERT extends BERT by leveraging the structural information, such as word-level ordering and sentence-level ordering. According to its developers, StructBERT advances the state-of-the-art results on a variety of NLU tasks, including the GLUE benchmark, the SNLI dataset and SQuAD v1.1 question answering task.

Know more here.

7| DeBERTa by Microsoft

DeBERTa or Decoding-enhanced BERT with Disentangled Attention is a Transformer-based neural language model that improves the BERT and RoBERTa models using two novel techniques such as a disentangled attention mechanism and an enhanced mask decoder. DeBERTa is pre-trained using MLM.

Know more here.

8| Text-to-Text Transfer Transformer (T5) by Google

Text-to-Text Transfer Transformer (T5) is a unified framework that converts all text-based language problems into a text-to-text format. In contrast to BERT-style models that can only output either a class label or a span of the input, T5 reframes all NLP tasks into a unified text-to-text-format where the input and output are always text strings. The text-to-text framework allows the use of the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarisation, question answering as well as classification tasks.

Know more here.

9| UniLM by Microsoft

Developed by Microsoft, UniLM or Unified Language Model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilising specific self-attention masks to control what context the prediction conditions on. The model can be fine-tuned for both natural language understanding and generation tasks. UNILM achieved state-of-the-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarisation ROUGE-L.

Know more here.

10| Reformer by Google

Reformer is a Transformer model designed to handle context windows of up to one million words; all on a single accelerator. Introduced by Google AI researchers, the model takes up only 16GB memory and combines two fundamental techniques to solve the problems of attention and memory allocation that limit the application of Transformers to long context windows.

Know more here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM