Top Language Models Released In 2021

The most recent advances in language modelling are described in research papers.

Language models are important while developing natural language processing (NLP) applications. However, developing complicated NLP language models from scratch is a time-consuming process. 

The top language models for the year 2021 are listed below.


AfriBERTa is a multilingual language model pre-trained on data from 11 African languages totalling less than 1 GB. The researchers demonstrate that this model is competitive with pre-trained models on larger datasets and even outperforms them in certain languages. Additionally, the exhaustive trials highlight critical considerations when pretraining multilingual language models on limited datasets. More importantly, the researchers discuss the practical advantages of employing viable language models on short datasets. Finally, the researchers make available the source code, pretrained models, and dataset to spur additional research on multilingual language models for low-resource languages.

For additional details, refer to the article here.


ByT5, a token-free form of multilingual T5, streamlines the natural language processing (NLP) pipeline by obviating the need for vocabulary generation, text preprocessing, and tokenisation. As a result, ByT5 is competitive with parameter-matched mT5 models that use the SentencePiece vocabulary for downstream task quality.

ByT5 outperforms mT5 in four distinct scenarios: 

(1) for model sizes less than 1 billion parameters, 

(2) for productive tasks, 

(3) for multilingual tasks using in-language labels, and 

(4) for tasks including various sources of noise.

For additional details, refer to the article here.


The researchers demonstrated how to pre-train an OD model for VL problems using a novel formula. The new model is larger and more optimised for visual learning tasks.

The researchers validate the new model using a large-scale empirical investigation. The results demonstrate that the new OD model can significantly improve the SoTA outcomes. Furthermore, the study demonstrates that the improvement is primarily due to the design choices.

For additional details, refer to the article here.


Google AI released their new NLP model, known as Fine-tuned LAnguage Net (FLAN), which examines a simple technique called instruction fine-tuning, or instruction tuning for short.

FLAN is fine-tuned on a huge collection of various instructions that employ a basic and intuitive explanation of the task. Creating a collection of instructions from start to fine-tune the model would take a large number of resources. So rather, it uses templates to convert current datasets to an educational format.

For additional details, refer to the article here.


LEXFIT is a process for fine-tuning lexical representations such as BERT into effective decontextualised word encoders via dual-encoder architectures. The trials proved that LEXFIT might supplement the linguistic knowledge currently stored in pretrained LMs with (even small amounts of) external lexical knowledge via further affordable LEXFITing. Furthermore, the researchers successfully deployed LEXFIT to languages that lacked external lexical knowledge curated by humans. In controlled assessments, the LEXFIT word embeddings (WEs) outperform “conventional” static WEs (e.g., fastText) across a spectrum of lexical tasks in a variety of languages, directly calling into question the practical utility of standard WE models in modern NLP.

For additional details, refer to the article here.


A Baidu research team published a report on the 3.0 edition of Enhanced Language RepresentatioN with Informative Entities (ERNIE), a deep-learning model for NLP. The model comprises 10B parameters and outperformed the human baseline score on the SuperGLUE benchmark.

Unlike most other deep-learning NLP models, trained exclusively on unstructured text, ERNIE’s training data also incorporates structured knowledge graph data. In addition, the model comprises a Transformer-XL “backbone” for encoding the input to a latent representation and two distinct decoder networks. As a result, along with establishing a new top score on SuperGLUE, ERNIE established new state-of-the-art scores on 54 Chinese-language NLP tasks.

For additional details, refer to the article here.

Download our Mobile App

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox