Active Hackathon

Top Language Models Released In 2021

The most recent advances in language modelling are described in research papers.

Language models are important while developing natural language processing (NLP) applications. However, developing complicated NLP language models from scratch is a time-consuming process. 

The top language models for the year 2021 are listed below.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

AfriBERTa

AfriBERTa is a multilingual language model pre-trained on data from 11 African languages totalling less than 1 GB. The researchers demonstrate that this model is competitive with pre-trained models on larger datasets and even outperforms them in certain languages. Additionally, the exhaustive trials highlight critical considerations when pretraining multilingual language models on limited datasets. More importantly, the researchers discuss the practical advantages of employing viable language models on short datasets. Finally, the researchers make available the source code, pretrained models, and dataset to spur additional research on multilingual language models for low-resource languages.

For additional details, refer to the article here.

ByT5

ByT5, a token-free form of multilingual T5, streamlines the natural language processing (NLP) pipeline by obviating the need for vocabulary generation, text preprocessing, and tokenisation. As a result, ByT5 is competitive with parameter-matched mT5 models that use the SentencePiece vocabulary for downstream task quality.

ByT5 outperforms mT5 in four distinct scenarios: 

(1) for model sizes less than 1 billion parameters, 

(2) for productive tasks, 

(3) for multilingual tasks using in-language labels, and 

(4) for tasks including various sources of noise.

For additional details, refer to the article here.

VinVL

The researchers demonstrated how to pre-train an OD model for VL problems using a novel formula. The new model is larger and more optimised for visual learning tasks.

The researchers validate the new model using a large-scale empirical investigation. The results demonstrate that the new OD model can significantly improve the SoTA outcomes. Furthermore, the study demonstrates that the improvement is primarily due to the design choices.

For additional details, refer to the article here.

FLAN

Google AI released their new NLP model, known as Fine-tuned LAnguage Net (FLAN), which examines a simple technique called instruction fine-tuning, or instruction tuning for short.

FLAN is fine-tuned on a huge collection of various instructions that employ a basic and intuitive explanation of the task. Creating a collection of instructions from start to fine-tune the model would take a large number of resources. So rather, it uses templates to convert current datasets to an educational format.

For additional details, refer to the article here.

LEXFIT

LEXFIT is a process for fine-tuning lexical representations such as BERT into effective decontextualised word encoders via dual-encoder architectures. The trials proved that LEXFIT might supplement the linguistic knowledge currently stored in pretrained LMs with (even small amounts of) external lexical knowledge via further affordable LEXFITing. Furthermore, the researchers successfully deployed LEXFIT to languages that lacked external lexical knowledge curated by humans. In controlled assessments, the LEXFIT word embeddings (WEs) outperform “conventional” static WEs (e.g., fastText) across a spectrum of lexical tasks in a variety of languages, directly calling into question the practical utility of standard WE models in modern NLP.

For additional details, refer to the article here.

ERNIE 3.0

A Baidu research team published a report on the 3.0 edition of Enhanced Language RepresentatioN with Informative Entities (ERNIE), a deep-learning model for NLP. The model comprises 10B parameters and outperformed the human baseline score on the SuperGLUE benchmark.

Unlike most other deep-learning NLP models, trained exclusively on unstructured text, ERNIE’s training data also incorporates structured knowledge graph data. In addition, the model comprises a Transformer-XL “backbone” for encoding the input to a latent representation and two distinct decoder networks. As a result, along with establishing a new top score on SuperGLUE, ERNIE established new state-of-the-art scores on 54 Chinese-language NLP tasks.

For additional details, refer to the article here.

More Great AIM Stories

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM