Active Hackathon

BERT Is So Popular That Google Have To Release A Website To Collate All Developments

With the advent of transformer-based machine translation models, researchers have been successful in implementing state-of-the-art performance in natural language processing (NLP). In 2018, Google open-sourced its groundbreaking state-of-the-art technique for NLP pre-training called Bidirectional Encoder Representations from Transformers, or BERT. With the help of this model, one can train their state-of-the-art NLP model in a few hours using a single GPU or a single Cloud TPU. The power of this model lies in the certainty that using some specific downstream tasks, BERT can be easily fine-tuned to gain state-of-the-art outcomes.

Driven by the potential of BERT models, developers working in the NLP domain have generated a number of BERT models that are fine-tuned, trained on some specific language as well as tested on a certain data domain and task. Recently, the researchers at Google released a multilingual language model known as multilingual BERT model or mBERT. mBERT supports more than 100 languages and is trained in different domains, like social media posts or newspaper articles. The main aim behind this project is to provide a quick and easy overview of the similar attributes as well as dissimilarities between Language-Specific BERT models and the Multilingual BERT model. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Behind mBERT

Multilingual BERT is a single language model pre-trained from monolingual corpora in 104 languages using Wikipedia data. The model allows for zero-shot learning across languages, which means one can train data in a particular language and then apply the trained model to data in some other language. Thus, this model obtained impressive results on a zero-shot cross-lingual natural inference task.

BERT has further been extended to include several languages. Besides Multilingual BERT (mBERT), there are a number of BERTs available such as A Lite BERT (ALBERT), RoBERTa, etc. According to the researchers, while the multi- and cross-lingual BERT representations allow zero-shot learning and capture universal semantic structures, they do gloss over language-specific differences. This is the main reason behind the development of the number of language-specific BERT models. 

In order to navigate the potential of constant development on these BERT models, the researchers introduced a website called BertLang. In this website, the researchers gathered different language-specific models, which have been introduced on a variety of tasks and data sets.  

BertLang Street

The BertLang website currently includes 30 BERT-based models, 18 languages and 28 tasks. According to the researchers, this website is a collaborative resource to help researchers understand and find the best BERT model for a given dataset, task and language. The website provides a searchable interface as well as the possibility to add new information. 

Wrapping Up

With the impressive progress of NLP techniques in various domains, there has been an increase in the number of language-specific BERT models developed by the NLP researchers but which model provides the best performance is still in a vague state. In this project, the researchers evaluated the potential of mBERT as a universal language model by comparing it with the performance of other language-specific models. The BertLang Street provided by the researchers will help in evaluating the pros and cons of each language-specific model with respect to different dimensions, such as architectures, data domains and tasks. 

The contributions of this project by the researchers are mentioned below

  • An overall picture of language-specific BERT models from an architectural, task- and domain-related point of view has been presented.
  • The researchers summarised the performance of language-specific BERT models and compared with the performance of the multilingual BERT model.
  • A new website known as BERT Lang Street is introduced to interactively explore the state-of-the-art models.

Read the paper here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM