With the advent of transformer-based machine translation models, researchers have been successful in implementing state-of-the-art performance in natural language processing (NLP). In 2018, Google open-sourced its groundbreaking state-of-the-art technique for NLP pre-training called Bidirectional Encoder Representations from Transformers, or BERT. With the help of this model, one can train their state-of-the-art NLP model in a few hours using a single GPU or a single Cloud TPU. The power of this model lies in the certainty that using some specific downstream tasks, BERT can be easily fine-tuned to gain state-of-the-art outcomes.
Driven by the potential of BERT models, developers working in the NLP domain have generated a number of BERT models that are fine-tuned, trained on some specific language as well as tested on a certain data domain and task. Recently, the researchers at Google released a multilingual language model known as multilingual BERT model or mBERT. mBERT supports more than 100 languages and is trained in different domains, like social media posts or newspaper articles. The main aim behind this project is to provide a quick and easy overview of the similar attributes as well as dissimilarities between Language-Specific BERT models and the Multilingual BERT model.
Multilingual BERT is a single language model pre-trained from monolingual corpora in 104 languages using Wikipedia data. The model allows for zero-shot learning across languages, which means one can train data in a particular language and then apply the trained model to data in some other language. Thus, this model obtained impressive results on a zero-shot cross-lingual natural inference task.
BERT has further been extended to include several languages. Besides Multilingual BERT (mBERT), there are a number of BERTs available such as A Lite BERT (ALBERT), RoBERTa, etc. According to the researchers, while the multi- and cross-lingual BERT representations allow zero-shot learning and capture universal semantic structures, they do gloss over language-specific differences. This is the main reason behind the development of the number of language-specific BERT models.
In order to navigate the potential of constant development on these BERT models, the researchers introduced a website called BertLang. In this website, the researchers gathered different language-specific models, which have been introduced on a variety of tasks and data sets.
The BertLang website currently includes 30 BERT-based models, 18 languages and 28 tasks. According to the researchers, this website is a collaborative resource to help researchers understand and find the best BERT model for a given dataset, task and language. The website provides a searchable interface as well as the possibility to add new information.
With the impressive progress of NLP techniques in various domains, there has been an increase in the number of language-specific BERT models developed by the NLP researchers but which model provides the best performance is still in a vague state. In this project, the researchers evaluated the potential of mBERT as a universal language model by comparing it with the performance of other language-specific models. The BertLang Street provided by the researchers will help in evaluating the pros and cons of each language-specific model with respect to different dimensions, such as architectures, data domains and tasks.
The contributions of this project by the researchers are mentioned below
- An overall picture of language-specific BERT models from an architectural, task- and domain-related point of view has been presented.
- The researchers summarised the performance of language-specific BERT models and compared with the performance of the multilingual BERT model.
- A new website known as BERT Lang Street is introduced to interactively explore the state-of-the-art models.
Read the paper here.