Neural Machine Translation (NMT), one of the most important topics in deep learning, has gained much attention from the industries and academia over the last few years. In order to create simple models out of the complex ones, tech giant Google has been doing a lot of innovations in the domain of human to machine and machine to human translations for quite a few years now.
Back in 2017, the tech giant introduced a solution to use a simple Neural Machine Translation (NMT) model to translate between multiple languages where the researchers merged 12 language pairs into a single model. The researchers here categorised the multilingual NMT
Models into three types which are many-to-one, one-to-many and many-to-many models.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Recently, the researchers at Google AI Team built a more enhanced system for neural machine translation (NMT) and published a paper known as “Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges”. The researchers built a single massively multilingual neural machine translation (NMT) model which handles 103 languages.

The researchers used a massive open-domain dataset which contains over 25 billion parallel sentences in 103 languages. The goal behind building this model is to enable a single model to translate between an arbitrary language pair and to accomplish this, the researchers model a massively multi-way input-output mapping task under strong constraints such as implementing a huge number of languages, different scripting systems, heavy data imbalance across languages and domains and other such.
The desired features of this universal Neural Machine Translation (NMT) are mentioned below:
- Maximum throughput in terms of the number of languages considered within a single model.
- Maximum inductive (positive) transfer towards low-resource languages.
- Minimum interference (negative transfer) for high-resource languages.
- Robust multilingual NMT models that perform well in realistic, open-domain settings.
How It Is Different
The researchers at the tech giant claimed that this model is the largest multilingual NMT system to date in terms of the amount of training data and the number of languages considered. However, this also means that the concept of the proposed model is not new and it can be said as the advanced version of the previously proposed model.
This state-of-the-art model can be used as one-to-many which is one model for many languages which eventually reduces the training as well as serving costs. In a blog post, Graham Neubig, an Assistant Professor at the Language Technologies Institute of Carnegie Mellon University said that through this research the Machine Translate researchers and practitioners can gain insights on a number of points such as the importance of large models and the effects of some techniques to select how large to make the model vocabularies.
Challenges
- Data and Supervision: The model is limited to 103 languages which are a minuscule fraction of the thousands of existing languages. This model will be less applicable when more languages will be included.
- Learning: The heuristic strategy of this model only takes dataset size into account when determining the fraction of per-task samples seen by the model.
- Increasing Capacity: This model lacks the need for sufficient model capacity when training large multitask networks. The researchers also faced significant trainability challenges while training deep and high capacity neural networks.
- Architecture and Vocabulary: As the researchers scale up to thousands of languages, vocabulary handling becomes a significantly harder challenge.
Wrapping Up
There are several benefits of utilising the multilingual models. A carefully designed multilingual model can easily handle all translation directions within a single model. This model not only helps in reducing operational costs but also improves performance on low and zero-resource language pairs as well as simplifies deployment in production systems.
Read the paper here.