Active Hackathon

Google’s New AI Milestone: Neural Machine Translation Engine Can Now Translate 103 Languages

Neural Machine Translation (NMT), one of the most important topics in deep learning, has gained much attention from the industries and academia over the last few years. In order to create simple models out of the complex ones, tech giant Google has been doing a lot of innovations in the domain of human to machine and machine to human translations for quite a few years now. 

Back in 2017, the tech giant introduced a solution to use a simple Neural Machine Translation (NMT) model to translate between multiple languages where the researchers merged 12 language pairs into a single model. The researchers here categorised the multilingual NMT


Sign up for your weekly dose of what's up in emerging technology.

Models into three types which are many-to-one, one-to-many and many-to-many models. 

Recently, the researchers at Google AI Team built a more enhanced system for neural machine translation (NMT) and published a paper known as “Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges”. The researchers built a single massively multilingual neural machine translation (NMT) model which handles 103 languages. 

The researchers used a massive open-domain dataset which contains over 25 billion parallel sentences in 103 languages. The goal behind building this model is to enable a single model to translate between an arbitrary language pair and to accomplish this, the researchers model a massively multi-way input-output mapping task under strong constraints such as implementing a huge number of languages, different scripting systems, heavy data imbalance across languages and domains and other such.  

The desired features of this universal Neural Machine Translation (NMT) are mentioned below:

  1. Maximum throughput in terms of the number of languages considered within a single model.
  2. Maximum inductive (positive) transfer towards low-resource languages.
  3. Minimum interference (negative transfer) for high-resource languages.
  4. Robust multilingual NMT models that perform well in realistic, open-domain settings. 

How It Is Different

The researchers at the tech giant claimed that this model is the largest multilingual NMT system to date in terms of the amount of training data and the number of languages considered. However, this also means that the concept of the proposed model is not new and it can be said as the advanced version of the previously proposed model. 

This state-of-the-art model can be used as one-to-many which is one model for many languages which eventually reduces the training as well as serving costs. In a blog post, Graham Neubig, an Assistant Professor at the Language Technologies Institute of Carnegie Mellon University said that through this research the Machine Translate researchers and practitioners can gain insights on a number of points such as the importance of large models and the effects of some techniques to select how large to make the model vocabularies.    


  • Data and Supervision: The model is limited to 103 languages which are a minuscule fraction of the thousands of existing languages. This model will be less applicable when more languages will be included.
  • Learning: The heuristic strategy of this model only takes dataset size into account when determining the fraction of per-task samples seen by the model.  
  • Increasing Capacity: This model lacks the need for sufficient model capacity when training large multitask networks. The researchers also faced significant trainability challenges while training deep and high capacity neural networks.
  • Architecture and Vocabulary: As the researchers scale up to thousands of languages, vocabulary handling becomes a significantly harder challenge.

Wrapping Up

There are several benefits of utilising the multilingual models. A carefully designed multilingual model can easily handle all translation directions within a single model. This model not only helps in reducing operational costs but also improves performance on low and zero-resource language pairs as well as simplifies deployment in production systems. 

Read the paper here

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM