Listen to this story
What if you didn’t need English to translate? Meta’s new and improved open source AI model ‘NLLB-200’ is capable of translating 200 languages without English!
“Communicating across languages is one superpower that AI provides, but as we keep advancing our AI work it’s improving everything we do—from showing the most interesting content on Facebook and Instagram, to recommending more relevant ads, to keeping our services safe for everyone”, says Mark Zuckerberg, CEO, Meta.
Sign up for your weekly dose of what's up in emerging technology.
Accessibility through language ensures that the benefits of the advancement of technology reach everyone, no matter what language they may speak.
Tech companies are assuming a proactive role in attempting to bridge this gap. For instance, machine translation is an area of AI research that Meta focuses on. Following the announcement that it is building its ‘universal speech translator’, Meta unveiled their open-source AI model—‘No Language Left Behind’ (NLLB-200) capable of providing high-quality translations across 200 different languages, validated through extensive evaluations.
The tech giant has also built a dataset, ‘FLORES-200’, to assess NLLB-200’s performance and demonstrate that high-quality translations are provided.
In terms of quality, Meta claims that NLLB-200 provides an average of 44 per cent better translations than its previous model. This model was trained using Meta’s new AI supercomputer, Research SuperCluster.
In a demonstration of its reach, Meta states that some languages that NLLB-200 translates, such as Kamba and Lao, are not supported by any translation tools in use currently.
NLLB-200 supports 55 African languages with high-quality results, adds Meta.
Comparison with other models
Currently, Meta is actively involved in machine translation. In 2018, it open sourced the Language-Agnostic SEntence Representations (LASER) toolkit, which accommodates 90 languages written in 28 different alphabets.
In 2020, Meta unveiled a host of models in machine translation.
M2M-100 was a milestone
M2M-100, Meta’s first multilingual machine translation (MMT) model that translated between any two languages out of 100 languages without employing English as an intermediary and was also open sourced. According to Meta, the MMT model is trained on a total of 2,200 language directions—ten times more than its English-centric multilingual models. This initiative improves the quality of translations for speakers of low-resource languages and, thereby, their accessibility to information and other content.
FLORES-101 was an earlier initiative by Meta on translating low-resource languages. It is a many-to-many evaluation data set that accommodates 101 languages globally. FLORES-101 focuses on low-resource languages, such as Amharic, Mongolian and Urdu, that lack datasets for broader NLP research.
Image source: Meta
Meta claimed that researchers could measure the quality of translations reliably through 10,100 different translation directions within FLORES-101.
Google Translate has been in existence since 2006. Initially, it started with 2 languages, and is now able to accommodate 133 different languages. In fact, Google added 24 more languages to Translate recently. In the paper titled, “Building machine translation systems for the next thousand languages”, researchers described that they built high-quality monolingual datasets for over 1,000 languages that do not have translation datasets available and demonstrated how monolingual data alone could be used to train MT models. For these newly added languages, Google created monolingual datasets by developing and using specialised neural language identification models combined with novel filtering approaches.
Google also revealed that adding these new languages is a technical milestone for the company. These are the first languages added using Zero-Shot Machine Translation—where a machine learning model only sees monolingual text. In March of 2021, Google Translate on Android hit one billion downloads from the Google Play Store.
Microsoft has always been a forerunner in terms of new and upcoming tech. Machine Translation systems were first developed by Microsoft research two decades back. Back then, the system could translate the entire Microsoft Knowledge Base from English to Spanish, French, German and Japanese. This translated version was then published, “making it the largest public-facing application of raw machine translation on the internet at the time”, claims Microsoft.
Currently, Microsoft Translator supports 103 languages.
Image source: Microsoft
With the advancement in AI research, the tech mammoth adopted neural machine translation (NMT) technology and migrated machine translation systems to neural models based on transformer technology. Later, by using multilingual transformer architecture, the company could augment training data with material from other languages—often in the same or a related language family—to produce models for languages with small amounts of data or, as we understand it, low-resource languages.
Amazon Translate is a neural machine translation service as well.
In 2019, it added support for 22 new languages—increasing the number of languages accommodated to 54 languages and dialects. In 2020, it added 16 more languages and the number of languages it is able to support is now up to 71 languages and variants, along with support for 4,970 translation combinations.