What does the curve arrow in the logo of Amazon signify? It simply portrays that one can get A to Z products from a single platform, making your task easy, right? The same will be the case when it comes to the translation system (production of text in one language from another).
To that end, Meta AI announced a new breakthrough and introduced a new multilingual model, outperforming present state-of-the-art bilingual models across 10 out of 14 language pairs, winning the Conference on Machine Translation (WMT) – a prestigious MT competition. The model thus introduced is a step towards building a universal translation system.
Sign up for your weekly dose of what's up in emerging technology.
The ultimate goal of the machine translation (MT) field is to create a universal translation system that will allow everyone to access information and communicate more effectively. However, some of the existing fundamental limitations need to be resolved for that vision to be a reality in the future.
Presently, a lot of modern MT systems rely on bilingual models, which often necessitate a large number of labelled examples for each language pair and task. Unfortunately, there are many languages with limited training data, say, for example, Icelandic and Hausa. The shortcomings make the present approaches redundant. Also, the tremendous complexity makes it difficult for a platform like Facebook to scale present modes to practical applications, where billions of users post every day in hundreds of languages.
Meta to rescue
As per the team at Meta, the MT field needs a shift from bilingual models towards multilingual translation, to be precise, where a single model can translate many language pairs at once. Further, the step to introduce a better multilingual model stands to benefit both – low and high resource languages as they are simple, scalable and efficient.
Image: Meta AI
Last year, Facebook AI (now Meta) introduced M2M-100 as the first multilingual model to translate any pair of 100 languages without relying on English-centric data. The team deployed different mining strategies to prepare a dataset with 7.5 billion sentences for 100 languages as translation data. The researchers employed a variety of scaling strategies to create a global model with 15 billion parameters that include data from related languages and reflect a more diversified script of languages and morphology. The model proves efficient for low resource languages. However, it loses high performance when it comes to high resource languages.
Building on this previous model, the team made three new advancements for:
- large-scale data mining
- scaling model capacity
- more efficient infrastructure
The team built two multilingual systems to train WMT 2021 model — any other language to English and English to any. They utilised parallel data mining techniques such as CCMatrix, which the company claims to be the largest dataset of web-based, high-quality bitexts for training translation models. CCMatrix dataset is more than 50 times larger than the WikiMatrix corpus Facebook provided earlier, with over 4.5 billion parallel phrases in 576 language pairs extracted from the CommonCrawl public dataset snapshots.
Additionally, the model capacity has been raised from 15 million parameters to 52 million. The large scale training was made five times faster than the previous models by adding a GPU memory-saving tool – Fully Sharded Data-Parallel from Meta itself. Further, it is important to note that scaling model size often results in high computational costs. To overcome, the team claims to have used a Transformer architecture with the FeedForward block in every alternate Transformer layer, which is then replaced with a Sparsely Gated Mixture-of-Experts layer with top-2 gating in the encoder and decoder. As a result of the same, only a subset of all the model’s parameters is used per input sequence.
Machine translation has made significant progress in breaking down barriers, but most of it has focused on a small number of commonly spoken languages. Low-resource translation remains MT’s “last mile” dilemma and the subfield’s biggest open challenge today.