Meta AI Puts A Step Towards Building Universal Translation System

What does the curve arrow in the logo of Amazon signify? It simply portrays that one can get A to Z products from a single platform, making your task easy, right? The same will be the case when it comes to the translation system (production of text in one language from another).

To that end, Meta AI announced a new breakthrough and introduced a new multilingual model, outperforming present state-of-the-art bilingual models across 10 out of 14 language pairs, winning the Conference on Machine Translation (WMT) – a prestigious MT competition. The model thus introduced is a step towards building a universal translation system.

The Bottleneck

The ultimate goal of the machine translation (MT) field is to create a universal translation system that will allow everyone to access information and communicate more effectively. However, some of the existing fundamental limitations need to be resolved for that vision to be a reality in the future.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Presently, a lot of modern MT systems rely on bilingual models, which often necessitate a large number of labelled examples for each language pair and task. Unfortunately, there are many languages with limited training data, say, for example, Icelandic and Hausa. The shortcomings make the present approaches redundant. Also, the tremendous complexity makes it difficult for a platform like Facebook to scale present modes to practical applications, where billions of users post every day in hundreds of languages.

Meta to rescue

As per the team at Meta, the MT field needs a shift from bilingual models towards multilingual translation, to be precise, where a single model can translate many language pairs at once. Further, the step to introduce a better multilingual model stands to benefit both – low and high resource languages as they are simple, scalable and efficient. 


Download our Mobile App



Image: Meta AI

Last year, Facebook AI (now Meta) introduced M2M-100 as the first multilingual model to translate any pair of 100 languages without relying on English-centric data. The team deployed different mining strategies to prepare a dataset with 7.5 billion sentences for 100 languages as translation data. The researchers employed a variety of scaling strategies to create a global model with 15 billion parameters that include data from related languages and reflect a more diversified script of languages and morphology. The model proves efficient for low resource languages. However, it loses high performance when it comes to high resource languages. 

Building on this previous model, the team made three new advancements for:

  • large-scale data mining
  • scaling model capacity
  • more efficient infrastructure

The team built two multilingual systems to train WMT 2021 model — any other language to English and English to any. They utilised parallel data mining techniques such as CCMatrix, which the company claims to be the largest dataset of web-based, high-quality bitexts for training translation models. CCMatrix dataset is more than 50 times larger than the WikiMatrix corpus Facebook provided earlier, with over 4.5 billion parallel phrases in 576 language pairs extracted from the CommonCrawl public dataset snapshots.

Additionally, the model capacity has been raised from 15 million parameters to 52 million. The large scale training was made five times faster than the previous models by adding a GPU memory-saving tool – Fully Sharded Data-Parallel from Meta itself. Further, it is important to note that scaling model size often results in high computational costs. To overcome, the team claims to have used a Transformer architecture with the FeedForward block in every alternate Transformer layer, which is then replaced with a Sparsely Gated Mixture-of-Experts layer with top-2 gating in the encoder and decoder. As a result of the same, only a subset of all the model’s parameters is used per input sequence. 

Machine translation has made significant progress in breaking down barriers, but most of it has focused on a small number of commonly spoken languages. Low-resource translation remains MT’s “last mile” dilemma and the subfield’s biggest open challenge today.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

kumar Gandharv
Kumar Gandharv, PGD in English Journalism (IIMC, Delhi), is setting out on a journey as a tech Journalist at AIM. A keen observer of National and IR-related news.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
AIM TOP STORIES

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges