Meta’s machine translation journey

Meta has been devoted to bringing innovations in machine translations for quite some time now.

There are around 7000 languages spoken globally, but most translation models focus on English and other popular languages. This excludes a major part of the world from the benefit of having access to content, technologies and other advantages of being online. Tech giants are trying to bridge this gap. Just days back, Meta announced that it plans to bring out a Universal Speech Translator to translate speech from one language to another in real-time. This announcement is not surprising to anyone who follows the company closely. Meta has been devoted to bringing innovations in machine translations for quite some time now. 

Let us take a quick look back into the major highlights of its machine translation journey.


Scaling NMT translation

Meta used neural machine translation (NMT) to automatically translate text in posts and comments. NMT models are useful at learning from large-scale monolingual data, and Meta was able to train an NMT model in 32 minutes. This was a drastic reduction in training time from 24 hours.


Sign up for your weekly dose of what's up in emerging technology.

Open-sourcing LASER

In 2018, Meta also open-sourced the Language-Agnostic SEntence Representations (LASER) toolkit. It works with over 90 languages that are written in 28 different alphabets. LASER computes multilingual sentence embeddings for zero-shot cross-lingual transfer. It works on low-resource languages as well. Meta said that “LASER achieves these results by embedding all languages jointly in a single shared space (rather than having a separate model for each).”


Wav2vec: unsupervised pre-training for speech recognition

Today, accessing various benefits of technology like GPS, virtual assistants essentially need speech recognition technology. But most of them rely on English, and a major chunk of people who do not speak the language or speak it with an accent not recognisable are excluded from using such an easy and important method of accessing information and services. Wav2vec wanted to solve this. Here, unsupervised pre-training for speech recognition was the focus point for Meta. Wav2vec is trained on unlabeled audio data.

Download our Mobile App

Meta adds, “The wav2vec model is trained by predicting speech units for masked parts of speech audio. It learns basic units that are 25ms long to enable learning of high-level contextualised representations.”

Due to this, Meta has been able to build speech recognition systems that perform way better than best semi-supervised methods, though it can have 100 times less labelled training data.


M2M-100: Multilingual machine translation

2020 was an important year for Meta, where it came out with different models that advanced machine translation technology. M2M-100 was one of them. It is a multilingual machine translation (MMT) model that translates between any pair of 100 languages without depending on English as an intermediary. M2M-100 is trained on a total of 2,200 language directions. This model wants to make the quality of translations worldwide better, especially those who speak low-resource languages claimed Meta. 

CoVoST: multilingual speech-to-text translation

CoVoST is a multilingual speech-to-text translation corpus from 11 languages into English. What makes it unique is that CoVoST covers over 11,000 speakers and over 60 accents. Meta claims that it is “the first end-to-end many-to-one multilingual model for spoken language translation.”


FLORES 101: low-resource languages

Following M2M-100’s footsteps, in the first half of last year, Meta open-sourced FLORES-101. It is a many-to-many evaluation data set that covers 101 languages globally with a focus on low-resource languages that lack extensive datasets even now. Meta added, “FLORES-101 is the missing piece, the tool that enables researchers to rapidly test and improve upon multilingual translation models like M2M-100.”



In 2022, Meta released data2vec, calling it “the first high-performance self-supervised algorithm that works for multiple modalities.” It was applied to speech, text and images separately and it outperformed the previous best single-purpose algorithms for computer vision and speech. Data2vec does not rely on contrastive learning or reconstructing the input example. 

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Do machines feel pain?

Scientists worldwide have been finding ways to bring a sense of awareness to robots, including feeling pain, reacting to it, and withstanding harsh operating conditions.

IT professionals and DevOps say no to low-code

The obsession with low-code is led by its drag-and-drop interface, which saves a lot of time. In low-code, every single process is shown visually with the help of a graphical interface that makes everything easier to understand.

Neuralink elon musk

What could go wrong with Neuralink?

While the broad aim of developing such a BCI is to allow humans to be competitive with AI, Musk wants Neuralink to solve immediate problems like the treatment of Parkinson’s disease and brain ailments.

Understanding cybersecurity from machine learning POV 

Today, companies depend more on digitalisation and Internet-of-Things (IoT) after various security issues like unauthorised access, malware attack, zero-day attack, data breach, denial of service (DoS), social engineering or phishing surfaced at a significant rate.