Facebook AI Releases XLS-R, Self-Supervised Model For Speech Tasks

XLS-R substantively improves upon previous multilingual models by training on nearly ten times more public data in more than twice as many languages. 

Facebook recently announced the release of XLS-R, a new self-supervised model for a variety of speech tasks. XLS-R substantively improves upon previous multilingual models by training on nearly ten times more public data in more than twice as many languages. 

Trained on more than 436,000 hours of publicly available speech recordings, XLS-R is based on wav2vec 2.0, Facebook AI’s approach to self-supervised learning of speech representations, and nearly ten times more hours of speech than the best previous model it released last year, XLSR-53

Utilizing speech data from different sources, ranging from parliamentary proceedings to audiobooks, it has been expanded to 128 different languages, covering nearly two and a half times more languages than its predecessor.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

XLS-R was evaluated on four major multilingual speech recognition benchmarks, where it outperformed prior work on most of the 37 languages tested; specifically, it was tried with five languages of BABEL, ten languages of CommonVoice, eight languages of MLS, and the 14 languages of VoxPopuli.

Image Source: Facebook AI

The model was also evaluated for speech translation, where audio recordings were directly translated into another language. Facebook has always been interested in models that can perform multiple tasks, so it simultaneously fine-tuned XLS-R on several different translation directions of the CoVoST-2 benchmark. The result is a single model that can translate between English and up to 21 other languages.


Download our Mobile App



Image Source: Facebook AI

The model leads to very large improvements on low-resource language directions, such as Indonesian-to-English translation, where the accuracy in terms of BLEU doubles on average — a very large step forward in improving translation of spoken language. An increase in the BLEU metric means automatic translations have more overlap with the translations produced by a human tackling the same task.

XLS-R demonstrates that scaling cross-lingual pretraining can further improve performance for low-resource languages. It improves performance for speech recognition and more than doubles the accuracy of foreign-to-English speech translation. XLS-R is an important step toward a single model that can understand speech in many different languages, and it is the largest effort we know of to leverage public data for multilingual pretraining.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Victor Dey
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: Evolution of Data Science: Skillset, Toolset, and Mindset

In my opinion, there will be considerable disorder and disarray in the near future concerning the emerging fields of data and analytics. The proliferation of platforms such as ChatGPT or Bard has generated a lot of buzz. While some users are enthusiastic about the potential benefits of generative AI and its extensive use in business and daily life, others have raised concerns regarding the accuracy, ethics, and related issues.