Microsoft’s Turing Language Model Can Now Interpret 94 Languages

Recently, the developers at Microsoft detailed the Turing multilingual language model (T-ULRv2) and announced that the AI model has achieved the top rank at the Google XTREME public leaderboard. 

The Cross-lingual TRansfer Evaluation of Multilingual Encoders, also known as XTREME benchmark includes 40 typologically diverse languages, which span 12 language families. XTREME also consists of nine tasks that require reasoning about different levels of syntax as well as semantics.

The Turing multilingual language model (T-ULRv2) is created by the Microsoft Turing team in collaboration with Microsoft Research. The model is also known to beat the previous best from Alibaba (VECO) by 3.5 points in average score. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Saurabh Tiwary, Vice President & Distinguished Engineer at Microsoft mentioned that in order to achieve this milestone, the team leveraged StableTune, which is a multilingual fine-tuning technique based on stability training along with the pre-trained model. The other popular language models on the XTREME leaderboard include XLM-R, mBERT, XLM, among others.

Ming Zhou, Assistant Managing Director at Microsoft Research Asia, stated in a blog post that the Microsoft Turing team has long believed that language representation should be universal. Also, this kind of approach would allow for the trained model to be fine-tuned in one language and applied to a different one in a zero-shot fashion. 


Download our Mobile App



For a few years now, unsupervised pre-trained language modelling has become the backbone of all-natural language processing (NLP) models, with transformer-based models at the heart of all such innovation. According to Zhou, this type of models has the capability to overcome the challenge of requiring labelled data to train the model in every language. 

How T-ULRv2 Works

The Turing multilingual language model (T-ULRv2) model is the latest cross-lingual innovation at the tech giant. It incorporates the InfoXLM (Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training), which is a cross-lingual pre-trained model for language understanding and generation to create a universal model that represents 94 languages in the same vector space.

TT-ULRv2 is a transformer architecture with 24 layers and 1,024 hidden states. The architecture also includes a total of 550 million parameters. The pre-training of this model includes three different tasks, which are multilingual masked language modelling (MMLM), translation language modelling (TLM) and cross-lingual contrast (XLCo). 

According to the blog post, the objective of the multilingual masked language modelling (MMLM) task, also known as Cloze task, is to predict masked tokens from inputs in different languages. The T-ULRv2 model uses a multilingual data corpus from the web that consists of 94 languages for MMLM task training. 

Similar to the MMLM task, translation language modelling (TLM) also works to predict masked tokens, but the prediction is conditioned on concatenated translation pairs. Lastly, the cross-lingual contrast (XLCo) task uses parallel training data. The objective of this task is to maximise the mutual information between the representations of parallel sentences.

However, unlike maximising token-sequence mutual information as in MMLM and TLM, XLCo targets cross-lingual sequence-level mutual information. The T-ULRv2 then utilises the translation parallel data with 14 language pairs for both TLM and XLCo tasks. 

Wrapping Up

The Turing family of NLP models have been powering the next generation of AI experiences in Microsoft products. At this year’s Microsoft Ignite conference, the developers at Microsoft announced that the Turing models would be made available for building custom applications as part of a private preview. According to the developers, the T-ULRv2 will also be part of this program. 

For achieving the Google XTREME benchmark, the language models should fulfil certain criteria. The tasks included in XTREME cover a range of paradigms, including sentence text classification, structured prediction, sentence retrieval and cross-lingual question answering. For models to be successful on the XTREME benchmarks, they must learn representations that generalise to many standard cross-lingual transfer settings.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.