All About Alexa’s New Language Understanding Model

The new language model from Amazon is a large-scale multilingual model, pre-trained on a set of denoising and Causal Language Modelling (CLM) tasks
Listen to this story

Inspired by the OpenAI developed GPT-3 model, Amazon has introduced its latest language model, the Alexa Teacher Model (AlexaTM 20B). It is a sequence-to-sequence (seq2seq) encoder-decoder model, unlike most language models today, which are decoder-only architectures.

About AlexaTM 20B

The new language model from Amazon is a multilingual large-scale model pre-trained on a set of denoising and Causal Language Modeling (CLM) tasks. As per the company, this strategy helps AlexaTM model to be more efficient for few-shot learning than the decoder-only language models. 

AlexaTM 20B model achieves state-of-the-art performance om 1-shot summarisation tasks and outperforms larger PaLM decoder model with 540 billion parameters. Amazon’s model works particularly well for low-resource language pairs that it supports – Arabic, French, English, German, Hindi, Italian, Japanese, Portuguese, Spanish, Marathi, Tamil, and Telugu on Flores-101 dataset. 

Further, in zero-shot setting, AlexaTM 20B even outperforms GPT3 on SuperGLUE and SQuADv2 datasets. It also offers state-of-art performance on multilingual tasks like XNLI, XCOPA, Paws-X, and XWinograd.

The researchers of AlexaTM 20B described a model development pipeline where transformer-based encoders are pretrained from scratch using public data, adapted using unlabeled data, distilled using a 2-step distillation process, and lastly fine-tuned. This is in contrast to usual practice of first distillation production-focused NLU models with 85M-300M parameters and then fine-tuning or alternately training them from scratch on the final labelled dataset. The AlexaTM pipeline starts with models containing over 2.3 billion parameters and improves upon this paradigm.

Credit: Amazon

The AlexaTM 20B model is subject to several constraints that do not generally apply to other language models. Since the work is to be used in an edge device, like mobile phones, memory is at a premium and the model inference should be low latency. Further, Alexa digital assistant supports different languages and the input is in the spoken-form, which is very different from the written form of text used in training datasets.

Credit: Amazon

Challenges and future work

In future, the team says it would like to robustly characterise the use of public pretrained conversational models like TOD-BERT and ConveRT, evaluate more combinations of teacher and distilled model sizes, benchmark the model with different public datasets like MultiATIS, or MASSIVE. The team wants to make greater use of dialog and user context, try code-switching, and examine varying levels of ASR noise, and more.

Further, the team has also admitted that like other large language models, AlexaTM 20B has a likelihood of perpetuating toxic language, harmful stereotypes, and social biases based on the online public data that it is trained on. Against this background, the team recommends that users “conduct a full task-specific fairness-and-bias analysis before using the model to fully understand and address any potential harm that might arise from its use”.

The team also suggests that depending on the downstreaming application that the model is applied to, techniques prescribed may be used to debias and detoxify the model. The authors of the study also reiterate the importance of fairness auditing. They emphasise the need for more research on bias mitigation.

Ambient AI

At the re:MARS – Amazon conference on machine learning and robotics held in June 2022, Rohit Prasad, the Alexa AI senior vice president and head scientist discussed in detail about the emerging trend of ambient intelligence. This concept is touted to be the future of intelligent computing where explicit input and output are not required. 

Prasad had then said that ambient intelligence offers the most practical way to achieve generalisable intelligence. “Ambient intelligence is best exemplified by AI services like Alexa, which we use on a daily basis. Customers interact with Alexa billions of times each week. And thanks to predictive and proactive features like Hunches and Routines, more than 30% of smart-home interactions are initiated by Alexa,” he said in an interview.

Download our Mobile App

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox