MITB Banner

All About Alexa’s New Language Understanding Model

The new language model from Amazon is a large-scale multilingual model, pre-trained on a set of denoising and Causal Language Modelling (CLM) tasks
Share
Listen to this story

Inspired by the OpenAI developed GPT-3 model, Amazon has introduced its latest language model, the Alexa Teacher Model (AlexaTM 20B). It is a sequence-to-sequence (seq2seq) encoder-decoder model, unlike most language models today, which are decoder-only architectures.

About AlexaTM 20B

The new language model from Amazon is a multilingual large-scale model pre-trained on a set of denoising and Causal Language Modeling (CLM) tasks. As per the company, this strategy helps AlexaTM model to be more efficient for few-shot learning than the decoder-only language models. 

AlexaTM 20B model achieves state-of-the-art performance om 1-shot summarisation tasks and outperforms larger PaLM decoder model with 540 billion parameters. Amazon’s model works particularly well for low-resource language pairs that it supports – Arabic, French, English, German, Hindi, Italian, Japanese, Portuguese, Spanish, Marathi, Tamil, and Telugu on Flores-101 dataset. 

Further, in zero-shot setting, AlexaTM 20B even outperforms GPT3 on SuperGLUE and SQuADv2 datasets. It also offers state-of-art performance on multilingual tasks like XNLI, XCOPA, Paws-X, and XWinograd.

The researchers of AlexaTM 20B described a model development pipeline where transformer-based encoders are pretrained from scratch using public data, adapted using unlabeled data, distilled using a 2-step distillation process, and lastly fine-tuned. This is in contrast to usual practice of first distillation production-focused NLU models with 85M-300M parameters and then fine-tuning or alternately training them from scratch on the final labelled dataset. The AlexaTM pipeline starts with models containing over 2.3 billion parameters and improves upon this paradigm.

Credit: Amazon

The AlexaTM 20B model is subject to several constraints that do not generally apply to other language models. Since the work is to be used in an edge device, like mobile phones, memory is at a premium and the model inference should be low latency. Further, Alexa digital assistant supports different languages and the input is in the spoken-form, which is very different from the written form of text used in training datasets.

Credit: Amazon

Challenges and future work

In future, the team says it would like to robustly characterise the use of public pretrained conversational models like TOD-BERT and ConveRT, evaluate more combinations of teacher and distilled model sizes, benchmark the model with different public datasets like MultiATIS, or MASSIVE. The team wants to make greater use of dialog and user context, try code-switching, and examine varying levels of ASR noise, and more.

Further, the team has also admitted that like other large language models, AlexaTM 20B has a likelihood of perpetuating toxic language, harmful stereotypes, and social biases based on the online public data that it is trained on. Against this background, the team recommends that users “conduct a full task-specific fairness-and-bias analysis before using the model to fully understand and address any potential harm that might arise from its use”.

The team also suggests that depending on the downstreaming application that the model is applied to, techniques prescribed may be used to debias and detoxify the model. The authors of the study also reiterate the importance of fairness auditing. They emphasise the need for more research on bias mitigation.

Ambient AI

At the re:MARS – Amazon conference on machine learning and robotics held in June 2022, Rohit Prasad, the Alexa AI senior vice president and head scientist discussed in detail about the emerging trend of ambient intelligence. This concept is touted to be the future of intelligent computing where explicit input and output are not required. 

Prasad had then said that ambient intelligence offers the most practical way to achieve generalisable intelligence. “Ambient intelligence is best exemplified by AI services like Alexa, which we use on a daily basis. Customers interact with Alexa billions of times each week. And thanks to predictive and proactive features like Hunches and Routines, more than 30% of smart-home interactions are initiated by Alexa,” he said in an interview.

PS: The story was written using a keyboard.
Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India