Amazon Making In Roads Into Chip Industry; Now Uses Its Own Machine Learning Chips For Alexa Services

“Amazon’s cloud-based voice service Alexa powers Amazon Echo devices and more than 140,000 models of smart speakers, lights, plugs, smart TVs, and cameras.”

At last year’s re:Invent conference, AWS announced the launch of its Inferentia chips designed to process machine learning workloads. This week, AWS has announced that the Alexa services will now be powered by AWS Inferentia, their own chip. As a result, they have migrated the majority of their GPU-based ML inference workloads to Amazon Elastic Compute Cloud (EC2) Inf1 instances.

According to Amazon, every month, tens of millions of customers interact with Alexa to control their home devices. They claim that there are more than 100 million devices connected to Alexa and migrating to Inferentia chips have made Alexa services even better. Compared to GPU-based instances, Inferentia has led to a 25% lower end-to-end latency, and 30% lower cost for Alexa’s text-to-speech(TTS) workloads. The lower latency, says Amazon, has allowed Alexa engineers to try out more complex algorithms and to enhance the overall Alexa experience for their customers.

How Is Inferentia Helping Alexa?

“Migrating to AWS Inferentia resulted in 25% lower end-to-end latency, and 30% lower cost compared to GPU-based instances for Alexa’s text-to-speech workloads.”

Deploying machine learning models can be very resource-intensive, and the inference is where most of the actual work gets done if some applications have to perform better. AWS Inferentia is designed to handle these specific ML-based inference workloads. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Each AWS Inferentia chip contains four NeuronCores that are equipped with a large on-chip cache. This helps cut down on external memory accesses, dramatically reducing latency and speeds up typical deep learning operations such as convolution and transformers. Speeding up of deep learning operations is critical to Alexa. 

  • Automatic Speech Recognition (ASR): First, Alexa converts the sound to text. 
  • Natural Language Understanding (NLU): Alexa then tries to understand what he heard.
  • Text-To-Speech (TTS): Generate voice from text

Of Alexa’s three main inference workloads (ASR, NLU, and TTS), Text-to-Speech(TTS) workloads initially ran on GPU-based instances. This Text-To-Speech process also heavily involves machine learning models to build a phrase that sounds natural in terms of pronunciations, rhythm, connection between words, intonation etc.


Download our Mobile App



Alexa encounters billions of inference requests every week. This whole process uses artificial intelligence heavily to transform the sound to phonemes, phonemes to words, words to phrases, and phrases to intents. Added to this are the multilingual translations. Some latency is expected, but Amazon does not want to leave any room for complacency or latency, and AWS Inferentia is making sure the services are top-notch.

Amazon’s Silicon Ambitions & Future Direction

Amazon has made its hardware ambitions obvious as early as 2015. Predicting that hardware specialization is going to be a big deal, Amazon has had a custom ASIC team focused on AWS ever since. In 2016, James Hamilton, VP at AWS, demoed the custom ASIC that powered AWS servers for many years. 

Today, AWS has its own custom-built AI chip, Inferentia and even a custom-built processor Graviton2. So far, the majority of the data centres are powered by the integrated solutions provided by the likes of Intel, NVIDIA and AMD. With its home-grown silicon, Amazon is gradually moving towards self reliability similar to what Apple has been doing with its own silicon efforts. In the last couple of years, Amazon has increased the involvement of its own hardware solutions with its services. The latest being Alexa’s workload migration to Inferentia. The data centre is a huge market for Intel and other chipmakers. And, AWS is a giant when it comes to data centres. It leads the cloud segment and flaunts a diverse portfolio of customers like Netflix. 

If Amazon decides to incorporate its integrated homemade solutions for its data centres, then it will be a big blow to the chip makers who rely heavily on offering silicon services. Google has TPUs, and now AWS has Inferentia. If cloud service providers can match the performance benchmarks of top chipmakers, then it will be the beginning of a new wave of infrastructure-as-a-service industry. For companies like Amazon who have made inroads to the consumer base, B2B services, AI research and now silicon, there cannot be a better time.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Is Foxconn Conning India?

Most recently, Foxconn found itself embroiled in controversy when both Telangana and Karnataka governments simultaneously claimed Foxconn to have signed up for big investments in their respective states