Amazon Making In Roads Into Chip Industry; Now Uses Its Own Machine Learning Chips For Alexa Services

“Amazon’s cloud-based voice service Alexa powers Amazon Echo devices and more than 140,000 models of smart speakers, lights, plugs, smart TVs, and cameras.”

At last year’s re:Invent conference, AWS announced the launch of its Inferentia chips designed to process machine learning workloads. This week, AWS has announced that the Alexa services will now be powered by AWS Inferentia, their own chip. As a result, they have migrated the majority of their GPU-based ML inference workloads to Amazon Elastic Compute Cloud (EC2) Inf1 instances.

According to Amazon, every month, tens of millions of customers interact with Alexa to control their home devices. They claim that there are more than 100 million devices connected to Alexa and migrating to Inferentia chips have made Alexa services even better. Compared to GPU-based instances, Inferentia has led to a 25% lower end-to-end latency, and 30% lower cost for Alexa’s text-to-speech(TTS) workloads. The lower latency, says Amazon, has allowed Alexa engineers to try out more complex algorithms and to enhance the overall Alexa experience for their customers.

How Is Inferentia Helping Alexa?

“Migrating to AWS Inferentia resulted in 25% lower end-to-end latency, and 30% lower cost compared to GPU-based instances for Alexa’s text-to-speech workloads.”

Deploying machine learning models can be very resource-intensive, and the inference is where most of the actual work gets done if some applications have to perform better. AWS Inferentia is designed to handle these specific ML-based inference workloads. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Each AWS Inferentia chip contains four NeuronCores that are equipped with a large on-chip cache. This helps cut down on external memory accesses, dramatically reducing latency and speeds up typical deep learning operations such as convolution and transformers. Speeding up of deep learning operations is critical to Alexa. 

  • Automatic Speech Recognition (ASR): First, Alexa converts the sound to text. 
  • Natural Language Understanding (NLU): Alexa then tries to understand what he heard.
  • Text-To-Speech (TTS): Generate voice from text

Of Alexa’s three main inference workloads (ASR, NLU, and TTS), Text-to-Speech(TTS) workloads initially ran on GPU-based instances. This Text-To-Speech process also heavily involves machine learning models to build a phrase that sounds natural in terms of pronunciations, rhythm, connection between words, intonation etc.

Alexa encounters billions of inference requests every week. This whole process uses artificial intelligence heavily to transform the sound to phonemes, phonemes to words, words to phrases, and phrases to intents. Added to this are the multilingual translations. Some latency is expected, but Amazon does not want to leave any room for complacency or latency, and AWS Inferentia is making sure the services are top-notch.

Amazon’s Silicon Ambitions & Future Direction

Amazon has made its hardware ambitions obvious as early as 2015. Predicting that hardware specialization is going to be a big deal, Amazon has had a custom ASIC team focused on AWS ever since. In 2016, James Hamilton, VP at AWS, demoed the custom ASIC that powered AWS servers for many years. 

Today, AWS has its own custom-built AI chip, Inferentia and even a custom-built processor Graviton2. So far, the majority of the data centres are powered by the integrated solutions provided by the likes of Intel, NVIDIA and AMD. With its home-grown silicon, Amazon is gradually moving towards self reliability similar to what Apple has been doing with its own silicon efforts. In the last couple of years, Amazon has increased the involvement of its own hardware solutions with its services. The latest being Alexa’s workload migration to Inferentia. The data centre is a huge market for Intel and other chipmakers. And, AWS is a giant when it comes to data centres. It leads the cloud segment and flaunts a diverse portfolio of customers like Netflix. 

If Amazon decides to incorporate its integrated homemade solutions for its data centres, then it will be a big blow to the chip makers who rely heavily on offering silicon services. Google has TPUs, and now AWS has Inferentia. If cloud service providers can match the performance benchmarks of top chipmakers, then it will be the beginning of a new wave of infrastructure-as-a-service industry. For companies like Amazon who have made inroads to the consumer base, B2B services, AI research and now silicon, there cannot be a better time.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox