Listen to this story
Amazon has demoed their latest tech including CodeWhisperer, the AI-pair programmer, Proteus, the fully automated warehousing robot, and Project Kuiper at re:MARS.
During the keynote, Rohit Prasad, senior vice president and head scientist for Alexa at Amazon, spoke about the flagship virtual assistant and ambient intelligence. Rohit said Alexa gets a billion requests every week from millions of devices across 70 countries and in 17 different languages.
He said Alexa is headed to the moon as part of project Artemis, NASA’s manned mission to the moon
Sign up for your weekly dose of what's up in emerging technology.
Ambient intelligence is the AI embedded in your environment. It assists and anticipates your needs, but fades into the background when not required.. Ambient intelligence offers the most practical route to generalisable intelligence, and the best evidence for that is the difference that Alexa is already making in customers’ lives, Rohit said. Thanks to predictive and proactive features like Hunches and Routines, more than 30 percent of smart-home interactions are initiated by Alexa.
Generalizable intelligence has three key attributes:
- AI generalizes its learning across many different tasks which is similar to a human adapting to various environments.
- AI continuously adapts to the user’s environment
- AI learns new concepts through self-supervised learning or minimal external input.
Alexa’s self-learning mechanism automatically corrects tens of millions of defects a week both customer errors as well as errors in its own natural language understanding (NLU) models. He said ambient intelligence is the “most practical” route to GI (the ability for AI entities to understand and learn any intellectual task like humans).
The researchers at amazon are using the latest Transformer based models to improve AI in three foundational areas:
- Multi-tasking intelligence: Through large-scale acoustic neural encoders, the team is efficiently representing the underlying structure of speech across many different tasks such as acoustic event detection, wakeword detection, etc. Such multitask learning to visual modality will soon be implemented on Alexa devices with cameras so that it can detect whether the user is talking to the device or someone else.
- Multi-language intelligence: Most Transformer based large models are typically trained on a single language. Meanwhile, Alexa is trained on publicly available corpora in 12 languages. The model is called the Alexa teacher model and can be applied to many different AI tasks within the complex system of Alexa.
- Multi-modal intelligence: The models incorporated in Alexa are multi-modal in nature and have visual scene understanding. For instance, you can ask Alexa “if the window next to the sliding door is open”.
Call conversation mode
Since the introduction of Alexa conversations in 2019, the developers are striving to build experiences on Alexa that enable free form interactions through deep-learning. With the call conversation mode, launched earlier this year, you don’t have to use the wakeword to activate the device: The audio and visual input is processed locally on the device and the cloud to generate the best response.
Rohit has demoed Alexa’s new capabilities with an example. First, Alexa distills all the relevant content through neural information retrieval. The automated summarisation feature ensures that Alexa gives a byte size result instead of a list of links. Finally, the AI-model learns from the user’s preferences to make custom suggestions and recommendations.
Think before you speak
Amazon researchers have collected and published the largest data set for social common sense and came up with a generative approach called ‘think before you speak’ for modeling common sense.
The AI learns to first model implicit common sense knowledge (think) by combining large language models with common sense knowledge graphs such as concept-net and then in the inference step it uses this knowledge to generate responses (speak).
Rohit Prasad demoed Alexa reading The Wizard of Oz in the voice of his late grandmother. The model learned to produce high quality voice with less than a minute of training data (Grandma’s voice). The process was made possible by framing the problem as a voice conversion task and not a speech generation task. Rohit Prasad said that the desire behind the feature was to build greater trust in the interactions users have with Alexa by putting more “human attributes of empathy and affect”.
“These attributes have become even more important during the ongoing pandemic when so many of us have lost ones that we love. While AI can’t eliminate that pain of loss, it can definitely make their memories last,” Prasad said.