MITB Banner

Amazon Demos the Largest text-to-speech AI Model,  Big Adaptive Streamable TTS with Emergent Abilities

This model sets a new benchmark for speech synthesis.

Share

Amazon shared BASE TTS, a text-to-speech model. It was trained on 100,000 hours of public domain speech data, mainly in English but also including German, Dutch, and Spanish, making it a new standard for natural speech. 

The model uses a 1-billion-parameter Transformer and a convolution-based decoder for efficient text-to-speech conversion. This model introduces a new approach for analysing speech so as to distinguish between different voices. It also employs a technique called byte-pair encoding to reduce the size of the speech data to enhances the model’s efficiency and speed in processing and generating speech. 

BASE TTS shows new or ‘emergent’ capabilities as it’s trained with more data. With over 10,000 hours of training, it understands text better, allowing it to produce speech that sounds right for the context. The model can also handle complex language features like compound nouns and emotional expressions, showing its versatility. 

An example provided by the paper, ‘In the classroom, filled with the chatter of students sharing their holiday stories and the rustling of new textbooks, Mrs. Thompson, excited to embark on a new academic year, prepared a lesson that would challenge and inspire her students.’

The development of BASE TTS was developed from the idea that larger text-to-speech systems would get better with scale. BASE TTS not only has high-quality speech but also shows new skills, like pronouncing difficult texts correctly and using the right emotional tone. It performs better than other large text-to-speech systems, making it a leading model.

Another example where the audio changes the tone and whispers for the sentence, ‘A profound sense of realisation washed over Matty as he whispered, “You’ve been there for me all along, haven’t you? I never truly appreciated you until now.”’

BASE TTS could improve user experiences and help languages with few resources. It can mimic speaker characteristics with little reference audio, offering new ways to create synthetic voices for people who cannot speak. Amazon decided not to share BASE TTS openly to avoid misuse, highlighting ethical considerations in using advanced AI.

These capabilities which eluded speech models until now seems possible as demonstrated by BASE TTS.  The research team also highlights the importance of diverse speech data in representing different languages, ethnicities, dialects, and genders. They call for more research on how data affects the model and ways to make voice technology more inclusive.

Another similar model is MetaVoice, an open source 1.2B parameter foundational model for TTS. 

Share
Picture of K L Krithika

K L Krithika

K L Krithika is a tech journalist at AIM. Apart from writing tech news, she enjoys reading sci-fi and pondering the impossible technologies, trying not to confuse it with reality.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.