MITB Banner

Google Introduces Low Bitrate Speech Codec For Smoother Communication

Share

The world has undoubtedly become a small place, thanks to the advancements in communication. But our communication systems tend to fall short at times. Poor network, low bandwidth or high traffic are the major causes of such disruptions. To remedy this, Google has introduced Lyra — a high-quality, very low-bitrate speech codec for making voice communication available even on the slowest of networks.

What Is Lyra?

Popular real-time communication frameworks such as WebRTC use a compression technique called codec to encode and decode signals for transmission or storage. Codecs can efficiently transfer heavy data, both audio and video. The amount of data encoded in a unit time is measured in bitrate.

However, codec struggles with supporting high-quality, low latency communication using less data in real-time. Though it might seem counterintuitive, high-quality speech codecs require a higher bitrate than most modern video codecs. A lower bitrate for audio codec results in a less intelligible and robotic voice texture.

Lyra is a novel method for compressing and transmitting voice signals. For this, the researchers applied traditional codec techniques and the latest machine learning methods on models trained on vast amounts of data.

Credit: Google AI Blog

Lyra extracts features or distinctive speech attributes (list of numbers representing the speech energy in different frequency bands, called log mel spectrograms) from the input every 40ms and compresses before transmitting. At the receiving end, a generative model converts the features to a speech signal.

Lyra’s new and improved ‘natural-sounding’ generative models maintain a low bitrate of codecs to achieve high-quality codecs, generally on par with state-of-art waveform codecs used in streaming platforms.

However, one drawback of these generative models is computational complexity. To overcome this, Lyra uses a cheaper variation of WaveRNN, a recurrent generative model. Though it works at a lower rate, it generates multiple parallel signals in different frequencies. These signals are then combined to output a signal at the desired sample rate. Hence, Lyra works on cloud servers and mid-range phones with a processing latency of 90ms. As per Google’s blog, this generative model is trained on thousands of hours of speech data and optimised to output the audio accurately. 

In its current form, Lyra can operate at 3kbps and can be used in situations where the bandwidth conditions are insufficient for higher-bitrates. As compared to other codecs in the same category, Lyra offers more than 60 percent reduction in bandwidth — a performance comparable to Opus, a popular codec that operates at 8kbps.

Google has trained Lyra with thousands of hours of audio with speakers in over 70 languages using open-source audio libraries and then verified the audio quality with expert and crowdsourced listeners. Lyra is conceived to ensure universally accessible, high-quality audio experiences, said the Google spokesperson.

Wrapping Up

As per Google, with Lyra, users in the emerging markets will now have access to an efficient low-bitrate codec. Lyra can also be used in cloud environments, meaning users with different networks and device capabilities can chat seamlessly. When used with video compression techniques such as AV1, Lyra can also enable video calls over the internet.

Google is looking at acceleration via GPUs and TPUs to optimise Lyra’s performance and quality. “We are also beginning to research how these technologies can lead to a low-bitrate general-purpose audio codec (i.e., music and other non-speech use cases),” Google said in the blog post.

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India