Google Introduces Low Bitrate Speech Codec For Smoother Communication

Share

Published on March 2, 2021

by Shraddha Goled

The world has undoubtedly become a small place, thanks to the advancements in communication. But our communication systems tend to fall short at times. Poor network, low bandwidth or high traffic are the major causes of such disruptions. To remedy this, Google has introduced Lyra — a high-quality, very low-bitrate speech codec for making voice communication available even on the slowest of networks.

What Is Lyra?

Popular real-time communication frameworks such as WebRTC use a compression technique called codec to encode and decode signals for transmission or storage. Codecs can efficiently transfer heavy data, both audio and video. The amount of data encoded in a unit time is measured in bitrate.

However, codec struggles with supporting high-quality, low latency communication using less data in real-time. Though it might seem counterintuitive, high-quality speech codecs require a higher bitrate than most modern video codecs. A lower bitrate for audio codec results in a less intelligible and robotic voice texture.

Lyra is a novel method for compressing and transmitting voice signals. For this, the researchers applied traditional codec techniques and the latest machine learning methods on models trained on vast amounts of data.

Credit: Google AI Blog

Lyra extracts features or distinctive speech attributes (list of numbers representing the speech energy in different frequency bands, called log mel spectrograms) from the input every 40ms and compresses before transmitting. At the receiving end, a generative model converts the features to a speech signal.

Lyra’s new and improved ‘natural-sounding’ generative models maintain a low bitrate of codecs to achieve high-quality codecs, generally on par with state-of-art waveform codecs used in streaming platforms.

However, one drawback of these generative models is computational complexity. To overcome this, Lyra uses a cheaper variation of WaveRNN, a recurrent generative model. Though it works at a lower rate, it generates multiple parallel signals in different frequencies. These signals are then combined to output a signal at the desired sample rate. Hence, Lyra works on cloud servers and mid-range phones with a processing latency of 90ms. As per Google’s blog, this generative model is trained on thousands of hours of speech data and optimised to output the audio accurately.

In its current form, Lyra can operate at 3kbps and can be used in situations where the bandwidth conditions are insufficient for higher-bitrates. As compared to other codecs in the same category, Lyra offers more than 60 percent reduction in bandwidth — a performance comparable to Opus, a popular codec that operates at 8kbps.

Google has trained Lyra with thousands of hours of audio with speakers in over 70 languages using open-source audio libraries and then verified the audio quality with expert and crowdsourced listeners. Lyra is conceived to ensure universally accessible, high-quality audio experiences, said the Google spokesperson.

Wrapping Up

As per Google, with Lyra, users in the emerging markets will now have access to an efficient low-bitrate codec. Lyra can also be used in cloud environments, meaning users with different networks and device capabilities can chat seamlessly. When used with video compression techniques such as AV1, Lyra can also enable video calls over the internet.

Google is looking at acceleration via GPUs and TPUs to optimise Lyra’s performance and quality. “We are also beginning to research how these technologies can lead to a low-bitrate general-purpose audio codec (i.e., music and other non-speech use cases),” Google said in the blog post.

Access all our open Survey & Awards Nomination forms in one place

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

‘iPhone is the Greatest Piece of Technology Humanity has Ever Made,’ Says OpenAI’s Sam Altman

Siddharth Jindal

“There are a bunch of societal and interpersonal issues that are all very complicated about wearing a computer on your face,” says OpenAI chief, taking a dig at Meta smart