Advertisement

Google Introduces Low Bitrate Speech Codec For Smoother Communication

The world has undoubtedly become a small place, thanks to the advancements in communication. But our communication systems tend to fall short at times. Poor network, low bandwidth or high traffic are the major causes of such disruptions. To remedy this, Google has introduced Lyra — a high-quality, very low-bitrate speech codec for making voice communication available even on the slowest of networks.

What Is Lyra?

Popular real-time communication frameworks such as WebRTC use a compression technique called codec to encode and decode signals for transmission or storage. Codecs can efficiently transfer heavy data, both audio and video. The amount of data encoded in a unit time is measured in bitrate.

However, codec struggles with supporting high-quality, low latency communication using less data in real-time. Though it might seem counterintuitive, high-quality speech codecs require a higher bitrate than most modern video codecs. A lower bitrate for audio codec results in a less intelligible and robotic voice texture.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Lyra is a novel method for compressing and transmitting voice signals. For this, the researchers applied traditional codec techniques and the latest machine learning methods on models trained on vast amounts of data.

Credit: Google AI Blog


Download our Mobile App



Lyra extracts features or distinctive speech attributes (list of numbers representing the speech energy in different frequency bands, called log mel spectrograms) from the input every 40ms and compresses before transmitting. At the receiving end, a generative model converts the features to a speech signal.

Lyra’s new and improved ‘natural-sounding’ generative models maintain a low bitrate of codecs to achieve high-quality codecs, generally on par with state-of-art waveform codecs used in streaming platforms.

However, one drawback of these generative models is computational complexity. To overcome this, Lyra uses a cheaper variation of WaveRNN, a recurrent generative model. Though it works at a lower rate, it generates multiple parallel signals in different frequencies. These signals are then combined to output a signal at the desired sample rate. Hence, Lyra works on cloud servers and mid-range phones with a processing latency of 90ms. As per Google’s blog, this generative model is trained on thousands of hours of speech data and optimised to output the audio accurately. 

In its current form, Lyra can operate at 3kbps and can be used in situations where the bandwidth conditions are insufficient for higher-bitrates. As compared to other codecs in the same category, Lyra offers more than 60 percent reduction in bandwidth — a performance comparable to Opus, a popular codec that operates at 8kbps.

Google has trained Lyra with thousands of hours of audio with speakers in over 70 languages using open-source audio libraries and then verified the audio quality with expert and crowdsourced listeners. Lyra is conceived to ensure universally accessible, high-quality audio experiences, said the Google spokesperson.

Wrapping Up

As per Google, with Lyra, users in the emerging markets will now have access to an efficient low-bitrate codec. Lyra can also be used in cloud environments, meaning users with different networks and device capabilities can chat seamlessly. When used with video compression techniques such as AV1, Lyra can also enable video calls over the internet.

Google is looking at acceleration via GPUs and TPUs to optimise Lyra’s performance and quality. “We are also beginning to research how these technologies can lead to a low-bitrate general-purpose audio codec (i.e., music and other non-speech use cases),” Google said in the blog post.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
AIM TOP STORIES

A beginner’s guide to image processing using NumPy

Since images can also be considered as made up of arrays, we can use NumPy for performing different image processing tasks as well from scratch. In this article, we will learn about the image processing tasks that can be performed only using NumPy.

RIP Google Stadia: What went wrong?

Google has “deprioritised” the Stadia game streaming platform and wants to offer its Stadia technology to select partners in a new service called “Google Stream”.