MITB Banner

Why Facebook’s New Low Latency Speech Recognition Framework Can Be A Game Changer

Share

Facebook Speech Recognition

Online speech recognition is gaining prominence as people across the world are leveraging it to control devices and get answers to their queries. According to a report, around 55% of teenagers use voice searches. And not just search, in the future, people will be using their voice to command machines in order to carry out a wide range of tasks. Consequently, automatic speech recognition (ASR) is on the rise as its potential to streamline the workflows is humongous. However, high latency is impeding its adoption among users as it causes hindrance while performing tasks in real-time.

However, with Facebook’s open-source online speech recognition, wav2letter@anywhere, developers are able to utilise it to make applications for delivering superior user experience by reducing the latency. Built on top of its benchmarked libraries wave2letter++, and wav2letter, wav2letter@anywhere will provide high speed.

Facebook’s Open-Source Online Speech Recognition

Unlike other ASR that uses recurrent neural networks (RNNs), wav2letter@anywhere utilises convolutional acoustic models. The firms had ready benchmarked the reason for the full convolutional acoustic models along with connectionist temporal classification (CTC), being faster than RNN — while still achieving a better word error rate (WER). This, in turn, enables them to enhance the throughput by 3x on specific inference models. The idea behind this framework is to support end-to-end speech recognition workflows that could be used in productions for developing robust applications. For this, the social media giant focused on supporting concurrent audio streams for empowering developers to scale while improving its performance. Besides, it also offers APIs to ensure compatibility with various platforms such as Android, iOS, among others.

Written in C++, wav2letter@anywhere delivers speed from the ground up, allowing developers to make applications that can render audio and will be able to provide results quickly. The framework was developed, keeping in mind the requirement for streaming. Thus, it uses Facebook General Matrix Multiplication (FBGEMM) – a low-precision, high-performance matrix-matrix multiplications and convolution library, for a server-side interface. 

Facebook also utilised time-depth separable (TDS) convolution to reduce the model size and computational flops while avoiding a hit on accuracy. Further, the firm used asymmetric padding for all the convolutions at the beginning of the input to reduce the requirement for future context by acoustic model, thereby, decreasing latency.

Leveraging the features of wav2letter++ along with modern acoustic language model architectures in both supervised and semi-supervised settings, Facebook expedited the process of speech recognition.

Impact On The Speech Recognition Community

Unlike other technologies where there are a plethora of open-source projects, speech recognition has only a few effective projects that are accessible for all. This slackens the growth in the implementation of this technology. Currently, various blue-chip companies own robust ASR but do not open-source it as they want to create a monopoly in the landscape.

Companies like Google, Amazon, and Microsoft use it for their virtual assistant while competing among themselves. Although this has allowed them to narrow the competition, they might lose it in future as several organisations such as Mozilla, and now Facebook has also made low-latency ASR public. This will allow the developers to create products that can challenge blue-chip companies.

Facebook’s open-source online speech recognition will now encourage others to contribute to their projects, which will further improve the framework. Mozilla took the same approach, and it benefited them as contributors like debinat, Carlos Fonseca, were central for achieving a low-latency in its DeepSpeech library. Such success might motivate current leaders like Google and Amazon in the ASR landscape to open-source their speech recognition in the future.

Outlook

Today, open-source is crucial for any technology to expand rapidly as, over the years, collaborative efforts have helped data science marketplace to proliferate. Now with Facebook’s open-source online speech recognition, developers, in addition to Mozilla’s DeepSpeech, will have another option to choose from and leverage it upon their use cases. Such initiatives from Facebook and Mozilla will now empower developers, thereby, increasing the competition in the ASR landscape.

Share
Picture of Rohit Yadav

Rohit Yadav

Rohit is a technology journalist and technophile who likes to communicate the latest trends around cutting-edge technologies in a way that is straightforward to assimilate. In a nutshell, he is deciphering technology. Email: rohit.yadav@analyticsindiamag.com
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.