This Mozilla Project Can Be A Game Changer In The Automatic Speech Recognition Landscape

Speech Recognition Mozilla AIM

Mozilla is riding on its open-source initiatives and is continuously working on becoming a foundation for developers to innovate in machine learning landscape. The firm has been working on DeepSpeech for a long time, and has now enhanced the automatic speech recognition (ASR) engine.


DeepSpeech is a deep learning-based ASR engine that allows developers to integrate it into their applications effortlessly through API. The speech-to-text and text-to-speech engine is one of the most popular speech recognition models on GitHub with 12.2k stars. The human-friendly API is intuitive to integrate with and add speech recognition feature with pre-trained English models.

With this new update, Mozilla has optimised the engine to transform the application for delivering low-latency and privacy-preserving speech recognition capabilities.


Sign up for your weekly dose of what's up in emerging technology.

Low Latency

DeepSpeech v0.6 will expedite workflows for users as it responds faster than its previous versions without the need for breaking the developers’ workflows. Since developers do not need to fine-tune their system, it also assists them in reducing products’ time to market. Due to the addition of new streaming decoder, it offers low latency, memory utilisation, irrespective of the length of the audio interaction. 

To decrease the latency, Mozilla has included two main subsystems — an acoustic model and a decoder. While the acoustic model is a deep neural network that takes audio features as inputs and labels as outputs character probabilities, the decoder utilises beam search algorithm to transform the character probabilities into textual transcripts that are then returned by the system. 

Earlier only one order was capable of streaming, but now both have the streaming ability, thereby, eliminating the need for carefully tuned silence detection algorithms in applications. Besides, the DeepSpeech delivers low-latency speech recognition services regardless of network conditions, as it supports offline functionalities.

The image below demonstrates how the streaming decoder has significantly improved the response time. The DeepSpeech is now 73% faster than its previous version. 

Speech Recognition

Support For TensorFlow Lite

Mozilla has embraced TensorFlow Lite ­­­— a mobile version of TensorFlow to support mobile and embedded devices. Such adoption has led to a decrease in the package size from 98 MB to 3.7 MB, resulting in faster start-up time. Besides, the English model size has also decreased from 188 MB to 47 MB. The firm accomplished this by using post-training quantisation, a technique to reduce the size of the model post-training. 

This has drastically decreased the weight of the model. As a result, it will enable developers to integrate into applications without trading off the performance speed.

Speech Recognition Framework Performance

The smaller model takes 22 times less memory and start-up over 500 times faster, which will help in democratising the API among speech recognition developers.

What It Means To The Community 

Numerous companies like Amazon and Google have taken the lead in speech recognition but have not open-sourced their packages. While it helps them in retaining their lead in the speech recognition landscape, they might lose their position to Mozilla if it keeps making breakthroughs with the open-source community. Open-source has helped Mozilla to achieve numerous feats due to long-term volunteer contributors like debinat, Carlos Fonseca, and others.

Along with Mozilla, such initiatives will help the community to get familiar with the technology. This has allowed developers to innovate and offer speech recognition services. 

For example, a Brazilian startup is already offering transcript service to the healthcare sector for generating medical reports through speeches by using DeepSpeech. In this highly competitive landscape, communities help goes a long way in enhancing the solutions.


There was a dearth of an open-source superior speech recognition technology. Thus, only a few companies were able to deliver speech recognition-based services. However, DeepSpeech’s latest version has taken a big leap to facilitate developers with an API for catering to the needs of the market.

And with this advancement, Mozilla is now giving blue-chip companies a run for their money. Consequently, they might open-source their speech recognition technologies to retain their lead, in the future.

More Great AIM Stories

Rohit Yadav
Rohit is a technology journalist and technophile who likes to communicate the latest trends around cutting-edge technologies in a way that is straightforward to assimilate. In a nutshell, he is deciphering technology. Email:

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM