Mozilla is riding on its open-source initiatives and is continuously working on becoming a foundation for developers to innovate in machine learning landscape. The firm has been working on DeepSpeech for a long time, and has now enhanced the automatic speech recognition (ASR) engine.
DeepSpeech
DeepSpeech is a deep learning-based ASR engine that allows developers to integrate it into their applications effortlessly through API. The speech-to-text and text-to-speech engine is one of the most popular speech recognition models on GitHub with 12.2k stars. The human-friendly API is intuitive to integrate with and add speech recognition feature with pre-trained English models.
With this new update, Mozilla has optimised the engine to transform the application for delivering low-latency and privacy-preserving speech recognition capabilities.
Low Latency
DeepSpeech v0.6 will expedite workflows for users as it responds faster than its previous versions without the need for breaking the developers’ workflows. Since developers do not need to fine-tune their system, it also assists them in reducing products’ time to market. Due to the addition of new streaming decoder, it offers low latency, memory utilisation, irrespective of the length of the audio interaction.
To decrease the latency, Mozilla has included two main subsystems — an acoustic model and a decoder. While the acoustic model is a deep neural network that takes audio features as inputs and labels as outputs character probabilities, the decoder utilises beam search algorithm to transform the character probabilities into textual transcripts that are then returned by the system.
Earlier only one order was capable of streaming, but now both have the streaming ability, thereby, eliminating the need for carefully tuned silence detection algorithms in applications. Besides, the DeepSpeech delivers low-latency speech recognition services regardless of network conditions, as it supports offline functionalities.
The image below demonstrates how the streaming decoder has significantly improved the response time. The DeepSpeech is now 73% faster than its previous version.
Support For TensorFlow Lite
Mozilla has embraced TensorFlow Lite — a mobile version of TensorFlow to support mobile and embedded devices. Such adoption has led to a decrease in the package size from 98 MB to 3.7 MB, resulting in faster start-up time. Besides, the English model size has also decreased from 188 MB to 47 MB. The firm accomplished this by using post-training quantisation, a technique to reduce the size of the model post-training.
This has drastically decreased the weight of the model. As a result, it will enable developers to integrate into applications without trading off the performance speed.
The smaller model takes 22 times less memory and start-up over 500 times faster, which will help in democratising the API among speech recognition developers.
What It Means To The Community
Numerous companies like Amazon and Google have taken the lead in speech recognition but have not open-sourced their packages. While it helps them in retaining their lead in the speech recognition landscape, they might lose their position to Mozilla if it keeps making breakthroughs with the open-source community. Open-source has helped Mozilla to achieve numerous feats due to long-term volunteer contributors like debinat, Carlos Fonseca, and others.
Along with Mozilla, such initiatives will help the community to get familiar with the technology. This has allowed developers to innovate and offer speech recognition services.
For example, a Brazilian startup is already offering transcript service to the healthcare sector for generating medical reports through speeches by using DeepSpeech. In this highly competitive landscape, communities help goes a long way in enhancing the solutions.
Outlook
There was a dearth of an open-source superior speech recognition technology. Thus, only a few companies were able to deliver speech recognition-based services. However, DeepSpeech’s latest version has taken a big leap to facilitate developers with an API for catering to the needs of the market.
And with this advancement, Mozilla is now giving blue-chip companies a run for their money. Consequently, they might open-source their speech recognition technologies to retain their lead, in the future.