AssemblyAI releases Conformer-1 API, the SOTA Speech Recognition Model

The team took inspiration from DeepMind’s data scaling laws in the Chinchilla paper and adapted into the ASR domain, curated 650k hours of English audio, making it the largest trained supervised model. 
Listen to this story

AssemblyAI, the company focused on building speech, voice, and text models, announced Conformer-1, its latest state-of-the-art speech recognition model. Built on the Conformer architecture and undergoing training on 650K hours of audio data, this model attains an accuracy level comparable to that of a human, demonstrating a reduction of up to 43% in errors when processing noisy data in comparison to alternative ASR models.

To improve on the Conformer architecture, the company leveraged Efficient Conformer, a modification on the original architecture that uses progressive downsampling which is inspired by ContextNet and also used Group Attention. These changes speedup the inference time by 29% and 36% training time.

Click here to learn more about Conformer-1.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

AssemblyAI took inspiration from DeepMind’s data scaling laws in the Chinchilla paper and adapted into the ASR domain, curated 650k hours of English audio, making it the largest trained supervised model. 

To overcome one of the biggest problems in speech recognition, the noise, the team also implemented a modified version of spare attention. It is a pruning method for attaining sparsity of the model’s weight for achieving regularisation. This is one of the greatest achievements of the model — its robustness to noise. 


Download our Mobile App



In 2020, Google Brain released the Conformer, a neural network designed for speech recognition. It is based on the Transformer architecture, which is widely used and known for its attention mechanism and parallel processing capabilities. The Conformer architecture enhances the Transformer by incorporating convolutional layers, allowing it to effectively capture both local and global dependencies, while remaining a compact neural network design.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Mohit Pandey
Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.