Amid ChatGPT Hype, OpenAI Silently Releases Second Version of Whisper

This new model is trained for more EPOCHs with regularisation and shows improved performance compared to the previous version.
Listen to this story

As ChatGPT (built on GPT-3.5 architecture) continues to make waves across the globe, OpenAI has subtly launched the second version of Whisper, an open-sourced multilingual speech recognition model. 

This new model is trained for more EPOCHs with regularisation and shows improved performance compared to the previous version. However, it has the same architecture as the original large model. The team said that it would be updating its research paper soon. 

Click here to view the source code of OpenAI Whisper V2. 

In October, AI research and development company, OpenAI released Whisper, which could translate and transcribe speech from 97 diverse languages. Whisper is trained on over 680,000 hours of multilingual data collected from the web. However, the training dataset for Whisper had been kept private. 

Since Whisper‘s first version was trained using a comparatively larger and more diverse dataset. It wasn’t fine-tuned to a specific dataset, due to which it didn’t surpass other models that were specialised around the LibriSpeech performance benchmark, one of the most noted parameters to judge speech recognition. 

OpenAI in its blog stated that it hoped that Whisper would serve as a foundation for building useful applications and for further research on robust speech processing.

Currently, the company is experimenting across various offerings. This includes DALL.E 2 which can produce art from text, the latest ChatGPT, or even the much-awaited GPT 4. However, using Whisper only to translate and transcribe audio is under-utilising the scope to do much more. 


Among the major challenges are the user’s laptop being not powerful enough compared to those used for professional transcription services. Secondly, installing the model is not very user-friendly. Another disadvantage is that the prediction is often biased to integer timestamps. 

Users observed that those tend to be less accurate; blurring the predicted distribution may help, but no conclusive study has been done yet.

Potential Risk 

While there are a host of advantages to using the model, there are also potential risks and disadvantages. 

On GitHub, under the ‘Broader Implications’ section of the model card, OpenAI warns that it could be used to automate surveillance or identify individual speakers in a conversation, but the company hopes it will be used “primarily for beneficial purposes”.

Download our Mobile App

Aparna Iyer
Aparna Iyer has covered various sectors spanning education, wildlife, culture and law for close to a decade. She now writes on technology and is keen to unearth its capability for public good.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox