MITB Banner

Amid ChatGPT Hype, OpenAI Silently Releases Second Version of Whisper

This new model is trained for more EPOCHs with regularisation and shows improved performance compared to the previous version.

Share

Listen to this story

As ChatGPT (built on GPT-3.5 architecture) continues to make waves across the globe, OpenAI has subtly launched the second version of Whisper, an open-sourced multilingual speech recognition model. 

This new model is trained for more EPOCHs with regularisation and shows improved performance compared to the previous version. However, it has the same architecture as the original large model. The team said that it would be updating its research paper soon. 

Click here to view the source code of OpenAI Whisper V2. 

In October, AI research and development company, OpenAI released Whisper, which could translate and transcribe speech from 97 diverse languages. Whisper is trained on over 680,000 hours of multilingual data collected from the web. However, the training dataset for Whisper had been kept private. 

Since Whisper‘s first version was trained using a comparatively larger and more diverse dataset. It wasn’t fine-tuned to a specific dataset, due to which it didn’t surpass other models that were specialised around the LibriSpeech performance benchmark, one of the most noted parameters to judge speech recognition. 

OpenAI in its blog stated that it hoped that Whisper would serve as a foundation for building useful applications and for further research on robust speech processing.

Currently, the company is experimenting across various offerings. This includes DALL.E 2 which can produce art from text, the latest ChatGPT, or even the much-awaited GPT 4. However, using Whisper only to translate and transcribe audio is under-utilising the scope to do much more. 

Challenges 

Among the major challenges are the user’s laptop being not powerful enough compared to those used for professional transcription services. Secondly, installing the model is not very user-friendly. Another disadvantage is that the prediction is often biased to integer timestamps. 

Users observed that those tend to be less accurate; blurring the predicted distribution may help, but no conclusive study has been done yet.

Potential Risk 

While there are a host of advantages to using the model, there are also potential risks and disadvantages. 

On GitHub, under the ‘Broader Implications’ section of the model card, OpenAI warns that it could be used to automate surveillance or identify individual speakers in a conversation, but the company hopes it will be used “primarily for beneficial purposes”.

Share
Picture of Aparna Iyer

Aparna Iyer

Aparna Iyer has covered various sectors spanning education, wildlife, culture and law for close to a decade. She now writes on technology and is keen to unearth its capability for public good.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.