MITB Banner

Gladio Announces Audio Transcription API built on OpenAI Whisper

Gladio’s Audio transcription API is built on Whisper-Large-v2 of OpenAI and has a WER of 1%

Share

Listen to this story

Jean-Louis Queguiner, the founder of Gladio, which works with AI deployment, announced the release of Audio transcription alpha. Built on OpenAI’s Whisper-Large-v2, the speech-to-text API is able to transcribe a 1h file in 10s with a Word Error Rate as low as 1%. It is believed to be more accurate than other products in the market by at least 5 times. The company believes that this would open up the immense scope in the audio intelligence space and broaden future applications in AI with plug-and-play APIs.  

Whisper is a pre-trained model for Audio Speech Recognition (ASR). These models have been trained on 680k hours of data. It was proposed by Alec Radford from OpenAI. The large-v2 model is trained for 2.5 times more epochs for improved efficiency. Whisper generates human-readable transcriptions, which means that the ASR system will be able to output commas, periods, hyphens and other punctuation marks. This will result in high-quality transcriptions resulting in a low Word Error Rate (WER). 

Integrating the latest NLP and deep learning research, the API for alpha is built on neural network optimization, which has resulted in improved inference speed by around 60 times compared to other similar providers in the market. Gladio is currently working on 250 models to create a “holistic intelligence solution” which can perform more than 45 tasks, including translation, summaries, gender detection and sentiment analysis. 

Inference speed is another parameter that is considered. The baseline was established by comparing the inference speed of other STT providers. At 16KHz sampling rate and 16 bits encoding, alpha was able to score 1 hour of Audio in both mono and stereo configuration, and this was compared with the results of other models that can deliver the same task within the same parameters. 

Source: Twitter

The company also believes that “democratizing access” to AI should not only be cost-centric. It should be about simplifying the complexity of the tools used. 

Share
Picture of Vandana Nair

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.