MITB Banner

Bengaluru-Based Startup Spext Provides An API-Agnostic Platform To Convert Voice To Text

Share

Deep learning-based models have shown a remarkable performance in translating long sentences through the long short-term memory (LSTM) model. Now, they are also being used to extract information converting video to text through a mix of DCNN with a deep LSTM (DLSTM) for the task.

Bengaluru-based startup Spext provides a text editor for voice content that automatically converts a large amount of recording conversations to text, accurately aligns the text and words. Co-founded by Anup Gosavi, Spext is a SaaS platform wherein the user can upload the media on Spext, and get voice automatically converted to text and it also aligns the spoken words with the text accurately. Gosavi explains this is a text-based voice editor with which users can delete portion of the transcript. Just like a text editor, one can use Ctrl-C, Ctrl-V to create clips or search for keywords and hear them in context.

Understanding the technology behind Spext Intelligent Media

Accurate Time Coding: The editor is built on top of a new kind of intelligent media that includes time coded information (for example, when was a word spoken, who are the speakers, what was the context etc.) with media itself. This means it is more granular and richer than traditional media like .mp3, .mp4.

Serverless Interaction: Spext’s technology allows interaction with the media locally, which means there will be no API calls to any of Spext’s servers and hence it works in the browser. Users are not required to download any software.

Talking about the DL algorithms built for converting video to text, Gosavi shares the early stage has built DL algorithms to optimise the following functions:

Accurately Align Transcript With Spoken Words: Speech-to-text APIs are optimised for captions, so accurate timestamps of speech-text alignment are not expected or prioritised. Their algorithms accurately align this at microsecond level, so that any cuts or pastes of new media feel natural and don’t sound glitchy.

Punctuation: Spext works with long form media which is usually over 100 minutes in length and have our their punctuation algorithms that have around 85 percent accuracy. For a quality recording with minimum background noise, the models achieve an accuracy of 92-96 percent which is not very far from human transcription which has a 97-98 percent accuracy. However, the accuracy can be lower if the audio is of poor quality (sampling rate), has background noise or unclear accents.

Enterprise Uses For Spext

The early stage startup, founded in 2017 is working on the Enterprise version of Spext which will feature intelligent labeling and automatic tagging, in addition to transcribing. The startup is also keen to add three Indian companies to our customer advisory group to help them launch by 2019. “We started in 2017 with a focus on the US market but are developing technology to support local languages. We currently support English (Indian accent) and Hindi. But in the future, we want to support all the local languages – the market for voice content in Indian languages is enormous,” said Gosavi. Broadening from its initial use case — podcast — the startup us now testing its intelligent media technology for enterprise uses around voice content.

Voice Search Inside Media: The user can ask, “Show me when Messi scored a goal.” and instead of just showing a huge list of videos, Spext can find the time where Messi scored, automatically create a short two-minute clip from the long 90-minute game and play it.

Automatic Media Tagging: A huge amount of media is stored on storage solutions and it is physically impossible to tag it manually. Spext can automatically classify, transcribe and tag the media with tags like objects, when was a word spoken, what questions were asked, who are the speakers, what was the context etc. Editor will make it easy for employees to review or correct these tags if necessary. This will also save hundreds of thousands of man hours. “We plan to work with storage solutions so that the media is tagged while it being stored for the first time,” he said.

Manage Corporate Content Libraries And Company Archives: A lot of corporate media stored is not accessed often because it is stored as a large, monolithic file — for example it is not tagged by speaker, context, topic. So, it is not accessed and repurposed for marketing, training or analysis. Spext tags and classifies this media intelligently and makes this content shareable and clippable. That means archives become accessible and usable.

API-Agnostic Platform

Spext platform is built on technologies provided by IBM, Google and Amazon. Gosavi explains the underlying technology is built on these APIs, making it API agnostic. “For example, depending on the type of file uploaded, we select one of the APIs that we think is most accurate. e.g for a phone call, we might select Amazon Transcribe but for a video lecture, we might select Google. This is done dynamically because the speech to text models are usually optimised for a particular type of media,” he explained.

This SaaS based platform has a monthly pricing. The subscription starts at $39.99 per month for four hours of media upload and includes bulk discounts (starting at $7 per hour) for additional hours if you have a lot of media.

Share
Picture of Richa Bhatia

Richa Bhatia

Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.