Last updated February 2, 2021
In AI Mysteries

Understanding Speech: Moving Beyond ASRs

Published on October 31, 2020

by Ambika Choudhury

Deep Learning DevCon 2020 or DLDC 2020 is another conference of the year that is hosted in partnership with Analytics India Magazine. Scheduled for 29th and 30th October, the conference has brought the leading experts and best minds of deep learning and machine learning industry from around the globe.

The first session of Day 1 was presented by Abhinav Tushar, who is the head of AI at the Bengaluru-based conversational AI startup Vernacular.ai. The primary aspect of the session named “Understanding Speech: Moving beyond Automatic Speech Recognitions” is — although text-based conversational interactions have been around in the industry for a while now, speech interactions are still in infancy.

Tushar kickstarted the talk by explaining the importance of speech and the emotions hidden behind it. He said that speech is far different than text, and it is much more complex. He said, “Speech is much more than transcriptions. And that should influence how we design conversational agents.”

Tushar mentioned that some of the factors that are impacting the responses include content, environment, speaker characteristics and paralinguistics. He also gave an instance of various “okays” spoken by various people that depicted different emotions in each different time.

He then discussed the workings of the present conversational AI-based voice bot that is built at Vernacular.ai and how it is chasing up in terms of mirroring human behaviour.

The working of the framework follows the mentioned steps:

When a user speaks, the speech goes into the speech recognition block, where it extracts the speech

It then moves forward into the Automatic Speech Recognition system, that includes an acoustic model, pronunciation model and language model.

After that, it moves forward for frame understanding like intent classification, pre-processing and entity parsing.

Next step is a content management and dialogue management process.

Then the final step proceeds, where the text is transformed into speech and sent to the user.

Further, Tushar discussed the various stages of extra-lexical conversational behaviour that includes snapshot-based, flow-based and persuasive.

Snapshot-based understands the behavioural snapshots and performs simple actions. This feature includes bail out on certain cues, detect personal characteristics and switch prompt, etc.

In the Flow-based stage, the system works across multiple turns and can perform basic repairs. The feature includes tracking the consistent expression of discomfort, change in-flow experience based on the situation, etc.

In the Persuasive stage, the system persuades the other party and drives the conversation. The features include understanding and utilising uncertainties and preferences, model situations and manoeuvre, etc.

Furthermore, in order to build such intelligent systems, Tushar concluded that one must include components like:

Stylistic and semantic models
State tracking
Live experimentation framework

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

7 Transformative Accessibility Tech Solutions

15 Data Science Projects that Will Land You a Job in 2023

Why Speech Separation is Such a Difficult Problem to Solve

Neural vocoder and its application in speech recognition

Meta’s machine translation journey

Neural Nets transforming the world of search engines

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

India is Making its Own AI Servers

Pritam Bordoloi

PLI scheme marks the beginning of India ‘s manufacturing venture

GPT-5 Likely to be Released After the US Elections

Donna Eva

Generative AI Jobs in India can Fetch You up to Rs 1 Crore

Siddharth Jindal

Top Editorial Picks

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

Sukriti Gupta

Elon Musk Set to Meet Indian Spacetech Startups During Upcoming Visit

Shyam Nandan Upadhyay

Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach

Shritama Saha

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Mohit Pandey

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Featured

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Through the implementation of advanced data management methodologies, resilient data observability solutions, and cutting-edge AI frameworks, Course5 is spearheading the