Unlocking Value from Speech Data

Screen Shot 2015-12-26 at 11.09.07 AMToday a lot of unstructured data is being generated in the form of text, images, videos and speech. This data could contain valuable information that companies can utilize to make the right decisions. In this article, we focus on one such form of unstructured data which is speech.

We present a use case, where we analyzed speech in clinical trials to automate a significant part of the operational processes, which has the potential to reduce the quality control costs by half.

What Is Speech Analytics?

Speech has several aspects to it. Some of the elements of speech like words, speech rate, tone, emotions etc. are discernible by humans.


Sign up for your weekly dose of what's up in emerging technology.

There are other elements that humans don’t identify so easily like minor variations in pitch and speech rates.

Speech analytics is the characterization of speech based on these factors to derive actionable business insights from the data.

Download our Mobile App

There are several ways in which speech can be analyzed, based on the type of application:

Full transcription

Full transcription involves conversion of speech into text format in applications like Siri or in transcribing meetings (for example, between a doctor and a patient), conferences, etc. Converting speech into text allows it to be searched more easily.

Speaker diarization

Speaker diarization involves separation of certain sections of speech based on the speaker. While transcribing speech with more than one speaker, like a meeting or a conference, it is important to not just convert speech to text but to identify who the speaker is.

Keyword detection

Keyword detection entails identification of certain specific keywords in an audio. Customer care centers can detect certain keywords like “unhappy” and “disappointed” and use them to monitor agent performance.

Speaker authentication/identification (voice fingerprinting)

Speaker authentication/identification (voice fingerprinting) involves identifying unique characteristics in every speaker’s voice that allow us humans to differentiate between and identify speakers. Some fraud detection applications capture these unique features and create voice fingerprints during customer care interactions and compare against known blacklists.

Emotion detection

Emotion detection involves identification of the emotional state of the speaker. This can help identify irate customers during customer care interactions, among other applications.

Other characteristics of conversation

These are pauses, noise, etc. Characteristics like loud noises or long pauses could be indicators of a bad customer care conversation.

Depending on the type of business problem, the analysis framework would have one or more of the above.

Problems Faced During Clinical Trials

Testing the efficacy of drugs for mental illnesses involves the doctor having detailed discussions with the patients to evaluate their mental state at various stages of the treatment.

The clinical trials evaluate both the quality of the interviews and then whether or not the drug meets its targets. Interview quality evaluation typically involves experts listening to audio recordings of the interviews and scoring it on various quality metrics. This manual review is quite expensive.

The objective here is to use speech analytics to assist the manual reviewers and significantly cut down the costs associated with review time.

Role of Speech Analytics


The first step was for us to remove any background noise so that the spoken dialog is clearly heard. We then split the files into sections of alternating speech and silence. Following this, we grouped the speech sections into clusters, each representing different speakers.

Feature extraction

We then extracted several hundred features from the audio files starting from direct features like duration and amplitudes to more abstract features like speech rates, frequency wise energy content and MFCCs. Among other things, these features also helped capture information that was characteristic of a person, similar to how a human would identify a person by their voice.


The objective was to predict an interview quality score, a single number constructed by combining several qualitative aspects of the interview quality. We computed this score manually for a few audio files and then developed machine learning algorithms to identify inherent patterns and predict this score for all other audio files. We used various supervised machine learning techniques – logistic regression, boosted trees, random forests, support vector machines, etc. The best performing algorithm improved accuracy of identifying bad interviews by more than 50% compared to the random baseline, meaning the cost of identifying potentially bad interviews was halved. In other words, in the same amount of time, one could identify and review twice the number of bad interviews and gain rich insights which will eventually help the quality of clinical trials significantly.


Speech analytics is an area with potential applications in almost all businesses that have any form of verbal interaction from call centers to classrooms. With the increase in computing power,

and big data technologies, analyzing large volumes of unstructured speech data is becoming increasingly mainstream. When used appropriately, it can give a company significant reduction in cost as well as strong competitive advantage. Some functions like customer care have started incorporating speech analytics but there is still a long way to go before the full potential is realized.

[divider divider_color=”#777777″ link_color=”#777777″ size=”1″]

About the Authors/Tiger Analytics

Patanjali V, the primary author, is a Lead Data Scientist at Tiger Analytics. He leads advanced analytics engagements that involve complex/unstructured data.

Anand Bharadwaj, the co-author, is a Director at Tiger Analytics. He has 18+ years of experience in the consulting industry and loves to ensure business value realization of analytics solutions

Tiger Analytics, ( provides Big Data and advanced analytics solutions to help businesses make data driven business decisions. We bring deep expertise in data sciences along with understanding of business needs and state-of-the-art technologies to solve business problems.

More Great AIM Stories

Anand Bharadwaj
Anand leads business development for Tiger Analytics, bringing in a strong consulting perspective which puts success of our clients first. He has led teams in IBM, Cognizant, among others, for 18+ years building start-ups and key client relationships. He has helped clients solve a variety of problems across verticals using IT, analytics or consulting. He has an MBA from Xavier Institute of Management Bhubaneswar (XIMB), India and has done executive education programs from London Business School and Carnegie Mellon University.

AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

What went wrong with Meta?

Many users have opted out of Facebook and other applications tracking their activities now that they must explicitly ask for permission.