Advertisement

Active Hackathon

New Research Suggests Speech Recognition Technology May Be Racist

The process of decoding human speeches by machines is called Speech Recognition. It has been gaining much traction in recent times by big tech companies. With the advancement of deep learning and natural language processing (NLP), this technique has become widespread as virtual assistants, hands-free computing, digital dictation platform, and automated subtitling for video content, among others. According to reports, the overall voice and speech recognition market is expected to grow at a CAGR of 17.2% from 2019 to 2025 to reach $26.8 billion.

However, research has revealed that Automatic Speech Recognition (ASR) technology exhibits racism for some sub groups of people. According to researchers at Stanford University, the ASR technique does not work equally well for all sub groups of the population.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The researchers examined the ability of five state-of-the-art ASR systems, which have been developed by Amazon, Apple, Google, IBM, and Microsoft, to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. The researchers found that all five Automatic Speech Recognition (ASR) systems exhibited substantial racial disparities, with an average word error rate of 0.35 for black speakers compared with 0.19 for white speakers. 

Dataset Used

The analysis performed by the researchers is based on two collected datasets of conversational speech. The first dataset is the Corpus of Regional African American Language (CORAAL), which is a collection of socio-linguistic interviews with several black individuals speaking African American Vernacular English (AAVE). The second dataset used is Voices of California (VOC), which is a compilation of interviews recorded in both rural and urban areas of the state. In total, the corpus spans five US cities and consists of 19.8 hours of audio, which is being matched on the age and gender of the speakers.

The performance of the ASR systems is evaluated in terms of the word error rate (WER). Despite variation in transcription quality across systems, the researchers found that the error rates for black speakers were approximately twice as large in each of the cases when compared to white speakers.

The Analysis

For the analysis of the ASR techniques, the researchers used methods such as data filtering, standardization and matching procedures. The researchers compared the ASR techniques in several ways. 

Firstly, the researchers computed the average word error rates for machine transcriptions across matched audio snippets of white and black speakers. In this case, Apple ASR showed the worst overall performance. 

The investigation of the racist nature of ASR techniques has been concluded by implementing two mechanisms that could account for the racial disparities. These are a performance gap in the ‘language models’ (models of lexicon and grammar) underlying modern ASR systems, and a performance gap in the acoustic models underlying these systems. The researchers found evidence of a gap in the acoustic models, but not in the language models.

The Outcome

The researchers found that all five ASR systems exhibited racism, with an average WER of 0.35 for black speakers, while it stood at 0.19 for white speakers. According to the researchers, the exact language models underlying commercial ASR systems are not readily available. 

The findings indicate that the racial disparities arise primarily from a performance gap in the acoustic models, further suggesting that the systems may get confused by the phonological, phonetic, or prosodic characteristics of African American Vernacular English, rather than the grammatical or lexical characteristics. The cause of this inefficiency is suspected to be the insufficient amount of audio data from black speakers when training the models.

Wrapping Up

The researchers proposed strategies such as using more diverse training datasets that include African American Vernacular English. This measure can be used to mitigate the issue and reduce performance differences. It will also ensure the inclusiveness of speech recognition technology.

Read the paper here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.