MITB Banner

Hands-On Guide To Develop Speech To Text Converter Using Python and Google API

In this article, we will build a simple speech to text converter with Python and the google cloud API.

Share

Hey Siri”, “okay Google” and “Alexa” is something we say almost every day to quickly get information without having to type in the search box. These devices are great to listen and understand your voice and give a suitable output. How do they work? They are designed in a highly efficient speech recognition software that can understand multiple accents and a natural language processing algorithm to convert this speech into text. But before these smart devices find the information you asked for, they need to understand what you are saying. Let us implement a speech to text converter using Python and a google API.

In this article, we will build a simple speech to text converter with Python and the google cloud API.

What is speech recognition and how does it work?

Speech recognition is a system that translates the language being spoken into text format. To do this, a deep learning model is used that takes in audio signals, analyses them and converts them into the corresponding text.

google API

Above is the workflow of the google API for converting speech to text. It takes in the voice input from the user device and this is sent to some of the core cloud functions. These functions perform internal processes like converting the audio input into signals and preprocessing them. Then, it is sent to the speech to text API which applies a deep learning model and understands what the user is trying to say. Finally, it is passed to the autoML NLP where the speech signal that is understood by the deep learning model is converted into text format and the output is displayed. 

Features of the speech to text API

  1. Streaming speech to text in real-time: the API is capable of processing real-time audio signals from the device microphone or take an audio file as input and convert it into text also. 
  2. Different models based on the domain: you can choose from different trained models depending on the requirements of the project. For example, for converting audio from a telephone, the enhanced phone call model can be used. 
  3. Adaptation: you can customize the API to understand rare words, currency, numbers etc by making these as additional classes.

Implementation with microphone

Now that we know how the Google API works we will put it to use and activate the microphone in the system and convert it into text. 

Installation

Before we get into the implementation, you will have to download the library with the pip command. 

pip install SpeechRecognition

Next, we can use the API and write code to build the speech to text converter in real-time for the English language. 

Code

We will first import the library and activate our microphone as follows:

import speech_recognition as sr
recognizer = sr.Recognizer()
with sr.Microphone() as inputs:
    print("Please speak now")
    listening = recognizer.listen(inputs)
    print("Analysing...")

Now that we have the input ready it is time to call the Google API to recognize the speech and display the text. 

try:
        print("Did you say: "+recognizer.recognize_google(listening))
    except:
         print("please speak again")

Output: 

Another interesting thing about this is the number of languages it supports. It not only supports common languages of the world but also supports multiple Indian languages as well. Since I speak Kannada, I will include one small change in the code and display the output in Kannada. 

Implementation of speech to text in Kannada

To make the API understand the language and give the output, just make the following changes to the code.

import speech_recognition as sr
recognizer = sr.Recognizer()
with sr.Microphone() as inputs:
    print("Please speak now")
    listening = recognizer.listen(inputs)
    print("Analysing...")
    try:
        print("Did you say: "+recognizer.recognize_google(listening,language = "kn-IN"))
    except:
         print("please speak again")

Output:

Implementation on an audio file

We saw how to use the API in real-time with the microphone for English and Kannada language. But what if we have an audio file and want to extract text from that? To do this, first, let us select the audio file. 

import speech_recognition as sr
recognition = sr.Recognizer()
with sr.AudioFile('myvoice.wav') as inputs:
    file_audio = recognition.listen(inputs)
    try:
        convert_text = recognition.recognize_google(file_audio)
        print('Analysing...')
        print(convert_text)
    except:
         print('could not hear')

As shown the contents of the audio file is displayed as text. 

Conclusion

In this article, we saw how to make use of the Google API to convert speech to text using the microphone in English and Kannada and using an audio file as well. This can be really useful in natural language processing projects for handling audio files and transcripts as well. 

Share
Picture of Bhoomika Madhukar

Bhoomika Madhukar

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.