MITB Banner

Step By Step Guide To Create Your Own Speech Classifier

Text classification is one of the most common problems in natural language processing. In the past few years, there have been numerous successful attempts which gave rise to many state-of-the-art language models capable of performing classification tasks with accuracy and precision. Text classification powers many real-world applications — from simple spam filtering to voice assistants like Alexa. These applications have the capability to classify the user’s input to understand the context of spoken words.

In this article, we will build on the basic idea of giving the machine the power to listen to human speech and classify what the person is talking about.

In one of our previous tutorials, we talked about a library called Simple Transformers that makes it easy to implement many transformer-based state-of-the-art language models. We used the library to classify any number of input texts into one of the following four news sections:

  • Politics
  • Technology
  • Entertainment
  • Business

In this article, we will use the same classifier to classify what you speak into one of those four categories effortlessly with very minor changes in code. 

Setting Up The Project Environment

Creating A Virtual Environment

Pre-requisites:

  • Anaconda

We will create a python virtual environment for our project and we will create one using Anaconda.

You can download and install Anaconda here.

Once you have installed Anaconda, you can activate it in your terminal or command prompt using the command conda activate.

Open up your terminal window and move to a directory where you want your project files to reside.

Note:

If your terminal line starts with (base) as shown in the below image, it implies that your conda environment is active by default. You can deactivate the conda environment using conda deactivate command when necessary.

For now we will use the conda environment to create a virtual environment for our project. Make sure the python, pip and virtualenv modules belong to the Conda environment before proceeding. You can do this by typing the following commands:

Activates your anaconda environment (If not active by default):

conda activate

Display the location of python 

which python

Display the location of pip

which pip

Display the location of virtualenv

which virtualenv

If all the modules belong to the Anaconda distribution we can proceed to create our environment. To create a virtual environment type and execute the following command:

virtualenv your_project_name

Once the process completes you will find a new directory which contains your working environment. Now we can deactivate the conda environment and activate our project environment . Type and execute the following:

Move into the new directory 

cd your_project_name

Deactivate Conda

conda deactivate

Activate project environment

source bin/activate

See the below image for reference:

Installing Dependencies

We now have a virtual environment, so let’s install our project dependencies. We have four main dependencies as listed below. Copy them and save as requirements.txt in your project folder. 

The project requires the following packages and modules:

  • SpeechRecognition==3.8.1
  • PyAudio==0.2.11
  • simpletransformers==0.10.4
  • torch==1.3.1

You can find the complete environment requirements here.

To install the above-mentioned modules, type and execute the following command:

pip install -r requirements.txt

Important Note:

Make sure you are in the project directory and the environment is active. When the environment is active you terminal lines will start with the environment name(Project directory name)

See the below image for reference:

Great job! 

We can finally build our application now:

Text Classifier To A Speech Classifier

A Walkthrough Of the Text Classifier

In Transformers Simplified: A Hands-on Intro To Text Classification Using Simple Transformers, we elaborately discussed how to implement text classification using the RoBERTa model. To build a speech classifier, we will use the model that we trained to classify texts that are given as speech inputs.

Getting The Data

We will use data from MachineHack’s Predict The News Category Hackathon. To get the datasets, head to MachineHack, sign up and start the course.

Dataset Features:

The problem is to classify a given piece of text into either of the four sections given below.

Size of training set: 7,628 records

Size of test set: 2,748 records

FEATURES:

STORY: A part of the main content of the article to be published as a piece of news.

SECTION: The genre/category the STORY falls in.

There are four distinct sections where each story may fall in to. The Sections are labelled as follows :

Politics: 0

Technology: 1

Entertainment: 2

Business: 3

Download the datasets and move the training data into your project directory.

Building A Simple Text Classifier & Saving It

Within your project environment, activate the conda environment to access jupyter notebooks. Now we have two active environments. 

Type and execute jupyter notebook to open up the notebook app. See the image below:

Click on new and create a python 3 notebook as follows:

All the steps have been explained in detail here.

On executing the above notebook, a file called News_Classifier.pkl will be created in your project directory. We will use this for our Speech Classifier.

The Speech Classifier

Finally, we are all set to classify our spoken words. We will implement this in just 2 simple blocks of code.

Here is what we need to do. First, we will load the saved News_classifier model. Then we will write a simple code to get input from a microphone and convert it into text. And finally, we will classify the text using the loaded model. As simple as that, we will now implement it.

Open a new python3 notebook and fill in the following code blocks:

Loading The Saved Model

Pickle is a python utility that allows us to save and export python objects 

import pickle

filename = 'News_Classifier.pkl'

model = pickle.load(open(filename, 'rb'))

Getting The Speech Input & Classifying The Speech

The speech_recognition is a very handy library that allows us to use a microphone to record and convert a speech input into text. You can read more about the library here. In the above code block, we first declare and initialise the listener, in this case, a microphone. The speech_recognition object gets the input from the microphone and converts it into a text. We save the text and its fed into our classifier which then returns a label. 

Complete Code:

You can also run the Speech classifier on the terminal by downloading the above notebook as a python and executing the file as shown below:

Here are the complete directory contents for your reference:

All the codes are available here.

Great! You created your first Speech Classifier! You can now experiment with the model and train it to identify many different classes or you may even build a Sentiment classifier effortlessly using the above code.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Amal Nair

Amal Nair

A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories