Now Reading
Hands-On Guide To Sign Language Classification Using CNN

Hands-On Guide To Sign Language Classification Using CNN

Dr. Vaibhav Kumar
American Sign Language Classification

Computer Vision has many interesting applications ranging from industrial applications to social applications. It has also been applied in many support for physically challenged people. For deaf-mute people, computer vision can generate English alphabets based on the sign language symbols. It can recognize the hand symbols and predict the correct corresponding alphabet through sign language classification.

In this article, we will classify the sign language symbols using the Convolutional Neural Network (CNN). After successful training of the CNN model, the corresponding alphabet of a sign language symbol will be predicted. We will evaluate the classification performance of our model using the non-normalized and normalized confusion matrices. Finally, we will obtain the classification accuracy score of the CNN model in this task.

The Data Set

In this article, we have used the American Sign Language (ASL) data set that is provided by MNIST and it is publicly available at Kaggle. This dataset contains 27455 training images and 7172 test images all with a shape of 28 x 28 pixels. These images belong to the 25 classes of English alphabet starting from A to Y (No class labels for Z because of gesture motions). The dataset on Kaggle is available in the CSV format where training data has 27455 rows and 785 columns. The first column of the dataset represents the class label of the image and the remaining 784 columns represent the 28 x 28 pixels. The same paradigm is followed by the test data set.

american sign language

Implementation of Sign Language Classification

This code was implemented in Google Colab and the .py file was downloaded.

# -*- coding: utf-8 -*-

Automatically generated by Colaboratory.

Original file is located at

The training and test CSV files were uploaded to the google drive and the drive was mounted with the Colab notebook. The below code snippet are used for that purpose.

#Setting google drive as a directory for dataset
from google.colab import drive 

The directory of the uploaded CSV files is defined using the below line of code.

dir_path = "gdrive/My Drive/Dataset"

We will verify the contents of the directory using the below lines of codes.

import os
for dirname, _, filenames in os.walk(dir_path):
    for filename in filenames:
        print(os.path.join(dirname, filename))
sign language data set

We will print the Sign Language image that we can see in the above list of files.

from IPython.display import Image
Image('gdrive/My Drive/Dataset/amer_sign2.png')

American Sign Language 

Some important libraries will be uploaded to read the dataset, preprocessing and visualization.

import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt

We will read the training and test CSV files

train = pd.read_csv('gdrive/My Drive/Dataset/sign_mnist_train.csv')
test = pd.read_csv('gdrive/My Drive/Dataset/sign_mnist_test.csv')

We will check the shape of the training and test data that we have read above.


data set shape

We will check the training data to verify class labels and columns representing pixels. 


data set head 

For further preprocessing and visualization, we will convert the data frames into arrays. 

# Create training and testing arrays
train_set = np.array(train, dtype = 'float32')
test_set = np.array(test, dtype='float32')

We will specify the class labels for the images.

#Specifying class labels
class_names = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y' ]

We will check a random image from the training set to verify its class label.

#See a random image for class label verification
i = random.randint(1,27455)
label_index = train["label"][i]

sign language 

Now, we will plot some random images from the training set with their class labels.

# Define the dimensions of the plot grid 
W_grid = 5
L_grid = 5
fig, axes = plt.subplots(L_grid, W_grid, figsize = (10,10))
axes = axes.ravel() # flaten the 15 x 15 matrix into 225 array
n_train = len(train_set) # get the length of the train dataset
# Select a random number from 0 to n_train
for i in np.arange(0, W_grid * L_grid): # create evenly spaces variables 
    # Select a random number
    index = np.random.randint(0, n_train)
    # read and display an image with the selected index    
    axes[i].imshow( train_set[index,1:].reshape((28,28)) )
    label_index = int(train_set[index,0])
    axes[i].set_title(class_names[label_index], fontsize = 8)

American Sign Language

In the next step, we will preprocess out datasets to make them available for the training.

# Prepare the training and testing dataset 
X_train = train_set[:, 1:] / 255
y_train = train_set[:, 0]

X_test = test_set[:, 1:] / 255
y_test = test_set[:,0]

From the processed training data, we will plot some random images.

#Visualize train images
plt.figure(figsize=(10, 10))
for i in range(25):
    plt.subplot(5, 5, i + 1)
    label_index = int(y_train[i])

training set sign language classification

Now, to train the model, we will split our data set into training and test sets.

#Split the training and test sets
from sklearn.model_selection import train_test_split
X_train, X_validate, y_train, y_validate = train_test_split(X_train, y_train, test_size = 0.2, random_state = 12345)

Now, we will check the shape of the training data set.


To train the model, we will unfold the data to make it available for training, testing and validation purposes.

See Also
goutte a php framework for web scraping

# Unpack the training and test tuple
X_train = X_train.reshape(X_train.shape[0], *(28, 28, 1))
X_test = X_test.reshape(X_test.shape[0], *(28, 28, 1))
X_validate = X_validate.reshape(X_validate.shape[0], *(28, 28, 1))


Convolutional Neural Network

In the next step, we will define our Convolutional Neural Network (CNN) Model. For this purpose, first, we will import the required libraries. Make sure that you have installed the TensorFlow if you are working on your local system.

#Library for CNN Model
import keras
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from keras.optimizers import Adam
from keras.callbacks import TensorBoard

#Defining the Convolutional Neural Network
cnn_model = Sequential()

cnn_model.add(Conv2D(32, (3, 3), input_shape = (28,28,1), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size = (2, 2)))

cnn_model.add(Conv2D(64, (3, 3), input_shape = (28,28,1), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size = (2, 2)))

cnn_model.add(Conv2D(128, (3, 3), input_shape = (28,28,1), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size = (2, 2)))


cnn_model.add(Dense(units = 512, activation = 'relu'))
cnn_model.add(Dense(units = 25, activation = 'softmax'))

After defining our model, we will check the model by its summary.

#CNN Model Summary

convolutional neural network sign language classification

In the next step, we will compile and train the CNN model.

cnn_model.compile(loss ='sparse_categorical_crossentropy', optimizer='adam' ,metrics =['accuracy'])

#Training the CNN model
history =, y_train, batch_size = 512, epochs = 50, verbose = 1, validation_data = (X_validate, y_validate))


After successful training, we will visualize the training performance of the CNN model.

#Visualizing the training performance
plt.figure(figsize=(12, 8))

plt.subplot(2, 2, 1)
plt.plot(history.history['loss'], label='Loss')
plt.plot(history.history['val_loss'], label='val_Loss')
plt.title('Loss evolution')

plt.subplot(2, 2, 2)
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.title('Accuracy evolution')

training performance sign language classification

Once we find the training satisfactory, we will use our trained CNN model to make predictions on the unseen test data.

#Predictions for the test data
predicted_classes = cnn_model.predict_classes(X_test)

The CNN model has predicted the class labels for the test images. These predictions will be visualized through a random plot.

#Visualize predictions
L = 5
W = 5

fig, axes = plt.subplots(L, W, figsize = (12,12))
axes = axes.ravel()

for i in np.arange(0, L * W):  
    axes[i].set_title(f"Prediction Class = {predicted_classes[i]:0.1f}\n True Class = {y_test[i]:0.1f}")

convolutional neural network sign language prediction

As we can see in the above visualization, the CNN model has predicted the correct class labels for almost all the images. Now we will see the full classification report using a normalized and non-normalized confusion matrices. 

from sklearn.metrics import confusion_matrix
from sklearn import metrics
cm = metrics.confusion_matrix(y_test, predicted_classes)

We will define a function to plot the confusion matrix

#Defining function for confusion matrix plot
def plot_confusion_matrix(y_true, y_pred, classes,
    if not title:
        if normalize:
            title = 'Normalized confusion matrix'
            title = 'Confusion matrix, without normalization'

    # Computing confusion matrix
    cm = confusion_matrix(y_true, y_pred)
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
        print('Confusion matrix, without normalization')

# Visualizing
    fig, ax = plt.subplots(figsize=(10, 10))
    im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
    ax.figure.colorbar(im, ax=ax)
    # We want to show all ticks...
           xticklabels=classes, yticklabels=classes,
           ylabel='True label',
           xlabel='Predicted label')

   # Rotating the tick labels and setting their alignment.
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
    # Looping over data dimensions and create text annotations.
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(j, i, format(cm[i, j], fmt),
                    ha="center", va="center",
                    color="white" if cm[i, j] > thresh else "black")
    return ax

Before plotting the confusion matrix, we will specify the class labels.

#Specifying class labels
class_names = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y' ]

#Non-Normalized Confusion Matrix
plot_confusion_matrix(y_test, predicted_classes, classes = class_names, title='Non-Normalized Confusion matrix')

confusion matrix of convolutional neural network in sign language classification

#Normalized Confusion Matrix
plot_confusion_matrix(y_test, predicted_classes, classes = class_names, normalize=True, title='Non-Normalized Confusion matrix')

confusion matrix of convolutional neural network in sign language classification

The CNN model has given 100% accuracy in class label prediction for 12 classes, as we can see in the above figure. Now, we will obtain the average classification accuracy score.

#Classification accuracy
from sklearn.metrics import accuracy_score
acc_score = accuracy_score(y_test, predicted_classes)
print('Accuracy Score = ',acc_score)

accuracy score of convolutional neural network in sign language classification

Here, we can conclude that the Convolutional Neural Network has given an outstanding performance in the classification of sign language symbol images. The average accuracy score of the model is more than 96% and it can further be improved by tuning the hyperparameters. We have trained our model in 50 epochs and the accuracy may be improved if we have more epochs of training. However, more than 96% accuracy is also an achievement.

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
What's Your Reaction?
In Love
Not Sure

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top