Hand-gesture detection and recognition are one of the hottest topics around the last few decades and many data scientists and researchers were successful in implementing this for the blind-interpreter, augmentation-reality and hand-controlled robots.
In general definition, Gesture is “a movement of a part of a body like hand or head which intends to express an idea or a meaning”. The research on evolution suggests that manual gestures was the first step taken towards the process of communication in human history. And the fact is even the newborns use hand gestures to express their desires which is long before they start speaking. Similarly, gestures can also be used to communicate with machines to express or make any action.
The traditional method used for gesture recognition was only possible with the use of external hardware controllers or it required wired gloves which can register the user’s intentions from hand and arm movements. The Microsoft’s Kinect, introduced in November’10, is one of the best-known examples of such hardware devices and it also set a Guinness World Record for the fastest-selling consumer device when it was launched. But the modern approach tends to highly rely on Deep Learning Algorithms and Computer Vision technologies, and not including any hardware devices.
Collectively, this whole process can be named as AirGesture as you don’t have to touch the screen or your keyboard to communicate with the machines.
The flow of implementation of creating a model is:
- Creating raw data for using webcam or we can use the datasets such as:
- MNIST Dataset – https://www.kaggle.com/datamunge/sign-language-mnist
- 20 BN Jester’s Dataset – https://20bn.com/datasets/jester
- LeapGestRecog Dataset – https://www.kaggle.com/gti-upm/leapgestrecog
- The dataset we are using is further processed. And further data augmentation can also be used to make more images of data and make it more scattered.
- Final model training is divided into 3 parts:
- Create Convolution Network ( ConvNet )
- Apply Pooling to the layers
- Apply Flattening to the output of pooling layers
- Visualization and evaluation of the model
The following is the basic implementation of creating model for predicting hand gestures:
- Data Creation/Using Datasets:
Our data includes the images of the different kinds of hand gestures those have been taken from the webcam or any existing datasets.
The hand gesture recognition dataset is presented, composed by a set of near infrared images acquired by the Leap Motion Sensor. The database is composed of 10 different hand-gestures that were performed by 10 different subjects (5 men and 5 women). And there are total 40000 images in total.
- Data Pre-processing
The images in our dataset are further resized to 50x50x1 binary format and further those images are converted to numpy arrays to make this suitable for tensor processing in training. And if you have less amount of training data then further we can use data augmentation.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import cv2
import os
import pandas as pd
from tensorflow import keras
from keras.models import Sequential
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Dense, Flatten, Dropout
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from tqdm import tqdm
from random import shuffle
from zipfile import ZipFile
from PIL import Image
Code Snippet – 1 : Import all the dependencies/libraries required
lookup = dict()
reverselookup = dict()
count = 0
for j in os.listdir(‘../input/leapgestrecog/leapGestRecog/00/’):
if not j.startswith(‘.’): # If running this code locally, this is to
# ensure you aren’t reading in hidden folders
lookup[j] = count
reverselookup[count] = j
count = count + 1
lookup
Code Snippet – 2: Looking up into the dataset
These are the 10 different types of gestures in the dataset.
x_data = []
y_data = []
IMG_SIZE = 150
datacount = 0
for i in range(0, 10):
for j in os.listdir(‘../input/leapgestrecog/leapGestRecog/0’ + str(i) + ‘/’):
if not j.startswith(‘.’):
count = 0
for k in os.listdir(‘../input/leapgestrecog/leapGestRecog/0’ + str(i) + ‘/’ + j + ‘/’):
path = ‘../input/leapgestrecog/leapGestRecog/0’ + str(i) + ‘/’ + j + ‘/’ + k
img = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
arr = np.array(img)
x_data.append(arr)
count = count + 1
y_values = np.full((count, 1), lookup[j])
y_data.append(y_values)
datacount = datacount + count
x_data = np.array(x_data, dtype = ‘float32’)
y_data = np.array(y_data)
y_data = y_data.reshape(datacount, 1)
Code Snippet – 3 : To manipulate the dataset and use it for further processing
As we have loaded the dataset, now to check what kind of images are there in the dataset the following block of code will let us peek into the dataset.
fig,ax=plt.subplots(5,2)
fig.set_size_inches(15,15)
for i in range(5):
for j in range (2):
l=rn.randint(0,len(y_data))
ax[i,j].imshow(x_data[l])
ax[i,j].set_title(reverselookup[y_data[l,0]])
plt.tight_layout()
Code Snippet – 4 : To peek into the dataset
- Dividing dataset into testing and training sets
In a dataset, a training set is implemented to build up a model, while a testing set is to validate the model built. Value points in the training set are excluded from the test set. Usually, a dataset is divided into a training set, a test set in each iteration, or divided into a training set, a validation set and a test set in each iteration.
The following block code will help us to reshape and normalise the dataset, and divide the dataset for training and testing.
y_data=to_categorical(y_data)
x_data = x_data.reshape((datacount, IMG_SIZE, IMG_SIZE, 1))
x_data = x_data/255
x_train,x_test,y_train,y_test=train_test_split(x_data,y_data,test_size=0.20,random_state=42)
Code Snippet – 5 : Train – Test – Split Dataset
- Training of Deep Learning Model
Start the by reviewing the packages that are being imported and ensure you have all the dependencies installed.
- Create Convolution Network ( ConvNet )
The main purpose of Convolutional is to extract features from the input image and preserve the spatial relationship between pixels by learning image features using small squares of input data.
The two main hyperparameters are:
- filters – Integer value, the dimensionality of the output space
- kernel_size – An integer or tuple/list of 2 integers, specifying the height and width of the 2-D convolution window. Can be a single integer to specify the same value for all spatial distances.
Since every image can be considered as a matrix of pixel values. Consider a 5×5 image whose pixel values are only 1 and 0. The convolution of 5×5 image and 3×3 matrix can be computed as shown below:
- Apply Max Pooling to layers
Max Pooling iis applied, as it reduces the dimensionality of each feature map but retains the most important information.
- Apply Flattening to the output of the layers
While flattening the matrix is converted into a linear array to input it into the nodes of our neural network.
- Following is the implementation of Convolutional Neural Net of the Hand Gesture Recognition model:
model = Sequential()
model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = ‘Same’,activation =’relu’, input_shape = (IMG_SIZE,IMG_SIZE,1)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3, 3), activation=’relu’))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation=’relu’))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation=’relu’))
model.add(Dense(10, activation=’softmax’))
Code Snippet – 6: Convolution Neural Network for Hand Gesture Recognition
Further we need to compile and fit the model.
#Compiling model
model.compile(optimizer=Adam(lr=0.001),loss=’categorical_crossentropy’,metrics=[‘accuracy’])
#Fitting the model
History = model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, verbose=1, validation_data=(x_test, y_test))
Code Snippet – 7: Final evaluation and fitting model
After training the model, we would like to visualize the loss function as well as the accuracy of the model on training data and test data.
plt.plot(History.history[‘loss’])
plt.plot(History.history[‘val_loss’])
plt.title(‘Model Loss’)
plt.ylabel(‘Loss’)
plt.xlabel(‘Epochs’)
plt.legend([‘train’, ‘test’])
plt.show()
Code Snippet – 8: Visualization of Loss function
plt.plot(History.history[‘accuracy’])
plt.plot(History.history[‘val_accuracy’])
plt.title(‘Model Accuracy’)
plt.ylabel(‘Accuracy’)
plt.xlabel(‘Epochs’)
plt.legend([‘train’, ‘test’])
plt.show()
Code Snippet – 9: Visualization of the Accuracy
At last, we would like to validate the gestures with the original images and predicted images.
def validate_gestures(predictions_array, true_label_array, img_array):
class_names = [“down”, “palm”, “l”, “fist”, “fist_moved”, “thumb”, “index”, “ok”, “palm_moved”, “c”]
plt.figure(figsize=(15,5))
for i in range(1, 10):
prediction = predictions_array[i]
true_label = true_label_array[i]
img = img_array[i]
img = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
plt.subplot(3,3,i)
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.imshow(img, cmap=plt.cm.binary)
predicted_label = np.argmax(prediction) # Get index of the predicted label from prediction
if predicted_label == true_label:
color = ‘blue’
else:
color = ‘red’
plt.xlabel(“Predicted: {} {:2.0f}% (Actual: {})”.format(class_names[predicted_label],
100*np.max(prediction),
class_names[true_label]),
color=color)
plt.show()
Code Snippet – 9: Validate Gestures
And finally we will call the function we created to validate the gestures.
validate_gestures(prediction, y_test, X_test)
Code Snippet – 10: Calling the validate_gesture function
Conclusion
In above code we have implemented a Convolution Neural Network which will give us a model specifically trained on multiple hand gestures. And further using that model, we can implement such projects where you can use those hand gestures to communicate with the machines. The photo above shows the implementation of one of such projects which I named as AirGesture because you don’t have to touch the keyboard or screen to play this game ( Battle Tank – 1990 ). Similarly, you can implement it for other games as well such as Mario, Google dino etc.