Convolutional Neural Network (CNN) being computationally strong has the ability to automatically detect the important features without the governance of humans. Also compared to normal neural networks accuracy of CNN models are always high and is considered to be one of the strong architectures when it comes to image classification. CNN models are now capable of doing classification better than humans; it has surpassed human ability for classifying an image.
This article talks about the Kaggle problem that is about predicting a finger count and also distinguishing between left and right hand. We will build a CNN model to classify the finger count and also distinguish the hand. We will directly import the data from Kaggle and will be using Google Colab for implementing the same so to get benefited from GPU and TPU that are provided by Google Colab. You can make use of Jupyter notebook or any other IDE as well for building the neural network.
What is in this article?
- Downloading dataset from Kaggle
- CNN Model for Finger Count Classification
- Training the CNN model in this task
- Obtaining the accuracy
In the dataset that we will download from Kaggle, we have 21,600 images of left and right-hand fingers. All the images are 128 by 128 pixels. We have 18,000 images in the training set and 3600 images in the testing set. Labels are in 2 last characters of a file name. L/R indicates left/right hand; 0,1,2,3,4,5 indicates number of fingers.
Implementing CNN For Finger Count Classification With GPU
First, we need to enable the GPU. To do so go to ‘Runtime’ in Google Colab and then click on ‘Change runtime type’ and select GPU as shown in the below image.
Once you enable the runtime as GPU you can go and hover over where ram and disk usage is shown and check if the GPU is enabled or not. If the GPU is working you will see ‘Connected to Python 3 Google Compute Engine backend(GPU)*’ as shown in the below image.
Downloading Dataset from Kaggle
Once it is enabled, we will now proceed further by installing the dependencies. As we will import data directly from Kaggle we need to install the package that supports that. So we have installed the Kaggle package as shown below.
!pip install Kaggle
Once you have installed the package we need to import all the necessary libraries that are required. Use the below code for the same.
import tensorflow as tf from zipfile import ZipFile import os,glob from skimage.io import imread from skimage.transform import resize import matplotlib.pyplot as plt import random import warnings from scipy import ndarray import skimage as sk from skimage import transform from skimage import util from skimage import io from sklearn import metrics from tqdm._tqdm_notebook import tqdm_notebook as tqdm import numpy as np from keras.models import Sequential from keras.layers import Convolution2D, Dropout, Dense from keras.layers import BatchNormalization from keras.layers import MaxPooling2D from keras.layers import Flatten from keras.optimizers import adam from keras.optimizers import sgd from keras.layers import LeakyReLU from numpy import asarray
After we have imported the libraries lets import data from Kaggle. To do so we need to first add a kaggle.json file which you will get by creating a new API token on Kaggle. Go to my account in Kaggle and scroll down you would see an option for creating a new API. Once you click on that a file ‘kaggle.json’ will be downloaded. Once you have that file upload it and change the permissions using the code shown below.
from google.colab import files files.upload() !mkdir -p ~/.kaggle !cp kaggle.json ~/.kaggle/ !chmod 600 ~/.kaggle/kaggle.json
Once we are done with this now we will import the dataset directly into Google Colab using the following API to extract data that is given in Kaggle challenge.
!kaggle datasets download -d koryakinp/fingers
Once you run the above command the zip file of the data would be downloaded. We now need to unzip the file using the below code.
from zipfile import ZipFile file_name = "fingers.zip" with ZipFile(file_name,'r') as zip: zip.extractall() print('Done')
After we are done unzipping the data file. We will create two lists to store the training image and corresponding training labels. After creating the list we have read the training data and training labels in the respective list using the below code.
X_train= y_train= os.chdir('/content/train') for i in tqdm(os.listdir()): img = cv2.imread(i) img = cv2.resize(img,(128,128)) X_train.append(img) y_train.append(i[-6:-4])
Let’s visualize a few of the training images with their respective labels using the code shown below in the image. Use the below code to visualize the 10 training samples with their labels as shown in the image.
%matplotlib inline import matplotlib.pyplot as plt plt.figure(figsize=(10, 1)) for i in range(10): plt.subplot(1, 10, i+1) plt.imshow(X_train[i], cmap="gray") plt.axis('off') plt.show() print('label for each of the above image: %s' % (y_train[0:10]))
We again create two other lists to store the testing images and corresponding testing labels. And again we read the images and their label in the respective lists.
X_test =  y_test =  os.chdir('/content/test') for i in tqdm(os.listdir()): img = cv2.imread(i) img = cv2.resize(img,(128,128)) X_test.append(img) y_test.append(i[-6:-5])
We have then checked the shape of training and testing images that comes out to be 128 * 128 * 3 and checked the total label in y_train and y_test which were 12 each.
After this, we transform the labels using LabelEncoder and then convert them to be categorical having 12 classes and transform them into arrays. You can refer to the below image for the same.
print ("Shape of an image in X_train: ", X_train.shape) print ("Shape of an image in X_test: ", X_test.shape) print("Total categories: ", len(np.unique(y_train))) print("Total categories: ", len(np.unique(y_test)))
le = preprocessing.LabelEncoder() y_train = le.fit_transform(y_train) y_test = le.fit_transform(y_test) y_train = tf.keras.utils.to_categorical(y_train, num_classes=12) y_test = tf.keras.utils.to_categorical(y_test, num_classes=12) y_train = np.array(y_train) X_train = np.array(X_train) y_test = np.array(y_test) X_tese = np.array(X_test)
Once it is done we have then finally checked the shape of training and testing images and their labels as shown in the below image.
print("X_train Shape: ", X_train.shape) print("X_test Shape: ", X_test.shape) print("y_train Shape: ", y_train.shape) print("y_test Shape: ", y_test.shape)
CNN Model for Finger Count Classification
We have then initiated the model to be sequential and defined batch normalization layer with 4 convolution and 4 maxpool layers with activation function as relu followed by flatten layer and fully connected layer. Last fully connected layer has 12 as output class and activation function as softmax.
m1=Sequential() m1.add(BatchNormalization(input_shape = (128,128,3))) m1.add(Convolution2D(32, (3,3), activation ='relu', input_shape = (128, 128, 3))) m1.add(MaxPooling2D(pool_size=2)) m1.add(Convolution2D(filters=6,kernel_size=4,padding='same',activation=relu')) m1.add(MaxPooling2D(pool_size=2)) m1.add(Convolution2D(filters=128,kernel_size=3,padding='same',activation='relu')) m1.add(MaxPooling2D(pool_size=2)) m1.add(Convolution2D(filters=128,kernel_size=2,padding='same',activation='relu')) m1.add(MaxPooling2D(pool_size=2)) m1.add(Flatten()) m1.add(Dense(units=128,activation = 'relu')) m1.add(Dense(units = 64, activation = 'relu')) m1.add(Dense(units = 32, activation = 'relu')) m1.add(Dense(units = 12, activation = 'softmax'))
After this, we compile the model using adam as an optimizer, loss as categorical cross-entropy and metrics as accuracy as shown below.
m1.compile(optimizer='adam', loss = 'categorical_crossentropy',metrics = ['accuracy'])
Training the CNN Model
Once the model is compiled we then fit our training and validation data on the model and start the training process. We have assigned epochs to be 30. The code for the training is shown below in the image. The process iterates upto 30 epochs and 4 things are computed that are training accuracy, loss, validation loss and validation accuracy.
model = m1.fit(X_train,y_train, epochs=30, validation_data=(X_test,y_test), verbose = 1, initial_epoch=0)
Obtaining the Accuracy
Once the training has been done we can evaluate the model and compute loss and accuracy using the below code.
loss_and_metrics = m1.evaluate(X_test,y_test) print(loss_and_metrics)
Predictions on Finger Counting
We can now use this model to randomly check the prediction and actual label of the few images. Use the below to do the same. We generate a random number to pick an image and then make a prediction of that image using the model. You can compare it with the actual label of the image. For both the randomly selected image the model has predicted correctly.
predicted_classes = m1.predict(X_train[:,:,:,:]) predicted_classes = np.argmax(np.round(predicted_classes),axis=1) predicted_classes import numpy as np k=X_train.shape r=np.random.randint(k) r 15614 print("Prediction:",predicted_classes[r]) print("\nActuals: ",y_train[r]) k=X_train.shape r=np.random.randint(k) r 14487 print("Prediction:",predicted_classes[r]) print("\nActuals: ",y_train[r])
You can always tune the parameters for improving the model performance like epochs and optimizers. Also for better accuracy, you can also use pre-trained architectures like ResNet and VGG16 that are trained on Image Net dataset.
Read here one such article where you can learn more about implementing such architectures “Hands-on Guide To Implement ResNet50 in PyTorch with TPU” Also, we can make use of such finger count classification models to count finger counts in real-time using webcam and OpenCV.
I conclude the article by stating that these models can be used in various different applications like operating coffee machines in offices with finger counts and in-home automation to operate appliances using finger counts so as to remain contactless in this COVID time. This model can be applied to a variety of applications. I hope you have understood how to construct this type of finger counting model using CNN. Now you can explore the different use cases where you can use this type of models.
If you loved this story, do join our Telegram Community.
Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
I am currently enrolled in a Post Graduate Program In Artificial Intelligence and Machine learning. Data Science Enthusiast who likes to draw insights from the data. Always amazed with the intelligence of AI. It's really fascinating teaching a machine to see and understand images. Also, the interest gets doubled when the machine can tell you what it just saw. This is where I say I am highly interested in Computer Vision and Natural Language Processing. I love exploring different use cases that can be build with the power of AI. I am the person who first develops something and then explains it to the whole community with my writings.