Now Reading
My First Kaggle problem with CNN Model – To Count Fingers And Distinguish Between Left And Right Hand?

My First Kaggle problem with CNN Model – To Count Fingers And Distinguish Between Left And Right Hand?

Rohit Dwivedi
Finger Counting
W3Schools

Convolutional Neural Network (CNN) being computationally strong has the ability to automatically detect the important features without the governance of humans. Also compared to normal neural networks accuracy of CNN models are always high and is considered to be one of the strong architectures when it comes to image classification. CNN models are now capable of doing classification better than humans; it has surpassed human ability for classifying an image.

This article talks about the Kaggle problem that is about predicting a finger count and also distinguishing between left and right hand. We will build a CNN model to classify the finger count and also distinguish the hand. We will directly import the data from Kaggle and will be using Google Colab for implementing the same so to get benefited from GPU and TPU that are provided by Google Colab. You can make use of Jupyter notebook or any other IDE as well for building the neural network.

What is in this article?

  • Downloading dataset from Kaggle
  • CNN Model for Finger Count Classification
  • Training the CNN model in this task
  • Obtaining the accuracy

The Dataset

In the dataset that we will download from Kaggle, we have 21,600 images of left and right-hand fingers. All the images are 128 by 128 pixels. We have 18,000 images in the training set and 3600 images in the testing set. Labels are in 2 last characters of a file name. L/R indicates left/right hand; 0,1,2,3,4,5 indicates number of fingers.



Implementing CNN For Finger Count Classification With GPU

First, we need to enable the GPU. To do so go to ‘Runtime’ in Google Colab and then click on ‘Change runtime type’ and select GPU as shown in the below image.

Once you enable the runtime as GPU you can go and hover over where ram and disk usage is shown and check if the GPU is enabled or not. If the GPU is working you will see ‘Connected to Python 3 Google Compute Engine backend(GPU)*’ as shown in the below image.

Downloading Dataset from Kaggle

Once it is enabled, we will now proceed further by installing the dependencies. As we will import data directly from Kaggle we need to install the package that supports that. So we have installed the Kaggle package as shown below.

!pip install Kaggle

Once you have installed the package we need to import all the necessary libraries that are required. Use the below code for the same.

import tensorflow as tf
from zipfile import ZipFile
import os,glob
from skimage.io import imread
from skimage.transform import resize 
import matplotlib.pyplot as plt
import random
import warnings
from scipy import ndarray
import skimage as sk
from skimage import transform
from skimage import util
from skimage import io
from sklearn import metrics
from tqdm._tqdm_notebook import tqdm_notebook as tqdm
import numpy as np
from keras.models import Sequential
from keras.layers import Convolution2D, Dropout, Dense
from keras.layers import BatchNormalization
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.optimizers import adam
from keras.optimizers import sgd
from keras.layers import LeakyReLU
from numpy import asarray

After we have imported the libraries lets import data from Kaggle. To do so we need to first add a kaggle.json file which you will get by creating a new API token on Kaggle. Go to my account in Kaggle and scroll down you would see an option for creating a new API. Once you click on that a file ‘kaggle.json’ will be downloaded. Once you have that file upload it and change the permissions using the code shown below.

from google.colab import files
files.upload()

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/

!chmod 600 ~/.kaggle/kaggle.json 

Once we are done with this now we will import the dataset directly into Google Colab using the following API to extract data that is given in Kaggle challenge.  

!kaggle datasets download -d koryakinp/fingers

Once you run the above command the zip file of the data would be downloaded. We now need to unzip the file using the below code. 

from zipfile import ZipFile
file_name = "fingers.zip"

with ZipFile(file_name,'r') as zip:
  zip.extractall()
  print('Done')

After we are done unzipping the data file. We will create two lists to store the training image and corresponding training labels. After creating the list we have read the training data and training labels in the respective list using the below code.

X_train=[]
y_train=[]
os.chdir('/content/train')
for i in tqdm(os.listdir()):
      img = cv2.imread(i)   
      img = cv2.resize(img,(128,128))
      X_train.append(img)
      y_train.append(i[-6:-4]) 

Let’s visualize a few of the training images with their respective labels using the code shown below in the image. Use the below code to visualize the 10 training samples with their labels as shown in the image.

%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 1))
for i in range(10):
    plt.subplot(1, 10, i+1)
    plt.imshow(X_train[i], cmap="gray")
    plt.axis('off')
plt.show()
print('label for each of the above image: %s' % (y_train[0:10]))

Finger Counting



We again create two other lists to store the testing images and corresponding testing labels. And again we read the images and their label in the respective lists.

X_test = []
y_test = []
os.chdir('/content/test')
for i in tqdm(os.listdir()):
      img = cv2.imread(i)   
      img = cv2.resize(img,(128,128))
      X_test.append(img)
      y_test.append(i[-6:-5]) 

We have then checked the shape of training and testing images that comes out to be 128 * 128 * 3 and checked the total label in y_train and y_test which were 12 each.

After this, we transform the labels using LabelEncoder and then convert them to be categorical having 12 classes and transform them into arrays. You can refer to the below image for the same.

print ("Shape of an image in X_train: ", X_train[0].shape)
print ("Shape of an image in X_test: ", X_test[0].shape)







print("Total categories: ", len(np.unique(y_train)))
print("Total categories: ", len(np.unique(y_test)))





le = preprocessing.LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.fit_transform(y_test)

y_train = tf.keras.utils.to_categorical(y_train, num_classes=12)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=12)

y_train = np.array(y_train)
X_train = np.array(X_train)

y_test = np.array(y_test)
X_tese = np.array(X_test)

Once it is done we have then finally checked the shape of training and testing images and their labels as shown in the below image.

See Also
HarDNet in PyTorch

print("X_train Shape: ", X_train.shape)
print("X_test Shape: ", X_test.shape)
print("y_train Shape: ", y_train.shape)
print("y_test Shape: ", y_test.shape)






CNN Model for Finger Count Classification

We have then initiated the model to be sequential and defined batch normalization layer with 4 convolution and 4 maxpool layers with activation function as relu followed by flatten layer and fully connected layer. Last fully connected layer has 12 as output class and activation function as softmax.

m1=Sequential()
m1.add(BatchNormalization(input_shape = (128,128,3)))
m1.add(Convolution2D(32, (3,3), activation ='relu', input_shape = (128, 128, 3))) 
m1.add(MaxPooling2D(pool_size=2))
m1.add(Convolution2D(filters=6,kernel_size=4,padding='same',activation=relu'))
m1.add(MaxPooling2D(pool_size=2))
m1.add(Convolution2D(filters=128,kernel_size=3,padding='same',activation='relu'))
m1.add(MaxPooling2D(pool_size=2))
m1.add(Convolution2D(filters=128,kernel_size=2,padding='same',activation='relu'))
m1.add(MaxPooling2D(pool_size=2))
m1.add(Flatten()) 
m1.add(Dense(units=128,activation = 'relu'))
m1.add(Dense(units = 64, activation = 'relu'))
m1.add(Dense(units = 32, activation = 'relu'))
m1.add(Dense(units = 12, activation = 'softmax'))

After this, we compile the model using adam as an optimizer, loss as categorical cross-entropy and metrics as accuracy as shown below.

m1.compile(optimizer='adam', loss = 'categorical_crossentropy',metrics = ['accuracy'])

Training the CNN Model

Once the model is compiled we then fit our training and validation data on the model and start the training process. We have assigned epochs to be 30. The code for the training is shown below in the image. The process iterates upto 30 epochs and 4 things are computed that are training accuracy, loss, validation loss and validation accuracy.

model = m1.fit(X_train,y_train,
                    epochs=30
                    validation_data=(X_test,y_test),
                    verbose = 1,
                    initial_epoch=0)
Finger Counting 








Obtaining the Accuracy

Once the training has been done we can evaluate the model and compute loss and accuracy using the below code. 

loss_and_metrics = m1.evaluate(X_test,y_test)
print(loss_and_metrics)





Predictions on Finger Counting

We can now use this model to randomly check the prediction and actual label of the few images. Use the below to do the same. We generate a random number to pick an image and then make a prediction of that image using the model. You can compare it with the actual label of the image. For both the randomly selected image the model has predicted correctly.

predicted_classes = m1.predict(X_train[:,:,:,:])
predicted_classes = np.argmax(np.round(predicted_classes),axis=1)
predicted_classes[0]


import numpy as np
k=X_train.shape[0]
r=np.random.randint(k)
r
15614


print("Prediction:",predicted_classes[r])
print("\nActuals:   ",y_train[r])

Finger Counting




k=X_train.shape[0]
r=np.random.randint(k)
r
14487


print("Prediction:",predicted_classes[r])
print("\nActuals:   ",y_train[r])

Finger Counting




You can always tune the parameters for improving the model performance like epochs and optimizers. Also for better accuracy, you can also use pre-trained architectures like ResNet and VGG16 that are trained on Image Net dataset. 

Read here one such article where you can learn more about implementing such architectures “Hands-on Guide To Implement ResNet50 in PyTorch with TPU” Also, we can make use of such finger count classification models to count finger counts in real-time using webcam and OpenCV. 

Conclusion

I conclude the article by stating that these models can be used in various different applications like operating coffee machines in offices with finger counts and in-home automation to operate appliances using finger counts so as to remain contactless in this COVID time. This model can be applied to a variety of applications. I hope you have understood how to construct this type of finger counting model using CNN. Now you can explore the different use cases where you can use this type of models. 

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top