Last updated January 13, 2021
In AI Mysteries

My first CNN project – Emotion Detection Using Convolutional Neural Network With TPU

Share

Illustration by SSUCv3H4sIAAAAAAAEAJ1X227jNhB9L9B/EPSctHYc23HfEjfObrHpBkhaoChWBSWNJTYUqfKSxFjk3zukZFsilcTJiy2d0QzncjhDfv/xhyiKU6JoFv8Sfbdv+E4ZM0pLoqngCI+OWhxyqoWkhFnQYs9OEitNtFGgrIkWyoiGAr9twFZ/u87fzXu0FTghaqAovgFRM4iPOhJlUiu5IKk1txU8H71i5QKI0ZufV0SVNoYBa9dUZcAY4SBMx2jz8G0bMSmAZ5t9CDY5Sovsvi6FFt1QUNJbZvQ0exrPfxpPA3Tu0C34bSeOcwmkUppW8Jrh8XQxpJzS4jDHloJnUGuDVRwwUwueT19TX8z6r6O9lebhuUMMCQxIQ4zWYHw6WcwWZyfT42kOZ8enMBsdp4sZOR6no8XsNJvMx+vUmWyciu8fNciqy6KCStb1ML4D4Fio6AoFKllBRRguuY/pJRwJtfGUvPKvDe+ttDI8ueT/ik0FXPsfP5ZU9yoXL4nJkPKER5e65DSjepPsnnz1TCAbC89AgyWfKxvfHWSo+5/p7I74WnBtv1iKqhYKHYjct0EgJOtb/mQqdGuFcNI8fgKSBy6VAHJt+tletmByY9ejDxBdVsJ1Ck+7FhJ7CNU97ZsWTAbdhKdaglJN2+mknWTYdKLLnTRZCVlFX9cRhl0ZzCcZcqAkdU05avQj36JJJ423FWWUF0mwUpAS49cYgeSm3GBnQ8VfQWWS1kPeQJulrnKbueScsehPkZHUMCIDZqS2mVG/DhdbNGmaXZfYDkgCEjiS9zPb8L7lwBXwHKSvthGGFx4FKMtbnfOwimnjT+Bs6FG23SAf3zYKK9eP6fBa2jBKIfIwOIuG3lrJexKBC+uhRHSr5qkwqnUQD2HsIIpZwm/eJvvhO7iZxb396yBH2HZSezqC9xW+csAvpUKW/26qFGQk1i+o3tP3ZddZ/ehi2MR7qr+JkJ5MiHvqcf9LgyV/KPyNboGrcGwwYooyUGzBt3mpbME/SgFTCel3eiGD0EpCg8/sFEA4uSshal4vRB4wlGhs4JmlzbuIndP1GqQdm12tzzynDzTHw4jd5F2O4lyTRGkVeF63pPXGSp/JyQuMxl6my5e2+1Ef7DBvL2nPGnJQ+Je1Hi0N00ZC2D5wmEMWzIDlDk7OpSS8gKHDhRa8MEMj/M4JWneuXXg+nR4B+lm/tUi0skHbn27gh06zR+FGRT8QrqGqmZvFQfQDlDlvor4erNS6Pbr3zwEOC2wXklb+AefKYYfNAcLzjTeBEEn2aQq79BqU3ni98csWDUlboAA50ffQYc7Dw04zPB/csTx/Y8cqv/ygkz3rXjinRBmR/fX+CeY88xh5gYC1LOSuJdq/9jKA+xxE54pITE5F51z/gAcgZs9qk45SLWnWdNKtmtClI97upolXVVF1AC60u3G095Y4J+7IFo8n49HJfH56Nlos5ieL0WKCTj7/D0vUGckCDwAA

Published on July 24, 2020

by Rohit Dwivedi

Computer vision (CV) is the field of study that helps computers to study using different techniques and methods so that it can capture what exists in an image or a video. There are a large number of applications of computer vision that are present today like facial recognition, driverless cars, medical diagnostics, etc. We will discuss one of the interesting applications of CV that is Emotion Detection through facial expressions. CV can recognize and tell you what your emotion is by just looking at your facial expressions. It can detect whether you are angry, happy, sad, etc.

The article demonstrates a computer vision model that we will build using Keras and VGG16 – a variant of Convolutional Neural Network. We will use this model to check the emotions in real-time using OpenCV and webcam. We will be working with Google Colab to build the model as it gives us the GPU and TPU. You can use any other IDE as well.

What is there in this article?

Downloading the data
Building the VGG Model for emotion detection
Training of the VGG model efficiently so that it can recognize the emotion
Testing of the model in real-time using webcam

The Dataset

The name of the data set is fer2013 which is an open-source data set that was made publicly available for a Kaggle competition. It contains 48 X 48-pixel grayscale images of the face. There are seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral) present in the data. The CSV file contains two columns that are emotion that contains numeric code from 0-6 and a pixel column that includes a string surrounded in quotes for each image.

Implementing VGG16 Network for Classification of Emotions with GPU

First, we need to enable GPU in the Google Colab to get fast processing. We can enable it by going to ‘Runtime’ in Google Colab and then clicking on ‘Change runtime type’ and select GPU. Once it is enabled we will now import the required libraries for building the network. The code for importing the libraries is given below.

import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import cv2

We have now imported all the libraries and now we will import the data set. I have already saved it in my drive so I will read it from there. You can give directory in round brackets where your data is stored as shown in the code below. After importing we have printed the data frame as shown in the image.

emotion_data = pd.read_csv('/content/drive/My Drive/Emotion_Detection /fer2013.csv')
print('emotion_data')

View of the data frame

We then create different lists of storing the testing and training image pixels. After this, we check if the pixel belongs to training then we append it into the training list & training labels. Similarly, for pixels belonging to the Public test, we append it to testing lists. The code for the same is shown below.

X_train = []
y_train = []
X_test = []
y_test = []
for index, row in emotion_data.iterrows():
    k = row['pixels'].split(" ")
    if row['Usage'] == 'Training':
        X_train.append(np.array(k)
        y_train.append(row['emotion'])
    elif row['Usage'] == 'PublicTest':
        X_test.append(np.array(k))
        y_test.append(row['emotion'])

Once we have added the pixel to the lists then we convert them into NumPy arrays and reshape X_train, X_test. After doing this we convert the training labels and testing labels into categorical ones. The code of the same is given below.

X_train = np.array(X_train')
y_train = np.array(y_train)
X_test = np.array(X_test)
y_test = np.array(y_test)

X_train = X_train.reshape(X_train.shape[0], 48, 48, 1)
X_test = X_test.reshape(X_test.shape[0], 48, 48, 1)

y_train= np_utils.to_categorical(y_train, num_classes=7)
y_test = np_utils.to_categorical(y_test, num_classes=7)

VGG16 Model for Emotion Detection

Now it’s time to design the CNN model for emotion detection with different layers. We start with the initialization of the model followed by batch normalization layer and then different convents layers with ReLu as an activation function, max pool layers, and dropouts to do learning efficiently. You can also change the architecture by initiating the layers of your choices with different numbers of neurons and activation functions.

model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(48,48,1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))


model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(7, activation='softmax'))

After this, we compile the model using Adam as an optimizer, loss as categorical cross-entropy, and metrics as accuracy as shown in the below code.

model.compile(optimizer='sgd',loss='categorical_crossentropy',metrics=['accuracy'])

After compiling the model we then fit the data for training and validation. Here, we are taking the batch size to be 32 with 30 epochs. You can tune them according to your wish.

model.fit(X_train,train_y,batch_size=32,epochs=30,verbose=1,validation_data=(X_test, test_y))

Training of the Network

Once the training has been done we can evaluate the model and compute loss and accuracy using the below code.

loss_and_metrics = model.evaluate(X_test,y_test)
print(loss_and_metrics)

We now serialize the model to JSON and save the model weights in an hd5 file so that we can make use of this file to make predictions rather than training the network again. You can do this task by using the below code.

model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("model.h5")
print("Saved model to disk")

Testing the model in Real-time using OpenCV and WebCam

Now we will test the model that we build for emotion detection in real-time using OpenCV and webcam. To do so we will write a python script. We will use the Jupyter notebook in our local system to make use of a webcam. You can use other IDEs as well. First, we will install a few libraries that are required. Use the below code to import those all.

import os
import cv2
import numpy as np
from keras.models import model_from_json
from keras.preprocessing import image

After importing all the required libraries we will load the model weights that we saved earlier after training. Use the below code to load your saved model. After importing the model weights we have imported a haar cascade file that is designed by open cv to detect the frontal face.

model = model_from_json(open("model.json", "r").read())
model.load_weights('model.h5')
face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

After importing the haar cascade file we will have written a code to detect faces and classify the desired emotions. We have assigned the labels that will be different emotions like angry, happy, sad, surprise, neutral. As soon as you run the code a new window will pop up and your webcam will turn on. It will then detect the face of the person, draw a bounding box over the detected person, and then convert the RGB image into grayscale & classify it in real-time. Please refer to the below code for the same and sample outputs that are shown in the images. To stop the code you need to press ‘b’.

cap=cv2.VideoCapture(0)

while True:
    image=cap.read()

    coverted_image= cv2.cvtColor(test_image, cv2.COLOR_BGR2GRAY)


    faces_detected = face_haar_cascade.detectMultiScale(converted_image)
    for (x,y,w,h) in faces_detected:
    cv2.rectangle(test_image,(x,y), (x+w,y+h), (255,0,0))
    roi_gray=gray_image[y:y+w,x:x+h]
    roi_gray=cv2.resize(roi_gray,(48,48))
    image_pixels = image.image_to_array(roi_gray)
    image_pixels = np.expand_dims(image_pixels, axis = 0)
    image_pixels /= 255


    predictions = model.predict(image_pixels)
    max_index = np.argmax(predictions[0])

    emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
    emotion_prediction = emotion_detection[max_index]


    cv2.putText(test_image, emotion_prediction, (int(x), int(y))
     cv2.putText(frame,label,label_position,cv2.FONT_HERSHEY_SIMPLEX,2,(0,255,0),3)  


    resize_image = cv2.resize(image, (1000, 700))
    cv2.imshow('Emotion',resized_image)
    if cv2.waitKey(10) == ord('b')
        break


cap.release()
cv2.destroyAllWindows

Conclusion

I conclude the article by hoping that you got a fair idea and understood the whole pipeline on how you can make an emotion detection model using pre-trained architecture like VGG16. We initially defined network and trained it so that it is capable of classifying the correct emotion and then we made use of that trained model to classify the emotions in real-time. The model does not give good accuracy and takes much time to train. You can make use of other pre-trained architecture like ResNet and MobileNet to train your network for better performance of the model.

With rapid advancements in Artificial intelligence, the use cases and applications you can build using AI are increasing rapidly. You can always think of what use cases you can build after reading an article like this using the power of CNN. One more such interesting article you can check on “My First Kaggle problem with CNN Model – To Count Fingers And Distinguish Between Left And Right Hand”.

Access all our open Survey & Awards Nomination forms in one place

Rohit Dwivedi

I am currently enrolled in a Post Graduate Program In Artificial Intelligence and Machine learning. Data Science Enthusiast who likes to draw insights from the data. Always amazed with the intelligence of AI. It's really fascinating teaching a machine to see and understand images. Also, the interest gets doubled when the machine can tell you what it just saw. This is where I say I am highly interested in Computer Vision and Natural Language Processing. I love exploring different use cases that can be build with the power of AI. I am the person who first develops something and then explains it to the whole community with my writings.