The computer vision is being applied in a variety of applications across the domains and thanks to the deep learning that is continuously giving new frameworks to be used in the computer vision space. As of now, there may be more than hundreds of deep learning models that have proven their capabilities in handling millions of images and producing accurate results. Every deep learning model has a specific architecture and is trained in that specific way. Convolutional neural networks are one of the popular deep learning models that have a wide range of applications in the field of computer vision.
There is a variety of Convolutional Neural Network (CNN) architectures. AlexNet is one of the variants of CNN which is also referred to as a Deep Convolutional Neural Network. In this article, we will discuss the architecture and implementation of AlexNet using Keras library without using transfer learning approach. In the end, we will evaluate the performance of this model in classification.
AlexNet is a deep learning model and it is a variant of the convolutional neural network. This model was proposed by Alex Krizhevsky as his research work. His work was supervised by Geoffery E. Hinton, a well-known name in the field of deep learning research. Alex Krizhevsky competed in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC2012) in the year 2012 where he used the AlexNet model and achieved a top-5 error of 15.3%, more than 10.8 percentage points lower than that of the runner up.
Architecture of AlexNet
The AlexNet proposed by Alex Krizhevsky in his work has eight layers including five convolutional layers followed by three fully connected layers. Some of the convolutional layers of the model are followed by max-pooling layers. As an activation function, the ReLU function is used by the network which shows improved performance over sigmoid and tanh functions.
(Source: Original research paper)
The network consists of a kernel or filters with size 11 x 11, 5 x 5, 3 x 3, 3 x 3 and 3 x 3 for its five convolutional layers respectively. The rest of the parameters of the network can be tuned depending on the training performances.
The AlexNet employing the transfer learning which uses weights of the pre-trained network on ImageNet dataset has shown exceptional performance. But in this article, we will not use the pre-trained weights and simply define the CNN according to the proposed architecture.
Implementing in Keras
Here, we will implement the Alexnet in Keras as per the model description given in the research work, Please note that we will not use it a pre-trained model.
This code was implemented in Google Colab and the .py file was downloaded.
# -*- coding: utf-8 -*- """AlexNet.ipynb Automatically generated by Colaboratory. Original file is located at https://colab.research.google.com/drive/14eAKHD0zCHJpw5uxxxxxxxxxxxxx """
In the first step, we will define the AlexNet network using Keras library. The parameters of the network will be kept according to the above descriptions, that is 5 convolutional layers with kernel size 11 x 11, 5 x 5, 3 x 3, 3 x 3 respectively, 3 fully connected layers, ReLU as an activation function at all layers except at the output layer. Since we will test this model in CIFAR10 classification, at the output layer we will define a Dense layer with 10 nodes.
#Importing library import keras from keras.models import Sequential from keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2D from keras.layers.normalization import BatchNormalization import numpy as np np.random.seed(1000) #Instantiation AlexNet = Sequential() #1st Convolutional Layer AlexNet.add(Conv2D(filters=96, input_shape=(32,32,3), kernel_size=(11,11), strides=(4,4), padding='same')) AlexNet.add(BatchNormalization()) AlexNet.add(Activation('relu')) AlexNet.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same')) #2nd Convolutional Layer AlexNet.add(Conv2D(filters=256, kernel_size=(5, 5), strides=(1,1), padding='same')) AlexNet.add(BatchNormalization()) AlexNet.add(Activation('relu')) AlexNet.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same')) #3rd Convolutional Layer AlexNet.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='same')) AlexNet.add(BatchNormalization()) AlexNet.add(Activation('relu')) #4th Convolutional Layer AlexNet.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='same')) AlexNet.add(BatchNormalization()) AlexNet.add(Activation('relu')) #5th Convolutional Layer AlexNet.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same')) AlexNet.add(BatchNormalization()) AlexNet.add(Activation('relu')) AlexNet.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same')) #Passing it to a Fully Connected layer AlexNet.add(Flatten()) # 1st Fully Connected Layer AlexNet.add(Dense(4096, input_shape=(32,32,3,))) AlexNet.add(BatchNormalization()) AlexNet.add(Activation('relu')) # Add Dropout to prevent overfitting AlexNet.add(Dropout(0.4)) #2nd Fully Connected Layer AlexNet.add(Dense(4096)) AlexNet.add(BatchNormalization()) AlexNet.add(Activation('relu')) #Add Dropout AlexNet.add(Dropout(0.4)) #3rd Fully Connected Layer AlexNet.add(Dense(1000)) AlexNet.add(BatchNormalization()) AlexNet.add(Activation('relu')) #Add Dropout AlexNet.add(Dropout(0.4)) #Output Layer AlexNet.add(Dense(10)) AlexNet.add(BatchNormalization()) AlexNet.add(Activation('softmax')) #Model Summary AlexNet.summary()
Once the model is defined, we will compile this model and use Adam as an optimizer. We could use stochastic gradient descent (sgd) as well.
# Compiling the model AlexNet.compile(loss = keras.losses.categorical_crossentropy, optimizer= 'adam', metrics=['accuracy'])
Now, as we are ready with our model, we will check its performance in classification. For the same, we will use the CIFAR10 dataset that is a popular benchmark in image classification. The CIFAR-10 dataset is a publically available image data set provided by the Canadian Institute for Advanced Research (CIFAR). It consists of 60000 32×32 colour images in 10 classes, with 6000 images per class. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 50000 training images and 10000 test images in this dataset.
For more information on the CIFAR10 dataset and its preprocessing for a convolutional neural network, please read my article ‘Transfer Learning for Multi-Class Image Classification Using Deep Convolutional Neural Network’.
#Keras library for CIFAR dataset from keras.datasets import cifar10 (x_train, y_train),(x_test, y_test)=cifar10.load_data() #Train-validation-test split from sklearn.model_selection import train_test_split x_train,x_val,y_train,y_val=train_test_split(x_train,y_train,test_size=.3) #Dimension of the CIFAR10 dataset print((x_train.shape,y_train.shape)) print((x_val.shape,y_val.shape)) print((x_test.shape,y_test.shape)) #Onehot Encoding the labels. from sklearn.utils.multiclass import unique_labels from keras.utils import to_categorical #Since we have 10 classes we should expect the shape of y_train,y_val and y_test to change from 1 to 10 y_train=to_categorical(y_train) y_val=to_categorical(y_val) y_test=to_categorical(y_test) #Verifying the dimension after one hot encoding print((x_train.shape,y_train.shape)) print((x_val.shape,y_val.shape)) print((x_test.shape,y_test.shape)) #Image Data Augmentation from keras.preprocessing.image import ImageDataGenerator train_generator = ImageDataGenerator(rotation_range=2, horizontal_flip=True,zoom_range=.1 ) val_generator = ImageDataGenerator(rotation_range=2, horizontal_flip=True,zoom_range=.1) test_generator = ImageDataGenerator(rotation_range=2, horizontal_flip= True,zoom_range=.1) #Fitting the augmentation defined above to the data train_generator.fit(x_train) val_generator.fit(x_val) test_generator.fit(x_test)
After preprocessing the CIFAR10 dataset, we are ready now to train our defined AlexNet model. We will use the learning rate annealer in this experiment. The learning rate annealer decreases the learning rate after a certain number of epochs if the error rate does not change. Here, through this technique, we will monitor the validation accuracy and if it seems to be a plateau in 3 epochs, it will reduce the learning rate by 0.01.
#Learning Rate Annealer from keras.callbacks import ReduceLROnPlateau lrr= ReduceLROnPlateau( monitor='val_acc', factor=.01, patience=3, min_lr=1e-5)
To train the model, we will define below the number of epochs, the number of batches and the learning rate.
#Defining the parameters batch_size= 100 epochs=100 learn_rate=.001
Now, we will train our defined AlexNet model.
#Training the model AlexNet.fit_generator(train_generator.flow(x_train, y_train, batch_size=batch_size), epochs = epochs, steps_per_epoch = x_train.shape//batch_size, validation_data = val_generator.flow(x_val, y_val, batch_size=batch_size), validation_steps = 250, callbacks = [lrr], verbose=1) #After successful training, we will visualize its performance. import matplotlib.pyplot as plt #Plotting the training and validation loss f,ax=plt.subplots(2,1) #Creates 2 subplots under 1 column #Assigning the first subplot to graph training loss and validation loss ax.plot(AlexNet.history.history['loss'],color='b',label='Training Loss') ax.plot(AlexNet.history.history['val_loss'],color='r',label='Validation Loss') #Plotting the training accuracy and validation accuracy ax.plot(AlexNet.history.history['accuracy'],color='b',label='Training Accuracy') ax.plot(AlexNet.history.history['val_accuracy'],color='r',label='Validation Accuracy') plt.legend()
We will see the classification performance using a non-normalized and a normalized confusion matrices. For this purpose, first, we will define a function through which the confusion matrices will be plotted.
#Defining function for confusion matrix plot def plot_confusion_matrix(y_true, y_pred, classes, normalize=False, title=None, cmap=plt.cm.Blues): if not title: if normalize: title = 'Normalized confusion matrix' else: title = 'Confusion matrix, without normalization' # Compute confusion matrix cm = confusion_matrix(y_true, y_pred) if normalize: cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] print("Normalized confusion matrix") else: print('Confusion matrix, without normalization') #Print Confusion matrix fig, ax = plt.subplots(figsize=(7,7)) im = ax.imshow(cm, interpolation='nearest', cmap=cmap) ax.figure.colorbar(im, ax=ax) # We want to show all ticks... ax.set(xticks=np.arange(cm.shape), yticks=np.arange(cm.shape), xticklabels=classes, yticklabels=classes, title=title, ylabel='True label', xlabel='Predicted label') # Rotate the tick labels and set their alignment. plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor") # Loop over data dimensions and create text annotations. fmt = '.2f' if normalize else 'd' thresh = cm.max() / 2. for i in range(cm.shape): for j in range(cm.shape): ax.text(j, i, format(cm[i, j], fmt), ha="center", va="center", color="white" if cm[i, j] > thresh else "black") fig.tight_layout() return ax np.set_printoptions(precision=2)
In the next step, we will predict the class labels for the test images using the trained AlexNet model.
#Making prediction y_pred=AlexNet.predict_classes(x_test) y_true=np.argmax(y_test,axis=1) #Plotting the confusion matrix from sklearn.metrics import confusion_matrix confusion_mtx=confusion_matrix(y_true,y_pred) class_names=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] # Plotting non-normalized confusion matrix plot_confusion_matrix(y_true, y_pred, classes = class_names,title = 'Confusion matrix, without normalization') # Plotting normalized confusion matrix plot_confusion_matrix(y_true, y_pred, classes=class_names, normalize=True, title='Normalized confusion matrix')
The average accuracy score in classifying the unseen test data will be obtained now.
#Classification accuracy from sklearn.metrics import accuracy_score acc_score = accuracy_score(y_true, y_pred) print('Accuracy Score = ', acc_score)
As we can see above, by analyzing the confusion matrices and the accuracy score, the performance of AlexNet is not very good and the average accuracy score is 64.8%. This is because we did not use the transfer learning approach. Our main purpose in this article was to demonstrate the architecture of the AlexNet model and how it can be defined using Keras library. In the next article, we will use the AlexNet model where transfer learning is applied using the pre-trained weights.