There are a variety of convolutional neural networks and all have their own advantage. With the varying architectures, these models have shown an overwhelming performance in a number of computer vision applications. EfficientNet is one of these variants of the Convolutional Neural Network.
In this article, we will discuss the EfficientNet model with its implementation. First, we will discuss its architecture and working then we will implement this model as a transfer learning framework in classifying CIFAR-10 images. Finally, we will evaluate its performance and compare it with other popular transfer learning models.
EfficientNet
EfficientNet model was proposed by Mingxing Tan and Quoc V. Le of Google Research, Brain team in their research paper ‘EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks’. This paper was presented in the International Conference on Machine Learning, 2019. These researchers studied the model scaling and identified that carefully balancing the depth, width, and resolution of the network can lead to better performance.
Based on this observation, they proposed a new scaling method that uniformly scales all dimensions of depth, width and resolution of the network. They used the neural architecture search to design a new baseline network and scaled it up to obtain a family of deep learning models, called EfficientNets, which achieve much better accuracy and efficiency as compared to the previous Convolutional Neural Networks.
Scaling
The researchers used the compound scaling method to scale the dimensions of the network. The applied grid search strategy to find the relationship between the different scaling dimensions of the baseline network under a fixed resource constraint. Using this strategy, the could find the appropriate scaling coefficients for each of the dimensions to be scaled-up. Using these coefficients, the baseline network was scaled by the desired size.
(Image Source: Original Research Paper)
The researchers claimed in their work that this compound scaling method improved the model’s accuracy and efficiency.
EfficientNet Architecture
The researchers first designed a baseline network by performing the neural architecture search, a technique for automating the design of neural networks. It optimizes both the accuracy and efficiency as measured on the floating-point operations per second (FLOPS) basis. This developed architecture uses the mobile inverted bottleneck convolution (MBConv). The researchers then scaled up this baseline network to obtain a family of deep learning models, called EfficientNets. Its architecture is given in the below diagram.
(Image Source: Original Research Paper)
They also presented a comparison of EfficientNet’s performance with other powerful transfer learning models when worked on ImageNet dataset. It has been shown that the latest version of EfficientNet that is EfficientNet-B7 has the highest accuracy among all with less number of parameters.
(Image Source: Google AI Blog)
Implementing EfficientNet
In this experiment, we will implement the EfficientNet on multi-class image classification on the CIFAR-10 dataset. To implement it as a transfer learning model, we have used the EfficientNet-B5 version as B6 and B7 does not support the ImageNet weights when using Keras. The CIFAR-10 dataset is a publically available image data set provided by the Canadian Institute for Advanced Research (CIFAR). It consists of 60000 32×32 colour images in 10 classes, with 6000 images per class. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 50000 training images and 10000 test images in this dataset. For more information on the CIFAR10 dataset and its preprocessing for a convolutional neural network, please read my article ‘Transfer Learning for Multi-Class Image Classification Using Deep Convolutional Neural Network’.
In the first step, we will download the data set and import the required libraries.
#Keras library for CIFAR dataset from keras.datasets import cifar10 #Downloading the CIFAR dataset (x_train,y_train),(x_test,y_test)=cifar10.load_data()
#importing other required libraries import numpy as np import pandas as pd from sklearn.utils.multiclass import unique_labels import os import matplotlib.pyplot as plt import matplotlib.image as mpimg import seaborn as sns import itertools from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix from keras import Sequential from keras.preprocessing.image import ImageDataGenerator from keras.optimizers import SGD,Adam from keras.callbacks import ReduceLROnPlateau from keras.layers import Flatten,Dense,BatchNormalization,Activation,Dropout from keras.utils import to_categorical
After importing the libraries, we will download the dataset and preprocess it as we have done in the previous articles.
#Train-validation-test split from sklearn.model_selection import train_test_split x_train,x_val,y_train,y_val=train_test_split(x_train,y_train,test_size=.3) #Dimension of the CIFAR10 dataset print((x_train.shape,y_train.shape)) print((x_val.shape,y_val.shape)) print((x_test.shape,y_test.shape))#Onehot Encoding the labels. from sklearn.utils.multiclass import unique_labels from keras.utils import to_categorical #Since we have 10 classes we should expect the shape[1] of y_train,y_val and y_test to change from 1 to 10 y_train=to_categorical(y_train) y_val=to_categorical(y_val) y_test=to_categorical(y_test) #Verifying the dimension after one hot encoding print((x_train.shape,y_train.shape)) print((x_val.shape,y_val.shape)) print((x_test.shape,y_test.shape))
#Image Data Augmentation from keras.preprocessing.image import ImageDataGenerator train_generator = ImageDataGenerator(rotation_range=2, horizontal_flip=True,zoom_range=.1 ) val_generator = ImageDataGenerator(rotation_range=2, horizontal_flip=True,zoom_range=.1) test_generator = ImageDataGenerator(rotation_range=2, horizontal_flip= True,zoom_range=.1) #Fitting the augmentation defined above to the data train_generator.fit(x_train) val_generator.fit(x_val) test_generator.fit(x_test)
We will use the learning rate annealer in this experiment. The learning rate annealer decreases the learning rate after a certain number of epochs if the error rate does not change. Here, through this technique, we will monitor the validation accuracy and if it seems to be a plateau in 3 epochs, it will reduce the learning rate by 0.01.
#Learning Rate Annealer from keras.callbacks import ReduceLROnPlateau lrr= ReduceLROnPlateau( monitor='val_acc', factor=.01, patience=3, min_lr=1e-5)
In the next step, we need to install the efficient net and import it using the following way.
!pip install keras_efficientnets from keras_efficientnets import EfficientNetB5
Here, we will define the EfficientNet-B5 using the following code snippets.
#Defining the model base_model = EfficientNetB5(include_top=False, weights="imagenet", input_shape=(32,32,3),classes=y_train.shape[1]) #Adding the final layers to the above base models where the actual classification is done in the dense layers model= Sequential() model.add(base_model) model.add(Flatten()) #Model summary model.summary()#Adding the Dense layers along with activation and batch normalization model.add(Dense(1024,activation=('relu'),input_dim=512)) model.add(Dense(512,activation=('relu'))) model.add(Dense(256,activation=('relu'))) #model.add(Dropout(.3)) model.add(Dense(128,activation=('relu'))) #model.add(Dropout(.2)) model.add(Dense(10,activation=('softmax'))) #Checking the final model summary model.summary()
![]()
To train the model, we will define below the number of epochs, the number of batches and the learning rate.
#Defining the parameters batch_size= 100 epochs=50 learn_rate=.001
We will define the Stochastic Gradient Descent as the optimizer.
sgd=SGD(lr=learn_rate,momentum=.9,nesterov=False)
We will compile and train the model
#Compiling the model model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) #Training the model model.fit_generator(train_generator.flow(x_train, y_train, batch_size = batch_size), epochs = epochs, steps_per_epoch = x_train.shape[0]//batch_size, validation_data = val_generator.flow(x_val, y_val, batch_size = batch_size), validation_steps = 250, callbacks = [lrr], verbose = 1)![]()
![]()
After successful training, we will visualize its performance.
import matplotlib.pyplot as plt #Plotting the training and validation loss f,ax=plt.subplots(2,1) #Creates 2 subplots under 1 column #Assigning the first subplot to graph training loss and validation loss ax[0].plot(model.history.history['loss'],color='b',label='Training Loss') ax[0].plot(model.history.history['val_loss'],color='r',label='Validation Loss') #Plotting the training accuracy and validation accuracy ax[1].plot(model.history.history['accuracy'],color='b',label='Training Accuracy') ax[1].plot(model.history.history['val_accuracy'],color='r',label='Validation Accuracy') plt.legend()![]()
We will see the classification performance using a non-normalized and a normalized confusion matrices. For this purpose, first, we will define a function through which the confusion matrices will be plotted.
#Defining function for confusion matrix plot def plot_confusion_matrix(y_true, y_pred, classes, normalize=False, title=None, cmap=plt.cm.Blues): if not title: if normalize: title = 'Normalized confusion matrix' else: title = 'Confusion matrix, without normalization' # Compute confusion matrix cm = confusion_matrix(y_true, y_pred) if normalize: cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] print("Normalized confusion matrix") else: print('Confusion matrix, without normalization') #Print Confusion matrix fig, ax = plt.subplots(figsize=(7,7)) im = ax.imshow(cm, interpolation='nearest', cmap=cmap) ax.figure.colorbar(im, ax=ax) # We want to show all ticks... ax.set(xticks=np.arange(cm.shape[1]), yticks=np.arange(cm.shape[0]), xticklabels=classes, yticklabels=classes, title=title, ylabel='True label', xlabel='Predicted label') # Rotate the tick labels and set their alignment. plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor") # Loop over data dimensions and create text annotations. fmt = '.2f' if normalize else 'd' thresh = cm.max() / 2. for i in range(cm.shape[0]): for j in range(cm.shape[1]): ax.text(j, i, format(cm[i, j], fmt), ha="center", va="center", color="white" if cm[i, j] > thresh else "black") fig.tight_layout() return ax np.set_printoptions(precision=2)
In the next step, we will predict the class labels for the test images using the trained EfficientNet model.
#Making prediction y_pred=model.predict_classes(x_test) y_true=np.argmax(y_test,axis=1) #Plotting the confusion matrix from sklearn.metrics import confusion_matrix confusion_mtx=confusion_matrix(y_true,y_pred) class_names=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] # Plotting non-normalized confusion matrix plot_confusion_matrix(y_true, y_pred, classes = class_names,title = 'Confusion matrix, without normalization')# Plotting normalized confusion matrix plot_confusion_matrix(y_true, y_pred, classes=class_names, normalize=True, title='Normalized confusion matrix')
#The average accuracy score in classifying the unseen test data will be obtained now. #Classification accuracy from sklearn.metrics import accuracy_score acc_score = accuracy_score(y_true, y_pred) print('Accuracy Score = ', acc_score)
![]()
As we can see above, by analyzing the confusion matrices and the accuracy score, the performance of EfficientNet-B5 is satisfactory and the average accuracy score is 78.39%. This accuracy can be improved further running the training for more number of the epoch, say 100 or 200 as we can see that the accuracy was getting improved during the training. However, we discuss the architecture and implementation of EfficientNet through this article and it will help anyone to use this model in a similar application with some hyperparameter tuning.