Convolutional Neural Networks have proven their advantage as a deep learning model in a variety of applications. When handling the large data sets to extract features and make predictions, the CNN models have always shown their competency. In the majority of the applications, one individual CNN model is applied. Now, there is always a scope to use a group of CNN models in the same tasks as an ensemble learning approach. In one of our articles, we discussed the customization of ensemble learning models and have seen their increased efficiency. Now, let us tune this ensembling approach with CNN models. If successful, it can be applied to the tasks where CNN models have given a low accuracy as per expectations.
In this article, we will create an ensemble of convolutional neural networks. In this experiment, we will create an ensemble of 10 CNN models and this ensemble will be applied in multi-class prediction of MNIST handwritten digit data set. Initially, we will define the individual CNN models and train them in a sequence. On the test data, every individual model will give its prediction and the final prediction of the ensemble model will be the most frequent prediction by all the individual CNN models. The same strategy is applied in creating ensembles by MaxVoting for classification.
Implementation
This code was implemented in Google Colab and the .py file was downloaded.
# -*- coding: utf-8 -*- """CNN_Ensemble.ipynb Automatically generated by Colaboratory. Original file is located at https://colab.research.google.com/drive/14eAKHD0zCHJpw5u8rRWazOxThuXByD0v """
First, we need to import the required libraries.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.image as mpimg import seaborn as sns from sklearn.metrics import confusion_matrix, mean_squared_error from sklearn.model_selection import train_test_split import itertools import math from sklearn.model_selection import train_test_split, KFold from keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D from keras.utils.np_utils import to_categorical # convert to one-hot-encoding from keras.layers import BatchNormalization from keras.optimizers import Adam, RMSprop, Adagrad
After importing the required libraries, we will read the MNIST handwritten data set that is provided publicly in Google Colab as sample data.
#Reading the data train = pd.read_csv("sample_data/mnist_train_small.csv") test = pd.read_csv("sample_data/mnist_test.csv")
Now, we need to specify the training and test sets. It will be done using the below lines of codes. First, we will check the header and then we will identify the required columns
#Training data head train.head()#Specifying train and test data train_X = train.iloc[:,1:] train_y = train.iloc[:,0] test = test.iloc[:,1:] #Shape of the specified data print(train_X.shape) print(train_y.shape) print(test.shape)
![]()
Now, we will normalize the training and test data
#Normalize the data train_X = train_X / 255.0 test = test / 255.0
For compatibility with the CNN model, we need to reshape the data.
#Reshape image in 3 dimensions (with 1 channel) train_X = train_X.values.reshape(-1,28,28,1) test = test.values.reshape(-1,28,28,1)
Since the output will be the 10 classes, we need to encode the labels of the data set.
#Encode labels to one hot vectors train_y = to_categorical(train_y, num_classes = 10)
We will check one random image from the training data set.
#Sample image plt.imshow(train_X[0][:,:,0])![]()
Ensemble Of Convolutional Neural Networks
In the next step, we will define 10 CNN models compatible with our data set.
# Define 10 CNN models from keras.optimizers import RMSprop, Adam from keras.layers import DepthwiseConv2D, Reshape, Activation nets = 10 model = [0] *nets for j in range(nets): model[j] = Sequential() #First Layer model[j].add(Conv2D(32, kernel_size = 3, activation='relu', input_shape = (28, 28, 1))) model[j].add(BatchNormalization()) model[j].add(Conv2D(32, kernel_size = 3, activation='relu')) model[j].add(BatchNormalization()) model[j].add(Conv2D(32, kernel_size = 5, strides=2, padding='same', activation='relu')) model[j].add(BatchNormalization()) model[j].add(Dropout(0.4)) #Second Layer model[j].add(Conv2D(64, kernel_size = 3, activation='relu')) model[j].add(BatchNormalization()) model[j].add(Conv2D(64, kernel_size = 3, activation='relu')) model[j].add(BatchNormalization()) model[j].add(Conv2D(64, kernel_size = 5, strides=2, padding='same', activation='relu')) model[j].add(BatchNormalization()) model[j].add(Dropout(0.4)) #Third layer model[j].add(Conv2D(128, kernel_size = 4, activation='relu')) model[j].add(BatchNormalization()) model[j].add(Flatten()) model[j].add(Dropout(0.4)) #Output layer model[j].add(Dense(10, activation='softmax')) # Compile each model model[j].compile(optimizer='adam', loss="categorical_crossentropy", metrics=["accuracy"]) print('All Models Defined')![]()
We will use the learning rate annealer in this experiment. The learning rate annealer decreases the learning rate after a certain number of epochs if the error rate does not change. Here, through this technique, we will monitor the validation accuracy and if it seems to be a plateau in 3 epochs, it will reduce the learning rate.
#LR Reduction Callback from keras.callbacks import ReduceLROnPlateau learning_rate_reduction=ReduceLROnPlateau(monitor='val_accuracy', patience=3, verbose=0, factor=0.5, min_lr=0.00001)
In the next step, we will train the models that we have defined above.
# train for 20 epochs history = [0] * nets epochs = 20 datagen = ImageDataGenerator(rotation_range=13, zoom_range=0.11, width_shift_range=0.1, height_shift_range=0.1) datagen.fit(train_X) for j in range(nets): print(f'Individual Net : {j+1}') X_train2, X_val2, Y_train2, Y_val2 = train_test_split(train_X, train_y, test_size = 0.1) history[j] = model[j].fit_generator(datagen.flow(X_train2,Y_train2, batch_size=64), epochs = epochs, steps_per_epoch = X_train2.shape[0]//64, validation_data = (X_val2,Y_val2), callbacks=[learning_rate_reduction], verbose=0) print("CNN Model {0:d}: Epochs={1:d}, Training accuracy={2:.5f}, Validation accuracy={3:.5f}".format(j+1,epochs,max(history[j].history['accuracy']),max(history[j].history['val_accuracy']) ))![]()
After the successful training, we will make predictions using all the 10 trained models and final prediction will be the most frequent prediction of all the 10 predictions.
#Result results = np.zeros( (test.shape[0],10) ) for j in range(nets): results = results + model[j].predict(test) results = np.argmax(results,axis = 1)
Now we will check the prediction on one sample image.
#Test on result plt.imshow(test[0][:,:,0]) plt.title(results[0])![]()
Finally, we will check the prediction label on a few more test images.
L = 4 W = 4 fig, axes = plt.subplots(L, W, figsize = (12,12)) axes = axes.ravel() for i in np.arange(0, L * W): axes[i].imshow(test[i].reshape(28,28)) axes[i].set_title(results[i]) axes[i].axis('off') plt.subplots_adjust(wspace=0.5)![]()
So, as we can see above, our ensemble model has given correct predictions for above 16 images. We can check the model with more images. This ensemble of convolutional neural networks can be applied on a larger image data set to check its performance. In those cases, we can increase the number of individual CNN models and train with more number of epochs.