Listen to this story
|
Regularization is the process of fine-tuning neural network models by inducing a penalty term in the error parameter to obtain an optimal and reliable model which converges better with minimal loss during testing and performs better for unseen data. Regularization helps us get a more generic and reliable model which functions well with respect to changes in patterns of data and any possible uncertainties. So in this article let us see how kernel regularizers work with neural networks and place at what layers of the neural networks are useful to obtain optimal neural networks.
Table of Contents
- What is Kernel Regularization
- Need for Kernel Regularization
- Case study of kernel regularizers with neural networks
- Key Outcomes of kernel regularizers with neural networks
- Summary
What is Kernel Regularization
Regularization is the process of adding penalty factors to the network layers to alter the weight propagation through the layers which facilitate the model to converge optimally. There are mainly two types of penalties that can be enforced on the network layers which are named as L1 regularization considers the weight of the layers as it is while the L2 regularization considers the squares of weights.
Are you looking for a complete repository of Python libraries used in data science, check out here.
Due to the robustness and the optimal penalization, the L1 regularization technique is used more in neural networks. Regularization can be applied at different layers respectively according to its needs where Kernel Regularization is one such technique where the penalty terms are added to the kernel layers which accounts for the addition of penalty terms to the weights of the neural networks and the bias component remains unaltered.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Need for Kernel Regularization
The primary need for Regularization techniques with respect to neural networks is to prevent the overfitting of complex neural networks and help them converge faster with faster weight updation during the training process. Among the various regularization techniques, kernel regularization is one such technique where the weight factor of the neural networks is added to some penalization or penalty factor. By adding penalty factors to the weights of the neural network, the neural network weight updation process is quicker with proper weights that can be used for the next updation. Kernel Regularizer does not add a penalty factor to the bias component which in turn is beneficial for obtaining lighter and better converging models. As the bias factor of the neural networks is unaltered the model generally does not overfit and helps us obtain better performing models in the testing phase.

Let us understand how Kernel Regularizers work with neural networks through a case study.
Case study of kernel regularizers with neural networks
For this case study, a binary image classification problem statement was taken up wherein we have to classify African and Asian Elephants.
Once the dataset was acquired, sample images of both the classes were visualized using plots from the Matplotlib module.
import matplotlib.pyplot as plt train_path='/content/drive/MyDrive/Colab notebooks/Kernel Regularizers with NN/dataset/train' test_path='/content/drive/MyDrive/Colab notebooks/Kernel Regularizers with NN/dataset/test' plt.figure(figsize=(15,5)) img=load_img(train_path + "/African/af_tr109.jpg") plt.imshow(img) plt.axis("off") plt.title("African Elephant Image") plt.show() plt.figure() img=load_img(train_path + "/Asian/as_tr114.jpg") plt.imshow(img) plt.axis("off") plt.title("Asian Elephant Image") plt.show()
Once the sample images in the dataset were visualized a Sequential Tensorflow model was built with certain layers as shown below and the model was suitably compiled with appropriate loss functions and metrics for evaluation.
Model without Kernel Regularization
import tensorflow as tf from tensorflow.keras.layers import Dense,MaxPooling2D,Conv2D,Flatten from tensorflow.keras.models import Sequential img_row=150 img_col=150 model1=Sequential() model1.add(Conv2D(64,(5,5),activation='relu',input_shape=(img_row,img_col,3))) model1.add(MaxPooling2D(pool_size=(2,2))) model1.add(Conv2D(32,(5,5),activation='relu')) model1.add(MaxPooling2D(pool_size=(2,2))) model1.add(Conv2D(16,(5,5),activation='relu')) model1.add(MaxPooling2D(pool_size=(2,2))) model1.add(Flatten()) model1.add(Dense(126,activation='relu')) model1.add(Dense(52,activation='relu')) model1.add(Dense(1,activation='sigmoid')) model1.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
As we are using Image Dataset some suitable preprocessing was taken up by using the ImageDataGenerator module as shown below.
from tensorflow.keras.preprocessing.image import ImageDataGenerator,load_img train_datagen=ImageDataGenerator(rescale=1./255,shear_range=0.2,zoom_range=0.2,horizontal_flip=True) test_datagen=ImageDataGenerator(rescale=1./255) train_set=train_datagen.flow_from_directory(train_path,target_size=(img_row,img_col), batch_size=64,class_mode='binary') test_set=test_datagen.flow_from_directory(test_path,target_size=(img_row,img_col), batch_size=64,class_mode='binary')
With suitably preprocessed data now the compiled was fit on the split data with 50 epochs and later evaluated for train and test loss and accuracy.
model1_res=model1.fit_generator(train_set,steps_per_epoch=840//64, epochs=50,validation_data=test_set, validation_steps=188//64) model1.evaluate(train_set) ## training loss and training accuracy

model1.evaluate(test_set) ## testing loss and accuracy

So here Model-1 can be considered as the base model and the parameters obtained can be used to validate it with models where kernel regularization is applied.
Applying Kernel Regularization before Flattening layer
For the same model, architecture lets us apply kernel regularization just before the flattening layer and observe the model performance by comparing it with the base model.
img_row=150 img_col=150 model3=Sequential() model3.add(Conv2D(64,(3,3),activation='relu',input_shape=(img_row,img_col,3))) model3.add(MaxPooling2D(pool_size=(2,2))) model3.add(Conv2D(32,(3,3),activation='relu')) model3.add(MaxPooling2D(pool_size=(2,2))) model3.add(Conv2D(16,(3,3),activation='relu',kernel_regularizer=regularizers.l1(0.001))) model3.add(MaxPooling2D(pool_size=(2,2))) model3.add(Flatten()) model3.add(Dense(126,activation='relu')) model3.add(Dense(52,activation='relu')) model3.add(Dense(1,activation='sigmoid'))
The model architecture is now compiled suitably and fitted for 50 epochs and evaluated for train and test loss and accuracy respectively as shown below.
model3.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy']) model3_res=model3.fit_generator(train_set,steps_per_epoch=840//64,epochs=50, validation_data=test_set, validation_steps=188//64) model3.evaluate(train_set) ## training loss and training accuracy

model3.evaluate(test_set) ## testing loss and testing accuracy

Here when we compare the base model performance and the kernelized model performance we can see that the loss parameters of the model are considerable and with respect to testing accuracy the model is performing better when compared to the base model. So we can say that by using a kernel regularizer just before the flattening layer, it was seen that the model’s underfitting can be overridden by using Kernel Regularizer just before the Flattening layer.
Using Kernel Regularization at two layers
Here kernel regularization is firstly used in the input layer and in the layer just before the output layer. So below is the model architecture and let us compile it with an appropriate loss function and metrics.
img_row=150 img_col=150 model5=Sequential() model5.add(Conv2D(64,(3,3),activation='relu',input_shape=(img_row,img_col,3))) model5.add(MaxPooling2D(pool_size=(2,2))) model5.add(Conv2D(32,(3,3),activation='relu',kernel_regularizer=regularizers.l1(0.001))) model5.add(MaxPooling2D(pool_size=(2,2))) model5.add(Conv2D(16,(3,3),activation='relu')) model5.add(MaxPooling2D(pool_size=(2,2))) model5.add(Flatten()) model5.add(Dense(126,activation='relu')) model5.add(Dense(52,activation='relu',kernel_regularizer=regularizers.l1(0.001))) model5.add(Dense(1,activation='sigmoid')) model5.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
The model was fit on the split data for 50 epochs and later evaluated for train and test loss and accuracy.
model5_res=model5.fit_generator(train_set,steps_per_epoch=840//64,epochs=50, validation_data=test_set, validation_steps=188//64) model5.evaluate(train_set) ## training loss and training accuracy

model5.evaluate(test_set) ## training loss and training accuracy

Here when the base model and the kernelized model at two layers is considered we can see that the losses are reduced when compared to the base model and the training and testing accuracy are almost close which shows signs of a reliable model which performs better for unseen data during testing.
Key Outcomes of kernel regularizers with neural networks
Model Names | Training Loss | Training Accuracy | Testing Loss | Testing Accuracy |
Base Model | 0.406 | 0.805 | 0.644 | 0.728 |
Kernelized Model-1 | 0.621 | 0.656 | 0.612 | 0.696 |
Kernelized Model-2 | 0.603 | 0.694 | 0.638 | 0.686 |
- Regularization is not required for all neural network architectures. Regularization fits in best for complex and deeper neural networks.
- Kernel Regularizers when applied before the flattening layer help to overcome underfitting by yielding lower train and test loss and higher test accuracy
- When multiple Kernel Regularizers are used, complex neural networks help to reduce overfitting to a large extent and help in yielding a reliable model without many fluctuations in the train and test parameters.
- Kernel Regularizers when used for relatively easier datasets may not show signs of improvement in any of the parameters because for easier neural network architectures the weight updation process is simpler as the updation happens over fewer layers.
Summary
Fine-tuning complex neural networks help in speeding up the training process and helps in faster convergence and obtaining a generic model. Among various fine-tuning techniques of neural networks, Kernel Regularization is one such technique suitable for complex or deep neural network architectures where a penalty term gets added to the weights of the layer without altering the bias, thereby addressing the issues associated with underfitting of neural networks and helps in yielding reliable models which would perform better for unseen data or changing environments of testing.