Last updated February 28, 2024
In AI Mysteries

How do Kernel Regularizers work with neural networks?

Do you want to know how kernel regularizers adds penalty terms to the network weights and optimize performance. Here is the answer.

Share

Published on June 25, 2022

by Darshan M

Listen to this story

Regularization is the process of fine-tuning neural network models by inducing a penalty term in the error parameter to obtain an optimal and reliable model which converges better with minimal loss during testing and performs better for unseen data. Regularization helps us get a more generic and reliable model which functions well with respect to changes in patterns of data and any possible uncertainties. So in this article let us see how kernel regularizers work with neural networks and place at what layers of the neural networks are useful to obtain optimal neural networks.

What is Kernel Regularization
Need for Kernel Regularization
Case study of kernel regularizers with neural networks
Key Outcomes of kernel regularizers with neural networks
Summary

What is Kernel Regularization

Regularization is the process of adding penalty factors to the network layers to alter the weight propagation through the layers which facilitate the model to converge optimally. There are mainly two types of penalties that can be enforced on the network layers which are named as L1 regularization considers the weight of the layers as it is while the L2 regularization considers the squares of weights.

Are you looking for a complete repository of Python libraries used in data science, check out here.

Due to the robustness and the optimal penalization, the L1 regularization technique is used more in neural networks. Regularization can be applied at different layers respectively according to its needs where Kernel Regularization is one such technique where the penalty terms are added to the kernel layers which accounts for the addition of penalty terms to the weights of the neural networks and the bias component remains unaltered.

Need for Kernel Regularization

The primary need for Regularization techniques with respect to neural networks is to prevent the overfitting of complex neural networks and help them converge faster with faster weight updation during the training process. Among the various regularization techniques, kernel regularization is one such technique where the weight factor of the neural networks is added to some penalization or penalty factor. By adding penalty factors to the weights of the neural network, the neural network weight updation process is quicker with proper weights that can be used for the next updation. Kernel Regularizer does not add a penalty factor to the bias component which in turn is beneficial for obtaining lighter and better converging models. As the bias factor of the neural networks is unaltered the model generally does not overfit and helps us obtain better performing models in the testing phase.

Let us understand how Kernel Regularizers work with neural networks through a case study.

Case study of kernel regularizers with neural networks

For this case study, a binary image classification problem statement was taken up wherein we have to classify African and Asian Elephants.

Once the dataset was acquired, sample images of both the classes were visualized using plots from the Matplotlib module.

import matplotlib.pyplot as plt
train_path='/content/drive/MyDrive/Colab notebooks/Kernel Regularizers with NN/dataset/train'
test_path='/content/drive/MyDrive/Colab notebooks/Kernel Regularizers with NN/dataset/test'
 
plt.figure(figsize=(15,5))
img=load_img(train_path + "/African/af_tr109.jpg")
plt.imshow(img)
plt.axis("off")
plt.title("African Elephant Image")
plt.show()
 
plt.figure()
 
img=load_img(train_path + "/Asian/as_tr114.jpg")
plt.imshow(img)
plt.axis("off")
plt.title("Asian Elephant  Image")
plt.show()

Once the sample images in the dataset were visualized a Sequential Tensorflow model was built with certain layers as shown below and the model was suitably compiled with appropriate loss functions and metrics for evaluation.

Model without Kernel Regularization

import tensorflow as tf
from tensorflow.keras.layers import Dense,MaxPooling2D,Conv2D,Flatten
from tensorflow.keras.models import Sequential
 
img_row=150
img_col=150
 
model1=Sequential()
model1.add(Conv2D(64,(5,5),activation='relu',input_shape=(img_row,img_col,3)))
model1.add(MaxPooling2D(pool_size=(2,2)))
model1.add(Conv2D(32,(5,5),activation='relu'))
model1.add(MaxPooling2D(pool_size=(2,2)))
model1.add(Conv2D(16,(5,5),activation='relu'))
model1.add(MaxPooling2D(pool_size=(2,2)))
model1.add(Flatten())
model1.add(Dense(126,activation='relu'))
model1.add(Dense(52,activation='relu'))
model1.add(Dense(1,activation='sigmoid'))
model1.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

As we are using Image Dataset some suitable preprocessing was taken up by using the ImageDataGenerator module as shown below.

from tensorflow.keras.preprocessing.image import ImageDataGenerator,load_img

train_datagen=ImageDataGenerator(rescale=1./255,shear_range=0.2,zoom_range=0.2,horizontal_flip=True)
test_datagen=ImageDataGenerator(rescale=1./255)
train_set=train_datagen.flow_from_directory(train_path,target_size=(img_row,img_col),
                                           batch_size=64,class_mode='binary')
test_set=test_datagen.flow_from_directory(test_path,target_size=(img_row,img_col),
                                           batch_size=64,class_mode='binary')

With suitably preprocessed data now the compiled was fit on the split data with 50 epochs and later evaluated for train and test loss and accuracy.

model1_res=model1.fit_generator(train_set,steps_per_epoch=840//64,
                               epochs=50,validation_data=test_set,
                               validation_steps=188//64)
model1.evaluate(train_set)  ## training loss and training accuracy

model1.evaluate(test_set)  ## testing loss and accuracy

So here Model-1 can be considered as the base model and the parameters obtained can be used to validate it with models where kernel regularization is applied.

Applying Kernel Regularization before Flattening layer

For the same model, architecture lets us apply kernel regularization just before the flattening layer and observe the model performance by comparing it with the base model.

img_row=150
img_col=150
 
model3=Sequential()
model3.add(Conv2D(64,(3,3),activation='relu',input_shape=(img_row,img_col,3)))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Conv2D(32,(3,3),activation='relu'))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Conv2D(16,(3,3),activation='relu',kernel_regularizer=regularizers.l1(0.001)))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Flatten())
model3.add(Dense(126,activation='relu'))
model3.add(Dense(52,activation='relu'))
model3.add(Dense(1,activation='sigmoid'))

The model architecture is now compiled suitably and fitted for 50 epochs and evaluated for train and test loss and accuracy respectively as shown below.

model3.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model3_res=model3.fit_generator(train_set,steps_per_epoch=840//64,epochs=50,
                     validation_data=test_set,
                     validation_steps=188//64)
model3.evaluate(train_set)  ## training loss and training accuracy

model3.evaluate(test_set)  ## testing loss and testing accuracy

Here when we compare the base model performance and the kernelized model performance we can see that the loss parameters of the model are considerable and with respect to testing accuracy the model is performing better when compared to the base model. So we can say that by using a kernel regularizer just before the flattening layer, it was seen that the model’s underfitting can be overridden by using Kernel Regularizer just before the Flattening layer.

Using Kernel Regularization at two layers

Here kernel regularization is firstly used in the input layer and in the layer just before the output layer. So below is the model architecture and let us compile it with an appropriate loss function and metrics.

img_row=150
img_col=150
 
model5=Sequential()
model5.add(Conv2D(64,(3,3),activation='relu',input_shape=(img_row,img_col,3)))
model5.add(MaxPooling2D(pool_size=(2,2)))
model5.add(Conv2D(32,(3,3),activation='relu',kernel_regularizer=regularizers.l1(0.001)))
model5.add(MaxPooling2D(pool_size=(2,2)))
model5.add(Conv2D(16,(3,3),activation='relu'))
model5.add(MaxPooling2D(pool_size=(2,2)))
model5.add(Flatten())
model5.add(Dense(126,activation='relu'))
model5.add(Dense(52,activation='relu',kernel_regularizer=regularizers.l1(0.001)))
model5.add(Dense(1,activation='sigmoid'))

model5.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

The model was fit on the split data for 50 epochs and later evaluated for train and test loss and accuracy.

model5_res=model5.fit_generator(train_set,steps_per_epoch=840//64,epochs=50,
                     validation_data=test_set,
                     validation_steps=188//64)

model5.evaluate(train_set)  ## training loss and training accuracy

model5.evaluate(test_set)  ## training loss and training accuracy

Here when the base model and the kernelized model at two layers is considered we can see that the losses are reduced when compared to the base model and the training and testing accuracy are almost close which shows signs of a reliable model which performs better for unseen data during testing.

Key Outcomes of kernel regularizers with neural networks

Model Names	Training Loss	Training Accuracy	Testing Loss	Testing Accuracy
Base Model	0.406	0.805	0.644	0.728
Kernelized Model-1	0.621	0.656	0.612	0.696
Kernelized Model-2	0.603	0.694	0.638	0.686

Regularization is not required for all neural network architectures. Regularization fits in best for complex and deeper neural networks.
Kernel Regularizers when applied before the flattening layer help to overcome underfitting by yielding lower train and test loss and higher test accuracy
When multiple Kernel Regularizers are used, complex neural networks help to reduce overfitting to a large extent and help in yielding a reliable model without many fluctuations in the train and test parameters.
Kernel Regularizers when used for relatively easier datasets may not show signs of improvement in any of the parameters because for easier neural network architectures the weight updation process is simpler as the updation happens over fewer layers.

Summary

Fine-tuning complex neural networks help in speeding up the training process and helps in faster convergence and obtaining a generic model. Among various fine-tuning techniques of neural networks, Kernel Regularization is one such technique suitable for complex or deep neural network architectures where a penalty term gets added to the weights of the layer without altering the bias, thereby addressing the issues associated with underfitting of neural networks and helps in yielding reliable models which would perform better for unseen data or changing environments of testing.

Access all our open Survey & Awards Nomination forms in one place

Darshan M

Darshan is a Master's degree holder in Data Science and Machine Learning and an everyday learner of the latest trends in Data Science and Machine Learning. He is always interested to learn new things with keen interest and implementing the same and curating rich content for Data Science, Machine Learning,NLP and AI