MITB Banner

Hands-On Guide To Implement Batch Normalization in Deep Learning Models

Through this article, we will discuss how the batch normalization helps in building an efficient model.  Batch normalization is a feature that we add between the layers of the neural network and it continuously takes the output from the previous layer and normalizes it before sending it to the next layer. This has the effect of stabilizing the neural network. Batch normalization is also used to maintain the distribution of the data.
Share
Batch Normalization in Deep Learning

Batch normalization is one of the important features we add to our model helps as a Regularizer, normalizing the inputs, in the backpropagation process, and can be adapted to most of the models to converge better. Here, in this article, we are going to discuss the batch normalization technique in detail. We will see the effectiveness of using batch normalization by comparing two models – one without batch normalization and other with batch normalization – in the task of image classification.

Topics we cover in this article

  • How Does Batch Normalization work
  • Use of Batch Normalization
  • Defining a Training a Model without Batch Normalization
  • Defining a Training a Model with Batch Normalization

How Does Batch Normalization work?

Through this article, we will discuss how the batch normalization helps in building an efficient model.  Batch normalization is a feature that we add between the layers of the neural network and it continuously takes the output from the previous layer and normalizes it before sending it to the next layer. This has the effect of stabilizing the neural network. Batch normalization is also used to maintain the distribution of the data.



The problem we have in neural networks is the internal covariate shift. When we are training our neural network, the distribution of data changes and the model trains slower. This problem is framed as an internal covariate shift. To maintain the similar distribution of data we use batch normalization by normalizing the outputs using mean=0, standard dev=1 (μ=0,σ=1). By using this technique, the model is trained faster and it also increases the accuracy of the model compared to a model that does not use the batch normalization.

When to use Batch Normalization?

We can use Batch Normalization in Convolution Neural Networks, Recurrent Neural Networks, and Artificial Neural Networks. In practical coding, we add Batch Normalization after the activation function of the output layer or before the activation function of the input layer. Mostly researchers found good results in implementing Batch Normalization after the activation layer.

By using Batch Normalization we can set the learning rates high which speeds up the Training process. Due to the flexibility of mean and variance for every mini-batch, it provides better learning and increases the accuracy of the model.

Batch Normalization is also a regularization technique, but that doesn’t fully work like l1, l2, dropout regularizations but by adding Batch Normalization we reduce the internal covariate shift and instability in distributions of layer activations in Deeper networks can reduce the effect of overfitting and works well with generalization data. So we can use Batch Normalization as a Regularization technique.

Implementation of Batch Normalization in Keras

In Batch Normalization we have default parameters, here we can change the values for each parameter and we can give our customized parameters.

keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None)

In the below code snippet we will import the required libraries 

from keras.datasets import mnist
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization

In below code snippet we will set the hyperparameters 

# Model configuration
batch_size = 250
no_epochs = 25
no_classes = 10
validation_split = 0.2
verbosity = 1

Data Pre-Processing

Now we will work on defining a deep learning model for classifying the MNIST Dataset. Here, we will add Batch Normalization between the layers of the deep learning network model. The MNIST dataset taken here has 10 classes with handwritten digits.



Dividing the data into train and test and preprocessing the dataset

# Load MNIST dataset
(input_train, target_train), (input_test, target_test) =mnist.load_data()

# Shape of the input sets
input_train_shape = input_train.shape
input_test_shape = input_test.shape 

# Keras layer input shape
input_shape = (input_train_shape[1], input_train_shape[2], 1)

# Reshape the training data to include channels
input_train = input_train.reshape(input_train_shape[0], input_train_shape[1], input_train_shape[2], 1)

input_test = input_test.reshape(input_test_shape[0], input_test_shape[1], input_test_shape[2], 1)

# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

# Normalize input data
input_train = input_train / 255
input_test = input_test / 255

Building a Model Without Batch Normalization

Here we are going to define the deep learning model without adding the batch normalization between the layers. 

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))

# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
              optimizer=tensorflow.keras.optimizers.Adam(),
              metrics=['accuracy'])

# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=batch_size,
            epochs=no_epochs,
            verbose=verbosity,
            validation_split=validation_split)

# Generate generalization metric  s
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')
Batch Normalization in Deep Learning

We can see in the above training, we achieved 0.97 accuracies during training when the batch normalization is not added.

Building a Model With Adding Batch Normalization

Now we will define the same model again with adding the batch normalization between the layers. 

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(no_classes, activation='softmax'))


As we can see above in the training epochs, by using the batch normalization technique, we have achieved 0.992 accuracies. Compared with the previous model without batch normalization, the accuracy has been increased up to 2%.

Now, we will compare the accuracy plots of both the training.

import matplotlib.pyplot as plt 
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='lower right')
plt.show()
Without using Batch Normalization:-




















With Batch Normalization:-
Batch Normalization in Deep Learning

As we can see in the above training plots, the accuracy is better when we add the batch normalization. The convergence is also better as compared to the once without batch normalization.

Conclusion

By analyzing the training performance and accuracy plots of both the ways – without batch normalization with batch normalization – we can conclude that adding the batch normalization between the layers of the network, id improves the accuracy of the model and avoids overfitting as well. So using batch normalization should always be considered when working with the deep learning models.

PS: The story was written using a keyboard.
Picture of Prudhvi varma

Prudhvi varma

AI enthusiast, Currently working with Analytics India Magazine. I have experience of working with Machine learning, Deep learning real-time problems, Neural networks, structuring and machine learning projects. I am a Computer Vision researcher and I am Interested in solving real-time computer vision problems.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed