Last updated January 13, 2021
In AI Mysteries

Complete Tutorial On LeNet-5 | Guide To Begin With CNNs

To start with CNNs, LeNet-5 would be the best to learn first as it is a simple and basic model architecture. In this article, I’ll be discussing the architecture of LeNet-5 which is the very first convolutional neural network to be built.

Share

Published on October 23, 2020

by Jayita Bhattacharyya

In deep learning, Convolutional Neural Networks(CNNs or Convnets) take up a major role. CNNs are widely used in computer vision-based problems, natural language processing, time series analysis, recommendation systems. ConvNet architecture mainly has 3 layers – convolutional layer, pooling layer and fully connected layer. All these layers bring out features of the input by finding some pattern using mathematical calculations. Unlike other neural networks architecture, CNNs have a backpropagation algorithm.

What is LeNet-5?

LeNet-5 was developed by one of the pioneers of deep learning Yann LeCun in 1998 in his paper ‘Gradient-Based Learning Applied to Document Recognition’. LeNet was used in detecting handwritten cheques by banks based on MNIST dataset. Fully connected networks and activation functions were previously known in neural networks. LeNet-5 introduced convolutional and pooling layers. LeNet-5 is believed to be the base for all other ConvNets.

Source – Yann LeCun’s website showing LeNet-5 demo

A convolution is a linear operation. The convolutional layer does the major job by multiplying weight (kernel/filter) with the input.

A pooling layer generally comes after a convolutional layer. This layer helps in reducing the high dimensionality created by convolutional layers, to curb overfitting.

Architecture

LeNet-5 consists of 7 layers – alternatingly 2 convolutional and 2 average pooling layers, and then 2 fully connected layers and the output layer with activation function softmax.

Original Image of LeNet-5 architecture

1) MNIST images dimensions are 28 × 28 pixels, but they are zero-padded to 32 × 32 pixels and normalized before being fed forward to the network. Input image shrinks further down the network.

2) In the average pooling layers each neuron computes the mean of its inputs, then multiplies the result by a learnable coefficient and adds a learnable bias term then finally applies the activation function.

3) Most neurons in the 3rd convolutional layer are connected to neurons in only three or four 2nd avg pooling layers.

4) In the output layer each neuron outputs the square of the Euclidean distance between its input vector and its weight vector. Each output measure predicts the probability of the image that belongs to a particular digit class. The cross-entropy cost function is used in this step.

Implementation of LeNet-5

We implement the LeNet-5 using MNIST dataset for handwritten character recognition.

Importing libraries:

import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, AveragePooling2D

Loading MNIST and splitting into training and testing datasets

mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()

Reshaping image dimensions

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

Normalization

x_train = tf.keras.utils.normalize(x_train, axis=1)  
x_test = tf.keras.utils.normalize(x_test, axis=1)

Model Building

model = Sequential()
model.add(Conv2D(filters=6, kernel_size=(3, 3), activation='tanh', input_shape=(28,28,1)))
model.add(AveragePooling2D())
model.add(Conv2D(filters=16, kernel_size=(3, 3), activation='tanh'))
model.add(AveragePooling2D())
model.add(Flatten())
model.add(Dense(units=128, activation='tanh'))
model.add(Dense(units=10, activation = 'softmax'))
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            (None, 26, 26, 6)         60        
_________________________________________________________________
average_pooling2d_4 (Average (None, 13, 13, 6)         0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 11, 11, 16)        880       
_________________________________________________________________
average_pooling2d_5 (Average (None, 5, 5, 16)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 400)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 128)               51328     
_________________________________________________________________
dense_5 (Dense)              (None, 10)                1290      
=================================================================
Total params: 53,558
Trainable params: 53,558
Non-trainable params: 0

Model compilation

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Fitting the model

model.fit(x_train, y_train, 
          epochs=10, 
          validation_data=(x_test, y_test))

Epoch 1/10
1875/1875 [==============================] - 28s 15ms/step - loss: 0.2933 - accuracy: 0.9136 - val_loss: 0.1394 - val_accuracy: 0.9579
Epoch 2/10
1875/1875 [==============================] - 25s 13ms/step - loss: 0.1209 - accuracy: 0.9637 - val_loss: 0.1100 - val_accuracy: 0.9677
Epoch 3/10
1875/1875 [==============================] - 25s 13ms/step - loss: 0.0829 - accuracy: 0.9746 - val_loss: 0.0799 - val_accuracy: 0.9752
Epoch 4/10
1875/1875 [==============================] - 25s 14ms/step - loss: 0.0625 - accuracy: 0.9811 - val_loss: 0.0612 - val_accuracy: 0.9810
Epoch 5/10
1875/1875 [==============================] - 26s 14ms/step - loss: 0.0510 - accuracy: 0.9841 - val_loss: 0.0609 - val_accuracy: 0.9804
Epoch 6/10
1875/1875 [==============================] - 25s 13ms/step - loss: 0.0417 - accuracy: 0.9875 - val_loss: 0.0531 - val_accuracy: 0.9832
Epoch 7/10
1875/1875 [==============================] - 25s 13ms/step - loss: 0.0355 - accuracy: 0.9890 - val_loss: 0.0518 - val_accuracy: 0.9826
Epoch 8/10
1875/1875 [==============================] - 25s 14ms/step - loss: 0.0300 - accuracy: 0.9906 - val_loss: 0.0585 - val_accuracy: 0.9809
Epoch 9/10
1875/1875 [==============================] - 26s 14ms/step - loss: 0.0257 - accuracy: 0.9919 - val_loss: 0.0503 - val_accuracy: 0.9844
Epoch 10/10
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0210 - accuracy: 0.9937 - val_loss: 0.0515 - val_accuracy: 0.9836
<tensorflow.python.keras.callbacks.History at 0x7fcf73ee0ef0>

Model prediction

predictions = model.predict(x_test)
print(np.argmax(predictions[0]))

OUTPUT : 7

plt.imshow(x_test[0],cmap=plt.cm.binary)
plt.show()

So our model has performed well with high accuracy and also predicted correctly.

Note that I’ve followed strictly the architecture and created the model with specified layers and activation functions, these can be tweaked and experimented. For example in place of ‘Tanh’ activation function, ’ReLU’ could be added.

Conclusion

LeNet -5 is an excellent model for beginners to learn about convolutional neural networks. This helps in the basic understanding of how CNNs work. The functionality of convolutional, pooling and fully connected layers are well explained through this neural network.

The complete code of the above implementation is available at the AIM’s GitHub repositories. Please visit this link to find the code.

Access all our open Survey & Awards Nomination forms in one place