In deep learning, Convolutional Neural Networks(CNNs or Convnets) take up a major role. CNNs are widely used in computer vision-based problems, natural language processing, time series analysis, recommendation systems. ConvNet architecture mainly has 3 layers – convolutional layer, pooling layer and fully connected layer. All these layers bring out features of the input by finding some pattern using mathematical calculations. Unlike other neural networks architecture, CNNs have a backpropagation algorithm.
To start with CNNs, LeNet-5 would be the best to learn first as it is a simple and basic model architecture. In this article, I’ll be discussing the architecture of LeNet-5 which is the very first convolutional neural network to be built.
What is LeNet-5?
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
LeNet-5 was developed by one of the pioneers of deep learning Yann LeCun in 1998 in his paper ‘Gradient-Based Learning Applied to Document Recognition’. LeNet was used in detecting handwritten cheques by banks based on MNIST dataset. Fully connected networks and activation functions were previously known in neural networks. LeNet-5 introduced convolutional and pooling layers. LeNet-5 is believed to be the base for all other ConvNets.

Source – Yann LeCun’s website showing LeNet-5 demo
A convolution is a linear operation. The convolutional layer does the major job by multiplying weight (kernel/filter) with the input.
A pooling layer generally comes after a convolutional layer. This layer helps in reducing the high dimensionality created by convolutional layers, to curb overfitting.
Architecture
LeNet-5 consists of 7 layers – alternatingly 2 convolutional and 2 average pooling layers, and then 2 fully connected layers and the output layer with activation function softmax.
Original Image of LeNet-5 architecture
1) MNIST images dimensions are 28 × 28 pixels, but they are zero-padded to 32 × 32 pixels and normalized before being fed forward to the network. Input image shrinks further down the network.
2) In the average pooling layers each neuron computes the mean of its inputs, then multiplies the result by a learnable coefficient and adds a learnable bias term then finally applies the activation function.
3) Most neurons in the 3rd convolutional layer are connected to neurons in only three or four 2nd avg pooling layers.
4) In the output layer each neuron outputs the square of the Euclidean distance between its input vector and its weight vector. Each output measure predicts the probability of the image that belongs to a particular digit class. The cross-entropy cost function is used in this step.
Implementation of LeNet-5
We implement the LeNet-5 using MNIST dataset for handwritten character recognition.
Importing libraries:
import tensorflow as tf from tensorflow.keras import Model from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Flatten, Conv2D, AveragePooling2D
Loading MNIST and splitting into training and testing datasets
mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data()
Reshaping image dimensions
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1) x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
Normalization
x_train = tf.keras.utils.normalize(x_train, axis=1) x_test = tf.keras.utils.normalize(x_test, axis=1)
Model Building
model = Sequential() model.add(Conv2D(filters=6, kernel_size=(3, 3), activation='tanh', input_shape=(28,28,1))) model.add(AveragePooling2D()) model.add(Conv2D(filters=16, kernel_size=(3, 3), activation='tanh')) model.add(AveragePooling2D()) model.add(Flatten()) model.add(Dense(units=128, activation='tanh')) model.add(Dense(units=10, activation = 'softmax')) model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_4 (Conv2D) (None, 26, 26, 6) 60 _________________________________________________________________ average_pooling2d_4 (Average (None, 13, 13, 6) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 11, 11, 16) 880 _________________________________________________________________ average_pooling2d_5 (Average (None, 5, 5, 16) 0 _________________________________________________________________ flatten_2 (Flatten) (None, 400) 0 _________________________________________________________________ dense_4 (Dense) (None, 128) 51328 _________________________________________________________________ dense_5 (Dense) (None, 10) 1290 ================================================================= Total params: 53,558 Trainable params: 53,558 Non-trainable params: 0
Model compilation
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
Fitting the model
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))
Epoch 1/10 1875/1875 [==============================] - 28s 15ms/step - loss: 0.2933 - accuracy: 0.9136 - val_loss: 0.1394 - val_accuracy: 0.9579 Epoch 2/10 1875/1875 [==============================] - 25s 13ms/step - loss: 0.1209 - accuracy: 0.9637 - val_loss: 0.1100 - val_accuracy: 0.9677 Epoch 3/10 1875/1875 [==============================] - 25s 13ms/step - loss: 0.0829 - accuracy: 0.9746 - val_loss: 0.0799 - val_accuracy: 0.9752 Epoch 4/10 1875/1875 [==============================] - 25s 14ms/step - loss: 0.0625 - accuracy: 0.9811 - val_loss: 0.0612 - val_accuracy: 0.9810 Epoch 5/10 1875/1875 [==============================] - 26s 14ms/step - loss: 0.0510 - accuracy: 0.9841 - val_loss: 0.0609 - val_accuracy: 0.9804 Epoch 6/10 1875/1875 [==============================] - 25s 13ms/step - loss: 0.0417 - accuracy: 0.9875 - val_loss: 0.0531 - val_accuracy: 0.9832 Epoch 7/10 1875/1875 [==============================] - 25s 13ms/step - loss: 0.0355 - accuracy: 0.9890 - val_loss: 0.0518 - val_accuracy: 0.9826 Epoch 8/10 1875/1875 [==============================] - 25s 14ms/step - loss: 0.0300 - accuracy: 0.9906 - val_loss: 0.0585 - val_accuracy: 0.9809 Epoch 9/10 1875/1875 [==============================] - 26s 14ms/step - loss: 0.0257 - accuracy: 0.9919 - val_loss: 0.0503 - val_accuracy: 0.9844 Epoch 10/10 1875/1875 [==============================] - 30s 16ms/step - loss: 0.0210 - accuracy: 0.9937 - val_loss: 0.0515 - val_accuracy: 0.9836 <tensorflow.python.keras.callbacks.History at 0x7fcf73ee0ef0>
Model prediction
predictions = model.predict(x_test) print(np.argmax(predictions[0]))
OUTPUT : 7
plt.imshow(x_test[0],cmap=plt.cm.binary) plt.show()
So our model has performed well with high accuracy and also predicted correctly.
Note that I’ve followed strictly the architecture and created the model with specified layers and activation functions, these can be tweaked and experimented. For example in place of ‘Tanh’ activation function, ’ReLU’ could be added.
Conclusion
LeNet -5 is an excellent model for beginners to learn about convolutional neural networks. This helps in the basic understanding of how CNNs work. The functionality of convolutional, pooling and fully connected layers are well explained through this neural network.
The complete code of the above implementation is available at the AIM’s GitHub repositories. Please visit this link to find the code.