Most of the classification tasks are based on images and videos. We have seen that to perform classification tasks on images and videos; the convolutional layer plays a key role. “In mathematics, convolution is a mathematical operation of two functions such that it produces a third function that expresses how another function modifies the shape of one function.”
If you try to apply the above definition, the convolution in CNN denotes the operation performed on two images which can be represented as matrices are multiplied to give an output that is used to extract features from an image. Convolution is the simple application of a filter to an input image that results in activation, and repeated application of the same filter throughout the image results in a map of activation called feature map, indicating location and strength of detected feature in an input image.
In this article, we will learn:
- The intuition of convolution in CNN.
- How can filters be handcrafted?
- How to calculate feature map from 1D and 2D data?
1. Intuition of convolution in CNN:
The CNN is a special type of neural network model designed to work on images data that can be one-dimensional, two-dimensional, and sometimes three-dimensional. Their application ranges from image and video recognition, image classification, medical image analysis, computer vision and natural language processing.
In the context of CNN, convolution is a linear operation involving the multiplication of a set of weights with the input images represented by metrics similar to traditional neural networks. Here an array of weights is called a filter or kernel.
Strides defines the motion of the filter; if you set stride=1, which is the default value, the kernel takes one step at a time.
Usually, the filter size is smaller than the input data, and the type multiplication applied between filter and filter sized sample of input data is the dot product. A dot product is an element-wise multiplication between filter weights and filter sized sample of input data, summed up in a single value.
Intentionally, the filter size is chosen smaller than that of input data as it allows the same set of filter weights to be multiplied by the input array multiple times at different points on the image. In simple words, the filter is applied systematically to each filter sized input data from left to right and top to bottom.
This systematic application of the same filter throughout the same image is used to detect specific types of features in input data. As mentioned earlier, the output from the dot product of filter and input image for one time is a single scalar value. This filter is applied multiple times to the input image that results in a two-dimensional output array representing the filter of the input image. Such a two-dimensional output array is called a feature map, and this feature map then passed through some non-linearity like ReLU.
2. How can filters be handcrafted?
Previously, the filter was designed by a computer vision expert, which is then applied to input image results in the feature map.
Some examples of 3 x 3 filters;
Horizontal line detector
array([[[0., 0., 0.], [1., 1., 1.], [0., 0., 0.]]])
Vertical line detector
array([[[0., 1., 0.], [0., 1., 0.], [0., 1., 0.]]])
Applying these filters to an image will contain only horizontal and vertical lines from the input image, called a feature map. The main motto of convolution operation in the neural network is that weights of filters are to be learned by the network while training.
The convolutional neural network does not learn on a single filter; they learn multiple feature maps from a given input image. For example, if you set the filter size to be 30, the network will be executing these 30 different ways to catch features from the input image.
If you talk more specifically about channels of input images, then the filter must have the same number of channels as input images.
3. How to calculate feature map from 1D and 2D data
Here we can better understand convolution operation and how to extract feature maps.
We can define one-dimensional and two-dimensional data as below,
data_1D = [0, 1, 0, 1, 1, 0] data_2D = [[0, 1, 0, 1, 1, 0], [0, 1, 0, 1, 1, 0], [0, 1, 0, 1, 1, 0], [0, 1, 0, 1, 1, 0], [0, 1, 0, 1, 1, 0], [0, 1, 0, 1, 1, 0]]
The input to the Keras Conv1D must be three dimensional, and for Conv2D, it must be four-dimensional. In the case of 1D and 2D, the first dimension represent the number of samples. In this case, we have only one; the second dimension in 1D refers to the length of each sample. In 2D, it refers to a number of rows. Here, in this case, it is six; the third dimension in 1D refers to the number of channels of each sample; for this case, it is one, and in 2D, it refers to the number of columns, in this case, is six.
The fourth dimension in 2D refers to no of channels for each sample.
Therefore output shape must be for Conv1D as [sample, length of sample, channel] in our case, it should be as [1,6,1], and for Conv2D as [samples, rows, columns, channel] in our case, it should be as [1,6,6,1]
Convert data into an array and reshape
data_1D = np.array(data_1D) data_1D = data_1D.reshape(1,6,1) data_2D = np.array(data_2D) data_2D = data_2D.reshape(1,6,6,1)
Now we will define the sequential model, which consists of the Conv1D layer, which expects an input shape as [1,6], and the model will have one filter with the shape of three or, in other words, three elements wide. The same will be carried out for Conv2D.
from keras.models import Sequential from keras.layers import Conv1D,Conv2D model1 = Sequential() model1.add(Conv1D(1,kernel_size = 2,input_shape = (1,6)))
Here we are explicitly setting the weight of filters; we are defining a filter that is capable of detecting changes in input data.
weights = [np.array([[[0]],[[1]]]),np.array([0.0])] model1.set_weights(weights)
And finally, we can apply our input data to the model to see the convolution operation for that we are using predict method.
model1.predict(data_1D) Output: array([[[1.], [0.], [1.], [1.], [0.]]], dtype=float32)
Now we are going to understand what exactly happened in the convolution operation.
First, the two elements of the filter [0,1] are applied to the first two input data elements, [0,1], and the dot product between them results in output as 1. And the same operation is followed till the last two values of input.
Note the length of the feature map is 5, whereas our input data has a length of 6. This is how the filter was applied to the input sequence. You can change the shape of a feature map by setting padding = ‘same’ in the Conv1D layer; it will give the same shape as that of the input sequence.
In the similar way you can calculate the the feature map for 2D data as shown below,
model2 = Sequential() model2.add(Conv2D(1,kernel_size = (3,3), input_shape = (6,6,1), padding = 'same')) detectors = [[[[1]],[[0]],[[0]]], [[[1]],[[0]],[[0]]], [[[0]],[[0]],[[2]]]] weights = [np.array(detectors),np.array([0.0])] model2.set_weights(weights)
model2.predict(data_2D) Output: array([[[[2.], [0.], [3.], [2.], [1.], [1.]], [[2.], [0.], [4.], [2.], [2.], [2.]], [[2.], [0.], [4.], [2.], [2.], [2.]], [[2.], [0.], [4.], [2.], [2.], [2.]], [[2.], [0.], [4.], [2.], [2.], [2.]], [[0.], [0.], [2.], [0.], [2.], [2.]]]], dtype=float32)
As we settled padding = ‘same’ that has given output shape of feature map, same as input shape of data.
Link for Google Colab Notebook
Conclusion:
In this article, we mainly discussed the Intuition of convolution in convolutional neural networks. We have seen how multiplication is carried out between filter and input data based on how the feature map is created. After that, we have seen what a filter is and how to use custom filters to our data to detect features. And finally, with the help of python codes, we observed all the theoretical discussion practically.