MITB Banner

Max Pooling in Convolutional Neural Network and Its Features

Share

The primary task of a Deep Neural Network – especially in case of Image recognition, Video Processing etc is to extract the features in a systematic way by identifying edges and gradients, forming textures on top of it. As a whole, convolutional layers in the Deep Neural Networks form parts of objects and finally objects which can summarize the features in an input image.

In this process, maintaining the same image size throughout the Neural Network will lead to the stacking of multiple layers. This is not sustainable due to the huge computing resources it demands. At the same time, we need enough convolutions to extract meaningful features. 

Let’s suppose, we are trying to identify a cat. For this, we need to perform convolutions on top of this image by passing Kernels. 

selective focus photo of gray tabby cat

To gain a better understanding of this, let us split the image into multiple parts. If we have a look at the two images below which are nothing but the subset of the images, one image contains the head of the cat along with the background space. The other image contains only the head of the cat. In the first image, the only head part of the cat is enough for us to identify it as a cat and we don’t need the background. In addition to that, we need predominant features to be extracted such as the eye of the cat, which acts as a differentiator to identify the image. 

If we observe the feature maps performed by the convolution layers, they are sensitive to the location of the features in the input. This can be addressed by downsampling the feature maps. So, to maintain a balance between computing resources and extracting meaningful features, down-sizing or downsampling should be done at proper intervals.

In order to achieve this, we use a concept called Pooling. Pooling provides an approach to downsample feature maps by summarizing the presence of features in the feature maps. 

The most commonly used Pooling methods are “Max Pooling” and “Average Pooling”.

Here we shall discuss Max Pooling

      Max Pooling is a convolution process where the Kernel extracts the maximum value of the area it convolves. Max Pooling simply says to the Convolutional Neural Network that we will carry forward only that information, if that is the largest information available amplitude wise.

        Max-pooling on a 4*4 channel using 2*2 kernel and a stride of 2: As we are convolving with a 2*2 Kernel. If we observe the first 2*2 set on which the kernel is focusing the channel have four values 8,3,4,7. Max-Pooling picks the maximum value from that set which is “8”. 

https://anilbhatt1tech.files.wordpress.com/2020/02/maxpool-2x2_p1.png?w=1024

Here in our context, we will make a kernel that amplifies the image of the cat’s eye to such an extent that even after Max Pooling the predominant information is not lost. When Max Pooling now clips my pixels, the 25% pixels which are left are enough to get the information about the cat. So, there is going to be one channel or feature map which contains the information of the cat’s eye no matter what happens at the benefit of reducing 75% pixels. In another way, we can say that we are filtering information that we don’t want by building Kernels which can allow getting required information out through Max Pooling.

When Should you perform Max Pooling in Your Network?

 Analyze your image. Say your image is of size 28 * 28 pixels. In this image, if you can reach a receptive field of 5*5, you can find some features visible. When you can extract some features, it is advisable to do Max Pooling. It’s not advised to do Max pooling in the initial stages of the Convolutional Neural Network as the Kernels would be at the stage of extracting edges and gradients.

Code Illustration (In Pytorch)

We have taken an image of size 28*28. Convolution operation (Layer1) is performed on it by a 3*3 Kernel resulting in a Receptive field of 3*3. Again a convolution operation (Layer 2) is performed and the receptive field resulted to be 5*5. As the 5*5 Receptive field is enough to identify features on a 28*28 image, Max Pooling is performed as shown in the Transition block mentioned below in “Yellow”.

Features of Max Pooling

Max Pooling adds a bit of slight – Shift Invariance, Rotational Invariance, Scale Invariance. 

Slight change or shift does not cause invariance as we get max value from the 2 *2 image. This is called Shift invariance. Similarly, Max Pooling is slightly Rotational and scale-invariant.


This article is presented by AIM Expert Network (AEN), an invite-only thought leadership platform for tech experts. Check your eligibility.

Share
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.