In machine learning, we usually go through the data preprocessing or cleaning steps for various reasons. The goal of these steps is to make your data ready for the modelling purpose and make it easier to analyze and process computationally. For any data that we are working with, all the necessary preprocessing steps should be carried out based on the problem we are solving and the type of data there are certain steps. Here in this article, we will discuss a few common and most useful processing steps on image data. The major points that we are covering in this article are listed below.
Table of Contents
- Need of Image Processing
- Image Thresholding
- Simple Thresholding
- Adaptive Thresholding
- Image Pyramids
- Image Blending
- Histogram of an Image
- Fourier Transform
- Foreground Extraction
Need of Image Processing
The first-hand data is usually messy and comes from different sources and distributions. To feed them into the machine learning model they need to be standardized and cleaned up. Moreover, these pre-processing steps often result in reduced complexity and increase the model performance for the applied algorithm or technique. Image pre-processing may also decrease model training time and increase the model inference time.
Image thresholding is the simplest and effective way of partitioning an image into a foreground and background. This process is about dividing the image into two or more classes of pixel e.g foreground and background. It is mostly used in various tasks such as removing noise present in the image which results in greater localization of objects.
To obtain a thresholded image we convert the original image into a grayscale image and then apply thresholding techniques. This approach is also called binarization as we convert the image into a binarized form I,e if the pixel value is less than the threshold value then the pixel is converted into 0 (black) and vice versa.
In simple thresholding, the threshold value is applied to all pixels, and values are calculated accordingly. To do so, OpenCV provides a function as cv.threshold which returns two values – the first, is the threshold used and the second is a thresholded image. OpenCV comes with different types of thresholding such as Binary thresholding, Binary Threshold inverted, Truncated, Threshold to zero, and threshold to zero inverted.
See the documentation here to understand the mathematical intuition. Below you can see the output of this various thresholding.
In simple thresholding threshold value is treated as a global value, which means against the same value all the pixels are compared. But this might not result well in all the cases. For e.g, if the important or interesting objects in the image are located in a different region, this adaptive thresholding is used. Here the algorithm determines a threshold value for pixel-based on a small region around it so we get a unique value for each small and different region for the same image. Adaptive thresholding is very useful where varying illumination is present.
Image pyramids are nothing but representing the image with multiple aspect ratios. By using image pyramids we can find objects in images at different scales of an image and when we combine it with the sliding window we can find objects in an image in various locations.
At the bottom of the representation, we have the original image with the original size, and in each subsequent layer, the image is sampled and resized, and smoothened by using the gaussian filters. The image is subsequently sampled until some stopping criteria are met which normally a minimum size has been reached and no further sampling carried out.
This set of images with different resolutions is called image pyramids because when we keep them in a stack following the ascending order of resolution which looks like a pyramid.
Below you can see an example of image pyramid:
Image Blending using Pyramids
One of the applications of pyramids is Image blending, which is nothing but mixing two images with corresponding pixel values to create a new artefact image. In image stitching, we need to stack two images together one over the other. But this will not look good as it will be clearly visible that two distinct images using pyramids give us seamless blending without losing much more data. Below you can see the classic example of this blending technique.
Histogram of an Image
The histogram of the image gives an overall idea about the pixel’s intensity distribution throughout the image. It is a plot with pixel values ranging from 0 to 255. By looking at the histogram of an image we can easily get an intuition about the contrast, brightness, intensity distribution, etc of that image.
In the below image, as we can see on the left side of the histogram all the darker pixels are plotted and farther objects which are mostly white pixels are not considered.
In signal processing, for any periodic signal if its amplitude is varying so fast then its corresponding frequency is high and vice versa. The same knowledge can be applied to image processing. So in images amplitude of pixels varies drastically at edges, noises are the high-frequency components of the image.
One of the most advanced topics in image processing is Fourier transformations. Fourier transform is used to analyze the frequency characteristics of various filters applied. For images, 2D discrete Fourier transform is used to find the frequency domain representation.
Fourier transformation is used to decompose a given mathematical function into sine and cosine components and the same can be achieved to images. The output of the transformation represents the image in the frequency domain in which each point represents a particular frequency contained in the original image.
Foreground extraction is any technique that allows images’ foreground to be extracted for further processing like object detection, tracking, etc.
For foreground detection here I am showing an example for the GrabCut algorithm which is designed at Microsoft Research, Cambridge, UK. To use this, First, we have to draw a rectangle around the foreground region in the image, then the algorithm segments it iteratively to get the outcome. But it is not fine in all cases, some algorithms mark the foreground pixel as background as you can see below image.
For more techniques related to Foreground Detection, you can follow this article.
In this article, we have seen the various and commonly used methods for image processing and pre-processing. By using all of the above techniques we can create our own customized data for image classification, object detection, etc. In tabular data, we usually use the histogram to check the distribution of the dataset here for an image. If the histogram is normally distributed we can say there are not any heavy edges and the presence of noise components.