Background subtraction is a widely used approach to detect moving objects in a sequence of frames from static cameras. The base in this approach is that of detecting moving objects from the difference between the current frame and reference frame, which is often called ‘Background Image’ or ‘Background Model’. This background subtraction is typically done by detecting the foreground objects in a video frame and foreground detection is the main task of this whole approach.
Many applications do not need to know the whole contents of the sequence, moreover, further analysis is focused on some part of the sequence because interest lies in the particular objects of images in its foreground. After completing all the preprocessing steps such as deionizing, morphological processing, object localisation is carried out and there this foreground detection is used.
All the present detection techniques are based on modelling the background of the image i.e. set the background and detect the changes that occur. Defining the proper background can be very difficult when it contains shapes, shadows and moving objects. While defining the background, it is assumed by all the techniques that the stationary objects could vary in color and intensity over time.
Below is an example of the foreground removal of video sequences.
A good foreground system should be able to develop a robust foreground model which should be immune to lighting changes, repetitive movements such as leaves, waves, shadows and long term changes.
What are the methods involved?
Background subtraction is generally based on a static background hypothesis which is not really applicable in real-time situations. With indoor senses, reflections or animated images on-screen lead to background changes. To deal with such issues the following methods are used based on applications.
Temporal average filter:
This filter estimates the background model from the median of all pixels from previous sequence images. This uses a buffer with pixel values of the last two frames to update the median for each image. To model the background the system examines all the sequences in a given time period called the training time period during which the median is calculated pixel by pixel of all the plots.
After training time each new frame and each new pixel value is compared with the input value as previously calculated and if the input pixel of the frame under observation is within the threshold limit then it is mapped as background pixel; else it is mapped as a foreground pixel.
This method is not as efficient as it has to be because it operates based on a buffer system which requires high computational cost and does not represent any statistical base.
Any robust background subtraction model should be able to handle light intensity changes, repeated motion from long term scene changes. The analysis of such an approach mathematically can be modelled using a function P(x,y,t) as a video sequence where t is the time dimensions x and y are the pixel locations. Example P(4,5,2) is the pixel intensity at 4,5-pixel location of the image at t=2 in the video sequence.
From the below example it will be more clear;
- Frame Difference:
Mathematically it can be modelled as;
| Framei – Framei-1 | > Threshold
The estimated background using the Frame Difference approach is just the previous frame estimated by the above empirical way. This approach can be used when segment motion-based objects such as cars, pedestrians etc.
It evidently works only in a particular condition such as an object’s speed and frame rate.
And it is very sensitive to threshold values. So depending on object structure, speed, frame rate and global threshold limit this approach has limited use cases. Below are some cases of this approach based on threshold values.
- The Mixture of Gaussians:
The Mixture of Gaussians or MoG is a mixture of k Gaussians distribution models for each background pixel with values for k within 3 and 5. The inventor assumes that different distributions each represent the different background and foreground colors. The weight of each one of those used distributions on the model is proportional to the amount of time each color stays on that pixel. Therefore when the weight of pixel distribution is low then that pixel is classified as a foreground pixel.
Apart from these conventional approaches a hands-on implementation of background subtraction method based on MoG2 and KNN is has been implemented by OpenCV API called cv2.createBackgrunduSubtractorMOG2() and cv2.createBackgrunduSubtractorKNN() which will generate the foreground mask.
The result of this method along with input is shown below;
Input test video;
Result by MOG2;
Result by KNN;
In the above outputs, the grey region is shadow detected by the algorithms.
In this article, we have understood what background subtraction is called and how the objects are being detected by different detection algorithms that we have discussed clearly. From which the conventional methods such as Frame differencing and Gaussians mixture are used based on the need of applications.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.