Object detection is a computer vision technique to find and classify instances in images or videos. Despite significant progress in computer vision, object detection is still a complex process and comes with its own set of challenges.
Object detection applications include traffic management, sports training, and video surveillance systems. It also forms the foundation of many other downstream computer vision tasks, such as image segmentation, image captions, object tracking, and more.
Here are some of the major challenges facing object detection today:
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
- Object localisation
The dual priorities —classifying an object and determining its position (this is referred to as the object localisation task)—are major challenges in object detection. To resolve this issue, researchers often use a multi-task loss function to create repercussions for both misclassifications and errors in localisation.
- Viewpoint variation
Objects viewed from different angles can look entirely different. For example, the top view of a cup looks completely different from a side view. Since most models are trained and tested in ideal scenarios, it’s an uphill task for detectors to recognise objects from different viewpoints.
- Multiple aspect ratios and spatial sizes
The objects vary in terms of aspect ratio and sizes. Therefore, the detection algorithms should be able to identify different objects at different views and scales, which can be difficult to achieve.
- Deformation
Objects of interest may be flexible and “deformed” in many ways. For example, an object detector trained to recognise a person sitting, standing, or walking, may find it difficult to detect the same person in contorted positions.
- Occlusion
An object that is only partly visible can also be difficult to detect. For example, in a picture of a person holding a cup or a phone in their hands—it will be difficult for the detector to recognise the cup and the phone since a large part of the object of interest will be masked by the person’s hands.
- Lighting
How an object is illuminated can play a significant role at the pixel level. The same object can exhibit different colours under different types of lighting—and the less illuminated it is, the less visible the objects will be. This can influence the detector’s effectiveness.
- Cluttered or textured background
If the background of an image is cluttered or textured, there’s a risk of the objects of interest blending into the background. For example, if a cat is sitting on a rug that resembles its fur—this may successfully camouflage it and keep the detector from locating it. Similarly, a cluttered image with many items will make it difficult for the detector to recognise individual items of interest.
- Intra-class variation
Objects within the same class could have completely different shapes and sizes. For example, different kinds of furniture and houses can look completely different. Ideally, a good detector should be able to identify these objects of interest as belonging to the same class despite their variations—while remaining sensitive to inter-class variations.
- Real-time detection speed
Object detection in videos can also be difficult because of the fast speed required of object detection algorithms to accurately classify and localise important objects in motion to meet real-time video processing.
- Limited data
Another significant problem facing object detection is the limited amount of annotated data. Detection datasets remain substantially smaller in scale and vocabulary than image classification datasets despite many data collection efforts.