The Reality About Object Detection As A Computer Vision Task

Computer vision, as its name suggests, is a field focused on the study and automation of visual perception tasks. Sounds logical and obvious, right?

Despite the specificity of this subarea of ​​artificial intelligence, the volume of problems derived from this approach is quite extensive. What consists of identifying elements in a photograph, is made up of countless threads, moving parts, tasks and specific stages that must work in perfect harmony so that a certain visual scene is consistent.

How is an Image described in Computer Vision?

Simple: With a list of numbers. But these numbers cannot be arbitrary. They must have a meaning, which varies depending on the aspect of the image that we want to describe.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

They must also be quantifiable and comparable. If one trait predominates in one image over another, ideally this list of numbers should reflect that fact. Likewise, for the description to be effective and, even more, useful from a practical point of view, we must be able to use these numbers to measure the similarity between two images based on their characteristic vectors.

What Can We Describe in an Image?

Infinity of things, really. Below is a non-exhaustive list of the aspects that we can describe of an image:


Download our Mobile App



  • Colour.
  • Texture.
  • Shape.
  • Content.

We do not necessarily have to limit ourselves to describing a single aspect. Depending on our objectives and the scope of our projects, the combination of two or more descriptors can lead to great results.

For example, if we wanted to create a landscape classifier, a colour descriptor along with a texture descriptor would be a good place to start, since some places share the same colour palette, the texture of their reliefs varies.

In this way, the multiple phases of the vision warrant their own study. For example, one way to understand the content of a photograph is to classify it. If we want to go further, we can, once a certain object has been categorized, locate it in the image’s own coordinate system. Subsequently, based on this knowledge, we can alter it, improve it, synthesize it or transform it into an input for a larger system, or of a mixed nature, such as those in charge of generating textual descriptions of what happens in a digital photo.

One of the most useful, albeit substantially more difficult, applications of machine vision is, therefore, object detection.

Classification vs. Detection

These terms, although used indiscriminately in the context of computer vision, contrary to the synonymous character that is conferred on them, point to two different tasks.

On the one hand, in classification what you are looking for is to label the content of an image, nothing more. Is there a cat in this photo? What animal is this? What climate does this place belong to? What breed is this dog?

For its part, detection goes one step further. It is not limited to saying what is in the photo, but where the object of interest is located.

The difference will be clearer with an example.

For example, an image with a pencil using a properly trained classifier will provide a positive prediction for the pencil class, possibly with a more than decent degree of confidence. What it won’t give us is an exact location of the instrument.

The detector will also let us know of the presence of a pencil. In addition to this information, it will give us its location, through some syntactic location mechanism, such as the coordinates of the minimum rectangle that encloses the region corresponding to the pencil.

A Brief Definition of ‘Object’

An object is any visually representable element whose shape or physical characteristics do not have a wide range of variability. Therefore, an object must be semi-rigid.

Where Is Object Detection Used?

In countless applications of computer vision. Perhaps the most common and handy object detection demo is facial recognition that is built into most contemporary smartphones.

Every time we take a photo of a person, be it a selfie or a normal portrait, our camera uses a facial detection algorithm to ensure that the faces in the image are not distorted or out of focus. Variations of this same algorithm is what supports tagging on social platforms such as Facebook and Instagram.

It is also used in autonomous vehicles to identify obstacles and elements of interest in the environment, such as pedestrians, traffic signals or other vehicles.

Why is it a Difficult Task to Handle?

Object detection is a superset of object classification. In simpler terms, it is a more specific task, which includes a classification component. For this reason, it suffers from the same difficulties as the latter, including:

  • Occlusion.
  • Blur.
  • Colour variations
  • Lighting variations
  • Angle variations

One of the most characteristic problems are Intraclass variations, i.e. the great variability in the characteristics different dog breeds.

These are the most important points to keep in mind about computer vision, and specially object detection, which, as with most of the subfields within computer vision, it is a constantly evolving area, where the boundaries of the possible are expanding daily, an imperative reason to keep us constantly informed about developments in the sector.

Get to know why you should learn computer vision now https://analyticsindiamag.com/why-you-should-learn-computer-vision-now/

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Dr. Raul V. Rodriguez
Dean at Woxsen School of Business. He is a registered expert in Artificial intelligence, Intelligent Systems, Multi-agent Systems at the European Commission, and has been nominated for the Forbes 30 Under 30 Europe 2020 list.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.