Last updated September 9, 2020
In AI Origins & Evolution

The Reality About Object Detection As A Computer Vision Task

Share

Published on May 13, 2020

by Dr. Raul V. Rodriguez

Computer vision, as its name suggests, is a field focused on the study and automation of visual perception tasks. Sounds logical and obvious, right?

Despite the specificity of this subarea of artificial intelligence, the volume of problems derived from this approach is quite extensive. What consists of identifying elements in a photograph, is made up of countless threads, moving parts, tasks and specific stages that must work in perfect harmony so that a certain visual scene is consistent.

How is an Image described in Computer Vision?

Simple: With a list of numbers. But these numbers cannot be arbitrary. They must have a meaning, which varies depending on the aspect of the image that we want to describe.

They must also be quantifiable and comparable. If one trait predominates in one image over another, ideally this list of numbers should reflect that fact. Likewise, for the description to be effective and, even more, useful from a practical point of view, we must be able to use these numbers to measure the similarity between two images based on their characteristic vectors.

What Can We Describe in an Image?

Infinity of things, really. Below is a non-exhaustive list of the aspects that we can describe of an image:

Colour.
Texture.
Shape.
Content.

We do not necessarily have to limit ourselves to describing a single aspect. Depending on our objectives and the scope of our projects, the combination of two or more descriptors can lead to great results.

For example, if we wanted to create a landscape classifier, a colour descriptor along with a texture descriptor would be a good place to start, since some places share the same colour palette, the texture of their reliefs varies.

In this way, the multiple phases of the vision warrant their own study. For example, one way to understand the content of a photograph is to classify it. If we want to go further, we can, once a certain object has been categorized, locate it in the image’s own coordinate system. Subsequently, based on this knowledge, we can alter it, improve it, synthesize it or transform it into an input for a larger system, or of a mixed nature, such as those in charge of generating textual descriptions of what happens in a digital photo.

One of the most useful, albeit substantially more difficult, applications of machine vision is, therefore, object detection.

Classification vs. Detection

These terms, although used indiscriminately in the context of computer vision, contrary to the synonymous character that is conferred on them, point to two different tasks.

On the one hand, in classification what you are looking for is to label the content of an image, nothing more. Is there a cat in this photo? What animal is this? What climate does this place belong to? What breed is this dog?

For its part, detection goes one step further. It is not limited to saying what is in the photo, but where the object of interest is located.

The difference will be clearer with an example.

For example, an image with a pencil using a properly trained classifier will provide a positive prediction for the pencil class, possibly with a more than decent degree of confidence. What it won’t give us is an exact location of the instrument.

The detector will also let us know of the presence of a pencil. In addition to this information, it will give us its location, through some syntactic location mechanism, such as the coordinates of the minimum rectangle that encloses the region corresponding to the pencil.

A Brief Definition of ‘Object’

An object is any visually representable element whose shape or physical characteristics do not have a wide range of variability. Therefore, an object must be semi-rigid.

Where Is Object Detection Used?

In countless applications of computer vision. Perhaps the most common and handy object detection demo is facial recognition that is built into most contemporary smartphones.

Every time we take a photo of a person, be it a selfie or a normal portrait, our camera uses a facial detection algorithm to ensure that the faces in the image are not distorted or out of focus. Variations of this same algorithm is what supports tagging on social platforms such as Facebook and Instagram.

It is also used in autonomous vehicles to identify obstacles and elements of interest in the environment, such as pedestrians, traffic signals or other vehicles.

Why is it a Difficult Task to Handle?

Object detection is a superset of object classification. In simpler terms, it is a more specific task, which includes a classification component. For this reason, it suffers from the same difficulties as the latter, including:

Occlusion.
Blur.
Colour variations
Lighting variations
Angle variations

One of the most characteristic problems are Intraclass variations, i.e. the great variability in the characteristics different dog breeds.

These are the most important points to keep in mind about computer vision, and specially object detection, which, as with most of the subfields within computer vision, it is a constantly evolving area, where the boundaries of the possible are expanding daily, an imperative reason to keep us constantly informed about developments in the sector.

Get to know why you should learn computer vision now https://analyticsindiamag.com/why-you-should-learn-computer-vision-now/

Access all our open Survey & Awards Nomination forms in one place