Image annotation techniques with implementation in OpenCV

Image annotation is labelling of objects in an image
Listen to this story

Image annotation is important in computer vision, which is the technique that allows computers to obtain high-level comprehension from digital images or videos and to observe and interpret visual information in the same way that humans do. Annotation, often known as picture labelling or tagging, is a crucial stage in the development of most computer vision models. This article will be focusing on creating these annotations with the help of OpenCV. Following are the topics to be covered.

Table of contents

  1. The Image annotation
  2. Need for image annotation
  3. Types of image annotation
  4. Implementing Image annotation with OpenCV


Sign up for your weekly dose of what's up in emerging technology.

The better the machine learning models perform, the greater the quality of your annotations. Let’s understand image annotations.

The Image annotation

The process of labelling, tagging or specifying images in a particular dataset to train machine learning models is known as an image annotation. When the manual annotation is finished, the tagged pictures are processed by a machine learning or deep learning model to repeat the annotations without the need for human intervention.

As a result, picture annotation is utilised to indicate the aspects that your system needs to recognise. Supervised Learning is the process of training an ML model given labelled data.

Image annotation establishes the criteria that the model attempts to duplicate, thus any errors in the labelling are also repeated. As a result, correct picture annotation creates the groundwork for training neural networks, making annotation one of the most critical jobs in computer vision. 

Image annotations can be done manually or with the help of an automatic annotation tool. Auto annotation technologies are often pre-trained algorithms that can accurately label photos. Their annotations are required for complex annotation jobs such as constructing segment masks, which take time to generate.

Are you looking for a complete repository of Python libraries used in data science, check out here.

Need for image annotation

Labelling pictures is required for functional datasets because it informs the training model about the relevant aspects of the image (classes), which it can then use to identify those classes in fresh, never-before-seen images.

Image annotation generates training data from which supervised AI models may learn. The manner in which annotating images predicts how the AI will behave after viewing and learning from them. As a result, poor annotation is frequently described in training, resulting in models that make bad predictions.

Annotated data is very important when tackling a unique challenge and using AI in a new domain. For typical tasks like image classification and segmentation, pre-trained models are frequently available, and they may be customised to specific use cases using Transfer Learning with minimum input.

Training a comprehensive model from scratch, on the other hand, frequently necessitates a massive quantity of annotated data divided into train, validation, and test sets, which is difficult and time-consuming to generate. Unsupervised algorithms, do not need annotated data and may be trained directly on raw data.

Types of image annotation

There are three prevalent methods of image annotation, and the one you choose for your use case will be determined by the project’s complexity. The more high-quality picture data used for each kind, the more accurately the AI will forecast.


Classification is the simplest and quickest approach for image annotation since it simply assigns one tag to a picture. For example, you could wish to go through and categorise a collection of photographs of grocery store shelves to determine which ones contain soda and which do not. 

This approach is ideal for capturing abstract information, such as the example above, or the time of day, if there are automobiles in the image, or for filtering out photographs that do not satisfy the criteria from the start. While categorization is the quickest at providing a single, high-level label, it is also the most ambiguous of the three categories we emphasise since it does not identify where the item is inside the image.

Object Detection

Annotators are given particular things to label in a picture using object detection. So, if a picture is labelled as having ice cream in it, this goes a step further by indicating where the ice cream is inside the image, or if particularly searching for where the cocoa ice cream is. Object detection may be accomplished using a variety of approaches, including:

  • Bounding Boxes: Annotators use rectangles and squares to define the position of target objects in 2D. This is one of the most often used picture annotation approaches. Cuboids, also known as 3D Bounding Boxes, are used by annotators to specify the location and depth of a target object.
  • Polygonal Segmentation: Annotators employ complicated polygons to specify the position of target items that are asymmetrical and do not simply fit inside a box.
  • Lines: Annotators detect essential boundary lines and curves in a picture to distinguish sections using lines and splines. Annotators may, for example, name the numerous lanes of a highway for a self-driving car picture annotation project.

This approach is still not the most exact since object detection allows for overlap in the use of boxes or lines. What it does offer is the general position of the item while remaining a pretty quick annotating procedure.

Semantic Segmentation

Semantic segmentation overcomes the overlap problem in object recognition by assuring that each component of an image belongs to just one class. This approach, which is usually done at the pixel level, needs annotators to assign categories (such as a pedestrian, automobile, or sign) to each pixel. This aids in teaching an AI model how to detect and categorise certain items, even when they are obscured. For example, if a shopping cart is obscuring a portion of the image, semantic segmentation may be used to define what coco ice cream looks like down to the pixel level, allowing the model to know that it is still, in fact, coco ice cream.

Implementing Image annotation with OpenCV

In this article, we will be using Bounding boxes and the colour segmentation method for the image annotation. 

In bounding boxes, methods will be manually drawing different bounding shapes around the object and adding some text to it.

In colour segmentation, we will be using the KNN algorithm to segment the colours of the objects in the query image. The colours would be segmented based on the value of ‘K’ which is the number of the nearest neighbours and that segmented portion on images can be treated as an annotated part. 

Bounding Boxes method

Import necessary libraries

import cv2 
import numpy as np
import matplotlib.pyplot as plt

Read the query image

Query image

As in this article, we are using a coloured image so we need to use the ‘cv2.IMREAD_COLOR’. As it instructs to load a colour picture. Any picture transparency will be ignored. It is the default setting. We may also pass the integer value 1 for this flag.

Draw a line on the object

cv2.line(image_line, (900,150), (1100,150), (0,255,255), thickness=5,lineType=cv2.LINE_AA)

The cv2.line takes input coordinates of the start and end point of the line with the thickness, transparency and colour of the line.

Analytics India Magazine

Draw a circle around the object

image_circle=input_img.copy(), (1030,340),200, (0,255,255), thickness=5,lineType=cv2.LINE_AA)

The ‘’ takes the radius and the coordinates for the circle as an input. Rest is the same as the line function discussed earlier.

Analytics India Magazine

Draw a rectangle around the object

cv2.rectangle(image_rect, (900,150),(1100,530), (0,0,255), thickness=5,lineType=cv2.LINE_AA)

It takes the top left side corner coordinates and the bottom right corner coordinates for drawing the rectangle.

Analytics India Magazine

KNN method for segmentation

Import necessary libraries

import cv2 
import numpy as np
import matplotlib.pyplot as plt

Reading and preprocessing 

img = cv2.cvtColor(input_img,cv2.COLOR_BGR2RGB)
image_reshape = img.reshape((-1,3))
image_2d = np.float32(image_reshape)

Change the order of the colours since in OpenCV the colour of an image is read as Blue, Green and Red (BGR). The requirements are Red, Green and Blue(RGB).

Applying the KNN

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 1.0)
K = 4
center = np.uint8(center)
res = center[label.flatten()]
result_image = res.reshape((img.shape))

Since the image is a high-resolution image so there are a lot of data points to go through, it would take time if the number of iterations is high. We have put a limit on the number of iterations to 100 and the epsilon value is set at the highest. The k nearest neighbour is set as 4 with the number of attempts to be 10.

Analytics India Magazine

The algorithm has segmented the colours quite well. The blues, whites, greys, and browns could be seen separated. One could mask the image and further tune the algorithm.


One of the most time-consuming aspects of dealing with data is data gathering and annotation. Nonetheless, it serves as the foundation for training algorithms and must be executed with the greatest precision feasible. Proper annotation frequently saves a significant amount of time later in the pipeline when the model is being created. With this article, we have understood different types of annotations and their implementations.


More Great AIM Stories

Sourabh Mehta
Sourabh has worked as a full-time data scientist for an ISP organisation, experienced in analysing patterns and their implementation in product development. He has a keen interest in developing solutions for real-time problems with the help of data both in this universe and metaverse.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.

Now Reliance wants to conquer the AI space

Many believe that Reliance is aggressively scouting for AI and NLP companies in the digital space in a bid to create an Indian equivalent of FAANG – Facebook, Apple, Amazon, Netflix, and Google.