Listen to this story
|
Image annotation is important in computer vision, which is the technique that allows computers to obtain high-level comprehension from digital images or videos and to observe and interpret visual information in the same way that humans do. Annotation, often known as picture labelling or tagging, is a crucial stage in the development of most computer vision models. This article will be focusing on creating these annotations with the help of OpenCV. Following are the topics to be covered.
Table of contents
- The Image annotation
- Need for image annotation
- Types of image annotation
- Implementing Image annotation with OpenCV
THE BELAMY
Sign up for your weekly dose of what's up in emerging technology.
The better the machine learning models perform, the greater the quality of your annotations. Let’s understand image annotations.
The Image annotation
The process of labelling, tagging or specifying images in a particular dataset to train machine learning models is known as an image annotation. When the manual annotation is finished, the tagged pictures are processed by a machine learning or deep learning model to repeat the annotations without the need for human intervention.
As a result, picture annotation is utilised to indicate the aspects that your system needs to recognise. Supervised Learning is the process of training an ML model given labelled data.
Image annotation establishes the criteria that the model attempts to duplicate, thus any errors in the labelling are also repeated. As a result, correct picture annotation creates the groundwork for training neural networks, making annotation one of the most critical jobs in computer vision.
Image annotations can be done manually or with the help of an automatic annotation tool. Auto annotation technologies are often pre-trained algorithms that can accurately label photos. Their annotations are required for complex annotation jobs such as constructing segment masks, which take time to generate.
Are you looking for a complete repository of Python libraries used in data science, check out here.
Need for image annotation
Labelling pictures is required for functional datasets because it informs the training model about the relevant aspects of the image (classes), which it can then use to identify those classes in fresh, never-before-seen images.
Image annotation generates training data from which supervised AI models may learn. The manner in which annotating images predicts how the AI will behave after viewing and learning from them. As a result, poor annotation is frequently described in training, resulting in models that make bad predictions.
Annotated data is very important when tackling a unique challenge and using AI in a new domain. For typical tasks like image classification and segmentation, pre-trained models are frequently available, and they may be customised to specific use cases using Transfer Learning with minimum input.
Training a comprehensive model from scratch, on the other hand, frequently necessitates a massive quantity of annotated data divided into train, validation, and test sets, which is difficult and time-consuming to generate. Unsupervised algorithms, do not need annotated data and may be trained directly on raw data.
Types of image annotation
There are three prevalent methods of image annotation, and the one you choose for your use case will be determined by the project’s complexity. The more high-quality picture data used for each kind, the more accurately the AI will forecast.
Classification
Classification is the simplest and quickest approach for image annotation since it simply assigns one tag to a picture. For example, you could wish to go through and categorise a collection of photographs of grocery store shelves to determine which ones contain soda and which do not.
This approach is ideal for capturing abstract information, such as the example above, or the time of day, if there are automobiles in the image, or for filtering out photographs that do not satisfy the criteria from the start. While categorization is the quickest at providing a single, high-level label, it is also the most ambiguous of the three categories we emphasise since it does not identify where the item is inside the image.
Object Detection
Annotators are given particular things to label in a picture using object detection. So, if a picture is labelled as having ice cream in it, this goes a step further by indicating where the ice cream is inside the image, or if particularly searching for where the cocoa ice cream is. Object detection may be accomplished using a variety of approaches, including:
- Bounding Boxes: Annotators use rectangles and squares to define the position of target objects in 2D. This is one of the most often used picture annotation approaches. Cuboids, also known as 3D Bounding Boxes, are used by annotators to specify the location and depth of a target object.
- Polygonal Segmentation: Annotators employ complicated polygons to specify the position of target items that are asymmetrical and do not simply fit inside a box.
- Lines: Annotators detect essential boundary lines and curves in a picture to distinguish sections using lines and splines. Annotators may, for example, name the numerous lanes of a highway for a self-driving car picture annotation project.
This approach is still not the most exact since object detection allows for overlap in the use of boxes or lines. What it does offer is the general position of the item while remaining a pretty quick annotating procedure.
Semantic Segmentation
Semantic segmentation overcomes the overlap problem in object recognition by assuring that each component of an image belongs to just one class. This approach, which is usually done at the pixel level, needs annotators to assign categories (such as a pedestrian, automobile, or sign) to each pixel. This aids in teaching an AI model how to detect and categorise certain items, even when they are obscured. For example, if a shopping cart is obscuring a portion of the image, semantic segmentation may be used to define what coco ice cream looks like down to the pixel level, allowing the model to know that it is still, in fact, coco ice cream.
Implementing Image annotation with OpenCV
In this article, we will be using Bounding boxes and the colour segmentation method for the image annotation.
In bounding boxes, methods will be manually drawing different bounding shapes around the object and adding some text to it.
In colour segmentation, we will be using the KNN algorithm to segment the colours of the objects in the query image. The colours would be segmented based on the value of ‘K’ which is the number of the nearest neighbours and that segmented portion on images can be treated as an annotated part.
Bounding Boxes method
Import necessary libraries
import cv2 import numpy as np import matplotlib.pyplot as plt
Read the query image
input_img=cv2.imread('annotation_image.jpg',cv2.IMREAD_COLOR)

As in this article, we are using a coloured image so we need to use the ‘cv2.IMREAD_COLOR’. As it instructs to load a colour picture. Any picture transparency will be ignored. It is the default setting. We may also pass the integer value 1 for this flag.
Draw a line on the object
image_line=input_img.copy() cv2.line(image_line, (900,150), (1100,150), (0,255,255), thickness=5,lineType=cv2.LINE_AA) plt.figure(figsize=(10,10)) plt.imshow(image_line[:,:,::-1]) plt.show()
The cv2.line takes input coordinates of the start and end point of the line with the thickness, transparency and colour of the line.

Draw a circle around the object
image_circle=input_img.copy() cv2.circle(image_circle, (1030,340),200, (0,255,255), thickness=5,lineType=cv2.LINE_AA) plt.figure(figsize=(10,10)) plt.imshow(image_circle[:,:,::-1]) plt.show()
The ‘cv2.circle’ takes the radius and the coordinates for the circle as an input. Rest is the same as the line function discussed earlier.

Draw a rectangle around the object
image_rect=input_img.copy() cv2.rectangle(image_rect, (900,150),(1100,530), (0,0,255), thickness=5,lineType=cv2.LINE_AA) plt.figure(figsize=(10,10)) plt.imshow(image_rect[:,:,::-1]) plt.show()
It takes the top left side corner coordinates and the bottom right corner coordinates for drawing the rectangle.

KNN method for segmentation
Import necessary libraries
import cv2 import numpy as np import matplotlib.pyplot as plt
Reading and preprocessing
img = cv2.cvtColor(input_img,cv2.COLOR_BGR2RGB) image_reshape = img.reshape((-1,3)) image_2d = np.float32(image_reshape)
Change the order of the colours since in OpenCV the colour of an image is read as Blue, Green and Red (BGR). The requirements are Red, Green and Blue(RGB).
Applying the KNN
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 1.0) K = 4 attempts=10 ret,label,center=cv2.kmeans(twoDimage,K,None,criteria,attempts,cv2.KMEANS_PP_CENTERS) center = np.uint8(center) res = center[label.flatten()] result_image = res.reshape((img.shape))
Since the image is a high-resolution image so there are a lot of data points to go through, it would take time if the number of iterations is high. We have put a limit on the number of iterations to 100 and the epsilon value is set at the highest. The k nearest neighbour is set as 4 with the number of attempts to be 10.
plt.figure(figsize=(10,10)) plt.imshow(result_image[:,:,::-1]) plt.show()

The algorithm has segmented the colours quite well. The blues, whites, greys, and browns could be seen separated. One could mask the image and further tune the algorithm.
Conclusions
One of the most time-consuming aspects of dealing with data is data gathering and annotation. Nonetheless, it serves as the foundation for training algorithms and must be executed with the greatest precision feasible. Proper annotation frequently saves a significant amount of time later in the pipeline when the model is being created. With this article, we have understood different types of annotations and their implementations.