Last updated November 20, 2021
In AI Mysteries

Getting Started With Object Detection Using TensorFlow

Share

Published on May 19, 2021

by Rajkumar Lakshmanamoorthy

Object detection is the process of classifying and locating objects in an image using a deep learning model. Object detection is a crucial task in autonomous Computer Vision applications such as Robot Navigation, Self-driving Vehicles, Sports Analytics and Virtual Reality.

Locating objects is done mostly with bounding boxes. Instance segmentation masks and key points are also used separately for locating objects or along with bounding boxes. A bounding box is a simple rectangle box that bounds an object. Representation of bounding boxes is standardized for interchangeability and reproducibility of object detection datasets and object detection models. One of the famous and widely used bounding box formats is COCO format. This format was introduced by the “Common Objects in Context” dataset, which is a huge collection of annotated images prepared exclusively for object detection. This format describes a bounding box with four parameters as [top_left_x, top_left_y, width, height].

In this article, we discuss how to perform Object Detection with a pre-trained EfficientDet model using TensorFlow. Google’s EfficientDet is one of the famous object detection models. This model is trained on the popular COCO2017 dataset. This dataset has around 160,000 images that contain 80 classes. The EfficientDet model’s training checkpoints are available open-source and can be readily implemented in any custom detection model via transfer learning.

This article assumes that the readers understand the fundamentals of deep learning, computer vision, image segmentation, and transfer learning. Nevertheless, the following articles may instantly fulfill the prerequisites:

Let’s dive deeper into hands-on learning.

Object Detection API

TensorFlow’s Object Detection API is a useful tool for pre-processing and post-processing data and object detection inferences. Its visualization module is built on top of Matplotlib and performs visualizations of images along with their coloured bounding boxes, object classes, keypoints, instance segmentation masks with fine control. Here, we use this API for post-processing the inference results and visualize the results.

Download the API from its source repository.

!git clone --depth 1 https://github.com/tensorflow/models

Output:

This clone brings many TensorFlow models at once. Install Object Detection API and its dependencies using the following commands.

 %%bash
 sudo apt install -y protobuf-compiler
 # change directory
 cd models/research/
 protoc object_detection/protos/*.proto --python_out=.
 cp object_detection/packages/tf2/setup.py .
 # install dependencies
 python -m pip install .

A portion of the output:

Create the environment

Import necessary libraries, frameworks and modules.

 import matplotlib
 import matplotlib.pyplot as plt
 import cv2
 import numpy as np
 import tensorflow as tf
 import tensorflow_hub as hub

 from object_detection.utils import label_map_util
 from object_detection.utils import visualization_utils as viz_utils
 from object_detection.utils import ops as utils_ops
 %matplotlib inline

Prepare EfficientDet Model

Load the pre-trained model with weights from the TensorFlow Hub.

 model_url = 'https://tfhub.dev/tensorflow/efficientdet/d0/1'
 efficientdet = hub.load(model_url)

The model is completely ready for deployment or inference. TensorFlow Hub has a great collection of ready-to-deploy pre-trained models. Models and their checkpoints can be loaded with a single line of code.

Prepare some Data for Inference

Download some image data to perform inference. The following data source contains open-source images, each contains multiple objects suitable for detection. Clone the source and download the data into the local (or virtual) machine.

!git clone https://github.com/RajkumarGalaxy/dataset.git

Output:

Read the images and save them in the required format in a list. EfficientDet receives input images in the shape of [1, 512, 512, 3]. It does not support batching. It receives images one by one. The shape supported by our version is 512 by 512 pixels. There are other versions of EfficientDet that support 640 by 640, 768 by 768, 1024 by 1024, and so on. Each image should be in 3 colour channels. If a grayscale image is used, it should be modified to the required shape format.

 images = []
 # Read 10 images from the downloaded dataset
 for i in range(1,11):
     url = './dataset/Images/%03d.jpg'%i
     img = cv2.imread(url)
     # cv2 reads image in BGR format
     # convert BGR into RGB
     img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
     # EfficientDet expects 512 by 512
     img = tf.image.resize(img, (512,512))
     # EfficientDet expects uint8
     img = tf.cast(img, tf.uint8)
     # EffiencientDet expects [1,512,512,3]
     img = tf.expand_dims(img, axis=0)
     images.append(img)

Check the image shape.

images[0].shape

Output:

Sample an image and visualize it.

 img = images[0].numpy().reshape(512,512,3)
 plt.figure(figsize=(8,6))
 plt.imshow(img)
 plt.axis('off')
 plt.colorbar()
 plt.show()

Output:

The pixel values range from 0 to 255. EfficientDet expects images not to be scaled or normalized. Hence, our data is ready for inference.

Visualize all our test images. There are 10 images in our test data.

 plt.figure(figsize=(8,20))
 for i in range(10):
     img = images[i].numpy().reshape(512,512,3)
     plt.subplot(5,2,i+1)
     plt.imshow(img)
     plt.axis('off')
 plt.show()

Output:

preprocessed images - object detection task

Inference – Object Detection

Perform inference with the EfficientDet Model on the pre-processed image data.

 results = []
 # infer and save results in a list
 for i in range(10):
     res = efficientdet(images[i])
     results.append(res)
 # what results do we obtain?
 results[0].keys()

Output:

Out of these results, we are interested in detection_boxes, detection_classes, and detection_scores required to visualize the detections.

During inference, classified objects are reported as integers. The following helper method obtains the original class name against the class numbers from the originally trained dataset.

 label = './models/research/object_detection/data/mscoco_label_map.pbtxt'
 category = label_map_util.create_category_index_from_labelmap(label, 
                                     use_display_name=True)

Visualize Object Detection Results

Define a helper function that displays the input images and results on top of them. The bounding boxes, locations, class name, and colour are extracted from the results and displayed as images.

 def display_detections(image, result):
     result = {key:val.numpy() for key,val in result.items()}
     viz_utils.visualize_boxes_and_labels_on_image_array(
       image,
       result['detection_boxes'][0],
       result['detection_classes'][0].astype(int),
       result['detection_scores'][0],
       category,
       use_normalized_coordinates=True,
       max_boxes_to_draw=200,
       min_score_thresh=.30,
       agnostic_mode=False,
       keypoints=None,
       keypoint_scores=None,
       keypoint_edges=None)
     plt.figure(figsize=(10,10))
     plt.imshow(image)
     plt.axis('off')
     plt.show()

Display the input images along with inferences made on them.

 for i in range(10):
     img = images[i].numpy().copy()[0]
     res = results[i]
     display_detections(img, res)

Output:

This notebook contains the above code implementation.

Wrapping Up

In this article, we have discussed object detection and its standard data formats. We have discussed an object detection model implementation with the famous EfficientDet model pre-trained on COCO 2017 dataset. We have learnt to perform object detection by loading a pre-trained model and its checkpoints, inferencing test images, post-processing the results and visualizing the detections with bounding boxes.

Interested readers can choose a different version of the EfficientDet model or a different model (such as CenterNet, Faster R-CNN, Mask R-CNN and SSD), preprocess the data according to the model’s requirements and perform inference on their own data.