Now Reading
Guide To Simple Object Detection Using InceptionResnet_v2

Guide To Simple Object Detection Using InceptionResnet_v2

The object detection technique is considered one of the most challenging tasks in computer vision, a subset of artificial intelligence, as it involves object classification and localising the object within the image or video. Object detection is a computer vision technique that detects objects such as animals, persons, cars, buildings, etc. It has been applied widely over video surveillance, self-driving cars and object tracking.

When performing standard image classification, we present that image to the neural network for a given input image and obtain a single class or label of the most dominating object in the image with a probability score associated with it. Whereas object detection built on image classification tries to localise the object with the help of a bounding box and the probability/confidence score associated with each class.

The following are the few common architectures used for objection detection;

Today in this article, we are going to perform object detection using a transfer learning method. We will be using InceptionResnet_v2 as our pre-trained model for this task.

Brief Introduction of the InceptionResnet_v2 architecture: 

Deep convolutional networks have been at the central point when it comes to image-based tasks. The version of the inception network has shown that it can achieve very high accuracy at a relatively low computational time. The K. He, the author, introduces residual connections to deep learning, demonstrating how the residual connection has inherent importance in training deep networks. As the inception network is very deep, it is natural to replace the filter concatenation stage of the inception with the residual network. The author believes this would allow inception to reap all the benefits of the residual approach while retaining its computational efficiency.  

To get more understanding of the architectures, you can check these papers, 1,2.

Now it’s time to implement object detection with InceptionResnet by leveraging python.

Code Implementation of InceptionResnet_v2 Model

The following code implementation is in reference to the official implementation.

Importing all dependencies:          
 import tensorflow as tf
 import tensorflow_hub as hub
 # for downloading and displaying image
 import matplotlib.pyplot as plt
 import tempfile
 from six.moves.urllib.request import urlopen
 from six import BytesIO
 # for dataframe
 import pandas as pd
 # for drawing onto the image
 import numpy as np
 from PIL import Image,ImageColor,ImageDraw,ImageFont,ImageOps
 import time 
Helper functions to download, visualise and drawing on image:
 def disp_ima(image):
   fig = plt.figure(figsize=(18, 13))
 def get_and_reshape_img(url, width=250, height=250, display=False):
   ruff, name = tempfile.mkstemp(suffix=".jpg")
   response = urlopen(url)
   image_data =
   image_data = BytesIO(image_data)
   pil_ima =
   pil_ima =, (width, height), Image.ANTIALIAS)
   pil_ima_rgb = pil_ima.convert("RGB"), format="JPEG", quality=90)
   print("Image downloaded to %s." % name)
   if display:
   return name 
 def boxes_on_image(image,ymin,xmin,ymax,xmax,color,font,thickness=4,display_str_list=()):
   draw = ImageDraw.Draw(image)
   width, height = image.size
   (left, right, top, bottom) = (xmin * width, xmax * width,
                                 ymin * height, ymax * height)
   draw.line([(left, top), (left, bottom), (right, bottom), (right, top),
              (left, top)],
   display_heights = [font.getsize(ds)[1] for ds in display_str_list]
   # Each display_str has a top and bottom margin of 0.05x.
   total_height = (1 + 2 * 0.05) * sum(display_heights)
   if top > total_height:
     text_bottom = top
     text_bottom = top + total_height
   # Reverse list and print from bottom to top.
   for display_str in display_str_list[::-1]:
     text_width, text_height = font.getsize(display_str)
     margin = np.ceil(0.05 * text_height)
     draw.rectangle([(left, text_bottom - text_height - 2 * margin),
                     (left + text_width, text_bottom)],
     draw.text((left + margin, text_bottom - text_height - margin),
     text_bottom -= text_height - 2 * margin 
 def drawing_boxes(image, boxes, class_names, scores, max_boxes=10, min_score=0.1):
   colors = list(ImageColor.colormap.values())
     font = ImageFont.truetype("/usr/share/fonts/truetype/liberation/LiberationSansNarrow-Regular.ttf",25)
   except IOError:
     print("Font not found, using default font.")
     font = ImageFont.load_default()
   for i in range(min(boxes.shape[0], max_boxes)):
     if scores[i] >= min_score:
       ymin, xmin, ymax, xmax = tuple(boxes[i])
       display_str = "{}: {}%".format(class_names[i].decode("ascii"),
                                      int(100 * scores[i]))
       color = colors[hash(class_names[i]) % len(colors)]
       image_pil = Image.fromarray(np.uint8(image)).convert("RGB")
       np.copyto(image, np.array(image_pil))
   return image 
Download and showcase the image from URL:
 img_url= ""
 web_img = get_and_reshape_img(img_url, 1250, 850, True) 
Inferencing the architecture: 

Load the model from tensorflow hub

 module= "" 
 model = hub.load(module).signatures['default'] 

User defined function for loading the image and running the model

See Also

 def load_image(path):
   imgage_path =
   imgage_path = tf.image.decode_jpeg(imgage_path, channels=3)
   return imgage_path 
 def run_model(model, path):
   img = load_image(path)
   converted_img  = tf.image.convert_image_dtype(img, tf.float32)[tf.newaxis, ...]
   start = time.time()
   result = model(converted_img)
   end = time.time()
   result = {key:value.numpy() for key,value in result.items()}
   print("Found %d objects." % len(result["detection_scores"]))
   print("Inference time: ", end-start)
   image_with_boxes = drawing_boxes(
       img.numpy(), result["detection_boxes"],
       result["detection_class_entities"], result["detection_scores"])

run_model(model, web_img)

With the help of pandas, we can check the scores of each object identified; here, we will check the top 10 objects.

 image = load_image(web_img)
 converted_image  = tf.image.convert_image_dtype(image, tf.float32)[tf.newaxis, ...]
 result = model(converted_image) 

One more example:


Today we have seen how the concatenation of residual networks with the inception architect enhances the overall performance of the architecture and gives accurate bounding boxes for each entity in given images. Furthermore, with this minimal code, we can easily deploy the system to the web and android platforms. 


What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top