Active Hackathon

Guide To Simple Object Detection Using InceptionResnet_v2

When performing standard image classification, we present that image to the neural network for a given input image...

The object detection technique is considered one of the most challenging tasks in computer vision, a subset of artificial intelligence, as it involves object classification and localising the object within the image or video. Object detection is a computer vision technique that detects objects such as animals, persons, cars, buildings, etc. It has been applied widely over video surveillance, self-driving cars and object tracking.

When performing standard image classification, we present that image to the neural network for a given input image and obtain a single class or label of the most dominating object in the image with a probability score associated with it. Whereas object detection built on image classification tries to localise the object with the help of a bounding box and the probability/confidence score associated with each class.


Sign up for your weekly dose of what's up in emerging technology.

The following are the few common architectures used for objection detection;

Today in this article, we are going to perform object detection using a transfer learning method. We will be using InceptionResnet_v2 as our pre-trained model for this task.

Brief Introduction of the InceptionResnet_v2 architecture: 

Deep convolutional networks have been at the central point when it comes to image-based tasks. The version of the inception network has shown that it can achieve very high accuracy at a relatively low computational time. The K. He, the author, introduces residual connections to deep learning, demonstrating how the residual connection has inherent importance in training deep networks. As the inception network is very deep, it is natural to replace the filter concatenation stage of the inception with the residual network. The author believes this would allow inception to reap all the benefits of the residual approach while retaining its computational efficiency.  

To get more understanding of the architectures, you can check these papers, 1,2.

Now it’s time to implement object detection with InceptionResnet by leveraging python.

Code Implementation of InceptionResnet_v2 Model

The following code implementation is in reference to the official implementation.

Importing all dependencies:          
 import tensorflow as tf
 import tensorflow_hub as hub
 # for downloading and displaying image
 import matplotlib.pyplot as plt
 import tempfile
 from six.moves.urllib.request import urlopen
 from six import BytesIO
 # for dataframe
 import pandas as pd
 # for drawing onto the image
 import numpy as np
 from PIL import Image,ImageColor,ImageDraw,ImageFont,ImageOps
 import time 
Helper functions to download, visualise and drawing on image:
 def disp_ima(image):
   fig = plt.figure(figsize=(18, 13))
 def get_and_reshape_img(url, width=250, height=250, display=False):
   ruff, name = tempfile.mkstemp(suffix=".jpg")
   response = urlopen(url)
   image_data =
   image_data = BytesIO(image_data)
   pil_ima =
   pil_ima =, (width, height), Image.ANTIALIAS)
   pil_ima_rgb = pil_ima.convert("RGB"), format="JPEG", quality=90)
   print("Image downloaded to %s." % name)
   if display:
   return name 
 def boxes_on_image(image,ymin,xmin,ymax,xmax,color,font,thickness=4,display_str_list=()):
   draw = ImageDraw.Draw(image)
   width, height = image.size
   (left, right, top, bottom) = (xmin * width, xmax * width,
                                 ymin * height, ymax * height)
   draw.line([(left, top), (left, bottom), (right, bottom), (right, top),
              (left, top)],
   display_heights = [font.getsize(ds)[1] for ds in display_str_list]
   # Each display_str has a top and bottom margin of 0.05x.
   total_height = (1 + 2 * 0.05) * sum(display_heights)
   if top > total_height:
     text_bottom = top
     text_bottom = top + total_height
   # Reverse list and print from bottom to top.
   for display_str in display_str_list[::-1]:
     text_width, text_height = font.getsize(display_str)
     margin = np.ceil(0.05 * text_height)
     draw.rectangle([(left, text_bottom - text_height - 2 * margin),
                     (left + text_width, text_bottom)],
     draw.text((left + margin, text_bottom - text_height - margin),
     text_bottom -= text_height - 2 * margin 
 def drawing_boxes(image, boxes, class_names, scores, max_boxes=10, min_score=0.1):
   colors = list(ImageColor.colormap.values())
     font = ImageFont.truetype("/usr/share/fonts/truetype/liberation/LiberationSansNarrow-Regular.ttf",25)
   except IOError:
     print("Font not found, using default font.")
     font = ImageFont.load_default()
   for i in range(min(boxes.shape[0], max_boxes)):
     if scores[i] >= min_score:
       ymin, xmin, ymax, xmax = tuple(boxes[i])
       display_str = "{}: {}%".format(class_names[i].decode("ascii"),
                                      int(100 * scores[i]))
       color = colors[hash(class_names[i]) % len(colors)]
       image_pil = Image.fromarray(np.uint8(image)).convert("RGB")
       np.copyto(image, np.array(image_pil))
   return image 
Download and showcase the image from URL:
 img_url= ""
 web_img = get_and_reshape_img(img_url, 1250, 850, True) 
Inferencing the architecture: 

Load the model from tensorflow hub

 module= "" 
 model = hub.load(module).signatures['default'] 

User defined function for loading the image and running the model

 def load_image(path):
   imgage_path =
   imgage_path = tf.image.decode_jpeg(imgage_path, channels=3)
   return imgage_path 
 def run_model(model, path):
   img = load_image(path)
   converted_img  = tf.image.convert_image_dtype(img, tf.float32)[tf.newaxis, ...]
   start = time.time()
   result = model(converted_img)
   end = time.time()
   result = {key:value.numpy() for key,value in result.items()}
   print("Found %d objects." % len(result["detection_scores"]))
   print("Inference time: ", end-start)
   image_with_boxes = drawing_boxes(
       img.numpy(), result["detection_boxes"],
       result["detection_class_entities"], result["detection_scores"])

run_model(model, web_img)

With the help of pandas, we can check the scores of each object identified; here, we will check the top 10 objects.

 image = load_image(web_img)
 converted_image  = tf.image.convert_image_dtype(image, tf.float32)[tf.newaxis, ...]
 result = model(converted_image) 

One more example:


Today we have seen how the concatenation of residual networks with the inception architect enhances the overall performance of the architecture and gives accurate bounding boxes for each entity in given images. Furthermore, with this minimal code, we can easily deploy the system to the web and android platforms. 


More Great AIM Stories

Vijaysinh Lendave
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: Enabling a Data-Driven culture within BFSI GCCs in India

Data is the key element across all the three tenets of engineering brilliance, customer-centricity and talent strategy and engagement and will continue to help us deliver on our transformation agenda. Our data-driven culture fosters continuous performance improvement to create differentiated experiences and enable growth.

Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter