Guide To Simple Object Detection Using InceptionResnet_v2

When performing standard image classification, we present that image to the neural network for a given input image...

The object detection technique is considered one of the most challenging tasks in computer vision, a subset of artificial intelligence, as it involves object classification and localising the object within the image or video. Object detection is a computer vision technique that detects objects such as animals, persons, cars, buildings, etc. It has been applied widely over video surveillance, self-driving cars and object tracking.

When performing standard image classification, we present that image to the neural network for a given input image and obtain a single class or label of the most dominating object in the image with a probability score associated with it. Whereas object detection built on image classification tries to localise the object with the help of a bounding box and the probability/confidence score associated with each class.

The following are the few common architectures used for objection detection;

Today in this article, we are going to perform object detection using a transfer learning method. We will be using InceptionResnet_v2 as our pre-trained model for this task.

Brief Introduction of the InceptionResnet_v2 architecture: 

Deep convolutional networks have been at the central point when it comes to image-based tasks. The version of the inception network has shown that it can achieve very high accuracy at a relatively low computational time. The K. He, the author, introduces residual connections to deep learning, demonstrating how the residual connection has inherent importance in training deep networks. As the inception network is very deep, it is natural to replace the filter concatenation stage of the inception with the residual network. The author believes this would allow inception to reap all the benefits of the residual approach while retaining its computational efficiency.  

To get more understanding of the architectures, you can check these papers, 1,2.

Now it’s time to implement object detection with InceptionResnet by leveraging python.

Code Implementation of InceptionResnet_v2 Model

The following code implementation is in reference to the official implementation.

Importing all dependencies:          
 import tensorflow as tf
 import tensorflow_hub as hub
 # for downloading and displaying image
 import matplotlib.pyplot as plt
 import tempfile
 from six.moves.urllib.request import urlopen
 from six import BytesIO
 # for dataframe
 import pandas as pd
 # for drawing onto the image
 import numpy as np
 from PIL import Image,ImageColor,ImageDraw,ImageFont,ImageOps
 import time 
Helper functions to download, visualise and drawing on image:
 def disp_ima(image):
   fig = plt.figure(figsize=(18, 13))
 def get_and_reshape_img(url, width=250, height=250, display=False):
   ruff, name = tempfile.mkstemp(suffix=".jpg")
   response = urlopen(url)
   image_data =
   image_data = BytesIO(image_data)
   pil_ima =
   pil_ima =, (width, height), Image.ANTIALIAS)
   pil_ima_rgb = pil_ima.convert("RGB"), format="JPEG", quality=90)
   print("Image downloaded to %s." % name)
   if display:
   return name 
 def boxes_on_image(image,ymin,xmin,ymax,xmax,color,font,thickness=4,display_str_list=()):
   draw = ImageDraw.Draw(image)
   width, height = image.size
   (left, right, top, bottom) = (xmin * width, xmax * width,
                                 ymin * height, ymax * height)
   draw.line([(left, top), (left, bottom), (right, bottom), (right, top),
              (left, top)],
   display_heights = [font.getsize(ds)[1] for ds in display_str_list]
   # Each display_str has a top and bottom margin of 0.05x.
   total_height = (1 + 2 * 0.05) * sum(display_heights)
   if top > total_height:
     text_bottom = top
     text_bottom = top + total_height
   # Reverse list and print from bottom to top.
   for display_str in display_str_list[::-1]:
     text_width, text_height = font.getsize(display_str)
     margin = np.ceil(0.05 * text_height)
     draw.rectangle([(left, text_bottom - text_height - 2 * margin),
                     (left + text_width, text_bottom)],
     draw.text((left + margin, text_bottom - text_height - margin),
     text_bottom -= text_height - 2 * margin 
 def drawing_boxes(image, boxes, class_names, scores, max_boxes=10, min_score=0.1):
   colors = list(ImageColor.colormap.values())
     font = ImageFont.truetype("/usr/share/fonts/truetype/liberation/LiberationSansNarrow-Regular.ttf",25)
   except IOError:
     print("Font not found, using default font.")
     font = ImageFont.load_default()
   for i in range(min(boxes.shape[0], max_boxes)):
     if scores[i] >= min_score:
       ymin, xmin, ymax, xmax = tuple(boxes[i])
       display_str = "{}: {}%".format(class_names[i].decode("ascii"),
                                      int(100 * scores[i]))
       color = colors[hash(class_names[i]) % len(colors)]
       image_pil = Image.fromarray(np.uint8(image)).convert("RGB")
       np.copyto(image, np.array(image_pil))
   return image 
Download and showcase the image from URL:
 img_url= ""
 web_img = get_and_reshape_img(img_url, 1250, 850, True) 
Inferencing the architecture: 

Load the model from tensorflow hub

 module= "" 
 model = hub.load(module).signatures['default'] 

User defined function for loading the image and running the model

 def load_image(path):
   imgage_path =
   imgage_path = tf.image.decode_jpeg(imgage_path, channels=3)
   return imgage_path 
 def run_model(model, path):
   img = load_image(path)
   converted_img  = tf.image.convert_image_dtype(img, tf.float32)[tf.newaxis, ...]
   start = time.time()
   result = model(converted_img)
   end = time.time()
   result = {key:value.numpy() for key,value in result.items()}
   print("Found %d objects." % len(result["detection_scores"]))
   print("Inference time: ", end-start)
   image_with_boxes = drawing_boxes(
       img.numpy(), result["detection_boxes"],
       result["detection_class_entities"], result["detection_scores"])

run_model(model, web_img)

With the help of pandas, we can check the scores of each object identified; here, we will check the top 10 objects.

 image = load_image(web_img)
 converted_image  = tf.image.convert_image_dtype(image, tf.float32)[tf.newaxis, ...]
 result = model(converted_image) 

One more example:


Today we have seen how the concatenation of residual networks with the inception architect enhances the overall performance of the architecture and gives accurate bounding boxes for each entity in given images. Furthermore, with this minimal code, we can easily deploy the system to the web and android platforms. 


Download our Mobile App

Vijaysinh Lendave
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.