MITB Banner

OneNet: Introduction to End-to-End One-Stage Object Detection

Share

OneNet object detection output

Object detection is one of the most talked-about subjects in the Artificial intelligence domain, object detection can be on an image or video; It can be multiple object detection in one shot using YOLO techniques or other models like Google EfficientDet, and CenterNet, In all of these different object detection, approaches everyone is trying to achieve the maximum accuracy with less computation power, the most challenging problem in object detectors is label assignment, more specifically how to differentiate positive and negative images and assign them object(for positive image) and background(for negative image) accordingly, the positive sample is always depends on IoU(intersection over union) threshold, primarily the object detection is done by sliding windows enumeration on image grid, there is much lack of classification information in label assignment previously in one-stage detectors.

Arxiv: OneNet

Previously we assigned labels by only location cost between a sample and corresponding ground truth, due to this further post-processing is required using NMS(Non-maximum suppression), NMS is a key post-processing technique for removing redundant boxes around the same object.

Figure 1: Our classifier initially detected six bounding boxes, but by applying non-maximum suppression, we are left with only one (the correct) bounding box.
NMP(non-maxima suppression on detected image): source

Label assignment Methods:

There are previous label assignment techniques that are used extensively with post-processing NMS, let’s discuss all of them and find out why OneNet uses the Minumum Cost Assignment technique and how it’s efficient.

1.Box assignment

Modern object detection techniques use pre-defined thousands of anchor boxes in the image grid and perform object classification we call this approach “Box Assignment.”

Box Assignment

Box assignment, as shown in the above figure is used over the years many times in different object detection frameworks. If Iou is greater than the high threshold, then the Detected boxes are labelled as green and negative(red) if it’s smaller than the threshold value. 

2.Point assignment

To eliminate the complex box computation point assignment is used in many object detection frameworks. It directly treats the grid points in the feature map as an object and predicts the offset from grid point to object box boundaries. Label assignment is more simplified in the point assignment method.

Point assignment

3.Minimum cost Assignment(proposed method by OneNet):

There was a major problem with both of the above-discussed label assignment techniques. Both suffered the Many-to-one assignment problem; they have more than one positive sample for one ground-truth box. That produces redundant results, and NMS(non-maxima suppression) post-processing becomes necessary. 

In minimum cost assignment: positive sample is assigned only one sample of minimum cost and others are all negative.

Minimum cost Assignment

It is a pretty straightforward method, for each ground truth they restricted the sample to one positive sample, no handcrafted heuristic approach Is involved and no need for NMS(Non-maximum suppression).

OneNet

Today we are going to discuss one of the recently launched object detector: OneNet. OneNet is an end to end one-stage object detector that purposes new techniques for object detection like Minimum Cost assignment. Its latest paper was published on 10 Dec 2020 by Peize Sun, Yi Jiang, Enze Xie, Zehuan Yuan, Changhu Wang, and Ping Luo at The University of Hong Kong It was the collaboration with ByteDance AI lab

OneNet is a fully convolutional one stage detector without any post-processing technique such as NMS(non-maximal suppression).

Now they used many new techniques like minimum cost assignment that removed NMS totally we seen in the above label assignment demonstration. There are other advantages of OneNet too like:

  • It archives 35.0 AP/80 FPS using ResNet-50 on COCO dataset
  • No NMS(non-maximal suppression) or Max Pooling for post-processing.
  • The whole network of OneNet is fully convolutional
  • End-end-to training.
  • No RoI operations
  • Label assignment is based on the Minimum cost of classification rather than complex bipartite-matching

Architecture of OneNet

Official Research paper

The pipeline of OneNet basically starts with an 

  • input image of HxWx3, with three channels, 
  • the backbone generates a feature map of H/4 x W/4 x C, 
  • the head produce classification prediction of H/4 x W/4 x K(number of categories),
  • and regression prediction of H/4 x W/4 x 4.
  • The final output is top-k scoring boxes.

1.Backbone

The backbone is the bottom-up and top-down structure of the architecture, 

  • the first bottom-up component of the architecture i.e. ResNet, OneNet used ResNet to produce multi-scale feature maps.
  • The top-down architecture is used to generate the final feature maps for object recognition.

2.Head

The head performs classification and location on each grid point of the image feature map by two parallel convolutional layers. 

  • For K object categories the classification layer predicts the object probability.
  • Location layers used to predict the offset from grid point to 4 boundaries of the ground truth box.

3.Training

As already discussed above, OneNet used the Minimum Cost Assignment for label assignment.

4.Inference

No Max pooling or NMS is used in OneNet, so the final output is direct top k (e.g. 100) scoring boxes.

Implementation

Let’s see how to get started with the implementation of OneNet, it provides two models dcn and nodcn one for accuracy and another for easy deployment.

Installation

The code of OneNet is based on facebook’s Detectron2 and DETR and the code requirements are Linux or macOS with Python-3.6+, Pytorch-1.5+, torchvision 

Steps for install, visualize, evaluation is as follows:

  1. Install pytorch for your CUDA version from here, in case of Google Colab install torchvision using below command:
!pip install torch===1.7.1+cu110 torchvision===0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
  1. Clone & Install from source 
!git clone https://github.com/PeizeSun/OneNet.git
!cd OneNet
!python setup.py build develop
  1. Link coco dataset path to dataset/coco inside cloned OneNet repo
! mkdir -p datasets/coco
! ln -s /path_to_coco_dataset/annotations datasets/coco/annotations
! ln -s /path_to_coco_dataset/train2017 datasets/coco/train2017
! ln -s /path_to_coco_dataset/val2017 datasets/coco/val2017
  1. Train 
!python projects/OneNet/train_net.py --num-gpus 8 \
    --config-file projects/OneNet/configs/onenet.res50.dcn.yaml
  1. Evaluate 
!python projects/OneNet/train_net.py --num-gpus 8 \
    --config-file projects/OneNet/configs/onenet.res50.dcn.yaml \
    --eval-only MODEL.WEIGHTS path/to/model.pth
  1. Visualize
!python demo/demo.py\
    --config-file projects/OneNet/configs/onenet.res50.dcn.yaml \
    --input path/to/images --output path/to/save_images --confidence-threshold 0.4 \
    --opts MODEL.WEIGHTS path/to/model.pth

Comparison to CenterNet

CenterNet is one the most popular one-stage detector which was a followup of CornetNet. CornerNet uses the corner-key points to overcome the limitations of using anchor-based methods. But it was having major accuracy flaws during small object detection so CenterNet tried to overcome the restriction encountered in CornerNet by using triplet cornet to localize objects.

OneNet beats the CenterNet with comparable performance in both speed and detection accuracy.

Conclusion

Using Minimum cost assignment and excluding NMS methodologies was proved a great success for OneNet, we have seen how previous label assignments were using more computation and also the pipeline of OneNet how it perform the object detection efficiently. We have also seen how the classification cost is the key to the success of end to end one stage object detection.

The above demonstration of OneNet is referred from this research paper published by the OneNet team at arXiv. The source code for OneNet is open source at Github. The coding tutorial is available at: https://github.com/mmaithani/data-science/blob/main/OneNet.ipynb

Share
Picture of Mohit Maithani

Mohit Maithani

Mohit is a Data & Technology Enthusiast with good exposure to solving real-world problems in various avenues of IT and Deep learning domain. He believes in solving human's daily problems with the help of technology.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.