Guide to MMDetection: An Object Detection Python Toolbox

MMDetection is a Python toolbox built as a codebase exclusively for object detection and instance segmentation tasks

MMDetection is a Python toolbox built as a codebase exclusively for object detection and instance segmentation tasks. It is built in a modular way with PyTorch implementation. There are numerous methods available for object detection and instance segmentation collected from various well-acclaimed models. It enables quick training and inference with quality. On the other hand, the toolbox contains weights for more than 200 pre-trained networks, making the toolbox an instant solution in the object detection domain.

MMDetection behaves as a benchmark with the flexibility to reimplement the existing methods or to develop a new detector with the modules available. The major feature of the toolbox is that it contains simple modular components of a typical object detection framework using which one can build custom pipelines or a custom model. Building a new detector framework on top of an existing framework and comparing its performance is easily possible with this toolbox’s benchmarking capabilities.

MMDetection and its Architecture

Since MMDetection is a toolbox containing many pre-built models and each model has its own architecture, this toolbox defines a general architecture that can adapt to any model. This general architecture comprises the following parts:

  1. Backbone
  2. Neck
  3. DenseHead (AnchorHead/AnchorFreeHead)
  4. RoIExtractor
  5. RoIHead (BBoxHead/MaskHead)

A Backbone is the part of the architecture that transforms the input images into raw feature maps. A Neck connects the Backbone with heads and performs reconfigurations and refinements on the raw feature maps so that heads can further process them. A DenseHead is a part that processes the dense locations of the feature maps fed by Neck. An RoIExtractor is the part of the architecture that identifies the region of interest (RoI) and extracts RoI features from the feature maps. An RoIHead is a part that takes RoI features as its input and makes the predictions such as bounding boxes classification or mask prediction as per the task assigned.

single-stage detector
Framework of a single-stage detector without an RoIHead (Source)
two-stage detector
Framework of a two-stage detector with an RoIHead (Source)

The whole network is built as a series of pipelines so that end-to-end training is made simple with any kind of network. During training, the whole network is traversed in the forward and backward directions over iterations. 

training pipeline in MMDetection
A typical training pipeline in the MMDetection architecture (Source)

MMDetection contains high-quality codebases for many popular models and task-oriented modules. Find below the list of fully-built models and custom-adaptable methods that the MMDetection toolbox supports. The list grows continuously with the inclusion of new models and methods.

  1. Fast R-CNN 
  2. Faster R-CNN 
  3. Mask R-CNN 
  4. RetinaNet
  5. DCN
  6. DCNv2
  7. Cascade R-CNN
  8. Mask Scoring R-CNN
  9. FCOS
  10. SSD
  11. R-FCN 
  12. M2Det 
  13. GHM 
  14. ScratchDet 
  15. Double-Head R-CNN 
  16. Grid R-CNN 
  17. FSAF 
  18. Libra R-CNN 
  19. GCNet 
  20. HRNet
  21. Mixed Precision Training 
  22. Weight Standardization 
  23. Hybrid Task Cascade 
  24. Guided Anchoring 
  25. Generalized Attention 

Inference with MMDetection

MMDetection runs better with a CUDA GPU runtime in a PyTorch implementation. The following code references the official tutorial of MMDetection. Check for NVIDIA CUDA compiler and GCC with the following commands.

 # Check nvcc version
 !nvcc -V
 # Check GCC version
 !gcc --version 

Install dependencies required to create the environment.

 !pip install -U torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f
 # install mmcv-full thus we could use CUDA operators
 !pip install mmcv-full 

Install MMDetection from the source repository.

 # Install mmdetection
 !rm -rf mmdetection
 !git clone
 %cd mmdetection
 !pip install -e .
 # install Pillow 7.0.0 back in order to avoid bug in colab
 !pip install Pillow==7.0.0 

Create the environment by importing necessary packages. 

 # Check Pytorch installation
 import torch, torchvision
 print(torch.__version__, torch.cuda.is_available())
 # Check MMDetection installation
 import mmdet
 # Check mmcv installation
 from mmcv.ops import get_compiling_cuda_version, get_compiler_version


Load a pre-trained Mask-RCNN model, trained on the COCO dataset, from the official website.

 !mkdir checkpoints
 !wget -c \
       -O checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth 

Load a checkpoint of the pre-trained model and initialize the detector.

 from mmdet.apis import inference_detector, init_detector, show_result_pyplot
 # choose to use a config and initialize the detector
 config = 'configs/mask_rcnn/'
 # setup a checkpoint file to load
 checkpoint = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'
 # initialize the detector
 model = init_detector(config, checkpoint, device='cuda:0') 

Infer the predictions on a sample outdoor image using the loaded detector.

 # Use the detector to do inference
 img = 'demo/demo.jpg'
 result = inference_detector(model, img) 

Plot the result using the in-built plotting method.

 # Let's plot the result
 show_result_pyplot(model, img, result, score_thr=0.3) 


End-to-End Training on Custom Dataset

Let’s build a model and train it on the KITTI_tiny dataset.

 # download, decompress the data
 !unzip > /dev/null 

Each image is supported with a label annotation file in which the annotations of objects present in the image are provided along with the location. Read the annotation file corresponding to an image sample.

 # Check the label of a single image
 !cat kitti_tiny/training/label_2/000000.txt 


The first column indicates the class of the object, and the 5th to 8th columns indicate the bounding boxes.

Develop a data generation Python class to convert the data format suitable for training and inference. 

 import os.path as osp
 import mmcv
 import numpy as np
 from mmdet.datasets.builder import DATASETS
 from mmdet.datasets.custom import CustomDataset
 class KittiTinyDataset(CustomDataset):
     CLASSES = ('Car', 'Pedestrian', 'Cyclist')
     def load_annotations(self, ann_file):
         cat2label = {k: i for i, k in enumerate(self.CLASSES)}
         # load image list from file
         image_list = mmcv.list_from_file(self.ann_file)
         data_infos = []
         # convert annotations to middle format
         for image_id in image_list:
             filename = f'{self.img_prefix}/{image_id}.jpeg'
             image = mmcv.imread(filename)
             height, width = image.shape[:2]
             data_info = dict(filename=f'{image_id}.jpeg', width=width, height=height)
            # load annotations
             label_prefix = self.img_prefix.replace('image_2', 'label_2')
             lines = mmcv.list_from_file(osp.join(label_prefix, f'{image_id}.txt'))
             content = [line.strip().split(' ') for line in lines]
             bbox_names = [x[0] for x in content]
             bboxes = [[float(info) for info in x[4:8]] for x in content]
             gt_bboxes = []
             gt_labels = []
             gt_bboxes_ignore = []
             gt_labels_ignore = []
             # filter 'DontCare'
             for bbox_name, bbox in zip(bbox_names, bboxes):
                 if bbox_name in cat2label:
             data_anno = dict(
                 bboxes=np.array(gt_bboxes, dtype=np.float32).reshape(-1, 4),
                 labels=np.array(gt_labels, dtype=np.long),
                                        dtype=np.float32).reshape(-1, 4),
                 labels_ignore=np.array(gt_labels_ignore, dtype=np.long))
         return data_infos 

Modify the model configurations to suit fast training on the prepared dataset.

 from mmcv import Config
 from mmdet.apis import set_random_seed
 cfg = Config.fromfile('./configs/faster_rcnn/')
 # Modify dataset type and path
 cfg.dataset_type = 'KittiTinyDataset'
 cfg.data_root = 'kitti_tiny/' = 'KittiTinyDataset' = 'kitti_tiny/' = 'train.txt' = 'training/image_2' = 'KittiTinyDataset' = 'kitti_tiny/' = 'train.txt' = 'training/image_2' = 'KittiTinyDataset' = 'kitti_tiny/' = 'val.txt' = 'training/image_2'
 # modify num classes of the model in box head
 cfg.model.roi_head.bbox_head.num_classes = 3
 # We can still use the pre-trained Mask RCNN model though we do not need to
 # use the mask branch
 cfg.load_from = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'
 # Set up working dir to save files and logs.
 cfg.work_dir = './tutorial_exps'
 # The original learning rate (LR) is set for 8-GPU training.
 # We divide it by 8 since we only use one GPU. = 0.02 / 8
 cfg.lr_config.warmup = None
 cfg.log_config.interval = 10
 # Change the evaluation metric since we use customized dataset.
 cfg.evaluation.metric = 'mAP'
 # We can set the evaluation interval to reduce the evaluation times
 cfg.evaluation.interval = 12
 # We can set the checkpoint saving interval to reduce the storage cost
 cfg.checkpoint_config.interval = 12
 # Set seed thus the results are more reproducible
 cfg.seed = 0
 set_random_seed(0, deterministic=False)
 cfg.gpu_ids = range(1) 

Train a new detector model with the preprocessed dataset and modified configurations.

 from mmdet.datasets import build_dataset
 from mmdet.models import build_detector
 from mmdet.apis import train_detector
 # Build dataset
 datasets = [build_dataset(]
 # Build the detector
 model = build_detector(
     cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
 # Add an attribute for visualization convenience
 model.CLASSES = datasets[0].CLASSES
 # Create work_dir
 train_detector(model, datasets, cfg, distributed=False, validate=True) 

Test the fully-trained model on a test image.

 img = mmcv.imread('kitti_tiny/training/image_2/000068.jpeg')
 model.cfg = cfg
 result = inference_detector(model, img)
 show_result_pyplot(model, img, result) 


Find the notebook with the above code implementation here.

Performance of MMDetection

With many competing models, users struggle to choose the right one for their requirements. MMDetection behaves as a benchmarking platform and compares different models under identical conditions.

Benchmarking popular models on a bounding box prediction task (Source).
Benchmarking popular models on an object masking task (Source).
Benchmarking GPUs with MMDetection
Benchmarking popular commercial GPUs with MMDetection on three models (Source).

MMDetection toolbox outperforms recent codebases, namely, maskrcnn-benchmark, Detectron and SimpleDet. MMDetection is presently state-of-the-art with a huge model collection. The efficiency and performance of MMDetection is far better than any other codebase.

Comparison of MMDetection
Comparison of MMDetection with competing codebases based on training, inference, memory usage and evaluation metrics (Source).

References for Further reading:

Download our Mobile App

Rajkumar Lakshmanamoorthy
A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week.