MMDetection is a Python toolbox built as a codebase exclusively for object detection and instance segmentation tasks. It is built in a modular way with PyTorch implementation. There are numerous methods available for object detection and instance segmentation collected from various well-acclaimed models. It enables quick training and inference with quality. On the other hand, the toolbox contains weights for more than 200 pre-trained networks, making the toolbox an instant solution in the object detection domain.
MMDetection behaves as a benchmark with the flexibility to reimplement the existing methods or to develop a new detector with the modules available. The major feature of the toolbox is that it contains simple modular components of a typical object detection framework using which one can build custom pipelines or a custom model. Building a new detector framework on top of an existing framework and comparing its performance is easily possible with this toolbox’s benchmarking capabilities.
THE BELAMY
Sign up for your weekly dose of what's up in emerging technology.
MMDetection and its Architecture
Since MMDetection is a toolbox containing many pre-built models and each model has its own architecture, this toolbox defines a general architecture that can adapt to any model. This general architecture comprises the following parts:
- Backbone
- Neck
- DenseHead (AnchorHead/AnchorFreeHead)
- RoIExtractor
- RoIHead (BBoxHead/MaskHead)
A Backbone is the part of the architecture that transforms the input images into raw feature maps. A Neck connects the Backbone with heads and performs reconfigurations and refinements on the raw feature maps so that heads can further process them. A DenseHead is a part that processes the dense locations of the feature maps fed by Neck. An RoIExtractor is the part of the architecture that identifies the region of interest (RoI) and extracts RoI features from the feature maps. An RoIHead is a part that takes RoI features as its input and makes the predictions such as bounding boxes classification or mask prediction as per the task assigned.
The whole network is built as a series of pipelines so that end-to-end training is made simple with any kind of network. During training, the whole network is traversed in the forward and backward directions over iterations.
Popular Models included in MMDetection
MMDetection contains high-quality codebases for many popular models and task-oriented modules. Find below the list of fully-built models and custom-adaptable methods that the MMDetection toolbox supports. The list grows continuously with the inclusion of new models and methods.
- Fast R-CNN
- Faster R-CNN
- Mask R-CNN
- RetinaNet
- DCN
- DCNv2
- Cascade R-CNN
- Mask Scoring R-CNN
- FCOS
- SSD
- R-FCN
- M2Det
- GHM
- ScratchDet
- Double-Head R-CNN
- Grid R-CNN
- FSAF
- Libra R-CNN
- GCNet
- HRNet
- Mixed Precision Training
- Weight Standardization
- Hybrid Task Cascade
- Guided Anchoring
- Generalized Attention
Inference with MMDetection
MMDetection runs better with a CUDA GPU runtime in a PyTorch implementation. The following code references the official tutorial of MMDetection. Check for NVIDIA CUDA compiler and GCC with the following commands.
# Check nvcc version !nvcc -V # Check GCC version !gcc --version
Install dependencies required to create the environment.
!pip install -U torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html # install mmcv-full thus we could use CUDA operators !pip install mmcv-full
Install MMDetection from the source repository.
# Install mmdetection !rm -rf mmdetection !git clone https://github.com/open-mmlab/mmdetection.git %cd mmdetection !pip install -e . # install Pillow 7.0.0 back in order to avoid bug in colab !pip install Pillow==7.0.0
Create the environment by importing necessary packages.
# Check Pytorch installation import torch, torchvision print(torch.__version__, torch.cuda.is_available()) # Check MMDetection installation import mmdet print(mmdet.__version__) # Check mmcv installation from mmcv.ops import get_compiling_cuda_version, get_compiler_version print(get_compiling_cuda_version()) print(get_compiler_version())
Output:
Load a pre-trained Mask-RCNN model, trained on the COCO dataset, from the official website.
!mkdir checkpoints !wget -c http://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth \ -O checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth
Load a checkpoint of the pre-trained model and initialize the detector.
from mmdet.apis import inference_detector, init_detector, show_result_pyplot # choose to use a config and initialize the detector config = 'configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco.py' # setup a checkpoint file to load checkpoint = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth' # initialize the detector model = init_detector(config, checkpoint, device='cuda:0')
Infer the predictions on a sample outdoor image using the loaded detector.
# Use the detector to do inference img = 'demo/demo.jpg' result = inference_detector(model, img)
Plot the result using the in-built plotting method.
# Let's plot the result show_result_pyplot(model, img, result, score_thr=0.3)
Output:
End-to-End Training on Custom Dataset
Let’s build a model and train it on the KITTI_tiny dataset.
# download, decompress the data !wget https://download.openmmlab.com/mmdetection/data/kitti_tiny.zip !unzip kitti_tiny.zip > /dev/null
Each image is supported with a label annotation file in which the annotations of objects present in the image are provided along with the location. Read the annotation file corresponding to an image sample.
# Check the label of a single image !cat kitti_tiny/training/label_2/000000.txt
Output:
The first column indicates the class of the object, and the 5th to 8th columns indicate the bounding boxes.
Develop a data generation Python class to convert the data format suitable for training and inference.
import os.path as osp import mmcv import numpy as np from mmdet.datasets.builder import DATASETS from mmdet.datasets.custom import CustomDataset @DATASETS.register_module() class KittiTinyDataset(CustomDataset): CLASSES = ('Car', 'Pedestrian', 'Cyclist') def load_annotations(self, ann_file): cat2label = {k: i for i, k in enumerate(self.CLASSES)} # load image list from file image_list = mmcv.list_from_file(self.ann_file) data_infos = [] # convert annotations to middle format for image_id in image_list: filename = f'{self.img_prefix}/{image_id}.jpeg' image = mmcv.imread(filename) height, width = image.shape[:2] data_info = dict(filename=f'{image_id}.jpeg', width=width, height=height) # load annotations label_prefix = self.img_prefix.replace('image_2', 'label_2') lines = mmcv.list_from_file(osp.join(label_prefix, f'{image_id}.txt')) content = [line.strip().split(' ') for line in lines] bbox_names = [x[0] for x in content] bboxes = [[float(info) for info in x[4:8]] for x in content] gt_bboxes = [] gt_labels = [] gt_bboxes_ignore = [] gt_labels_ignore = [] # filter 'DontCare' for bbox_name, bbox in zip(bbox_names, bboxes): if bbox_name in cat2label: gt_labels.append(cat2label[bbox_name]) gt_bboxes.append(bbox) else: gt_labels_ignore.append(-1) gt_bboxes_ignore.append(bbox) data_anno = dict( bboxes=np.array(gt_bboxes, dtype=np.float32).reshape(-1, 4), labels=np.array(gt_labels, dtype=np.long), bboxes_ignore=np.array(gt_bboxes_ignore, dtype=np.float32).reshape(-1, 4), labels_ignore=np.array(gt_labels_ignore, dtype=np.long)) data_info.update(ann=data_anno) data_infos.append(data_info) return data_infos
Modify the model configurations to suit fast training on the prepared dataset.
from mmcv import Config from mmdet.apis import set_random_seed cfg = Config.fromfile('./configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco.py') # Modify dataset type and path cfg.dataset_type = 'KittiTinyDataset' cfg.data_root = 'kitti_tiny/' cfg.data.test.type = 'KittiTinyDataset' cfg.data.test.data_root = 'kitti_tiny/' cfg.data.test.ann_file = 'train.txt' cfg.data.test.img_prefix = 'training/image_2' cfg.data.train.type = 'KittiTinyDataset' cfg.data.train.data_root = 'kitti_tiny/' cfg.data.train.ann_file = 'train.txt' cfg.data.train.img_prefix = 'training/image_2' cfg.data.val.type = 'KittiTinyDataset' cfg.data.val.data_root = 'kitti_tiny/' cfg.data.val.ann_file = 'val.txt' cfg.data.val.img_prefix = 'training/image_2' # modify num classes of the model in box head cfg.model.roi_head.bbox_head.num_classes = 3 # We can still use the pre-trained Mask RCNN model though we do not need to # use the mask branch cfg.load_from = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth' # Set up working dir to save files and logs. cfg.work_dir = './tutorial_exps' # The original learning rate (LR) is set for 8-GPU training. # We divide it by 8 since we only use one GPU. cfg.optimizer.lr = 0.02 / 8 cfg.lr_config.warmup = None cfg.log_config.interval = 10 # Change the evaluation metric since we use customized dataset. cfg.evaluation.metric = 'mAP' # We can set the evaluation interval to reduce the evaluation times cfg.evaluation.interval = 12 # We can set the checkpoint saving interval to reduce the storage cost cfg.checkpoint_config.interval = 12 # Set seed thus the results are more reproducible cfg.seed = 0 set_random_seed(0, deterministic=False) cfg.gpu_ids = range(1)
Train a new detector model with the preprocessed dataset and modified configurations.
from mmdet.datasets import build_dataset from mmdet.models import build_detector from mmdet.apis import train_detector # Build dataset datasets = [build_dataset(cfg.data.train)] # Build the detector model = build_detector( cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg')) # Add an attribute for visualization convenience model.CLASSES = datasets[0].CLASSES # Create work_dir mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) train_detector(model, datasets, cfg, distributed=False, validate=True)
Test the fully-trained model on a test image.
img = mmcv.imread('kitti_tiny/training/image_2/000068.jpeg') model.cfg = cfg result = inference_detector(model, img) show_result_pyplot(model, img, result)
Output:
Find the notebook with the above code implementation here.
Performance of MMDetection
With many competing models, users struggle to choose the right one for their requirements. MMDetection behaves as a benchmarking platform and compares different models under identical conditions.
MMDetection toolbox outperforms recent codebases, namely, maskrcnn-benchmark, Detectron and SimpleDet. MMDetection is presently state-of-the-art with a huge model collection. The efficiency and performance of MMDetection is far better than any other codebase.