Last updated April 6, 2021
In AI Mysteries

What is GSDT: GNNs for Simultaneous Detection and Tracking

GSDT is an end-to-end trainable joint Multi-Object Tracking approach using Graphical Neural Networks for Simultaneous Detection and Tracking

Share

Published on April 6, 2021

by Rajkumar Lakshmanamoorthy

Multi-Object Tracking, also called the MOT, is the detection and follow-up of multiple moving objects at the same time in a dynamic environment. It finds crucial applications including autonomous vehicles, robot navigation, security surveillance, medical imaging and sports analysis. Multi-Object Tracking comprises two key challenges, namely, object detection and data association. Object detection is performed by a neural network that looks for the objects of interest, whereas, data association is performed by a time-lapse-aware neural network that looks for correspondences between the same object in two different frames. Traditional multi-object tracking approaches to train the object detection network and the data association network separately. These networks are optimized separately to obtain better performance in their parts of the job. This strategy fails to handle object detection and data association end-to-end in machine learning modeling, though these tasks rely wholly on each other. This issue limits improvement in performance beyond a certain level.

A few recent approaches introduced joint multi-object tracking to tackle the above-said problem. Some attempted tracking objects individually and independently that easily resolved the data association problem, but they led to a new problem. They ignored object-object relationships as they started tracking objects individually. Object-object relationships are crucial in identifying relative patterns among objects. On the other hand, some approaches attempted, including object-object relationships, but they necessitated training object detectors separately.

To this end, Yongxin Wang, Kris Kitani, Xinshuo Weng of the Robotics Institute, Carnegie Mellon University has developed an end-to-end trainable joint Multi-Object Tracking architecture using Graphical Neural Networks that is named GSDT, the abbreviation for GNNs for Simultaneous Detection and Tracking. GSDT models object-object relationships for both the data association and object detection. It follows the joint multi-object tracking strategy; thus it can be trained and optimized as a whole. It employs Graphical Neural Networks to obtain more discriminative features. This model achieves state-of-the-art results in various public multi-object datasets, including MOT15, MOT16, MOT17 and MOT20.

MOT 20 — A sample Multi-object tracking on MOT20 dataset using the GSDT model (source)

How GSDT differs from competing models

GSDT strategy — The training strategy of GSDT compared to the previous works (source)

In GSDT, two images from successive frames and tracklets from the previous frame are given to the model as inputs. The model attempts to detect the objects in the current frame with these inputs and associate those detected objects with the tracklets of the previous frame. By associating the tracklets to the objects, the model decides iteratively whether to continue using a specific tracklet or to discontinue it or to initiate a new tracklet at the current frame.

An overview of the GSDT Architecture (source)

An object detector and a re-identification module are used in GSDT to detect multiple objects and associate them simultaneously. In addition, graphical neural networks are used to extract and learn features and improve both object detection and data association performances. In short, the GSDT architecture is composed of four modules, namely, GNNs-based feature extraction module, node feature aggregation module, object detection module and data association module.

Functional overview of node feature aggregation (source)

Python implementation of GSDT

GSDT requires a PyTorch environment with CUDA enabled GPU runtime. Download the source codes from the official repository.

!git clone https://github.com/yongxinw/GSDT.git

Output:

Change the directory to refer to the downloaded GSDT and explore its contents.

 %cd /content/GSDT/
 !ls -p

Output:

GSDT works well with Anaconda-3 distribution. Download and install if the local machine does not have a conda environment.

 !wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
 !bash Anaconda3-2020.02-Linux-x86_64.sh

Output:

Enable and activate the conda environment.

!bash

Inside the conda’s base environment, provide the following command.

conda create -n dev python=3.6

A part of Code and Output:

Activate conda’s development environment using the following command and run the following steps inside the development environment only.

conda activate dev

Install the dependencies in the development environment by running pip command in recursion.

pip install -r requirements.txt

A part of Code and Output:

Install the PyTorch version 1.7.0 that is compatible with the CUDA version 10.2. Anaconda distribution comes with CUDA 10.2 by default.

pip install torch==1.7.0

Install the PyTorch Geometric package

bash install_pyg.sh CUDA_version=cu102

Output:

Build Deformable Convolutional Neural Network version 2 from the source file using the following command successively.

 cd ./src/lib/models/networks/DCNv2
 bash make.sh

Download the dataset from MOT15 and MOT20 challenges. Once the dataset is ready, the following commands generate labels corresponding to the objects.

 cd src
 python gen_labels_15.py
 python gen_labels_20.py

Download the pre-trained models corresponding to the MOT15 dataset and MOT20 dataset and their weights and move them to /content/GSDT/experiments. Perform sample evaluation on two frames from the datasets, each using the following commands successively.

 cd ./experiments
 track_gnn_mot_AGNNConv_RoIAlign_mot15.sh model_mot15
 track_gnn_mot_AGNNConv_RoIAlign_mot20.sh model_mot20

Performance of GSDT

GSDT has been evaluated on the open challenges MOT15, MOT16, MOT17 and MOT20. Compared with competing models, the model has been submitted by its authors to the official leaderboard of the MOT challenge. Models are evaluated based on numerous standard metrics including MOTA, IDF1, MT, ML and IDS.

Sample MOT — A sample multi-object tracking on MOT20 dataset using the GSDT model (Source)

GSDT greatly outperforms most of the well-acclaimed models including DMT, LIF_TsimInt, MDP_SubCNN, CDA_DDAL, MPNTrack, EAMTT, AP_HWDPL, NOMTwSDP, RAR15, Tube_TK, CTrackerV1, CTTrack17, SORT20 and POI. GSDT is recognized as the state-of-the-art in the MOT challenge during its publication.