What is GSDT: GNNs for Simultaneous Detection and Tracking

GSDT is an end-to-end trainable joint Multi-Object Tracking approach using Graphical Neural Networks for Simultaneous Detection and Tracking

Multi-Object Tracking, also called the MOT, is the detection and follow-up of multiple moving objects at the same time in a dynamic environment. It finds crucial applications including autonomous vehicles, robot navigation, security surveillance, medical imaging and sports analysis. Multi-Object Tracking comprises two key challenges, namely, object detection and data association. Object detection is performed by a neural network that looks for the objects of interest, whereas, data association is performed by a time-lapse-aware neural network that looks for correspondences between the same object in two different frames. Traditional multi-object tracking approaches to train the object detection network and the data association network separately. These networks are optimized separately to obtain better performance in their parts of the job. This strategy fails to handle object detection and data association end-to-end in machine learning modeling, though these tasks rely wholly on each other. This issue limits improvement in performance beyond a certain level. 

A few recent approaches introduced joint multi-object tracking to tackle the above-said problem. Some attempted tracking objects individually and independently that easily resolved the data association problem, but they led to a new problem. They ignored object-object relationships as they started tracking objects individually. Object-object relationships are crucial in identifying relative patterns among objects. On the other hand, some approaches attempted, including object-object relationships, but they necessitated training object detectors separately.


Sign up for your weekly dose of what's up in emerging technology.

To this end, Yongxin Wang, Kris Kitani, Xinshuo Weng of the Robotics Institute, Carnegie Mellon University has developed an end-to-end trainable joint Multi-Object Tracking architecture using Graphical Neural Networks that is named GSDT, the abbreviation for GNNs for Simultaneous Detection and Tracking. GSDT models object-object relationships for both the data association and object detection. It follows the joint multi-object tracking strategy; thus it can be trained and optimized as a whole. It employs Graphical Neural Networks to obtain more discriminative features. This model achieves state-of-the-art results in various public multi-object datasets, including MOT15, MOT16, MOT17 and MOT20.

MOT 20
A sample Multi-object tracking on MOT20 dataset using the GSDT model (source)

How GSDT differs from competing models

GSDT strategy
The training strategy of GSDT compared to the previous works (source)

In GSDT, two images from successive frames and tracklets from the previous frame are given to the model as inputs. The model attempts to detect the objects in the current frame with these inputs and associate those detected objects with the tracklets of the previous frame. By associating the tracklets to the objects, the model decides iteratively whether to continue using a specific tracklet or to discontinue it or to initiate a new tracklet at the current frame.

An overview of the GSDT Architecture
An overview of the GSDT Architecture (source)

An object detector and a re-identification module are used in GSDT to detect multiple objects and associate them simultaneously. In addition, graphical neural networks are used to extract and learn features and improve both object detection and data association performances. In short, the GSDT architecture is composed of four modules, namely, GNNs-based feature extraction module, node feature aggregation module, object detection module and data association module.

Functional overview of node feature aggregation
Functional overview of node feature aggregation (source)

Python implementation of GSDT

  1. GSDT requires a PyTorch environment with CUDA enabled GPU runtime. Download the source codes from the official repository.
!git clone https://github.com/yongxinw/GSDT.git


  1. Change the directory to refer to the downloaded GSDT and explore its contents.
 %cd /content/GSDT/
 !ls -p 


  1. GSDT works well with Anaconda-3 distribution. Download and install if the local machine does not have a conda environment.
 !wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
 !bash Anaconda3-2020.02-Linux-x86_64.sh 


  1. Enable and activate the conda environment.
  1. Inside the conda’s base environment, provide the following command. 
conda create -n dev python=3.6

A part of Code and Output:

  1. Activate conda’s development environment using the following command and run the following steps inside the development environment only.
conda activate dev
  1. Install the dependencies in the development environment by running pip command in recursion.
pip install -r requirements.txt

A part of Code and Output:

  1. Install the PyTorch version 1.7.0  that is compatible with the CUDA version 10.2. Anaconda distribution comes with CUDA 10.2 by default.
pip install torch==1.7.0
  1. Install the PyTorch Geometric package 
bash install_pyg.sh CUDA_version=cu102


  1. Build Deformable Convolutional Neural Network version 2 from the source file using the following command successively. 
 cd ./src/lib/models/networks/DCNv2
 bash make.sh 
  1. Download the dataset from MOT15 and MOT20 challenges. Once the dataset is ready, the following commands generate labels corresponding to the objects.
 cd src
 python gen_labels_15.py
 python gen_labels_20.py 
  1. Download the pre-trained models corresponding to the MOT15 dataset and MOT20 dataset and their weights and move them to /content/GSDT/experiments. Perform sample evaluation on two frames from the datasets, each using the following commands successively. 
 cd ./experiments
 track_gnn_mot_AGNNConv_RoIAlign_mot15.sh model_mot15
 track_gnn_mot_AGNNConv_RoIAlign_mot20.sh model_mot20 

Performance of GSDT

GSDT has been evaluated on the open challenges MOT15, MOT16, MOT17 and MOT20. Compared with competing models, the model has been submitted by its authors to the official leaderboard of the MOT challenge. Models are evaluated based on numerous standard metrics including MOTA, IDF1, MT, ML and IDS. 

Sample MOT
A sample multi-object tracking on MOT20 dataset using the GSDT model (Source)
Sample MOT
A sample multi-object tracking on MOT17 dataset using the GSDT model (Source)

GSDT greatly outperforms most of the well-acclaimed models including DMT, LIF_TsimInt, MDP_SubCNN, CDA_DDAL, MPNTrack, EAMTT, AP_HWDPL, NOMTwSDP, RAR15, Tube_TK, CTrackerV1, CTTrack17, SORT20 and POI. GSDT is recognized as the state-of-the-art in the MOT challenge during its publication.

A sample MOT
A sample multi-object tracking on MOT17 dataset using the GSDT model (Source)

Further reading

More Great AIM Stories

Rajkumar Lakshmanamoorthy
A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.