What is GSDT: GNNs for Simultaneous Detection and Tracking

GSDT is an end-to-end trainable joint Multi-Object Tracking approach using Graphical Neural Networks for Simultaneous Detection and Tracking

Multi-Object Tracking, also called the MOT, is the detection and follow-up of multiple moving objects at the same time in a dynamic environment. It finds crucial applications including autonomous vehicles, robot navigation, security surveillance, medical imaging and sports analysis. Multi-Object Tracking comprises two key challenges, namely, object detection and data association. Object detection is performed by a neural network that looks for the objects of interest, whereas, data association is performed by a time-lapse-aware neural network that looks for correspondences between the same object in two different frames. Traditional multi-object tracking approaches to train the object detection network and the data association network separately. These networks are optimized separately to obtain better performance in their parts of the job. This strategy fails to handle object detection and data association end-to-end in machine learning modeling, though these tasks rely wholly on each other. This issue limits improvement in performance beyond a certain level. 

A few recent approaches introduced joint multi-object tracking to tackle the above-said problem. Some attempted tracking objects individually and independently that easily resolved the data association problem, but they led to a new problem. They ignored object-object relationships as they started tracking objects individually. Object-object relationships are crucial in identifying relative patterns among objects. On the other hand, some approaches attempted, including object-object relationships, but they necessitated training object detectors separately.

To this end, Yongxin Wang, Kris Kitani, Xinshuo Weng of the Robotics Institute, Carnegie Mellon University has developed an end-to-end trainable joint Multi-Object Tracking architecture using Graphical Neural Networks that is named GSDT, the abbreviation for GNNs for Simultaneous Detection and Tracking. GSDT models object-object relationships for both the data association and object detection. It follows the joint multi-object tracking strategy; thus it can be trained and optimized as a whole. It employs Graphical Neural Networks to obtain more discriminative features. This model achieves state-of-the-art results in various public multi-object datasets, including MOT15, MOT16, MOT17 and MOT20.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
MOT 20
A sample Multi-object tracking on MOT20 dataset using the GSDT model (source)

How GSDT differs from competing models

GSDT strategy
The training strategy of GSDT compared to the previous works (source)

In GSDT, two images from successive frames and tracklets from the previous frame are given to the model as inputs. The model attempts to detect the objects in the current frame with these inputs and associate those detected objects with the tracklets of the previous frame. By associating the tracklets to the objects, the model decides iteratively whether to continue using a specific tracklet or to discontinue it or to initiate a new tracklet at the current frame.

An overview of the GSDT Architecture
An overview of the GSDT Architecture (source)

An object detector and a re-identification module are used in GSDT to detect multiple objects and associate them simultaneously. In addition, graphical neural networks are used to extract and learn features and improve both object detection and data association performances. In short, the GSDT architecture is composed of four modules, namely, GNNs-based feature extraction module, node feature aggregation module, object detection module and data association module.

Download our Mobile App

Functional overview of node feature aggregation
Functional overview of node feature aggregation (source)

Python implementation of GSDT

  1. GSDT requires a PyTorch environment with CUDA enabled GPU runtime. Download the source codes from the official repository.
!git clone https://github.com/yongxinw/GSDT.git


  1. Change the directory to refer to the downloaded GSDT and explore its contents.
 %cd /content/GSDT/
 !ls -p 


  1. GSDT works well with Anaconda-3 distribution. Download and install if the local machine does not have a conda environment.
 !wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
 !bash Anaconda3-2020.02-Linux-x86_64.sh 


  1. Enable and activate the conda environment.
  1. Inside the conda’s base environment, provide the following command. 
conda create -n dev python=3.6

A part of Code and Output:

  1. Activate conda’s development environment using the following command and run the following steps inside the development environment only.
conda activate dev
  1. Install the dependencies in the development environment by running pip command in recursion.
pip install -r requirements.txt

A part of Code and Output:

  1. Install the PyTorch version 1.7.0  that is compatible with the CUDA version 10.2. Anaconda distribution comes with CUDA 10.2 by default.
pip install torch==1.7.0
  1. Install the PyTorch Geometric package 
bash install_pyg.sh CUDA_version=cu102


  1. Build Deformable Convolutional Neural Network version 2 from the source file using the following command successively. 
 cd ./src/lib/models/networks/DCNv2
 bash make.sh 
  1. Download the dataset from MOT15 and MOT20 challenges. Once the dataset is ready, the following commands generate labels corresponding to the objects.
 cd src
 python gen_labels_15.py
 python gen_labels_20.py 
  1. Download the pre-trained models corresponding to the MOT15 dataset and MOT20 dataset and their weights and move them to /content/GSDT/experiments. Perform sample evaluation on two frames from the datasets, each using the following commands successively. 
 cd ./experiments
 track_gnn_mot_AGNNConv_RoIAlign_mot15.sh model_mot15
 track_gnn_mot_AGNNConv_RoIAlign_mot20.sh model_mot20 

Performance of GSDT

GSDT has been evaluated on the open challenges MOT15, MOT16, MOT17 and MOT20. Compared with competing models, the model has been submitted by its authors to the official leaderboard of the MOT challenge. Models are evaluated based on numerous standard metrics including MOTA, IDF1, MT, ML and IDS. 

Sample MOT
A sample multi-object tracking on MOT20 dataset using the GSDT model (Source)
Sample MOT
A sample multi-object tracking on MOT17 dataset using the GSDT model (Source)

GSDT greatly outperforms most of the well-acclaimed models including DMT, LIF_TsimInt, MDP_SubCNN, CDA_DDAL, MPNTrack, EAMTT, AP_HWDPL, NOMTwSDP, RAR15, Tube_TK, CTrackerV1, CTTrack17, SORT20 and POI. GSDT is recognized as the state-of-the-art in the MOT challenge during its publication.

A sample MOT
A sample multi-object tracking on MOT17 dataset using the GSDT model (Source)

Further reading

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Rajkumar Lakshmanamoorthy
A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.

Our Upcoming Events

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

The Great Indian IT Reshuffling

While both the top guns of TCS and Tech Mahindra are reflecting rather positive signs to the media, the reason behind the resignations is far more grave.

OpenAI, a Data Scavenging Company for Microsoft

While it might be true that the investment was for furthering AI research, this partnership is also providing Microsoft with one of the greatest assets of this digital age, data​​, and—perhaps to make it worse—that data might be yours.