Active Hackathon

Explained: Didi Chuxing’s Prize-Winning Real-Time 2D Object Detection Framework

Automobile firms like Waymo and Tesla are doubling down on the work on fully-self driving systems to achieve Level 5 autonomy. Chinese researchers from Didi Chuxing and Tianjin University have now introduced a real-time method to detect 2D objects from images.

The proposed detection framework came second in the Waymo Open Dataset (WOD) Challenges with 75 percent L1 mAP and 69.72 mAP in the real-time 2D detection track while achieving the latency of 45.8 ms per frame on the NVIDIA Tesla V100 GPU


Sign up for your weekly dose of what's up in emerging technology.

The recently concluded Waymo Open Dataset 2021 challenge had participants from all around the world. In August 2019, when the company first launched the challenge, the perception dataset comprised high-resolution sensor data and labels for 1,950 segments. Last March, Waymo expanded its Open Dataset to include a motion dataset containing object trajectories and corresponding 3D maps for 103,354 segments.

Example of images in Waymo Open Dataset (Source: arXiv) 

Last year’s first place solution at the WOD challenge— from the researchers at TuSimple–had trained multiple models on the union of training and validation sets, including ResNet50-v1d, RestNet50-v1b, Res2Net-v1b, HRNetv2p-W32 and HRNetv2p-W487. All the models were pre-trained on ImageNet, and the Linear-Reweight strategy was used for the final ensemble. The final model achieved 74.43 at level 2 among all NS in the WOD testing set. 

“Compared with the 2D detection challenge in 2020, the real-time 2D detection challenge requests this year suggest that the submitted model must run faster than 70 ms per frame on an NVIDIA Tesla V100 GPU. We conducted a SOTA 2D object detection framework in this challenge,” said the researchers. 

The process

In a research paper ‘2nd Place Solution for Waymo Open Dataset Challenge — Real-time 2D Object Detection,’ the authors from Tianjin University and Didi Chuxing — Yueming Zhang, Xiaolin Song, Bing Bai, Tengfei Xing, Chao Liu, Xin Gao, Zhihui Wang, Yawei Wen, Haojin Liao, Guoshan Zhang and Pengfei Xu–proposed a method to detect 2D objects in real time.

The researchers aggregated multiple one-stage object detectors and trained the models of various input strategies independently to yield better performance for accurate multi-scale detection of each category, especially for small objects. “For model acceleration, we leveraged TensorRT to optimise the inference time of our detection pipeline,” Zhang said. 

Deep learning-based object detectors are divided into two, a one-stage detector and a two-stage detector. The team aggregated the popular one-stage object detectors with different settings. “Considering the runtime, we chose the popular one-stage detector YOLOR, which is one of the upgraded versions from YOLOv1, and achieved the state-of-the-art of real-time object detection on COCO,” said Song. 

Further, he said, compared with other detection methods on the COCO dataset, the mAP of YOLOR is 3.8% higher than PP-YOLOv2 at the same inference speed, and the inference speed has been increased 88% at the same accuracy compared with Scaled-YOLOv4-P7. “We choose YOLOR as our detector,” added Zhang. 

The pipeline of their solution (Source: arXiv) 

Additionally, the researchers used scale enhancement techniques and K-Means clustering to generate the hyperparameters for real-time 2D detection on the Waymo Open Dataset. Plus, TensorRT for model accelerating. TensorRT is a high-performance deep learning optimiser and can significantly reduce the inference time while reserving accuracy. 

Also, they have used the self-learning method, where the model automatically cleaned the dataset during the model training process and improved the model performance by leaps and bounds. 


Based on the YOLOR detector, the researchers obtained the baseline model and carried out five improvement experiments to improve the results. After applying all of them (i.e. auto data cleaning, multi-scale training, scale enhancement, independent threshold-NMS and model ensemble), their mAP increased from 61.54 percent to 66.67 percent, a spike of 5.13 percent. 

 (Source: GitHub/Waymo)

To achieve robust detection results, the researchers have aggregated the popular one-stage object with the scale enhancement strategy to detect a smaller object. With these techniques in place, their detection framework achieved second place in the real-time 2D detection track of the WOD challenges, while researchers from LeapMotor achieved first place. 

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM