The library aims to solve the problem of detecting objects with low detection scores, e.g. occluded objects that are simply thrown away, and bring non-negligible true objects missing and fragmented trajectories. ByteTrack uses a simple, effective and generic association method, tracking by associating every detection box instead of only the high score ones. For the low score detection boxes, it utilizes their similarities with tracklets to recover true objects and filter out the background detections.
ByteTrack uses BYTE technology, which is different from traditional methods, which only keep the high score detection boxes. BYTE keeps every detection box and separates them into high score ones and low score ones. It first associates the high score detection boxes to the tracklets. Some tracklets get unmatched because it does not match to an appropriate high score detection box, which usually happens when occlusion, motion blur or size changing occurs. Then it associates the low score detection boxes and these unmatched tracklets to recover the objects in low score detection boxes and filter out the background simultaneously.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
The input of BYTE is a video sequence V, along with an object detector and the Kalman filter. ByteTrack is equipped with a high-performance detector named YOLOX, along with the association method BYTE. YOLOX switches the YOLO series detectors to an anchor-free manner and conducts other advanced detection techniques, including decoupled heads, strong data augmentations, such as Mosaic and Mixup with effective label assignment strategy SimOTA, to achieve state-of-the-art performance on object detection
ByteTrack was evaluated on the half validation set of MOT17 using different combinations of training data. When using only the half training set of MOT17, the performance achieves 75.8 MOTA, outperforming most methods. This is because it uses strong augmentations such as Mosaic and Mixup. When further adding CrowdHuman, Cityperson and ETHZ for training, we can achieve 76.7 MOTA and 79.7 IDF1.
ByteTrack is very robust to occlusion for its accurate detection performance and the help of associating low score detection boxes. The model also sheds light on making the best use of detection results to enhance multi-object tracking. The research team hopes that the high accuracy, fast speed and simplicity of ByteTrack can make it attractive and effective in real applications.