Listen to this story
Computer vision is one of the most buzzing fields of AI. Major companies are dedicating massive amounts of resources to launch the next big thing in this field. One project that has truly stood out in recent years is YOLO – You Only Look Once. Introduced first in 2015 by Joseph Redmon et al via a paper titled, “You Only Look Once: Unified, Real-Time Object Detection,” it is considered a breakthrough in this field.
Over the years, this model has undergone several iterations and advancements. Version 2 was released in 2016 (YOLO9000: Better, Faster, Stronger), followed by YOLOv3 (YOLOv3: An Incremental Improvement) in 2018, YOLOv4 (YOLOv4: Optimal Speed and Accuracy of Object Detection) in April 2020, and YOLOv5 in May 2020.
Sign up for your weekly dose of what's up in emerging technology.
YOLOv6 was recently introduced by Chinese company Meituan. It is not part of the official YOLO series but was named so since the authors of this architecture were heavily inspired by the original one-stage YOLO. It uses the prefix MT in its name.
YOLOv6 is a target detection framework dedicated to industrial applications. As per the company’s release, the most used YOLO detection frameworks – YOLOv5, YOLOX, and PP-YOLOE – leave a lot of room for improvement in terms of speed and accuracy. Recognising these ‘flaws,’ Meituan has introduced MT-YOLOv6 by studying and drawing further on the existing technologies in the industry. The MT-YOLOv6 framework supports the entire chain of industrial applications requirements like model training, inference, and multiplatform deployment. According to the team, MT-YOLOv6 has carried out improvements and optimisations at the algorithmic level, like training strategies and network structure, and has displayed impressive results in terms of accuracy and speed when tested on COCO datasets.
Unlike YOLOv5/YOLOX, which are based on CSPNet and use a multi-branch approach and residual structure, Meituan redesigned the Backbone and Neck according to the idea of hardware-aware neural network design. As per the team, this helps in overcoming the challenges of latency and bandwidth utilisation. The idea is based on the characteristics of hardware and that of inference/compilation framework. Meituan introduced two redesigned detection components – EfficientRep Backbone and Rep-PAN Neck.
Further, the researchers at Meituan adopted the decoupled head structure, taking into account the balance between the representation ability of the operators and the computing overhead on the hardware. They used a hybrid strategy to redesign a more efficient decoupling head structure. The team observed that with this strategy, they were able to increase the accuracy by 0.2 per cent and speed by 6.8 per cent.
In terms of training, Meituan adopted three strategies:
Anchor-free paradigm: This strategy has been widely used in recent years due to its strong generalisation ability and simple code logic. Compared to other methods, the team found that the Anchor-free detector had a 51 per cent improvement in speed.
SimOTA Tag Assignment Policy: To obtain high-quality positive samples, the team used the SimOTA algorithm that dynamically allocates positive samples to improve detection accuracy.
SIoU bounding box regression loss: YOLOv6 adopts the SIoU bounding box regression loss function to supervise the learning of the network. The SIoU loss function redefines the distance loss by introducing a vector angle between required regression. This improves the regression accuracy, resulting in improved detection accuracy.
YOLOv5 vs MT-YOLOv6
According to the benchmarking performed by Meituan’s team, YOLOv6 outperforms YOLOv5 and other YOLO models in terms of accuracy and speed on the COCO dataset. YOLOv6-nano achieved a 35 per cent AP accuracy on the COCO dataset; it could reach 1242 FPS performs, and when compared to YOLOv5-nano, the accuracy was up by 7 per cent AP and speed by 85 per cent. YOLOv6-tiny recorded 41.3 per cent AP accuracy on COCO, and compared to YOLOv5-s, the accuracy was increased by 3.9 per cent and speed by 29.4 per cent. At last, YOLOv6-s obtained an accuracy of 43.1 per cent on COCO. It achieves a performance of 520 FPS compared to YOLOX-s – the accuracy is 2.6 per cent AP better and speed by 38.6 per cent.
As per a few discussion threads and blogs, YOLOv6 is not a straight upgrade of YOLOv5 from Ultralytics. It is observed that while MT-YOLOv6 can detect smaller objects more reliably, it flickers as compared to YOLOv5 and struggles with close-up objects. Compared to YOLOv5, MT-YOLOv6 lacks stability but makes up for impressive capabilities in small object detection in densely packed environments. In terms of flexibility, YOLOv5 uses YAML, and YOLOv6 defines the model parameters directly in Python. It was observed that YOLOv5 is more customisable than YOLOv6.
What future does YOLOv6 hold?
Meituan’s team wants to improve the full range of the model further and advance the detection performance. The team said that the model will support ARM platform deployment and full-chain adoption, such as quantitative distillation. They would like to explore the generalisation performance of YOLOv6 in different business scenarios.