Real Time object detection is a technique of detecting objects from video, there are many proposed network architecture that has been published over the years like we discussed EfficientDet in our previous article, which is already outperformed by YOLOv4, Today we are going to discuss YOLOv5.
YOLO refers to “You Only Look Once” is one of the most versatile and famous object detection models. For every real-time object detection work, YOLO is the first choice by Data Scientist and Machine learning engineers. YOLO algorithms divide all the given input images into the SxS grid system. Each grid is responsible for object detection. Now those Grid cells predict the boundary boxes for the detected object. For every box, we have five main attributes: x and y for coordinates, w and h for width and height of the object, and a confidence score for the probability that the box containing the object.
YOLO v1 was introduced in 2016 by Joseph Redmon et al with a research paper called “You Only Look Once: Unified, Real-Time Object Detection”. This was the initial paper by Redmon that revolutionized the industry and changed the Real-Time Object detection methods totally.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
By just looking at the image once, it can detect the objects with a speed of 45fps(frames per second), another YOLO v1 type, Fast YOLOv1 was able to achieve 155fps with little less accuracy.
It used the Darknet framework that was trained on the ImageNet-1000 dataset. But YOLOv1 has many limitations like
- it can’t detect the objects properly when the objects are small
- it also can’t generalize the objects if the image is of different dimensions
The second version of YOLOv2 was released in 2017 by Ali Farhadi and Joseph Redmon. This time Joseph collaborated with Ali for major bug fixes and accuracy increment. The research they published was “YOLO9000: Better, Faster, Stronger.” The name of the second version of YOLO was YOLO9000. The major competitor of YOLO9000 was Faster R-CNN, which was also an object detection algorithm that uses Region Proposal Network & (SSD)Single-shot Multbox Detector to identify the multiple objects from an image.
Some of the features of YOLOv2 are:
- YOLOv2 added Batch Normalization as an improvement that normalizes the input layer of the image by altering the activation functions.
- Higher-resolution input: input size has been increased from 224*224 to 448*448.
- Anchor boxes.
- Multi-Scale training.
- Darknet 19 architecture with 19 convolution layers and 5 Max Pooling layers.
YOLOv2 performance on MS COCO dataset
After one year, on March 25, Joseph Redmon and Ali Farhadi came up with another version of YOLO and a research paper called: “YOLOv3: An Incremental improvement.”
At 320×320, YOLOv3 runs with 22ms at 28.2 mAP with great accuracy, as shown in the above video. It is three times faster than the previous SSD and four times faster than RetinaNet.
New YOLOv3 followed the methodology of the previous YOLOv2 version: YOLO9000. In this approach, Redmond uses Darknet 53 architecture, which was a significantly improved version and had 53 convolution layers.
Some of the new, improved features in YOLOv3 was:
- Class Predictions
- Feature Pyramid Networks(FPN)
- Darknet 53 architecture
As Redmond was not currently working on the CV for a long time, a new team of three developers released YOLOv4. It was released by Alexey Bochoknovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Alexey is the one who developed the Windows version of YOLO back in the days.
Some of the new features of YOLOv4 is:
- Anyone with a 1080 Ti or 2080 ti GPU can run the YOLOv4 model easily.
- YOLOv4 includes CBN(Cross-iteration batch normalization) and PAN(Pan aggregation network) methods.
- Cross-Stage-Partial connections(CSP), a new backbone to enhance CNN(convolution neural network)
- Self-adversarial-training(SAT): A new data augmentation technique
- DropBlock regularization.
After a few days of the release of the YOLOv4 model on 27 May 2020, YOLOv5 got released by Glenn Jocher(Founder & CEO of Utralytics). It was publicly released on Github here. Glenn introduced the YOLOv5 Pytorch based approach, and Yes! YOLOv5 is written in the Pytorch framework.
It is state of the art and newest version of the YOLO object detection series, and with the continuous effort and 58 open source contributors, YOLOv5 set the benchmark for object detection models very high; as shown below, it already beats the EfficientDet and its other previous YOLOv5 versions.
There is no official paper released yet and also many controversies are happening about its name. Now Let’s see some coding example that was published with its code at Github for learning purposes.
Pytotch inferences are very fast that before releasing YOLOv5, many other AI practitioners often translate the YOLOv3 and YOLOv4 weights into Ultralytics Pytorch weight.
We are going to see a starter tutorial on YOLOv5 by Ultralytics and going to detect some objects from our given image. Remember to change your runtime to GPU inside Colab. Fullnotebook is available here
- First, clone the YOLOv5 repo from GitHub to our Google colab environment using the below command.
!git clone https://github.com/ultralytics/yolov5 # clone repo
- Install the dependencies using the pip command
%cd yolov5 %pip install -qr requirements.txt # install dependencies
- Import some of the modules like a torch and display to display our output image inside the notebook.
import torch from IPython.display import Image, clear_output
- Download this custom image from here for testing
- Test using this command, detect.py runs inference on a variety of sources and will automatically download the latest model from here.
!python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images/ Image(filename='runs/detect/exp5/1.jpg', width=600)
You can also use the Yolov5 model using PyTorch Hub.
We have gone through the history of YOLO object detection models and also seen a simple tutorial to check the accuracy of this architecture. It is pretty awesome and fast, there are many other tutorials on the internet available to go into the depth of YOLOv5. If you want to explore more about YOLOv5, here are some of the tutorials you can refer to these tutorials: