An Accenture Analyst’s Affair with the Beloved Vision AI YOLOv5

YOLO are the most important algorithms used in terms of the ratio between mean average precision and inference time.

Share

Published on January 17, 2023

by Poulomi Chatterjee

Listen to this story

For Alessandro Mondin, the shift towards AI/ML research hasn’t exactly been seamless. Mondin was a market analyst initially, following which he started participating in hackathons of his own accord. He currently works as a digital transformation analyst with Accenture Song (previously known as Accenture Interactive). However, Mondin has come far while continuing to work on AI/ML projects outside of his professional role.

Analytics India Magazine spoke to Mondin about his descriptive work with the YOLOv5 model architecture, the future of computer vision and what motivates his independent research.

AIM: We wanted to understand more about your article explaining YOLOv5. Can you tell us how long it took you and what went into it?

Alessandro: Since I’m not a professional or working as a researcher within the industry, this is a side project I have been working on to build a portfolio that can help me eventually. I decided to work with YOLOv5 because a friend of mine working in the industry told me that everyone uses it but there wasn’t a lot of explanatory work around it. Most of the work around it just describes how to use YOLOv5 but they don’t explain how the model architecture is.

It took me a long time, especially because if you don’t have expertise with the YOLO architectures, I had to attempt to understand them from scratch and they’re quite tricky. So, overall it took me more than four months. Building inside the architecture was quite fast and took me between three weeks and a month. But the training pipeline is the really tricky part and time consuming. The loss functions and the data loader—these are the parts that are really, really long, especially if you’re doing them from scratch. The model takes up about 20% of the time.

AIM: What is the importance of the YOLO series in computer vision?

Alessandro: I was trying to deepen my knowledge in object detection and YOLO are the most important algorithms used in terms of the ratio between mean average precision and inference time. So, they are very accurate and the inference speed is incredible. Additionally, you can also use them on the CPU. So, they are single shot detectors and strike a great balance between average precision and speed.

So, for YOLOv5, there are no articles that dive deep into the details like mine. I tried to look around and I found one article but even that lacked sufficient examples. Most articles will tell you how to use it but not how it works. But this explanation is extremely useful for other researchers and engineers to understand atomically how YOLOv5 performs inference.

AIM: Can you describe what your current role entails?

Alessandro: I’m currently working as a digital transformation analyst with Accenture Song as a consultant and I’m trying to build a strong resume because I have a deep interest in AI/ML. I come from a non-technical background so HRs obviously end up choosing engineers that typically come out of STEM. So, I am trying to prove that my interest in the area is stronger than my atypical background.

AIM: How has the transition from a market analyst to data science and now AI/ML research been for you? What were the challenges you faced?

Alessandro: I’m not going to lie, it is tough. You really need to put in the work. But if you’re really passionate and if you’re willing to put in a lot of hours, hundreds of hours, you end up learning a lot. And that’s the most important thing in the beginning.

So, being an amateur, all the knowledge and all my points of view are not the points of view of the professionals working for ten years in the industry. Everything I have picked up has been on my own.

AIM: What are some of the biggest trends in computer vision that you are looking forward to right now?

Alessandro: The problem with working on side projects is that you cannot really dive into two exciting topics because you lack GPUs. I know that the tech industry is really investing into action recognition right now. There’s a lot of work with vision transformers. With regard to blacktips of papers, I’m really interested in the touch points between computer vision and natural language processing, which is image captioning. I want to work more in these areas now.

AIM: While you were implementing YOLOv5 from scratch, what were some of the biggest lessons that you learned from that project?

Alessandro: First off, you have to understand that we are not doing anything new, we are not implementing new algorithms or discovering something but you’re learning about something that is pre-existing —and this was the main reason for me to do it. I asked my friend about what would be the best algorithm that could be helpful to someone and YOLO was the one.

It could be useful for anyone who uses YOLO-based architecture already and would like to understand how the model performs detections because there are plenty of ML engineers and computer vision engineers that just use the repository. So, essentially the main lesson is that you have to be very interested in the subject matter itself without looking for any kind of prize in the end—your goal has to be to learn.

Access all our open Survey & Awards Nomination forms in one place