“Machine learning is to businesses today what petrol engines were to horses back then.”
For this week’s ML practitioner’s series, Analytics India Magazine got in touch with Eugene Khvedchenya from Ukraine. Eugene is a Kaggle master and is currently ranked 104 on the global leaderboard. He has more than 10 years of experience in developing computer vision applications, and in this interview, he shared few valuable insights from his decade long journey.
Eugene started programming from a young age ever since he saw his father assembling Orion 128 PC, a popular DIY PC in the early 90s. “I was fascinated by painting lines and circles on the CRT display with code and made programming my hobby,” said Eugene. “At that time, we didn’t have the internet, and the only source of documentation was a hard copy of the Turbo-C documentation book printed from a xerox machine. Since then, I have developed the zeal to dive deep into any problem.”
Eugene has a Master’s degree in Computer Engineering, and most of his programming expertise is self-taught. “I learned C++, C#, Python, Java, .NET stack by “learning by doing” principle,” said Eugene.
Eugene’s introduction to machine learning happened at the University level Olympiad in Khmelnytskyi, Ukraine where he had to solve an image classification algorithm for classifying vegetables based on an image’s statistical properties (mean colour, deviation, shape, etc.).
Eugene topped the Olympiad without the knowledge that he was actually solving a proper machine learning problem. Later, using the same strategy of learning from the first principles, Eugene mastered the fundamentals of machine learning. He solved real-world problems and acquired a decent understanding of computer vision etc.
He started his career as a Research Engineer where he was part of a team that developed a monocular SLAM (Simultaneous Localisation and Mapping) system for the automotive industry. Back then, the number of scientific publications on SLAM were few, so he and his team had to develop many photogrammetry algorithms from scratch in C++. “I remember my shock when I saw the papers and math formulas that we were working with for the first time. They were on a whole new level compared to what we’ve studied in the university,” said Eugene about his first job.
“But, even scientific papers can be split into smaller pieces and digested one by one. So eventually, I got used to reading and understanding scientific papers.” Today, Eugene has more than 10 years of experience in computer vision and provides consulting services to companies who want to apply ML/AI in their businesses. He is also a Kaggle competitions master and is part of the popular Albumentations library core team.
About Kaggle Days
“To me, grinding Kaggle rating is not the goal. I am curious by nature and love challenges.”
Eugene currently ranks 104 in the global Kaggle rankings with four gold, two silver and three bronze medals in only 19 competitions. “I think two facts played here – I am curious by nature and love challenges. Kaggle has to offer something that I cannot withstand – new challenges, new domains,” said Eugene.
Thanks to his experience with computer vision, Eugene gets a head start in most of the competitions. Moreover, he uses these challenges as an opportunity to try new deep learning model architectures and other advancements.
Over the years, Kaggle has become one of the most lucrative platforms for data scientists across the world. With increasing popularity, the complexity of the contests has increased too. “It has become harder and harder to compete on Kaggle,” admits Eugene. “The overall strength of the community grows. Frameworks and libraries become more and more powerful. Three years ago, one could win an image segmentation challenge with a single U-Net model. These days, you have to be very creative with your solution and always study new research papers to compete in a silver/gold zone.”
During the past two years of active Kaggling, Eugene has made boilerplate templates of image classification, segmentation, and object detection that are used repeatedly. He uses these templates for reading data whenever he enters a new competition. Eugene strongly suggests everyone to make similar templates of their own. “But here’s the catch – you don’t want to take “starter kits” for public Kaggle Kernels. Only when you write code by yourself, make mistakes, fix them, cover it with tests, you grow as a professional. You learn only from your own mistakes. Why would you want to have someone else make mistakes in your pipeline?” said Eugene.
“I love a challenge beneath every new challenge. Some Kaggle challenges require a deep dive into the domain.”
What fascinates Eugene about Kaggle competitions is the challenge beneath every new challenge. Some Kaggle challenges would even require one to dive deeper into a specific domain. The knowledge gained from these challenges is invaluable and overall, a win-win outcome for the participant. Along with this, one can also leverage the Kaggle platform to get in touch with domain experts who are more than willing to share their expertise with the community.
When it comes to libraries and frameworks, Eugene uses PyTorch and Catalyst as the main workhorse frameworks for deep learning tasks. He has also tried other frameworks like TensorFlow, CNTK, Keras, but was charmed by PyTorch, and it’s API.
And, of course, Albumentations, which is one of the most popular CV libraries used for image augmentations tasks and provides a diverse set of transformations. Eugene not only uses it extensively but is also a part of the core team that wrote this library.
Eugene has also developed a PyTorch-toolbelt library to make it convenient to reuse code from previous competitions while participating in Kaggle. This library is an open-source python package with many utility functions, loss functions, models, and callbacks which he re-uses in almost every competition.
“To me, it is an understanding that no machine learning model works in a vacuum. I believe that ‘engineer’ is the central part of the “machine learning engineer” title. So we have to think not only about “model.train()” nuances but also on how we write code, test, deploy and monitor our model. What is the performance, hardware limitations, how do we detect concept drift and anomalies in the production environment? It’s tempting to say that it’s a duty of DevOps, Data analyst, or someone else. But the truth is if you don’t think outside your scope, your solution could be sub-optimal,” advised Eugene.
For newcomers in ML domain, Eugene recommends the following books:
- Multiple View Geometry in Computer Vision by Richard Hartley and Andrew Zisserman
- Pattern recognition and Machine learning by Christopher M. Bishop
These books, according to Eugene, are rich in information, concise, concentrated, and every chapter is demanding. “It was not easy reading, but you have to get comfortable with ‘not understanding’ feelings. It is normal if you don’t get the idea of the paper crystal-clear after reading it. You may have to re-read paper tens times before all the puzzles get into the right place,” said Eugene.
On The Future Of AI
Talking about the hype around AI and ML, Eugene said that the demand in machine learning would not decrease, but on the contrary, will end up touching many lives. Scepticism and denial existed even during the early days of petrol engines, and today it is still there; only the domain has changed. The amount of data generated is exponential, and without ML-assistance, we can’t even sift through related posts on social media. Going forward, Eugene thinks that we shall soon witness the dawn of autonomous driving & flying along with tremendous developments in the healthcare industry.
“What we see now is not a hype but rather a practical realisation of 20-30 years old theoretical studies. No one could have dreamt of it just ten years ago. Thanks to GPU and TPU computing power, training models with billions of parameters in a reasonable amount of time has become possible,” said Eugene
Regardless of its prevalence, Eugene believes that the future of ML will be more regulated. “There will probably be a profession of “AI-advocate” or “ML-lawyer” who will ensure ML-powered systems are unbiased and can explain their predictions,” speculated Eugene.
However, Eugene also warns about getting too excited about AI itself. In a nutshell, he said, it’s just a tool to help businesses make more money. To him, machine learning will transform businesses in the same way petrol engines did with transportation.
On a concluding note, for the aspirants, he has one advice: Demand the most of yourself, try to become a better specialist and better person every day. Study. Write code. Participate in Kaggle. But keep a healthy work-life balance. Eugene wrapped up the interview by quoting MIT researcher Lex Fridman, “Do things you don’t want to do in the first order. Build that mental muscle”.