“I did not have lines in the resume that showed my ML expertise. I did not have a Data Science industry experience or relevant papers. Recruiters ignored my resume.”
For this week’s ML practitioner’s series, Analytics India Magazine got in touch with Vladimir Iglovikov, an ex-Spetsnaz, theoretical physicist and also a Kaggle GrandMaster. In this exclusive interview, he shares valuable information from his journey in the world of data science.
On Initial Struggle
After a brief stint in Russian special forces, Iglovikov enrolled for the Master’s programme in theoretical Physics at the St.Petersburg State University whose distinguished alumni include President Vladimir Putin.
In September 2010, Iglovikov moved to California to pursue a PhD in Physics from UC Davis and on completion of the degree, he moved to Silicon Valley in the summer of 2015. Currently, Iglovikov works as Sr. Software Engineer at Lyft, a ride-sharing company that operates in the United States and Canada. His work is centered around building robust machine learning models for autonomous vehicles at Lyft, Level5.
Post PhD, Iglovikov had two options in hand. One was to pursue postdoc, and the other was to get into the industry as a software engineer. He picked neither. His career took a new turn when one of his friends introduced him to the world of data science.
“I attended a lecture where the presenter talked about Data Science as the 4th paradigm of scientific discovery. University taught me theoretical science and the method of numerical simulations. Mastering a new paradigm felt like seeing the world in more dimensions, and it was the turning point,” said Iglovikov.
Talking about the challenges he faced in the initial days, he said that he never had a formal ML education nor had a mentor or friends who were interested in the same. He was self-taught, and the learning became easier after he discovered ods.ai.
But, the theory would take one only so far. One needs to show expertise to land an ML job. “I did not have lines in the resume that showed my ML expertise. I did not have a Data Science industry experience or relevant papers. Recruiters ignored my resume,” remembered Iglovikov. Soon after he landed his first job, recruiters started swarming him on Linkedin with new opportunities.
“If you have a high-quality code, it does not mean that your company will do well, but most likely, if it does not, technical debt will kill it.”
Highlighting the gap between academic practices and that of industry, Iglovikov spoke about the inadequacy of his own coding skills. “I was writing code in graduate school, and it worked, I published papers, wrote a thesis, and graduated. But the code was inefficient, hard to read, hard to maintain,” he said. “In industry, you need machine learning and strong software engineering skills. I did not have them at the beginning of my journey.”
Building a competitive, scalable tech company requires employees who write high-quality code. If you have a high-quality code, it does not mean that your company will do well, but at least the technical debt is taken care of.
Like data science, his foray into Kaggle too has been accidental. In one of his early courses on Coursera, one of the lectures spoke about Kaggle. He immediately joined a competition, and in a couple of years, he made it to the top 20 on global leaderboards.
On Kaggle Journey
“The tradition to describe your winning solution at the Kaggle forum in detail was not enforced; it was born within a community.”
Iglovikov looks at Kaggle as an excellent place to develop machine learning muscles. He likened the top Kagglers to powerlifters. Furthermore, drawing similarities between sports and Kaggle, he underlined the importance of hard work, mentorship and teammates.
“The world of machine learning is competitive. The number of participants is huge; the number of top places is limited. You compete with those who are skilled, have more hardware, study the topic of the competition for many years in the University, or make money for living with it. To be better than others, you need to change the way you think, the way you study, how you write your code, and how you deal with failures. Your final placing is directly related to the number of ideas that you check during the competition. The main difference between top Kaggler and the new one is experience,” said Iglovikov
Though he fetched his first gold medal at Kaggle going solo, he warns that it might not be the most effective way. The earlier one finds passionate ML people, the better. The standard practice, recommends Iglovikov, is to look at the leaderboard for people that have similar standing. That said, he also reminded of the dangers of teaming up with the wrong person.
“If you form a team with a person who did not make a submission themself, most likely, they would overestimate their excitement and skillset. They will stop being engaged with the problem in the middle of the competition. And there is no way to kick such a person off the team once it was formed,” warned Iglovikov.
Talking about his fondness for Kaggle, Iglovikov pointed out the scale at which Kaggle operates. Typically, ML competitions barely have 10 solid teams. Whereas, Kaggle draws in a huge crowd for every competition.
Added to this is the unlimited learning resources that the platform offers. “People share code in kernels, and information about their approaches at the forum. The tradition to describe your winning solution at the Kaggle forum in detail was not enforced; it was born within a community,” said Iglovikov. For instance, last year, he helped organise a Lyft 3D object detection challenge on Kaggle, which raked in more than 500 teams around the world.
“To write good code, you first need to write 100500 lines of the bad code. Machine learning is similar.”
When asked about his machine learning toolstack, Iglovikov said that his main deep learning framework is Pytorch. He uses Catalyst and Pytorch Lightning for training. For image augmentations, he uses Albumentations, the popular library of which he also happens to be a key contributor.
Iglovikov’s hardware setup looks as follows:
- GPU: 4x2080Ti
- AMD Ryzen Threadripper 3970x
- 128GB RAM
- 20 Tb+ of various SSD and HDD drives
This setup is good enough for prototyping, and when he needs something beefier, he prefers cloud platforms such as AWS or GCP. “Sometimes, I try smaller hostings; for example, I had a positive experience with the Hostkey. Recently I had a conversation with the CEO of Q blocks. They have the initiative to give free computers to active Kagglers. Feel free to reach them out,” said Iglovikov.
Tools become irrelevant if a machine learning engineer is out of touch with proper ML pipeline practices. Iglovikov touched upon this topic in detail. “The most important thing that you need to do in the beginning is to build an end-to-end pipeline that maps the data into a cross-validation score,” he said. “It can be hackish, the code could be of bad quality, but such a pipeline will unveil issues with the data, hardware, or models that you would never guess.”
Source: The Lean Startup by Eric Ries
Once the pipeline that maps the data to cross-validation score and the leaderboard score is ready, the next step is to tinker with new ideas. “You can get these ideas from the literature, Kaggle forum, your rich imagination, or any other source. It was fine in the previous step to have a low-quality code. At this step, you will fix it. It is the time of massive refactoring. Your code should become more modular with every idea you implement,” advised Iglovikov.
“To be better than others, you need to change the way you think, the way you study, how you write your code, and how you deal with failures.”
“It is essential to be a good programmer. Hence I would recommend focusing on software engineering skills. They will not get out of fashion for many decades. The better your code is, the more productive you are. The best way is to join the company with high code standards and learn from your colleagues. But if you are still in ace media, you may follow advice from my article Nine Simple Steps For A Better-looking Python Code. It could be an excellent first step.”
On The Future Of ML
Five years ago, Iglovikov continued, it was all about research advancements. It was a phase of active exploration, the era of hype. Companies were interested in researchers. Today, the hype is fading away; machine learning as a field is becoming mature.
“People that are not used to ML do not understand that it is not deterministic but probabilistic. In software engineering, we implement A, and it will be able to do B. It is a commitment, and partner teams can plan their actions based on this information. In machine learning: we will implement A, it may work with accuracy B, but we are not 100% sure. It may not work at all. We will try approaches C and D; they may work,” he explained.
The question is not what ML can do, asserted Iglovikov, but how to use it to make an impact. Companies are more interested in machine learning engineers than researchers; engineers with strong software engineering skills because these are the people who build solutions that bring value to the businesses.
“I believe the most widely used algorithm in 10 years will be the same as today, and the same as 10 years ago. And this algorithm is logistic regression,” quipped Iglovikov. That said, he also hopes that reinforcement learning will find industrial use soon.
Reiterating on the need for communication, Iglovikov urged beginners to write blog posts explaining whatever they have learnt. Iglovikov listed a few such popular reading sources:
- Clean Code
- Clean Coder
- The Pragmatic Programmer
- Factoring: Improving the Design of Existing Code
- Designing Data-Intensive Applications
Apart from these, the only book about machine learning that he ever read was Deep Learning by Ian Goodfellow. He prefers reading blog posts and papers about ML instead. On a concluding note, he recommended Mastery by Robert Greene that talks about the similarities in the path to the top by the likes of Leonardo Da Vinci and Albert Einstein.