“Whenever you compete, you have to accept simple rules – someone wins, someone loses, and usually the winner takes it all.”
For this week’s ML practitioner’s series, Analytics India Magazine got in touch with Oleg Yaroshevskiy from Ukraine. Oleg is currently ranked 24th on the Kaggle leaderboard. In this interview, he shares his experiences from his journey to the top 20 in one of the toughest data science competitions in the world.
On His Early Days
Oleg majored in maths and statistics from Cybernetics Faculty of Taras Shevchenko National University of Kyiv, which was co-founded by Victor Glushkov, one of the cybernetics pioneers who played a key role in the advancement of theoretical computer science, including artificial intelligence.
Oleg had a formal introduction to machine learning (ML) during his graduation days where he had studied neural networks along with the popular Andrew NG’s course on Coursera back in 2013.
But, what really propelled him into the world of AI is the famous article, “The Unreasonable Effectiveness of Recurrent Neural Networks,” by Andrej Karpathy, which made him switch from software engineering to machine learning. For his first hands-on experience with deep learning algorithms, he had trained a recurrent language model for two days on his laptop to generate some fancy poems!
So far, in his career, Oleg has mostly worked in solving classic problems of NLP, including speech processing and machine translation. In his current job at a stealth ML company, he and his team solve various problems using the most recent approaches.
“I think almost every ML Engineer passes this “jedi path” of leaky validation, exploding gradients, GPU OOM (out-of-memory), inconsistent experiments and so on.”
Talking about the challenges he has faced while switching to ML, Oleg said that there wasn’t a well-formed community or knowledge base back when he started. However, he also admits that he wasn’t ignorant of the existence of TensorFlow. Hence, he says he cannot complain much about the challenges.
Oleg says that the following courses and MOOCs helped in transitioning to an ML career:
- Machine Learning specialisation from Yandex and Udacity Deep Learning, to learn Python from scratch.
- Deep Learning book by Ian Goodfellow,
- Pattern Recognition and Machine Learning by Bishop
- NLTK book for NLP.
- Stanford’s “Computer Vision” (CS231n) and “Natural language processing with deep learning” (CS224n)
- mlcourse.ai or Kaggle data science courses.
Apart from these, Oleg suggests aspirants to make good use of the free YouTube lectures, which are plenty these days.
The reputation that Kaggle has garnered over the past few years has been unprecedented. It won’t be an exaggeration to say that Kaggle produced a new wave of data scientists. But, how significant is Kaggle when it comes landing a job as the aspirants would have to spend incredible amount of time to compete on a high level?
“Yes, for my recent career, Kaggle has played a significant role.”
Touching upon the same, Oleg says that solving real industry problems takes months and years, and many of those things can’t be learnt from Kaggle. However, even this limited experience equips one with an arsenal of ML tools and intuition that is priceless.
On His Kaggle Journey
“What I like the most on Kaggle is its competitive spirit.”
Oleg’s yearning for competitions was planted back in school days when he used to participate in the science olympiads. The problem solving and competitive nature of olympiads have appealed to him tremendously and followed him into the later stages of his career.
His tryst with ML contests began a couple of years ago when one of his friends invited him to join a hackathon, which was similar in format to Kaggle, where he fetched the third spot. He followed this up with the Kaggle’s TalkData competition, and within a month’s time, he already had his first Kaggle gold. Today, Oleg is ranked 24th in the global leaderboard and has already featured in the top 20.
When asked about his successful ascent to the top, Oleg modestly admitted that he still doesn’t consider himself to be at the top and he explained how his contemporaries from companies like RAPIDS AI or H2O.ai have been fetching gold in almost every competition. That said, Oleg also understands the fact that with every passing day, it gets harder to crack all ML problems as the knowledge base grows exponentially.
For newcomers, Oleg advises to accept that Kaggle is a competitive platform, and there will be wins, losses and usually the winner takes it all. And, if one is determined to win, they need to accept that it can be extremely stressful along with few ambiguities like the recent Deepfake competition fiasco, where the initial winners went home empty-handed.
On Approaching A Problem
While approaching an ML problem, Oleg says that he always tries to build a working prototype as early as possible in order to squeeze some time for error analysis. And then, after having a working estimator and it’s outputs he proceeds for exploratory data analysis (EDA) with confidence. The whole procedure involves understanding input data, out of fold results, target metric and identifying the real problem that needs to be solved.
“Dealing and “understanding” with data at real business tasks takes much more time than modeling itself.”
For instance, during one of his competitions on Tweet sentiment extraction, Oleg’s error analysis showed that the model failed to detect label noise, but the noise itself was deterministic to some extent. However, simple post-processing or pre-processing boosted the score significantly. Oleg firmly believes that building architecture or tuning hyperparameters are just not enough.
“Many deep learning competitions look like 1) pre-processing/pre-training 2) modeling 3) post-processing 4) semi-supervised learning/pseudo labeling etc. Most people concentrate on modeling, and they spend so much time to get additional 0.0001 running everything from their GPUs. But I’d recommend you to concentrate on other items, there are less competitors on those tracks,” explains Oleg.
Here are some of the frequently used tools by Oleg:
- Language: Python
- Framework: PyTorch
- Libraries: NumPy, sklearn, scipy
- Computation: Google Cloud, AWS
- ML models: seq2seq models (both recurrent or convolutional) and self-attention transformers.
“I started with having only a laptop with integrated GPU. For toy problems I think Google Colab is okay, also 10$/month for Colab Pro might be a reasonable price for TPU access. But I recommend ones to build their own devbox once they already have income from ML and no need to buy the latest hardware, you can start with one or two used GPUs and some simple CPU-motherboard configuration. I estimate it around $2-2.2k. Pick a power adapter carefully!” advises Oleg.
“Start now. Install Anaconda or whatever. Copy-paste some Github code. Run it. Tomorrow you’ll learn why it works but start now.”
For Oleg, his secret sauce to success can be distilled down to working hard, teaming up with the right people, investing in coding skills, reading papers and most importantly perseverance; being patient with failures. “Not only Kaggle but research work means a lot of experiments, week after week, month after month, and usually with no positive outcome. Sometimes when I’m asked about my strengths, I reply that I found ten thousand architectures that didn’t work. And that’s not far from the truth,” says Oleg.
On Black Boxes And Gate’s Law
“If you ask me what will stand in 10 years, I won’t answer you. But, in 100 years, linear regression, for sure!”
When asked about the hype around machine learning, Oleg began by quoting the Gate’s law, which states that we overestimate the impact of technology in the short-term and underestimate the effect in the long run. Though machine learning has been widely adopted over the past couple of years, some skepticism still remains within the community regarding its resource intense nature. Addressing the same, Oleg explained how the tools open sourced by big companies is a win-win scenario for all.
“They[big companies] also made available many pre-trained models, that dramatically reduced time and cost for small agents. And right now community also contributes much building datasets, fine tuning and training models (ex:Hugging Face) or developing software (for ex: fast.ai) and so on. I think that’s win – win game,”says Oleg.
He is optimistic that with unsupervised learning, many business problems can be solved with lesser resources. Given the increased adoption of AI, Oleg underlines the importance to move towards interpretable and explainable AI, and one of the reasons for that is to extend the current limits of application as we want AI to be reliable assistants and not “black boxes”.
That said, he also admits that he is not sure which technique would flourish in the next 10 years, but if insisted, he says he would bet on linear regression even 100 years from now.
“On a very high level, AI owes much of its connotation to mass culture and newsmakers (and that’s probably a nice way to sell your product nowadays) but people sometimes get disappointed when they first hear you are doing AI and then you “only” detect text sentiment. That doesn’t sound like HAL 9000 or J.A.R.V.I.S. But that makes our job a little mysterious, doesn’t it? And we are at the very beginning,” ponders Oleg.