In the developer series, Behind The Code, we reach out to the developers from the community to gain insights on how their journey started in data science, what are the tools and skills they use and what’s essential for their day-to-day operation.
For this week’s column, Analytics India Magazine got in touch with Abhishek Thakur, the Chief data scientist at boost.ai. to get a glimpse of his journey in becoming the world’s first Kaggle triple grandmaster and many more.
How It All Began
Abhishek Thakur, world’s first Kaggle triple grandmaster and chief data scientist at boost.ai has become one of the most popular contributors in the ML community. However, his journey into this field has not been straightforward
On being asked whether his foray into the world of algorithms was an accident, Abhishek spoke about how his first stint with algorithms came during his graduation days, in the form of image processing algorithms. This later laid the foundation for his future endeavours and also fetched him an internship at the University of Warwick to work on medical image processing.
After completing his Bachelors in Electronics Engineering from NIT Surat, Abhishek went on to get a master’s degree from the University of Bonn, Germany. His fascination for computers made him choose computer science for his master’s degree.
While he is doing his Master’s, he equipped himself with some practical knowledge by working at Fraunhofer where he implemented OCR algorithms on microcontrollers.
After graduating from the University of Bonn, he worked as a data scientist in Berlin, which didn’t last long. His unquenched curiosity to learn more and do research introduced him to the sophisticated world of machine learning.
“I tried taking lectures on data mining and machine learning at the university but failed miserably. Most of my machine learning has been self-taught,” admits Abhishek speaking about early days of his journey into AI.
Self-learning requires a lot of dedication and practice. Abhishek would dedicate some time for his thesis and the rest for machine learning practice. To squeeze more out of a day, he would even spend many sleepless nights at Fraunhofer. But it is totally worth it, insists Abhishek.
“I was more interested in applied after gaining the theoretical knowledge but the lectures limited themselves to theory,” says Abhishek, talking about his craving for a hands-on experience.
Unknowingly, Abhishek was already on the path which would later fetch him the Kaggle crown.
Making Of A Kaggle Grandmaster
A Kaggle triple grandmaster is one who has achieved grandmaster status in competitions, kernels and discussions on Kaggle. Being amongst the top 10 in a Kaggle competition can be considered as a decent achievement. One can even go ahead and contribute a relevant Kernel or participate in discussions. Topping all three is no mean feat and no one might even have thought of it until Abhishek Thakur showed how it is done.
“I did not use any books. Just internet, research papers, blogs and YouTube videos to understand the concepts and Kaggle to apply what I learned”
The much-needed nudge was provided by one of his friends when he spoke about a promising platform that holds machine learning competition and goes by the name ‘Kaggle’.
His first competition on Kaggle was on facial recognition. The timing was just right for Abhishek. His machine learning hands-on experience was coming to fruition and then there is this Kaggle competition on image processing, which also coincidentally has been part of his projects since his graduation days.
According to the competition, the participants were tasked with finding the features like the angle between eyes and lips to recognise emotions along with other tasks to complete the competition.
However, Abhishek failed miserably at his first competition and ended up with a low rank.
“I couldn’t do any machine learning,” laments Abhishek remembering his first Kaggle attempt.
Instead of succumbing to the bitter after taste of failure, Abhishek skimmed through the solutions of the winners, read relevant papers and started implementing them on his own.
He inculcated a healthy diet of solving previous Kaggle competitions on his own, checking the successful solutions and getting to the bottom of the approaches with the help of Google. This went on for almost 10 months during which, he finished his thesis and also landed a job as a Data Scientist in Berlin.
“I first try to understand the problem and then build basic algorithms to solve that problem. This way, I build a ‘benchmark’ and then try to improve on the benchmark,” advised Abhishek when asked about how he would proceed with a problem.
His consistent efforts and discipline won him his first Kaggle gold 6 years ago. After 18 more golds and many other medals, he achieved a world rank of 3 and then went on to become the world’s first triple grandmaster earlier this year.
“A right proportion of hard work, dedication, persistence, never giving up attitude and luck are the most important ingredients that helped me,” adds Abhishek.
After 6 fruitful years in Germany, Abhishek decided to take a giant leap into the world of data science with boost.ai.
Boost.ai is a Norwegian company that has been developing AI-powered direct messaging since 2016.
He realised that at boost.ai people were building something really extraordinary and decided to move to Norway permanently where he has been working as a Chief Data Scientist since 2017.
As a Chief Data Scientist, Abhishek builds the Natural Language Processing and Natural Language Understanding components and the deep learning models. He is currently tasked with improving the algorithms that provide answers in the bot and develop next-generation conversational AI platforms.
About The Current State Of NLP
“I think now is the right time for Natural Language Processing/Understanding (NLP) and that’s what is happening,” posits Abhishek when asked about the current state of NLP. He also likens current sporadic rise in NLP research to new benchmarks to that of computer vision, half a decade ago.
The NLP community witnessed the rise and rise of BERT ever since its introduction last year. It has even beaten all the benchmarks on GLUE (General Language Understanding Evaluation).
“The benchmarks set by BERT and its contemporaries are great to solve almost all kinds of NLP problems,” opines Abhishek on the success of BERT and its variants.
However, he also warns about the resource-hungry nature of these algorithms and the challenges that follow them when deployed for production.
What Makes A Good Data Scientist
A time-series dataset has to be processed in a different manner compared to a regular tabular dataset.
For me, every new dataset or problem is an adventure.
Abhishek’s fascination with data science comes from playing around with different algorithm and improving the existing ones.
He is a frequent user of Tensorflow for NLP problems. Whereas, he prefers PyTorch for image problems.
When it comes to favourite Python libraries, he stresses the significance of Scikit-learn and how it provides many necessary components to put a model into production.
There is no dearth of libraries or frameworks one can use these days. There are many libraries one can use to build machine learning or deep learning models. It’s all good as long as one understands what is happening in the background.
Along with scikit-learn and pandas, xgboost and lightgbm also have been part of many of Abhishek’s machine learning models.
When it comes to cloud services, Abhishek is fond of Amazon AWS while he also confesses that he is slowly developing a liking for Google Cloud Platform.
“Most of the wannabe data scientists think that this field is only about creating cool graphs and making models that can be shown off to a bunch of people,” says Abhishek when asked about the hype around data scientists.
He expresses his utter disbelief at the latest obsession with presentations rather than the implementations.
“If it’s in the presentations, it’s fine, if it’s in production, it’s usable,” quips Abhishek.
He also highlights how challenging data collection can be and how useless a data scientist in the absence of data. All problems don’t need the expertise of a data scientist. Few can be solved with conventional approaches. The realisation of this, warns Abhishek, will derail the AI hype train at the enterprise level.
A Word For The Beginners
Being a self-taught machine learning engineer, Abhishek stands firm on the idea of utilising the resource-rich internet for learning the fundamental concepts.
I have seen that a lot of beginners tend to give up too quickly.
He lists courses such as the ones by Andrew Ng is the place to start for those who are serious about making it big in the field of ML.
He also advises the aspirants to supplement online courses by reading blogs, reading papers on arxiv and skimming through discussion forums to fortify theoretical foundations of the subject.
For getting a hands-on experience, not so surprisingly, Abhishek suggests the newcomers to take up Kaggle challenges.
“Once you have solved a few problems, it will become very easy for you to start approaching machine learning problems just by looking at the data. Once that happens, you are no longer a beginner,” advises the Kaggle grandmaster.
Abhishek bets big on perseverance and he blames the upcoming aspirants for the lack of it. He is very passionate about the practical aspects of this job and recommends the beginners to do the same.