The challenges in the real world get more complex and competing than an online competition. Hackathon might not paint the exact picture, and the success at these competitions should not be mistaken for expertise at the industry level. However, Kaggle, one of the world’s finest platforms for data scientists, gives aspirants the best possible introduction into the tricky world of data. Analytics India Magazine has been exclusively covering the stories of top Kagglers, and today we compile a few nuggets of wisdom from those interviews that can guide an aspirant.
“A right proportion of hard work, dedication, persistence, never giving up attitude and luck are the most important ingredients that helped me,” said Abhishek Thakur when asked about his Kaggle success and what made him the world’s first 4x grandmaster.
Sign up for your weekly dose of what's up in emerging technology.
Abhishek inculcated a healthy diet of solving previous Kaggle competitions on his own, checking the successful solutions and getting to the bottom of the approaches with the help of Google.
When asked about what it takes to get to the top, Darragh, a Kaggle grandmaster, recollecting Jermey Howard, said that the best practitioners in machine learning all share one particular trait in common; they’re very, very tenacious.
Putting tenacity into practice requires investing more time in learning a new technology, getting good at it and then quickly moving onto new approaches. Another Kaggle master, Arthur Llau said that he had spent around 4-8 hours per day for over a month for the contests that fetched him gold. Arthur believes that being a top Kaggler is a full-time job.
Do Not Reinvent The Wheel
With all the advancements happening around AI, one might lose track of the kind of abundance that it has created in terms of tools. However, few traditional methods still hold their ground when it comes to crunching data, but many people fall for the fad — “NEW IS GOOD”.
Arthur recommends to newcomers to develop the notion of exploring the data and finding what is not evident and not to hesitate to try classic methods. Whereas, another master, Mathurin, a Kaggle top 20 ranked master, asserts the need to try more and fail fast. He underlines the importance of reusing previous codes and learning about optimisation of metrics. Although every problem has its challenges, he recommends one to have a good scheme of cross-validation and confidence. Besides, he urges one to trust local results over results on the public leaderboard.
Tri Duc, a Kaggle grandmaster, insists on the importance of knowing the fundamentals that involve mathematics. He strongly believes that mathematics helps one to get familiar with algorithms, which assists aspirants/practitioners in preparing new concepts introduced in books or advanced courses.
However, in a real-world project or Kaggle competition, observes Duc, the role of mathematics is rarely tangible, and one barely touches it while building ML pipelines.
Don’t Fall In Love With The Tools
The pace at which new tools get released, there is no doubt that the abilities of machine learning are sometimes blown out of proportion. The kind of attention it gets, sometimes clouds the ground truth and nudges people into believing that it is the holy grail of solutions.
For instance, Abhishek uses TensorFlow for NLP problems and PyTorch for image problems. There is no dearth of libraries or frameworks one can use these days, and he firmly believes that it’s all good as long as one understands what is happening in the background.
A common trait that can be seen across the top players is the flexibility in their approach towards problem-solving. They pick tools that do the job and do not waste time battling over languages and frameworks.
For instance, Kaggle master Arthur prefers Python and sometimes C++ for doing operational research tasks. He switches between Keras and PyTorch framework while using a handful of very useful libraries like albumentations for image augmentation, eli5 and lofo for feature selection, Missingno and seaborn for visualisation, and Imblearn for imbalanced data. For parameters optimisation, Arthur prefers Optuna and skopt for the Bayesian module.
Be An Eternal Student
Although the democratisation of ML through new frameworks and libraries have made launching deep learning algorithms relatively straightforward, almost all the masters believe that it is of utmost importance that one has a grip on the fundamentals.
Konstantin Yakovlev, a Kaggle Grand Master advises newcomers to read vociferously and recommends aspirants to be a lifelong student of the craft. For beginners, he recommends working on code writing ability with Python, R, etc. and developing a knack for drawing insights from results through visualisations.
Mathurin has participated in more than 200 competitions where he mostly competed solo. However, he admits that all his top medals were won as a team. He believes that choosing a highly skilled partner assists in keeping the learning curve steep.
As an industry insider, Mathurin says that building a machine learning pipeline at an organisation needs skills that extend beyond what Kaggle demands and warns newcomers not to mistake a Kaggle competition as the end goal.
For job seekers, he stresses the importance of being consistently curious, which he believes to be more valuable than one’s Kaggle achievements because at the end of the day, companies are looking for those who are readily deployable with the least amount of training.
Let’s conclude with a few thumb rules:
- read the forum carefully for discussion,
- re-read the top solution in similar contests in the past
- read new papers to get ideas
- run experiments and always perform K-fold to evaluate the gap between local validation and public leaderboard.