ML Tools Used By The Kaggle Experts

There isn’t a dearth of ML tools today. However, for a beginner, to know about the tool stack of those who win Kaggle competitions consistently is of great help. One can later go ahead and pick the tool of their choice. In the next section, we look at the top tools, frameworks, cloud services, libraries used by the Kaggle masters and Grand Masters, which they revealed to us in our exclusive interviews. That said, we have to admit that all these top Kagglers are of the opinion that one should not fall in love with tools, and it is all right as long any tools get the job done right!

Abhishek Thakur

4x Kaggle GM, Abhishek Thakur says that he frequently finds himself using TensorFlow for NLP problems and PyTorch for computer vision problems.

When it comes to favourite Python libraries, Thakur is in praise for Scikit-learn and how significant this library is in providing many necessary components to put a model into production.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Thakur, however, believes that there isn’t a shortage of libraries or frameworks one can use these days, and it’s all good as long as one understands what is happening in the background.

Arthur LLau

Arthur says that a basic laptop would sometimes suffice. However, sometimes he rents some GPUs of Google cloud platform with Kaggle vouchers, depending on the competition. 


Download our Mobile App



Here is what Arthur’s toolkit looks like:

  • Hardware: MBPro (2019, 16GB, i7) or i7,32GB + 1070Ti or GCP.
  • Language: Python and C++
  • Framework: Keras and PyTorch
  • Augmentation library: albumentations
  • Feature selection library: eli5 and lofo
  • Visualization: Missingno and seaborn
  • Imbalanced data: imblearn
  • Parameter optimization: Optuna and skopt

A Kaggle master ranked in the top 20 in the competitions’ leaderboard, Mathurin says that he prefers Python to R, though he had been using R until 2015. Mathurin who has been in this field for over a decade and a half, his renewed interest in algorithms made him switch to Python gradually.

A look at Mathurin’s toolkit, which he keeps coming back to:

  • Algorithms: lightgbm, xgboost, catboost
  • Cloud services: Google Colab and Kaggle kernels.
  • Packages: scikit-learn, pandas, numpy
  • Frameworks: Keras, TensorFlow, PyTorch and Fastai
  • AutoML tools: Prevision.io, H2O, TPOT, auto sklearn

Tri Duc Nguyen Tang

Duc, who is ranked in the world top 50 and also a chief data engineer and co-founder of the Vietnamese AI startup, Palexy, says that he and his team usually use one server with 2x1080Ti with a Kaggle kernel. For a competition like DeepFake, he prefers renting a server with 4x1080Ti or AWS.

Talking about frequently used tools, Duc said that he usually finds himself using Keras-TensorFlow, OpenCV, albumentation, lgbm, scikit-learn. A data engineer by profession, Duc says that the role of a data engineer is collecting data and preparing the data pipeline, and for a data engineering team to build the necessary infrastructure and architecture for data generation, they use SQL, MySQL, Spark, Hadoop, Hive, etc.

Whereas, in case of a data scientist who is responsible for obtaining insights from data and formulating these insights into a model to communicate with the clients, data scientists use statistics, visualisation (matplotlib, seaborn), modeling (sklearn, TensorFlow, PyTorch), etc

Darragh Hanley

An AI engineer and a grandmaster, Darragh usually runs code off the command line and Spyder IDE and mainly leverages AWS and prototypes on his Macbook Pro, which he believes, is enough to check if a pipeline is working well before deploying. Regarding the frameworks, Darragh has expressed his liking for PyTorch over other frameworks for the kind of freedom it offers to experiment compared to others.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: Evolution of Data Science: Skillset, Toolset, and Mindset

In my opinion, there will be considerable disorder and disarray in the near future concerning the emerging fields of data and analytics. The proliferation of platforms such as ChatGPT or Bard has generated a lot of buzz. While some users are enthusiastic about the potential benefits of generative AI and its extensive use in business and daily life, others have raised concerns regarding the accuracy, ethics, and related issues.