Underrated Kaggle notebooks every data science enthusiast must know

Initially created for a Petfinder competition, this notebook is a tutorial about the inner workings of a convolution filter through examples and illustrations.

Kaggle is synonymous with competitions and hackathons in the world of data science, but it is also a great resource to learn more about the field through community-driven notebooks. In contrast to textbooks and lectures, Kaggle notebooks or kernels provide data scientists with tutorials in their language. These are essentially Jupyter notebooks that run in the browser free of charge and without even needing to set up a local environment for Jupyter. In addition, these notebooks explore and run machine learning code and discover vast public and open-sourced repositories. 

While there are hundreds of thousands of notebooks on Kaggle, all data enthusiasts must-read are the top eight underrated notebooks. 

Register for Data engineering Summit 2022

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Kannada MNIST: Choosing the Right Optimiser by ILM

Kannada MNIST is a comprehensive and well-structured overview of essential deep learning optimisers implemented in Keras libraries. These include BGD, MGBD, AdaMax, Adam, SGD and Nadam and more. The notebook explains these with theory, graphs and applications. Additionally, the maths are supplemented with TensorFlow/Keras implementation comparisons. The kernel includes various tasks of loading and preparing data, building a model, training a network, checking the performance on Dig-MNIST data and making submission files. 

Learn more here.

Notebook9298460840: An Inference Notebook for the winning solution in the Jigsaw Toxic Severity by Guanshuo Xu

This inference notebook is based on the winning solution at the Jigsaw Toxic Severity competition. The competition ranks comments in order of severity of toxicity, where a higher degree of toxicity comments receive a higher numerical value than the lower degrees. Guanshuo Xu is one of the world’s top Kagglers and has published the training code for the competition’s solution, which usually has no training data

Learn more here.

Transformers Course – Chapter 2 – TF & Torch by Darien Schettler

This collection of notebooks takes the users through the various components of the Hugging Face Transformers Course that consists of using transformers, fine-tuning pretained models, sharing models and tokenisers, dataset library, tokenisers library and main NLP tasks. This notebook is the second part of the series, covering eight aspects of using a transformer. In addition, the notebook shows the Tensorflow and PyTorch code.

Learn more here.

How to Create Award-Winning Data Visualisations by Andrew Sinek

Andrew Sinek is the third prize winner of Kaggle’s Machine Learning and Data Science Survey Competition for this notebook surveying jobs on Kaggle vs Glassdoor. An important aspect of the award was his presentation. The How to Create Award-Winning Data Visualisations notebook illustrates his thought process and step-to-step instructions for building effective visualisations. The thought process is explained with precision and covers all the aspects of visualisation such as preparing the data, data to plot, first plot, choosing a plot, closing the lines, creating a hierarchy, decluttering, telling a story, adding a meaningful title and more. 

Learn more here.

Comprehensive data exploration with Python by PEDRO MARCELINO

Data analysis can sometimes be time-consuming, and data scientists can easily miss the initial but important steps in the long process. This notebook is an in-depth tutorial of data analysis principles and steps based on ‘Examining your Data’ by Hair et al. (2013). The author has also shared codes, examples and illustrations of applying these principles to his problems. The tutorial content includes understanding the problem, univariate, multivariate, and basic clearing and test assumptions. 

Learn more here.

Understanding and Improving CycleGANs – Tutorial by Jesper Sören Dramsch

Part 2 of a beginners GAN tutorial, this notebook teaches users to understand and improve GANs. The objective of the notebook is to explain baseline models in data science and create Monets with GAN. The tutorial studies data augmentation, neural network architectures, cycleGAN architectures and better loss functions in detail with examples of what to do and what not to do. The contents of the tutorial include loading the data, building the DCGAN, training the cycleGAN, visualising photos and creating submission files. 

Learn more here.

Data Heroines – Saving the World Through Data

Rather than a unique notebook, Data Heroines is a Kaggle Survey story written by three authors looking at various incredible women that make the field of data. The authors Datana Scientists, Datana Analystus, Datana Engineers and Machina Learnerum have explored a story-format journey of data, focusing on the women in data and how they dealt with COVID–19 and its impact on data. The story is told through a comic format followed by codes and graphs in each section.

Learn more here.

Visualising Convolution Filters by Hongnan G

Initially created for a Petfinder competition, this notebook is a tutorial about the inner workings of a convolution filter through examples and illustrations. It teaches how convolution can be achieved through a kernel moving an entire image and calculating dot products with each window along the way. It also looks at the abstract features of convolutions and vertical and horizontal features.

Learn more here.

Further resource for more underrated Kaggle notebooks: Hidden Gems Collection

Avi Gopani
Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox