Underrated Kaggle notebooks every data science enthusiast must know

Initially created for a Petfinder competition, this notebook is a tutorial about the inner workings of a convolution filter through examples and illustrations.

Kaggle is synonymous with competitions and hackathons in the world of data science, but it is also a great resource to learn more about the field through community-driven notebooks. In contrast to textbooks and lectures, Kaggle notebooks or kernels provide data scientists with tutorials in their language. These are essentially Jupyter notebooks that run in the browser free of charge and without even needing to set up a local environment for Jupyter. In addition, these notebooks explore and run machine learning code and discover vast public and open-sourced repositories. 

While there are hundreds of thousands of notebooks on Kaggle, all data enthusiasts must-read are the top eight underrated notebooks. 

Register for Data engineering Summit 2022

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Kannada MNIST: Choosing the Right Optimiser by ILM

Kannada MNIST is a comprehensive and well-structured overview of essential deep learning optimisers implemented in Keras libraries. These include BGD, MGBD, AdaMax, Adam, SGD and Nadam and more. The notebook explains these with theory, graphs and applications. Additionally, the maths are supplemented with TensorFlow/Keras implementation comparisons. The kernel includes various tasks of loading and preparing data, building a model, training a network, checking the performance on Dig-MNIST data and making submission files. 

Learn more here.

Notebook9298460840: An Inference Notebook for the winning solution in the Jigsaw Toxic Severity by Guanshuo Xu

This inference notebook is based on the winning solution at the Jigsaw Toxic Severity competition. The competition ranks comments in order of severity of toxicity, where a higher degree of toxicity comments receive a higher numerical value than the lower degrees. Guanshuo Xu is one of the world’s top Kagglers and has published the training code for the competition’s solution, which usually has no training data

Learn more here.

Transformers Course – Chapter 2 – TF & Torch by Darien Schettler

This collection of notebooks takes the users through the various components of the Hugging Face Transformers Course that consists of using transformers, fine-tuning pretained models, sharing models and tokenisers, dataset library, tokenisers library and main NLP tasks. This notebook is the second part of the series, covering eight aspects of using a transformer. In addition, the notebook shows the Tensorflow and PyTorch code.

Learn more here.

How to Create Award-Winning Data Visualisations by Andrew Sinek

Andrew Sinek is the third prize winner of Kaggle’s Machine Learning and Data Science Survey Competition for this notebook surveying jobs on Kaggle vs Glassdoor. An important aspect of the award was his presentation. The How to Create Award-Winning Data Visualisations notebook illustrates his thought process and step-to-step instructions for building effective visualisations. The thought process is explained with precision and covers all the aspects of visualisation such as preparing the data, data to plot, first plot, choosing a plot, closing the lines, creating a hierarchy, decluttering, telling a story, adding a meaningful title and more. 

Learn more here.

Comprehensive data exploration with Python by PEDRO MARCELINO

Data analysis can sometimes be time-consuming, and data scientists can easily miss the initial but important steps in the long process. This notebook is an in-depth tutorial of data analysis principles and steps based on ‘Examining your Data’ by Hair et al. (2013). The author has also shared codes, examples and illustrations of applying these principles to his problems. The tutorial content includes understanding the problem, univariate, multivariate, and basic clearing and test assumptions. 

Learn more here.

Understanding and Improving CycleGANs – Tutorial by Jesper Sören Dramsch

Part 2 of a beginners GAN tutorial, this notebook teaches users to understand and improve GANs. The objective of the notebook is to explain baseline models in data science and create Monets with GAN. The tutorial studies data augmentation, neural network architectures, cycleGAN architectures and better loss functions in detail with examples of what to do and what not to do. The contents of the tutorial include loading the data, building the DCGAN, training the cycleGAN, visualising photos and creating submission files. 

Learn more here.

Data Heroines – Saving the World Through Data

Rather than a unique notebook, Data Heroines is a Kaggle Survey story written by three authors looking at various incredible women that make the field of data. The authors Datana Scientists, Datana Analystus, Datana Engineers and Machina Learnerum have explored a story-format journey of data, focusing on the women in data and how they dealt with COVID–19 and its impact on data. The story is told through a comic format followed by codes and graphs in each section.

Learn more here.

Visualising Convolution Filters by Hongnan G

Initially created for a Petfinder competition, this notebook is a tutorial about the inner workings of a convolution filter through examples and illustrations. It teaches how convolution can be achieved through a kernel moving an entire image and calculating dot products with each window along the way. It also looks at the abstract features of convolutions and vertical and horizontal features.

Learn more here.

Further resource for more underrated Kaggle notebooks: Hidden Gems Collection

More Great AIM Stories

Avi Gopani
Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM