MITB Banner

Underrated Kaggle notebooks every data science enthusiast must know

Initially created for a Petfinder competition, this notebook is a tutorial about the inner workings of a convolution filter through examples and illustrations.

Share

Kaggle is synonymous with competitions and hackathons in the world of data science, but it is also a great resource to learn more about the field through community-driven notebooks. In contrast to textbooks and lectures, Kaggle notebooks or kernels provide data scientists with tutorials in their language. These are essentially Jupyter notebooks that run in the browser free of charge and without even needing to set up a local environment for Jupyter. In addition, these notebooks explore and run machine learning code and discover vast public and open-sourced repositories. 

While there are hundreds of thousands of notebooks on Kaggle, all data enthusiasts must-read are the top eight underrated notebooks. 

Register for Data engineering Summit 2022

Kannada MNIST: Choosing the Right Optimiser by ILM

Kannada MNIST is a comprehensive and well-structured overview of essential deep learning optimisers implemented in Keras libraries. These include BGD, MGBD, AdaMax, Adam, SGD and Nadam and more. The notebook explains these with theory, graphs and applications. Additionally, the maths are supplemented with TensorFlow/Keras implementation comparisons. The kernel includes various tasks of loading and preparing data, building a model, training a network, checking the performance on Dig-MNIST data and making submission files. 

Learn more here.

Notebook9298460840: An Inference Notebook for the winning solution in the Jigsaw Toxic Severity by Guanshuo Xu

This inference notebook is based on the winning solution at the Jigsaw Toxic Severity competition. The competition ranks comments in order of severity of toxicity, where a higher degree of toxicity comments receive a higher numerical value than the lower degrees. Guanshuo Xu is one of the world’s top Kagglers and has published the training code for the competition’s solution, which usually has no training data

Learn more here.

Transformers Course – Chapter 2 – TF & Torch by Darien Schettler

This collection of notebooks takes the users through the various components of the Hugging Face Transformers Course that consists of using transformers, fine-tuning pretained models, sharing models and tokenisers, dataset library, tokenisers library and main NLP tasks. This notebook is the second part of the series, covering eight aspects of using a transformer. In addition, the notebook shows the Tensorflow and PyTorch code.

Learn more here.

How to Create Award-Winning Data Visualisations by Andrew Sinek

Andrew Sinek is the third prize winner of Kaggle’s Machine Learning and Data Science Survey Competition for this notebook surveying jobs on Kaggle vs Glassdoor. An important aspect of the award was his presentation. The How to Create Award-Winning Data Visualisations notebook illustrates his thought process and step-to-step instructions for building effective visualisations. The thought process is explained with precision and covers all the aspects of visualisation such as preparing the data, data to plot, first plot, choosing a plot, closing the lines, creating a hierarchy, decluttering, telling a story, adding a meaningful title and more. 

Learn more here.

Comprehensive data exploration with Python by PEDRO MARCELINO

Data analysis can sometimes be time-consuming, and data scientists can easily miss the initial but important steps in the long process. This notebook is an in-depth tutorial of data analysis principles and steps based on ‘Examining your Data’ by Hair et al. (2013). The author has also shared codes, examples and illustrations of applying these principles to his problems. The tutorial content includes understanding the problem, univariate, multivariate, and basic clearing and test assumptions. 

Learn more here.

Understanding and Improving CycleGANs – Tutorial by Jesper Sören Dramsch

Part 2 of a beginners GAN tutorial, this notebook teaches users to understand and improve GANs. The objective of the notebook is to explain baseline models in data science and create Monets with GAN. The tutorial studies data augmentation, neural network architectures, cycleGAN architectures and better loss functions in detail with examples of what to do and what not to do. The contents of the tutorial include loading the data, building the DCGAN, training the cycleGAN, visualising photos and creating submission files. 

Learn more here.

Data Heroines – Saving the World Through Data

Rather than a unique notebook, Data Heroines is a Kaggle Survey story written by three authors looking at various incredible women that make the field of data. The authors Datana Scientists, Datana Analystus, Datana Engineers and Machina Learnerum have explored a story-format journey of data, focusing on the women in data and how they dealt with COVID–19 and its impact on data. The story is told through a comic format followed by codes and graphs in each section.

Learn more here.

Visualising Convolution Filters by Hongnan G

Initially created for a Petfinder competition, this notebook is a tutorial about the inner workings of a convolution filter through examples and illustrations. It teaches how convolution can be achieved through a kernel moving an entire image and calculating dot products with each window along the way. It also looks at the abstract features of convolutions and vertical and horizontal features.

Learn more here.

Further resource for more underrated Kaggle notebooks: Hidden Gems Collection

Share
Picture of Avi Gopani

Avi Gopani

Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.