Top Ten Kaggle Notebooks For Data Science Enthusiasts In 2021

Kaggle Notebook is a cloud computational environment which enables reproducible and collaborative analysis. Notebooks, previously known as kernels, help in exploring and running machine learning codes. It also helps in discovering the vast repository of public, open-sourced, as well as, reproducible code for data science and machine learning projects.

Currently, there are more than 460,000 public notebooks available in Kaggle. There are two types of Notebooks on Kaggle. The first type is a script that executes everything as code sequentially, and the other type is the Jupyter notebooks that consist of a sequence of cells, where each cell is formatted in either Markdown or in a programming language. 

Here is a list of ten top Kaggle Notebooks a data science enthusiast must know.

1| Comprehensive Data Exploration With Python 

About: This notebook offers a comprehensive analysis of data. In this kernel, you can perform tasks like understanding the problem by looking at each variable, focusing on the dependent variable, understanding how the dependent variable and independent variables relate, cleaning the dataset and handling the missing data, outliers and categorical variables and, lastly, checking if the data meets the assumptions required by multivariate techniques.

Know more here.

2| Titanic Data Science Solutions 

About: The notebook is a typical workflow for solving data science competitions at sites like Kaggle. The notebook follows a step-by-step workflow, explaining each step and rationale for every decision a data scientist needs to take during solution development. It involves analysing and visualising data, analysing data before and after wrangling, etc.

Know more here.

3| Hello, Python

About: This notebook covers the key Python skills you need to start using Python for data science. It helps in understanding as well as levelling up the basic Python skills. The notebook includes a brief overview of Python syntax, variable assignment, and arithmetic operators.

Know more here.

4| Introduction to Ensembling/Stacking in Python 

About: This notebook serves a primer for ensembling (combining) base learning models, in particular, the variant of ensembling known as Stacking. The notebook allows ensembling in an intuitive and concise manner, and also includes the Titanic dataset.

Know more here.

5| A Data Science Framework: To Achieve 99% Accuracy

About: In this Notebook, you will learn the approaches in data science such as data science frameworks, gathering and preparing data, performing exploratory analysis, tuning models with hyper-parameters, feature selection and more. 

Know more here.

6| Exploring Survival on the Titanic

About: This notebook focuses on illustrative data visualisations using the Titanic dataset. It includes topics like feature engineering, predictive imputation, building a machine learning model, variable importance, sensible value imputation, etc.

Know more here.

7| Credit Fraud || Dealing with Imbalanced Datasets

About: In this notebook, you will learn how to use various predictive models to see if a transaction is legitimate. The goal of this notebook is to understand the distribution of data, create a 50/50 sub-dataframe ratio of “Fraud” and “Non-Fraud” transactions, determine the Classifiers, create a Neural Network and compare the accuracy to the best classifier, etc.

Know more here.

8| Coronavirus (COVID-19) Visualization & Prediction

About: This notebook aims at exploring COVID-19 through data analysis and projections. It includes exploring global coronavirus cases, exploring the cases from different countries, prediction of worldwide confirmed cases, US testing data, mobility data from hotspots etc. 

Know more here.

9| Approaching (Almost) Any NLP Problem on Kaggle

About: This notebook discusses the approaches to natural language processing problems on Kaggle. You will learn how to use data and create a very basic first model as well as improve it using different features. It includes topics like logistic regression, naive bayes, svm, xgboost, grid search, word vectors, LSTM, and more.

Know more here.

10| Interactive Intro to Dimensionality Reduction 

About: This Notebook discusses the merits of dimensionality reduction methods. The Notebook aims to provide an introductory exposition on the three methods, which are PCA (Principal Component Analysis), LDA ( Linear Discriminant Analysis) and TSNE ( T-Distributed Stochastic Neighbour Embedding). The notebook also allows visualisations via the Plotly visualisation library. The dataset used in this Notebook is the popular MNIST (Mixed National Institute of Standards and Technology) computer vision digit dataset.  

Know more here.

Download our Mobile App

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring