Kaggle Notebook is a cloud computational environment which enables reproducible and collaborative analysis. Notebooks, previously known as kernels, help in exploring and running machine learning codes. It also helps in discovering the vast repository of public, open-sourced, as well as, reproducible code for data science and machine learning projects.
Currently, there are more than 460,000 public notebooks available in Kaggle. There are two types of Notebooks on Kaggle. The first type is a script that executes everything as code sequentially, and the other type is the Jupyter notebooks that consist of a sequence of cells, where each cell is formatted in either Markdown or in a programming language.
Here is a list of ten top Kaggle Notebooks a data science enthusiast must know.
1| Comprehensive Data Exploration With Python
About: This notebook offers a comprehensive analysis of data. In this kernel, you can perform tasks like understanding the problem by looking at each variable, focusing on the dependent variable, understanding how the dependent variable and independent variables relate, cleaning the dataset and handling the missing data, outliers and categorical variables and, lastly, checking if the data meets the assumptions required by multivariate techniques.
Know more here.
2| Titanic Data Science Solutions
About: The notebook is a typical workflow for solving data science competitions at sites like Kaggle. The notebook follows a step-by-step workflow, explaining each step and rationale for every decision a data scientist needs to take during solution development. It involves analysing and visualising data, analysing data before and after wrangling, etc.
Know more here.
3| Hello, Python
About: This notebook covers the key Python skills you need to start using Python for data science. It helps in understanding as well as levelling up the basic Python skills. The notebook includes a brief overview of Python syntax, variable assignment, and arithmetic operators.
Know more here.
4| Introduction to Ensembling/Stacking in Python
About: This notebook serves a primer for ensembling (combining) base learning models, in particular, the variant of ensembling known as Stacking. The notebook allows ensembling in an intuitive and concise manner, and also includes the Titanic dataset.
Know more here.
5| A Data Science Framework: To Achieve 99% Accuracy
About: In this Notebook, you will learn the approaches in data science such as data science frameworks, gathering and preparing data, performing exploratory analysis, tuning models with hyper-parameters, feature selection and more.
Know more here.
6| Exploring Survival on the Titanic
About: This notebook focuses on illustrative data visualisations using the Titanic dataset. It includes topics like feature engineering, predictive imputation, building a machine learning model, variable importance, sensible value imputation, etc.
Know more here.
7| Credit Fraud || Dealing with Imbalanced Datasets
About: In this notebook, you will learn how to use various predictive models to see if a transaction is legitimate. The goal of this notebook is to understand the distribution of data, create a 50/50 sub-dataframe ratio of “Fraud” and “Non-Fraud” transactions, determine the Classifiers, create a Neural Network and compare the accuracy to the best classifier, etc.
Know more here.
8| Coronavirus (COVID-19) Visualization & Prediction
About: This notebook aims at exploring COVID-19 through data analysis and projections. It includes exploring global coronavirus cases, exploring the cases from different countries, prediction of worldwide confirmed cases, US testing data, mobility data from hotspots etc.
Know more here.
9| Approaching (Almost) Any NLP Problem on Kaggle
About: This notebook discusses the approaches to natural language processing problems on Kaggle. You will learn how to use data and create a very basic first model as well as improve it using different features. It includes topics like logistic regression, naive bayes, svm, xgboost, grid search, word vectors, LSTM, and more.
Know more here.
10| Interactive Intro to Dimensionality Reduction
About: This Notebook discusses the merits of dimensionality reduction methods. The Notebook aims to provide an introductory exposition on the three methods, which are PCA (Principal Component Analysis), LDA ( Linear Discriminant Analysis) and TSNE ( T-Distributed Stochastic Neighbour Embedding). The notebook also allows visualisations via the Plotly visualisation library. The dataset used in this Notebook is the popular MNIST (Mixed National Institute of Standards and Technology) computer vision digit dataset.
Know more here.