Active Hackathon

Top Ten Kaggle Notebooks For Data Science Enthusiasts In 2021

Kaggle Notebook is a cloud computational environment which enables reproducible and collaborative analysis. Notebooks, previously known as kernels, help in exploring and running machine learning codes. It also helps in discovering the vast repository of public, open-sourced, as well as, reproducible code for data science and machine learning projects.

Currently, there are more than 460,000 public notebooks available in Kaggle. There are two types of Notebooks on Kaggle. The first type is a script that executes everything as code sequentially, and the other type is the Jupyter notebooks that consist of a sequence of cells, where each cell is formatted in either Markdown or in a programming language. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Here is a list of ten top Kaggle Notebooks a data science enthusiast must know.

1| Comprehensive Data Exploration With Python 

About: This notebook offers a comprehensive analysis of data. In this kernel, you can perform tasks like understanding the problem by looking at each variable, focusing on the dependent variable, understanding how the dependent variable and independent variables relate, cleaning the dataset and handling the missing data, outliers and categorical variables and, lastly, checking if the data meets the assumptions required by multivariate techniques.

Know more here.

2| Titanic Data Science Solutions 

About: The notebook is a typical workflow for solving data science competitions at sites like Kaggle. The notebook follows a step-by-step workflow, explaining each step and rationale for every decision a data scientist needs to take during solution development. It involves analysing and visualising data, analysing data before and after wrangling, etc.

Know more here.

3| Hello, Python

About: This notebook covers the key Python skills you need to start using Python for data science. It helps in understanding as well as levelling up the basic Python skills. The notebook includes a brief overview of Python syntax, variable assignment, and arithmetic operators.

Know more here.

4| Introduction to Ensembling/Stacking in Python 

About: This notebook serves a primer for ensembling (combining) base learning models, in particular, the variant of ensembling known as Stacking. The notebook allows ensembling in an intuitive and concise manner, and also includes the Titanic dataset.

Know more here.

5| A Data Science Framework: To Achieve 99% Accuracy

About: In this Notebook, you will learn the approaches in data science such as data science frameworks, gathering and preparing data, performing exploratory analysis, tuning models with hyper-parameters, feature selection and more. 

Know more here.

6| Exploring Survival on the Titanic

About: This notebook focuses on illustrative data visualisations using the Titanic dataset. It includes topics like feature engineering, predictive imputation, building a machine learning model, variable importance, sensible value imputation, etc.

Know more here.

7| Credit Fraud || Dealing with Imbalanced Datasets

About: In this notebook, you will learn how to use various predictive models to see if a transaction is legitimate. The goal of this notebook is to understand the distribution of data, create a 50/50 sub-dataframe ratio of “Fraud” and “Non-Fraud” transactions, determine the Classifiers, create a Neural Network and compare the accuracy to the best classifier, etc.

Know more here.

8| Coronavirus (COVID-19) Visualization & Prediction

About: This notebook aims at exploring COVID-19 through data analysis and projections. It includes exploring global coronavirus cases, exploring the cases from different countries, prediction of worldwide confirmed cases, US testing data, mobility data from hotspots etc. 

Know more here.

9| Approaching (Almost) Any NLP Problem on Kaggle

About: This notebook discusses the approaches to natural language processing problems on Kaggle. You will learn how to use data and create a very basic first model as well as improve it using different features. It includes topics like logistic regression, naive bayes, svm, xgboost, grid search, word vectors, LSTM, and more.

Know more here.

10| Interactive Intro to Dimensionality Reduction 

About: This Notebook discusses the merits of dimensionality reduction methods. The Notebook aims to provide an introductory exposition on the three methods, which are PCA (Principal Component Analysis), LDA ( Linear Discriminant Analysis) and TSNE ( T-Distributed Stochastic Neighbour Embedding). The notebook also allows visualisations via the Plotly visualisation library. The dataset used in this Notebook is the popular MNIST (Mixed National Institute of Standards and Technology) computer vision digit dataset.  

Know more here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

[class^="wpforms-"]
[class^="wpforms-"]