Tutorial On Missingno – Python Tool To Visualize Missing Values

The purpose of this article is to get a better understanding of missing data by visualizing them using Missingno.

Individuals working in the field of Data Science understand the importance of data. Data is the resource to fuel a machine learning model. But raw data in the real world cannot be used without pre-processing them to a usable format. One of the most common problems faced with real-time data is missing values. There are some values in rows and columns that simply do not exist. But, for a good model training, we need the data to be as clean as possible.

Missing values are generally represented with NaN which stands for Not a Number. Although Pandas library provides methods to impute values to these missing rows and columns, we need to be able to understand how, where and how many points of NaN are distributed in the dataset. For this, python introduced a new library called Missingno.

The purpose of this article is to get a better understanding of missing data by visualizing them using Missingno. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

What is Missingno?

Missingno is a Python library that provides the ability to understand the distribution of missing values through informative visualizations. The visualizations can be in the form of heat maps or bar charts. With this library, it is possible to observe where the missing values have occurred and to check the correlation of the columns containing the missing with the target column. Missing values are better handled once the dataset is fully explored. Let us now implement this and find out how it helps us pre-process the data better. 

Implementation of Missingno

The first step in implementing this is to install the library using the pip command as follows:

pip install missingno

Once this is installed, let us select a dataset that contains missing values. I have selected a dataset from Kaggle called Life expectancy dataset. This dataset is used to estimate the average human life expectancy based on the geographical location, health expenditure, disease etc. To download this dataset click here.  

Loading the dataset

Let us now import some of the libraries and load our dataset. 

from google.colab import drive
import numpy as np
import pandas as pd
life_expentancy = pd.read_csv("/content/gdrive/My Drive/Life Expectancy Data.csv")
missing values

Now, let us identify the sum of the missing values using the isnull method of pandas. 


missing values

Now, we can identify that there are values which are missing. It is time to now visualize this using the library. 

Visualization of missing values

  1. Matrix

import missingno as msno


The dataset is distributed from 1 to 2938 data points. The white lines indicate the missing values in each column. The Hepatitis B, population and GDP columns seem to have the highest number of missing values. Other than this, on closer observation, you can notice that there are few trends in the missing rows and columns. For example, if a row value is missing from the BMI column there is also the same rows missing from the thinness 1-19 years column. Another trend is that if there are values missing from the GDP column, then the income column is also missing those rows. These trends give an idea about how the features are correlated with one another. But to get a better idea about correlations we need to use heatmaps.

  1. Heatmap



The heatmap shows a positive correlation with blue. The darker the shade of blue, the more the correlation. The map shows that the total expenditure and alcohol have the highest correlation of 0.9. It also shows that the GDP and income column are positively correlated as per our initial intuition which means these two columns can affect the target. 

Another way to visualize the data for missing values is by using bar plots. 

  1. Bar Plot


These bars show the values that are proportional to the non-missing data in the dataset. Along with that, the number of values missing is also shown. Since the total number of datapoints is 2938, the columns with lesser than these contain missing values. 


In this article, we saw how to visualize the missing data in a graphical format and understand the relationship that exists among the different columns. Missingno helps in understanding the structure of the dataset with very few lines of code. With this information, it becomes easier and more efficient to use pandas either to impute the values or to drop them and can increase the overall accuracy of the model. 

Bhoomika Madhukar
I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox