MITB Banner

Hands-On Tutorial On ExploriPy: Effortless Target Based EDA Tool

In this article, we will explore ExploriPy to perform EDA on a dataset and derive useful insights.

Share

ExploriPy

Exploratory Data Analysis is the initial step that should be performed on a dataset in order to know about the properties of the different attributes of the dataset. EDA gives us an idea of what all columns do data have, what are the values in these columns, what are the datatypes, etc. Other than that EDA also helps in visualizing the relationships between different columns/ attributes in the dataset.

Exploratory data analysis is an approach to analyze the datasets in order to summarize their main characteristics with both statistical and visual methods. In order to perform EDA in python, we generally use different libraries like pandas, NumPy, matplotlib, etc. to know about different properties of the dataset. Using all these libraries for EDA takes a lot of time and effort.

ExploriPy is an open-source python library that can be used for EDA and make the whole process a lot easier and effortless. It automates the whole process of EDA and saves a lot of time which can be used in other tasks. It works in just a few lines of code so no prior hardcore coding experience is required to use ExploriPy.

In this article, we will explore ExploriPy to perform EDA on a dataset and derive useful insights.

Implementation of ExploriPy

Like any other library, we will start by installing ExploriPy using pip install ExploriPy.

  1. Importing required libraries

For loading the dataset we will use pandas so we need to import that and for EDA we will use ExploriPy so we will also import that.

from ExploriPy import EDA

import pandas as pd

  1. Loading the dataset

In this article, we will use a dataset of Advertisement Dataset of an MNC in which Sales is the Target Variable which is dependent on certain features like ‘TV’, ‘Radio’, etc.es of automobiles like ‘price’, ‘height’, ‘length’, etc. 

df = pd.read_csv(‘Advertisement.csv’)

df.head()

  1. Exploratory Data Analysis 

For Exploratory data analysis, we will use Exploripy. As we are using the advertising dataset and we know that sales are the target variable so we will pass it as the target variable.

ContinuousFeatures = [‘Radio’,’Newspaper’,’TV’]

analysis = EDA(df,title=’EDA for Sales Data’)

analysis.TargetAnalysis(‘Sales’)

After running the above-said commands we will be generating an EDA report with Sales as our target Variable.

Now let us explore different sections of the EDA report.

  • Home Page(Target Specific EDA)

This page shows what are the different attributes of the dataset along with their datatypes. Also, we can see here a pie-chart which clearly shows the distribution of the attributes in categorical or continuous variables. 

  • Null Values
ExploriPy Data Analysis

This segment of the report shows which attributes have missing data or null data and also represent it in the form of bar charts. The dataset we used has no missing data so it is showing as Null Percentage = 0%.

  • Statistics of Target Variable
ExploriPy Data Analysis

Here we can see the statistical properties of the target variable along with the number of records that it contains. Statistical properties contain Skewness and Kurtosis also.

  • Distribution of Data in Target Variable
ExploriPy Data Analysis

In this section, we can clearly see how our Sales data is distributed with the help of different visualizations namely Boxplot, KDE Plot, and histogram.

  • Distribution of Feature Variable
ExploriPy Data Analysis

This is the end of the report which shows the distribution of the feature variables.

Similarly, we can generate Target Specific EDA reports for different datasets.

Conclusion

In this article, we have created an EDA report using ExploriPy, a python library, we explored the different sections of the report corresponding to the distribution of the data and the spread of the data. ExploriPy is easy to use and creates reports fast which saves time.

Share
Picture of Himanshu Sharma

Himanshu Sharma

An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India