Exploratory Data Analysis is the process of inspecting the data in order to understand what the data is all about. It is generally a visual method where we create different plots and graphs to understand what patterns, anomalies and outliers do data have. It is an important step because it helps us analyze the relationship between different attributes within themselves, also it is helpful in analyzing the properties of different attributes of the dataset.
Being an important step in analyzing what data is all about Exploratory Data Analysis generally takes a lot of time because we need to write code for analyzing and visualizing data. What if we can automate this process of visualizing and analyzing data?
Pandas Visual Analysis is an open-source python library which is used to visually analyze the data and that too in just a single line of code. It creates a user interface that can be used to create different plots and graphs taking different attributes. It supports a large variety of graphs and plots, also all the graphs are created using Plotly so that they are highly interactive, visually appealing, and easily downloadable.
In this article, we will see how easily and effortlessly we can automate the process of visual analysis using Pandas Visual Analysis.
Implementation:
We will start by installing pandas visual analysis using pip install pandas-visual-analysis.
- Importing Required Libraries
For data analysis, we will be importing pandas visual analysis and we will import pandas for loading the dataset we will use. Other than this we will import seaborn to load a dataset defined in seaborn named tips.
import pandas as pd
from pandas_visual_analysis import VisualAnalysis
import seaborn as sns
- Loading the dataset
We will explore pandas visual analysis using two different datasets. One dataset we will load from seaborn named tips is a dataset of a restaurant data which contains attributes like ‘total bill’, ‘tip’, etc. and the second dataset is a sales data of an MNC which contains attributes like ‘Sales’, ‘TV’, etc.
df1= sns.load_dataset(‘tips’)
df1
df2= pd.read_csv(‘Advertising.csv’)
df2
- Visual Analysis
This is the final step that will load our data in the form of a Graphical User Interface where we have a variety of graphs and plots defined and we can select different attributes to visualize.
VisualAnalysis(df1)
Here you can see that we have created an interface with different sections to analyze and visualize the dataset we are working on. It is a multivariate dataset still pandas visual analysis created it so easily and effortlessly. Let us see what are the different sections.
- Statistical Analysis
The first section helps us analyze the statistical properties, we can analyze different metrics like mean, quartiles, median, etc. for all the numerical attributes.
- Distribution using Scatter Plot
Using this, we can analyze the distribution and relationship between two attributes using a scatter plot.
- Distribution Using Histogram
In this way, we will analyze the distribution of an attribute using the histogram.
These are the different approaches that we can use to analyze a dataset using pandas visual analysis.
Similarly, we can create this interface and analyze different sections of the second dataset using the same VisualAnalysis command.
VisualAnalysis(df2)
Conclusion:
In this article, we saw how we can analyze and visualize different properties of data using pandas visual analysis. We saw how easily we can create an interface that can be used to create plots and graphs between different attributes of the dataset and we also analyzed the statistical properties of the different numerical attributes of the dataset. Pandas Visual Analysis can be used to reduce time and effort in performing exploratory Data Analysis.