Insane Exploratory Data Analysis Libraries

For one to perform EDA on any dataset he/she must be well versed with some of the python visualization libraries such as seaborn, matplotlib, plotly etc. to make attractive graphs so as to find the insights of the data. Finding insights into any data is a preliminary step of any data science, machine learning project as the corresponding step that is feature selection depends on the results derived from EDA. This means EDA plays a crucial role in determining the accuracy of any data science, machine learning projects.

In this blog, we shall find easier ways of performing EDA on any dataset by using some automated libraries.


1. Dtale

The first step is to install the library by running the command

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

!pip install dtale

in the anaconda prompt or in the console itself. Once dtale library is being installed import the common operational libraries and the titanic dataset from seaborn by using the .load_dataset.

Once the dataset is loaded, embed the dataset into dtale using dtale.show(df), this will show the data frame in dtale window. On clicking the dropdown button various operations such as ranging from finding Pearson’s correlation between entities to plotting 3d,2d graphs and finding outliers in the data, every EDA function can be performed.

One amazing feature of this library is that the source code of the desired operation can be copied from the code export option which is available. For example, the code behind the Pearson’s correlation can be found by clicking on the <>Code Export button.


2. Pandas Profiling

Pandas profiling is another amazing automated library that can perform EDA but its working is limited as its performance and operations compared to dtale is much lesser. To install this library run the command

!pip install pandas-profiling

either in the console or in the anaconda prompt. Once the installation is done the following code needs to execute.

The tips dataset is being loaded from seaborn and the columns of the dataset are shown above.

Here, we are importing ProfileReport from the installed pandas profiling library and saving the output as an Html file. The Html file gets saved in the environment directory. The output looks like as shown below and various EDA operations can be performed by navigating through the options that are available.


3. Sweet Viz

Sweetviz is also a handy automated EDA library, here we again load the titanic dataset from seaborn, before that sweetviz library needs to install this can be done by running the

!pip install sweetviz

command in the console or in the anaconda prompt.

On running the two lines of code the HTML page pops up naming output_report.

The ouput_report.html is shown below, here again, similar to other automated EDA libraries this too has the competence to perform high-level EDA.


4. AutoViz

Similar to other libraries we need to first install them

!pip install autoviz

and run the following codes, here we are using the titanic dataset for performing EDA.

from autoviz, Autoviz_class is being imported and it’s being initialized using the object AV.

Since I’m loading the dataset locally I have assigned it to filename else it can also be loaded from seaborn. On running this code a series of basic EDA charts are formed.

You can go through my jupyter notebook here and try-test with different automated EDA libraries and share what all conclusions you could grab from it or if I failed to capture any of the useful insights in my own approach, do share that too in comments.

More Great AIM Stories

V. Nanda Gopal
Final year undergraduate at NIT Raipur with a keen interest in data science, machine learning, and deep learning.Love learning new skills and solving real life problems using AI/ML.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM