Insane Exploratory Data Analysis Libraries

For one to perform EDA on any dataset he/she must be well versed with some of the python visualization libraries such as seaborn, matplotlib, plotly etc. to make attractive graphs so as to find the insights of the data. Finding insights into any data is a preliminary step of any data science, machine learning project as the corresponding step that is feature selection depends on the results derived from EDA. This means EDA plays a crucial role in determining the accuracy of any data science, machine learning projects.

In this blog, we shall find easier ways of performing EDA on any dataset by using some automated libraries.


1. Dtale

The first step is to install the library by running the command

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

!pip install dtale




in the anaconda prompt or in the console itself. Once dtale library is being installed import the common operational libraries and the titanic dataset from seaborn by using the .load_dataset.

Once the dataset is loaded, embed the dataset into dtale using dtale.show(df), this will show the data frame in dtale window. On clicking the dropdown button various operations such as ranging from finding Pearson’s correlation between entities to plotting 3d,2d graphs and finding outliers in the data, every EDA function can be performed.

One amazing feature of this library is that the source code of the desired operation can be copied from the code export option which is available. For example, the code behind the Pearson’s correlation can be found by clicking on the <>Code Export button.


2. Pandas Profiling

Pandas profiling is another amazing automated library that can perform EDA but its working is limited as its performance and operations compared to dtale is much lesser. To install this library run the command

!pip install pandas-profiling

either in the console or in the anaconda prompt. Once the installation is done the following code needs to execute.

The tips dataset is being loaded from seaborn and the columns of the dataset are shown above.

Here, we are importing ProfileReport from the installed pandas profiling library and saving the output as an Html file. The Html file gets saved in the environment directory. The output looks like as shown below and various EDA operations can be performed by navigating through the options that are available.


3. Sweet Viz

Sweetviz is also a handy automated EDA library, here we again load the titanic dataset from seaborn, before that sweetviz library needs to install this can be done by running the

!pip install sweetviz

command in the console or in the anaconda prompt.

On running the two lines of code the HTML page pops up naming output_report.

The ouput_report.html is shown below, here again, similar to other automated EDA libraries this too has the competence to perform high-level EDA.


4. AutoViz

Similar to other libraries we need to first install them

!pip install autoviz

and run the following codes, here we are using the titanic dataset for performing EDA.

from autoviz, Autoviz_class is being imported and it’s being initialized using the object AV.

Since I’m loading the dataset locally I have assigned it to filename else it can also be loaded from seaborn. On running this code a series of basic EDA charts are formed.

You can go through my jupyter notebook here and try-test with different automated EDA libraries and share what all conclusions you could grab from it or if I failed to capture any of the useful insights in my own approach, do share that too in comments.

V. Nanda Gopal
Final year undergraduate at NIT Raipur with a keen interest in data science, machine learning, and deep learning.Love learning new skills and solving real life problems using AI/ML.

Download our Mobile App

MachineHack

AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.