Active Hackathon

Insane Exploratory Data Analysis Libraries

For one to perform EDA on any dataset he/she must be well versed with some of the python visualization libraries such as seaborn, matplotlib, plotly etc. to make attractive graphs so as to find the insights of the data. Finding insights into any data is a preliminary step of any data science, machine learning project as the corresponding step that is feature selection depends on the results derived from EDA. This means EDA plays a crucial role in determining the accuracy of any data science, machine learning projects.

In this blog, we shall find easier ways of performing EDA on any dataset by using some automated libraries.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

1. Dtale

The first step is to install the library by running the command

!pip install dtale

in the anaconda prompt or in the console itself. Once dtale library is being installed import the common operational libraries and the titanic dataset from seaborn by using the .load_dataset.

Once the dataset is loaded, embed the dataset into dtale using dtale.show(df), this will show the data frame in dtale window. On clicking the dropdown button various operations such as ranging from finding Pearson’s correlation between entities to plotting 3d,2d graphs and finding outliers in the data, every EDA function can be performed.

One amazing feature of this library is that the source code of the desired operation can be copied from the code export option which is available. For example, the code behind the Pearson’s correlation can be found by clicking on the <>Code Export button.


2. Pandas Profiling

Pandas profiling is another amazing automated library that can perform EDA but its working is limited as its performance and operations compared to dtale is much lesser. To install this library run the command

!pip install pandas-profiling

either in the console or in the anaconda prompt. Once the installation is done the following code needs to execute.

The tips dataset is being loaded from seaborn and the columns of the dataset are shown above.

Here, we are importing ProfileReport from the installed pandas profiling library and saving the output as an Html file. The Html file gets saved in the environment directory. The output looks like as shown below and various EDA operations can be performed by navigating through the options that are available.


3. Sweet Viz

Sweetviz is also a handy automated EDA library, here we again load the titanic dataset from seaborn, before that sweetviz library needs to install this can be done by running the

!pip install sweetviz

command in the console or in the anaconda prompt.

On running the two lines of code the HTML page pops up naming output_report.

The ouput_report.html is shown below, here again, similar to other automated EDA libraries this too has the competence to perform high-level EDA.


4. AutoViz

Similar to other libraries we need to first install them

!pip install autoviz

and run the following codes, here we are using the titanic dataset for performing EDA.

from autoviz, Autoviz_class is being imported and it’s being initialized using the object AV.

Since I’m loading the dataset locally I have assigned it to filename else it can also be loaded from seaborn. On running this code a series of basic EDA charts are formed.

You can go through my jupyter notebook here and try-test with different automated EDA libraries and share what all conclusions you could grab from it or if I failed to capture any of the useful insights in my own approach, do share that too in comments.

More Great AIM Stories

V. Nanda Gopal
Final year undergraduate at NIT Raipur with a keen interest in data science, machine learning, and deep learning.Love learning new skills and solving real life problems using AI/ML.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022

[class^="wpforms-"]
[class^="wpforms-"]