Whenever we are working with datasets the first step is generally understanding what is the data all about. So for exploring the data we start with Exploratory Data Analysis which is analyzing the data with certain techniques and visualization in order to get a clear idea of the data we are dealing with. In EDA we analyze different attributes and their statistical properties also we visualize the data using different graphs and plots.
EDA is a necessary step so we cannot neglect it, but performing EDA generally is a pretty time-consuming task because we need to write different types of code for statistical properties as well as codes for different types of visualizations. There are different python libraries and modules which can help in reducing the efforts and time taken in EDA by simple and easy to use codes. The lens is one such library.
The lens is an open-source python library which is used for fast calculation of summary statistics and the correlation in the dataset. It helps us explore the properties of different attributes of the dataset in just a single line of code. It creates different types of visualizations of all the attributes in the data. It works on both numerical and categorical data. It is blazingly fast and easy to use.
Sign up for your weekly dose of what's up in emerging technology.
In this article, we will explore how we can perform EDA using Lens and save time and effort.
We will start by installing lens using pip install lens
- Importing Required Libraries
We would load the dataset we will use using pandas so we will import pandas and we will import lens for data analysis and visualizations.
import pandas as pd
- Loading the Dataset
The dataset we will use here is an advertising dataset of an MNC which contains different attributes like ‘Sales’, ‘TV’, etc. We will load this dataset using pandas.
df = pd.read_csv(‘Advertising.csv’)
- Statistical Analysis of Data
Now as we have loaded the dataset we will work on displaying the statistical properties of this dataset. We will use the summarise and explore function to display the statistical properties of the dataset.
data = lens.summarise(df)
exp = lens.explore(data)
Similarly, we can use these functions to display the properties of a single column also.
- Correlation in Dataset
Analyzing and visualizing is easy in the lens, we just need to write a single line of code.
We can easily visualize different attributes of the dataset using different plots which are already defined in Lens. Let us look at some of the visualizations.
Lens has an attractive function named ‘interactive’ which creates a user interface where users can select different attributes and different type of attributes. Let us visualize this interface.
Here you can clearly see that we can select different attributes and visualize the different type of plots and graphs of those attributes. Let us see some other plots also.
In this article, we learned about Lens which helps in fast calculation of summary statistics and correlation. We saw how we use the lens for analyzing the statistical property of a dataset as well as of single columns. We also saw different types of visualization that are provided by the lens and created some of the plots. Finally, we saw the interactive function which created a user interface for selecting different graphs and plots for different attributes. The lens makes the process od data analysis and visualization simpler and effortless.