Advertisement

Active Hackathon

Hands-On Tutorial On Lens: Python Tool For Swift Statistical Analysis

The lens is an open-source python library which is used for fast calculation of summary statistics and the correlation in the dataset. It helps us explore the properties of different attributes of the dataset in just a single line of code.
Lens Data Analysis

Whenever we are working with datasets the first step is generally understanding what is the data all about. So for exploring the data we start with Exploratory Data Analysis which is analyzing the data with certain techniques and visualization in order to get a clear idea of the data we are dealing with. In EDA we analyze different attributes and their statistical properties also we visualize the data using different graphs and plots.

EDA is a necessary step so we cannot neglect it, but performing EDA generally is a pretty time-consuming task because we need to write different types of code for statistical properties as well as codes for different types of visualizations. There are different python libraries and modules which can help in reducing the efforts and time taken in EDA by simple and easy to use codes. The lens is one such library.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The lens is an open-source python library which is used for fast calculation of summary statistics and the correlation in the dataset. It helps us explore the properties of different attributes of the dataset in just a single line of code. It creates different types of visualizations of all the attributes in the data. It works on both numerical and categorical data. It is blazingly fast and easy to use. 

In this article, we will explore how we can perform EDA using Lens and save time and effort.

Implementation:

We will start by installing lens using pip install lens

  1. Importing Required Libraries

We would load the dataset we will use using pandas so we will import pandas and we will import lens for data analysis and visualizations.

import pandas as pd

import lens

  1. Loading the Dataset

The dataset we will use here is an advertising dataset of an MNC which contains different attributes like ‘Sales’, ‘TV’, etc. We will load this dataset using pandas.

df = pd.read_csv(‘Advertising.csv’)

df

Dataset Used
  1. Statistical Analysis of Data

Now as we have loaded the dataset we will work on displaying the statistical properties of this dataset. We will use the summarise and explore function to display the statistical properties of the dataset.

data = lens.summarise(df)

exp = lens.explore(data)

exp.describe()

Dataset Summary

Similarly, we can use these functions to display the properties of a single column also.

exp.column_details(‘Sales’)

Column Summary
  1. Correlation in Dataset

Analyzing and visualizing is easy in the lens, we just need to write a single line of code.

exp.correlation()

Correlation Matrix

exp.correlation_plot()

Correlation Plot
  1. Visualization

We can easily visualize different attributes of the dataset using different plots which are already defined in Lens. Let us look at some of the visualizations.

exp.distribution_plot(‘Sales’)

Distribution Plot

exp.cdf_plot(‘Newspaper’)

CDF Plot

Lens has an attractive function named ‘interactive’ which creates a user interface where users can select different attributes and different type of attributes. Let us visualize this interface.

lens.interactive_explore(data)

Distribution Plot

Here you can clearly see that we can select different attributes and visualize the different type of plots and graphs of those attributes. Let us see some other plots also.

Density Plot
CDF Plot

Conclusion:

In this article, we learned about Lens which helps in fast calculation of summary statistics and correlation. We saw how we use the lens for analyzing the statistical property of a dataset as well as of single columns. We also saw different types of visualization that are provided by the lens and created some of the plots. Finally, we saw the interactive function which created a user interface for selecting different graphs and plots for different attributes. The lens makes the process od data analysis and visualization simpler and effortless. 

More Great AIM Stories

Himanshu Sharma
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR
How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?

Another bill bites the dust

The Bill had faced heavy criticism from different stakeholders -citizens, tech firms, political parties since its inception

So long, Spotify

‘TikTok Music’ is set to take over the online streaming space, but there exists an app that has silently established itself in the Indian market.

[class^="wpforms-"]
[class^="wpforms-"]