By the advancements of technology, we are generating huge amounts of data in multiple ways. The data generated from the origin of the earth to the 20th century is equal to the data generated from 2001 to 2020. It means the data generated from the past 20 years is more than ever generated. The data is useless without getting insights from it so we need to preprocess the data and need to find the trends in the data.
While working with machine learning projects, 70% of the time we spend in preprocessing of the data. By using a pictorial representation of data we can understand the data quickly and easily. So some researchers created visualization tools and libraries that are very useful in preprocessing. In this article, we will demonstrate how to use Plotly and seaborn tools.
In this article, we will explore two popular visualization libraries in Python for data visualization – Plotly and Seaborn – and demonstrate the following basic types of visualization for comparison:-
- Box-plot
- Bar-plot
- Pair plot
Box-plot
A box-plot is a visualization technique that indicates the outliers in the data and this is the standardized way of displaying our data based on outliers, Outliers are nothing but the values away from the mean. Using this Box-plot we can compare the distribution of data between different datasets. Now let’s visualize Box-plot using Plotly and seaborn.
Using Seaborn
import seaborn as sns
import pandas as pd
df = sns.load_dataset("tips")
sns.boxplot( x=df["tip"], y=df["sex"], palette="Accent");
plt.show()
Using Plotly
import plotly.express as px
df = px.data.tips()
fig = px.box(df, x="day", y="total_bill", color="smoker")
fig.show()
Output: In the above productions, the displayed dots can be represented as outliers, and here plotly is also displaying the values of Quantile regions in the Box-plot, but using seaborn we can visualize whether the dataset has outliers.
Bar-plot
Bar-plots are the most common type of plots used for visualization. It displays the relationship between the absolute value and numerical value, They are represented in rectangular blocks. For example, in the data, if you need to find which country has the highest population, by using box-plot we can quickly get insights from it.
Using Seaborn
import seaborn as sns
import pandas as pd
df = sns.load_dataset("tips")
sns.barplot(x="sex", y="total_bill", data=df)
Using Plotly
import plotly.express as px
df = px.data.tips()
fig = px.bar(df, x="sex", y="total_bill", color='day')
fig.show()
Output: In the above outputs, using Bar-plot in seaborn we are able to know the ratio of male and female but by using Plotly we can know how many males and females are visiting on a particular day. By using Plotly we can get more information.
Pair-plot
Pair plot is used to visualize the relationship in-between each variable in the dataset. In the X-axis and Y-axis, the data columns are placed, and by using multiple graphs we can get insights into the entire dataset at once. For example, let us have data on cars and we need to predict the millage using our model. Then in Exploratory Data Analysis, using pair plot we can know what are variables influencing the millage. Mostly the mileage of the car is influenced by weight, speed, fuel type. We can get this type of visualization using a pair-plot.
Using Seaborn
import seaborn as sns
import pandas as pd
df = sns.load_dataset("tips")
sns.pairplot(df)
Using Plotly
import plotly.express as px
df = px.data.tips()
px.scatter_matrix(df)
Output – Comparing the above outputs, Seaborn is easy to visualize while using the Plotly tool it is hard to get insights from multiple graphs.
Conclusion
Through the above demonstration, we can conclude that both plotly and seaborn are used for visualization purposes but plotly is best for its customization and interface. By hovering the mouse on the graph it displays values at each point and we can download, zoom and crop our graph. This is a user-friendly visualization tool and popular tool among the Data scientists community.