Data Visualization is an accessible way to represent the patterns, outliers, anomalies, etc. that are available in data by plotting graphs and charts. Data Visualization is a powerful tool because as soon as the human eyes see a chart or plot they try to find out a pattern in it because we get attracted to colours and patterns. Python provides different visualization libraries but Seaborn is the most commonly used library for statistical data visualization.
It can be used to build almost each and every statistical chart. It is built on matplotlib which is also a visualization library. Seaborn provides highly attractive and informative charts/plots. It is easy to use and is blazingly fast. Seaborn is a dataset oriented plotting function that can be used on both data frames and arrays. It enhances the visualization power of matplotlib which is only used for basic plotting like a bar graph, line chart, pie chart, etc.
Through this article, we will discuss the following points in detail:
- How to use Seaborn
- Visualizing different statistical charts
- Various plotting functions in Seaborn
- Different parameters for seaborn visualization.
Before using seaborn we need to install it using pip install seaborn.
Visualization Implementations in Seaborn
Here, we will download a dataset named “tips’ from the online repository, or by using Seaborn’s load_dataset() function. This dataset contains different attributes like total_bill, tips, smoker, etc.
Let us start by importing the important libraries and the dataset.
import pandas as pd import seaborn as sns df = sns.load_dataset("tips") df
Plotting different statistical graphs:
‘lmplot’ is the most basic plot which shows a line along a 2-dimensional plane and is used to see the correlation between two attributes plotted against the x-axis and y-axis. Here we will plot Sales against TV.
Seaborn also allows you to set the height, colour palette, etc. properties for the plot generated.
sns.lmplot(x="total_bill", y="tip", data=df, height=4, palette="dark")
A Kernel Density Estimate plot is used to visualize the Probability density distribution of univariate data. In simple terms, we can use it to know the spread/distribution of the data.
Scatterplots are similar to lineplot/lmplot, the difference is that it only shows the scattering of the two attributes without trendline. It is also used for finding the relation between two attributes.
sns.scatterplot(x="total_bill", y="tip", data=df)
Distplot is the most convenient way of visualizing the distribution of the dataset and the skewness of the data. It is a combination of kdeplot and histograms.
Barplots are the most common type of visualization and mostly used for showing the relationship between numeric and categorical data. Barplots can be plotted both horizontally and vertically as required.
sns.barplot(x="sex", y="total_bill", data=df)
FacetGrids are used to draw multiple instances of the same plot on different subsets of the dataset. In this visualization, we take a data frame as an input and the names of variables for rows and columns.
To draw facet grids we need to import matplotlib as well. Let us visualize the dataset using Histogram FacetGrids.
import matplotlib.pyplot as plt a = sns.FacetGrid(df, col="time", row="sex") a.map(plt.hist, "total_bill")
We use box-plots to graphically display the data according to its quartiles. With box-plot, we can easily identify the median, any outlier if data has and the range of the data points.
Here we will visualize the tip that is paid on different days of a week.
sns.boxplot(x="day", y="tip", data=df)
8. Violin Plots
Violin plots are the combination of the KDE plot and box-plot. It is used to visualize the numerical distribution of the data.
sns.violinplot(x="day", y="total_bill", data=df)
Heatmaps are used to display the correlations of different numerical attributes in a dataset. In heatmaps, colour scheme plays an important role in visualizing whether the relationship is positive or negative.
For creating a heatmap we will create a Correlation matrix and pass it to the heatmap parameter. We will also set the annotation to true so the value of the relationship is also visible.
Jpintplots are useful when we want to visualize the relationship between two variable as well as their univariate relationship. Jointplots are of many types and we can define the kind we want by passing the value in the “kind” parameter.
sns.jointplot(x="total_bill", y="tip", data=df, kind="reg")
These are some of the basic plots which we can visualize using Seaborn, and are helpful in data analysis. Seaborn has the advantage of manipulating the graphs and plots by applying different parameters. Some of the important parameters are:
- set_style: It is used to set the aesthetics style of the plots, mainly affects the properties of the grid and axes.
- hue: It is used for deciding which column of the dataset will be used for colour encoding.
- palette: It is used to select the colour palette which we want to use for our visualization. Seaborn has six pre-defined colour palettes namely: “pastel”, “muted”, “bright”, “deep”, “colorblind” and “dark”.
- height: It sets the height of the visualization figure.
Other than these properties all the graphs have some of their internal properties which can be altered accordingly.
In this article, we saw how we can create highly informative and visually appealing charts and graphs using seaborn and what are the uses of this visualization. We explored that seaborn is really easy to use as all of our graphs are created in just a single line of code and is blazingly fast as graphs are plotted within seconds.
Provide your comments below
If you loved this story, do join our Telegram Community.
Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.