Yellowbrick is mainly designed to visualize and Diagnose the machine learning models. It is a visualization suite built on top of Scikit-Learn and Matplotlib. It helps in the model selection process, hyperparameter tuning, and algorithm selection.
Yellowbrick calls an API using the visualizer which is a scikit-learn estimator, the visualizer learns from data by creating the visualization of the workflow of the model selected. These visualizations allow us to draw insights into the model selection process.
In this article, we will explore different types of visualizations that are provided by Yellowbrick and how we can create them according to our requirements.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Implementation:
Yellowbrick is based on scikit-learn and matplotlib so we need to install both and then install yellowbrick. The command for installing all three libraries is given below:
pip install scikit-learn
Download our Mobile App
pip install matplotlib
pip install yellowbrick
- Feature Analysis Visualization
We will import different functions defined in yellowbrick and scikit-learn for model selection as and when required. We will start by visualizing an advertising dataset that contains 3 features and one target variable ‘Sales’.
a. Loading the Dataset
import pandas as pd
df = pd.read_csv(‘Advertising.csv’)
df
b. Defining Target and Feature variables
x = df[['TV', 'Radio', ‘Newspaper’]]
y= df['Sales']
c. Visualizing Features
from yellowbrick.features import Rank1D
visual = Rank1D()
visual.fit(x, y)
visual.transform(x)
visual.show()
2. Linear Regression Visualization
We will create a linear regression model using Scikit-Learn to visualize the Linear Regression using Yellowbrick.
a. Creating the model
We will create a linear regression model to visualize.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
x_train, x_test, y_train, y_test = train_test_split(x,y, random_state=1)
model = LinearRegression().fit(x_train, y_train)
model_pred = model.predict(x_test)
b. Visualizing the Model
Using yellowbrick to visualize the model.
from yellowbrick.regressor import PredictionError , ResidualsPlot
visual = PredictionError(model).fit(x_train, y_train)
visual.score(x_test, y_test)
visual.poof()
3. Model Selection Visualization
The model selection visualizer helps us in inspecting the performance of cross-validation and hyperparameter tuning.
Let us visualize the feature importance using Random Forest Classifier and Yellowbrick.
from sklearn.ensemble import RandomForestClassifier
from yellowbrick.model_selection import FeatureImportances
model = RandomForestClassifier()
viz = FeatureImportances(model)
viz.fit(x, y)
viz.show()
Similarly, we can visualize feature importance using Logistic Regression and yellowbrick.
model = LogisticRegression(multi_class="auto", solver="liblinear")
visual = FeatureImportances(model, stack=False, relative=False)
visual.fit(x, y)
visual.show()
4. Textual Data Visualization
Yellowbrick can help us analyze the textual data properties also. For analyzing textual data we can read any textual data using the open function and visualize the frequency of the word using Frequency Distribution Visualizer.
a. Importing Library and loading dataset
from sklearn.feature_extraction.text import CountVectorizer
from yellowbrick.text import FreqDistVisualizer
corpus = open('text.txt', 'r')
vectorizer = CountVectorizer()
docs = vectorizer.fit_transform(corpus)
features = vectorizer.get_feature_names()
b. Visualizing The frequency and features or words
visualizer = FreqDistVisualizer(features=features, orient='v')
visualizer.fit(docs)
visualizer.show()
5. Anscombe’s Quartet
In the end, let us visualize the Anscombe’s Quartet which is a collection of four datasets that have similar statistical properties in the description format but are very different in the visual format. Anscombe’s Quartet clearly describes why we need to visualize data is an example of why Visualization is important for machine learning.
import yellowbrick as yb
import matplotlib.pyplot as plt
ans = yb.anscombe()
plt.show()
We can clearly visualize how different these four datasets are irrespective of their similar statistical properties.
Conclusion:
In this article, we have learned about Yellowbrick, a visualization library used for visualizing machine learning models and algorithms. We saw how we can create different visualizations for different purposes using YellowBrick. This is just an introduction to the capabilities of yellowbrick, it has many more features and functions which are very helpful.