21st-may-banner design

Yellowbrick Hands-On Guide – A Python Tool for Machine Learning Visualizations

In this article, we will explore different types of visualizations that are provided by Yellowbrick and how we can create them according to our requirements.

Share

Yellowbrick Visualization

Yellowbrick is mainly designed to visualize and Diagnose the machine learning models. It is a visualization suite built on top of Scikit-Learn and Matplotlib. It helps in the model selection process, hyperparameter tuning, and algorithm selection. 

Yellowbrick calls an API using the visualizer which is a scikit-learn estimator, the visualizer learns from data by creating the visualization of the workflow of the model selected.  These visualizations allow us to draw insights into the model selection process.  

In this article, we will explore different types of visualizations that are provided by Yellowbrick and how we can create them according to our requirements.

Implementation: 

Yellowbrick is based on scikit-learn and matplotlib so we need to install both and then install yellowbrick. The command for installing all three libraries is given below:

pip install scikit-learn

pip install matplotlib

pip install yellowbrick

  1. Feature Analysis Visualization

We will import different functions defined in yellowbrick and scikit-learn for model selection as and when required. We will start by visualizing an advertising dataset that contains 3 features and one target variable ‘Sales’.

a. Loading the Dataset

import pandas as pd

df = pd.read_csv(‘Advertising.csv’)

df

Dataset we are using

b. Defining Target and Feature variables

x = df[['TV', 'Radio', ‘Newspaper’]]

y= df['Sales']

c. Visualizing Features

from yellowbrick.features import Rank1D

visual = Rank1D()

visual.fit(x, y)

visual.transform(x)

visual.show() 

   2. Linear Regression Visualization

We will create a linear regression model using Scikit-Learn to visualize the Linear Regression using Yellowbrick.

a. Creating the model

We will create a linear regression model to visualize.

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

x_train, x_test, y_train, y_test = train_test_split(x,y, random_state=1)

model = LinearRegression().fit(x_train, y_train)

model_pred = model.predict(x_test)

b. Visualizing the Model

Using yellowbrick to visualize the model

from yellowbrick.regressor import PredictionError , ResidualsPlot

visual = PredictionError(model).fit(x_train, y_train)

visual.score(x_test, y_test)

visual.poof()

Regression Visualization

    3. Model Selection Visualization

The model selection visualizer helps us in inspecting the performance of cross-validation and hyperparameter tuning. 

Let us visualize the feature importance using Random Forest Classifier and Yellowbrick.

from sklearn.ensemble import RandomForestClassifier

from yellowbrick.model_selection import FeatureImportances

model = RandomForestClassifier()

viz = FeatureImportances(model)

viz.fit(x, y)

viz.show()

Similarly, we can visualize feature importance using Logistic Regression and yellowbrick. 

model = LogisticRegression(multi_class="auto", solver="liblinear")

visual = FeatureImportances(model, stack=False, relative=False)

visual.fit(x, y)

visual.show()

Feature Importance, Machine Learning Visualizations

  4. Textual Data Visualization

Yellowbrick can help us analyze the textual data properties also. For analyzing textual data we can read any textual data using the open function and visualize the frequency of the word using Frequency Distribution Visualizer.

a. Importing Library and loading dataset

from sklearn.feature_extraction.text import CountVectorizer

from yellowbrick.text import FreqDistVisualizer

corpus = open('text.txt', 'r')

vectorizer = CountVectorizer()

docs       = vectorizer.fit_transform(corpus)

features   = vectorizer.get_feature_names()

b. Visualizing The frequency and features or words

visualizer = FreqDistVisualizer(features=features, orient='v')

visualizer.fit(docs)

visualizer.show()

Text Data Visualization, Machine Learning Visualizations

    5. Anscombe’s Quartet 

In the end, let us visualize the Anscombe’s Quartet which is a collection of four datasets that have similar statistical properties in the description format but are very different in the visual format. Anscombe’s Quartet clearly describes why we need to visualize data is an example of why Visualization is important for machine learning. 

import yellowbrick as yb

import matplotlib.pyplot as plt

ans = yb.anscombe()

plt.show()

Machine Learning Visualizations

We can clearly visualize how different these four datasets are irrespective of their similar statistical properties.

Conclusion:

In this article, we have learned about Yellowbrick, a visualization library used for visualizing machine learning models and algorithms. We saw how we can create different visualizations for different purposes using YellowBrick. This is just an introduction to the capabilities of yellowbrick, it has many more features and functions which are very helpful.

Share
Picture of Himanshu Sharma

Himanshu Sharma

An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe

Subscribe to our Youtube channel and see how AI ecosystem works.

There must be a reason why +150K people have chosen to follow us on Linkedin. 😉

Stay in the know with our Linkedin page. Follow us and never miss an update on AI!