MITB Banner

Demonstration Of What-If Tool For Machine Learning Model Investigation

In this article we will learn about: What is the What-If tool? What are the features of this tool? Walkthrough with a sample dataset.

Share

Machine learning era has reached the stage of interpretability where developing models and making predictions is simply not enough any more. To make a powerful impact and get good results on the data it is important to investigate and probe the dataset and the models. A good model investigation involves digging deep into the understanding of the model to find insights and inconsistencies in the developed model. This task usually involves writing a lot of custom functions. But, with tools like What-If, it makes the probing task very easy and saves time and efforts for programmers. 

In this article we will learn about:

  1. What is the What-If tool?
  2. What are the features of this tool?
  3. Walkthrough with a sample dataset. 

What is the What-If tool?

What-If tool is a visualization tool that is designed to interactively probe the machine learning models. WIT allows users to understand machine learning models like classification, regression and deep neural networks by providing methods to evaluate, analyse and compare the model. It is user friendly and can be used not only by developers but also by researchers and non-programmers very easily. 

WIT was developed by Google under the People+AI research (PAIR) program. It is open-source and brings together researchers across Google to study and redesign the ways people interact with AI systems.

What are the Features?

This tool provides multiple features and advantages for users to investigate the model. 

Some of the features of using this are:

  1. Visualizing the results of inference
  2. Arranging the data according to the similarity
  3. Editing data points to see how the model reacts to change 
  4. Comparing multiple machine learning models 
  5. Compare counterfactuals of the data
  6. Implement confusion matrix and ROC curves
  7. Use feature values to evaluate the model performance
  8. Test algorithm constraints

WIT can be used with a Google Colab notebook or Jupyter notebook. It can also be used with Tensorflow Board. 

Walkthrough with a Sample Dataset.

Let us take a sample dataset to understand the different features of WIT. I will choose the forest fire dataset available for download on Kaggle. You can click here for downloading the dataset. The goal here is to predict the areas affected by forest fires given the temperature, month, amount of rain etc. 

I will implement this tool on google collaboratory. Before we load the dataset and perform the processing, we will first install the WIT. To install this tool use, 

!pip install witwidget

Importing the libraries and loading the data

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import GradientBoostingRegressor
from google.colab import drive
drive.mount('/content/gdrive')
df = pd.read_csv(‘/content/gdrive/My Drive/forestfires (2).csv’)
df
what-if

Data splitting and pre-processing

features=df.drop('area',axis=1)
target=df[["area"]]
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2)

Once we have split the data, we can convert the columns month and day to categorical values using label encoder. 

label_encoder=preprocessing.LabelEncoder()
X_train['month']=label_encoder.fit_transform((X_train['month']))
X_train[‘day’]=label_encoder.fit_transform((X_train['day']))
X_test['month']=label_encoder.fit_transform((X_test['month']))
X_test['day']=label_encoder.fit_transform((X_test['day']))
X_train
machine learning

Now we can build our model. I will use sklearn ensemble model and implement the gradient boosting regression model. 

params = {'n_estimators': 200, 'max_depth': 10,
          'learning_rate': 0.1, 'loss': 'ls','random_state':0}
reg_mod = GradientBoostingRegressor(**params)
reg_mod.fit(X_train, y_train)

Now that we have the model trained, we will write a function to predict the data since we need to use this for the widget. 

def adjust_prediction(z):
 testing_data = pd.DataFrame(X_test, columns=X_test.columns.tolist())
 return reg.predict(testing_data)

Next, we will write the code to call the widget. 

from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget
num_data= 2000  
tool_height = 1000 
test_examples = np.hstack((X_test[:num_data].values,y_test[:num_data]))
config_builder = (WitConfigBuilder(test_examples.tolist(), X_test.columns.tolist() + ["area"])
  .set_custom_predict_fn(adjust_prediction)
  .set_target_feature('area')
  .set_model_type('regression'))
WitWidget(config_builder, height=tool_height)

This opens an interactive widget with two panels.

jupyter notebook

To the left, there is a panel for selecting multiple techniques to perform on the data and to the right is the data points. 

As you can see on the right panel we have options to select features in the dataset along X-axis and Y-axis. I will set these values and check the graphs.

what-if tool

Here I have set FFMC along the X-axis and area as the target. Keep in mind that these points are displayed after the regression is performed. 

Let us now explore each of the options provided to us. 

Data exploration tab

Editing and viewing the dataset

You can select a random data point and highlight the point selected. You can also change the value of the datapoint and observe how the predictions change dynamically and immediately. 

As you can see, changing the values changes the predicted outcomes. You can change multiple values and experiment with the model behaviour.

Finding Counterfactuals

Another way to understand the behaviour of a model is to use counterfactuals. Counterfactuals are slight changes made that can cause a model to flip its decision. 

By clicking on the slide button shown below we can identify the counterfactual which gets highlighted in green. 

jupyter notebook

Partial Dependence Plots

This plot shows the effects that the features have on the trained machine learning model. 

As shown below, we can see the inference of all the features with the target value. 

jupyter notebook

Performance Tab

This tab allows us to look at the overall model performance. You can evaluate the model performance with respect to one feature or more than the one feature. There are multiple options available for analysis of the performance. 

I have selected two features FFMC and temp against the area to understand performance using mean error. 

If multiple training models are used their performance can be evaluated here. 

Features Tab

The features tab is used to get the statistics of each feature in the dataset. It displays the data in the form of histograms or quantile charts. 

The tab also enables us to look into the distribution of values for each feature in the dataset.

It also highlights the features that are most non-uniform in comparison to the other features in the dataset. 

what-if tool

Identifying non-uniformity is a good way to reduce bias in the model.

Conclusion

WIT is a very useful tool for analysis of model performance. Ability to inspect models in a simple no-code environment will be of great help especially in the business perspective. 

It also gives insights to factors beyond training the model like understanding why and how that model was created and how the dataset is fitting in the model. 

Share
Picture of Bhoomika Madhukar

Bhoomika Madhukar

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.