Machine learning models are generally considered black-box models in the entire community despite their massive implementation. It becomes very essential to Understand how particular predictions are being made and how models focus on various aspects of parameters it has learned. Models are usually assessed using certain evaluation matrices on a given test dataset. Real-world data, on the other hand, is frequently different, so the evaluation metric may not accurately reflect the product’s purpose.
In addition to such metrics, evaluating individual predictions and their justifications is a viable solution for optimizing performance. In this article, we will discuss debugging and visualizing machine learning algorithms using ELI5. ELI5 is a tool in Python that is used to visualize and debug various Machine Learning models using a unified API. The major points to be covered in this article are given below.
Sign up for your weekly dose of what's up in emerging technology.
Table of Contents
- Explainability and Interpretability in Machine Learning
- ELI5 (Explain Like I’m 5)
- XGBoost with ELI5
- Example of Keras Implementation
- Advantages and Usage of ELI5
Now, let us start with understanding explainability and interpretability.
Explainability and Interpretability in Machine Learning
Explainability and interpretability are frequently used in machine learning and artificial intelligence. Even though they are extremely similar, it’s worth exploring the differences, if only to demonstrate how difficult things can get once you start looking into machine learning systems. The amount to which a cause and effect may be observed within a system is known as interpretability. To put it another way, it refers to your ability to forecast what will happen in response to a change in input or computational parameters. It’s the ability to look at an algorithm and pertaining, what’s going on there.
Meanwhile, explainability refers to how well the internal mechanics of a machine or deep learning system can be communicated in human terms. It’s easy to overlook the tiny distinction with interpretability, but think of it this way: interpretability is about being able to understand mechanics without necessarily knowing why. Explainability refers to the ability to explain what is happening in detail. Simple models (Like Linear or Logistic regression) can be used to explain findings for a sample data set. Typically, these models are insufficient, and we must go to Deep Learning models, which deliver great performance but are a mystery to the majority of Data Science practitioners. Machine learning models are currently utilized to make a variety of essential judgments, including fraud detection, credit rating, self-driving, and patient examination.
It becomes very important to every practitioner that enhancing the interpretability and explainability of models is now crucial in most development and that can make us stand differently than others. We can address the issues and goals of the problem statement correctly by understanding how algorithms work.
ELI5 (Explain Like I’m 5)
ELI5 is a Python toolkit that uses a uniform API to visualize and debug diverse Machine Learning models. It supports all scikit-learn algorithms (including the fit() and predict() methods). It includes built-in support for numerous ML frameworks and allows you to explain white-box models (Linear Regression, Decision Trees) as well as black-box models (Keras, XGBoost, LightGBM). It is applicable to both regression and classification models.
Now we are going to see how ELI5 interprets and explains a model using its eli5.show_weights and eli5.show_prediction API. The practical demo is divided into two parts. First, we are going to interpret and explain XGBoost. Following it we will see the same for Keras application.
XGBoost with ELI5
! pip install eli5 from xgboost import XGBClassifier import eli5 from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split import pandas as pd import numpy as np
The data set here we are using is the sklearn built-in data set for breast cancer prediction while implementing, create a pandas data frame for breast cancer dataset with the proper header this is because when we execute eli5 it retrieves feature information from the model.
data = load_breast_cancer() df = pd.DataFrame(data.data) df.columns = data.feature_names df['target'] = data.target
Build a classifier:
model = XGBClassifier() model.fit(x_train,y_train)
Now we need to use just two simple functional API’s of ELI5 as below.
eli5.show_weights(model, top=30) eli5.explain_prediction_xgboost(model,x_test.iloc)
The left side shows weights assigned for each feature and the right side shows the prediction for one instance
As you can see from the above two tables how XGBoost assigned weights for each feature based on training data and from the other table, for a particular instance, to reach a probability of 0.981 for class 1 how each feature has contributed.
Similarly, next, we are going to see the same interpretation for Keras’s application.
ELI5 with Keras Implementation
If we have a model that takes an image as input and returns class scores (probabilities that a specific object is present in the image), we can use ELI5 to see what was in the image that caused the model to predict a specific class score.
For the Keras demo, we are using a VGG16 pre-trained network and its interpretation for a random image.
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications import VGG16 import keras vgg16 = VGG16(include_top=True, weights='imagenet', classes=1000) # load image im = keras.preprocessing.image.load_img('/content/HAL-TEDBF-Fighter-Jet-With-Vikrant-Aircraft-Carrier-Art.jpg', target_size=(224, 224)) doc = keras.preprocessing.image.img_to_array(im) doc = np.expand_dims(doc, axis=0) doc = keras.applications.vgg16.preprocess_input(doc) # visualize the image keras.preprocessing.image.array_to_img(doc)
# explain eli5.show_prediction(vgg16, doc)
As you can see ELI5 shows how the VGG16 looks for objects for which a given image is to be classified.
Advantages and Usage of ELI5
ELI5, can use an existing function and produce good results that are formatted as well, It also allows code to be reused across different machine learning frameworks, It can deal with a slew of minor inconsistencies.
ELI5 can be used to inspect basic model parameters and to figure out how the models perform on a global scale. ELI5 can be used to examine specific predictions provided by a single model, as well as the decisions made by the models.
We generally tend to use many models and algorithms for our problem and choose one which performs better than others. Practically evaluating each such model is a tedious task that can slow down our development process. By using the ELI5 and mastering it for a variety of algorithms we can easily choose a model which can outperform our task. From this article, we have seen how interpretability and explainability play an important role.