A guide to explaining feature importance in neural networks using SHAP

SHAP values (SHapley Additive exPlanations) is an awesome tool to understand your complex Neural network models and other machine learning models such as Decision trees, Random forest.

Share

Published on March 22, 2022

by Waqqas Ansari

SHAP values (SHapley Additive exPlanations) is an awesome tool to understand your complex Neural network models and other machine learning models such as Decision trees, Random forests. Basically, it visually shows you which feature is important for making predictions. In this article, we will understand the SHAP values, why it is an important tool for interpreting neural network models, and in the end, we will implement SHAPS values to interpret neural networks. The major points to be covered in this article are listed below.

Introduction to SHAP value
Importance of SHAP values
Calculating SHAP values of Neural networks.

Let’s first understand the features of SHAP.

Introduction to SHAP value

SHAP value is a real breakthrough tool in machine learning interpretation. SHAP value can work on both regression and classification problems. Also works on different kinds of machine learning models like logistic regression, SVM, tree-based models and deep learning models like neural networks.

In a regression problem even if the features are correlated SHAP value can correctly assign the feature importance. Hence every ML developer should have this tool in their skillset to represent the model outcomes.

Are you looking for for a complete repository of Python libraries used in data science, check out here.

Importance of SHAP values

After implementing machine learning models our next step is to analyze the model. SHAP value helps to select which feature is important and which feature is useless by plotting graphs. SHAP value became a famous tool in a very short period of time because before we had interpretation only in tabular form so it became tricky to get the result, but in the visual representation of feature importance, we can get the result at first glimpse.

Calculating SHAP values of Neural networks

In this section we are going to implement a neural network then calculate the SHAP value.

First of all, install the shap value package into your environment.

(Note: implementation done in google colab)

!pip install shap

Load essential libraries that will help us to implement neural networks, plotting graphs, and computations.

#load libraries
import tensorflow as tf
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import sklearn
import lifelines
import shap

# This sets a common size for all the figures we will draw.
plt.rcParams['figure.figsize'] = [10, 7]

Now we will Import our data taken from MachineHack’s Weekend hackathon into pandas DataFrame. Our target variable is “IsUnderRisk”, where 1 means under risk, 0 means not under risk.

df = pd.read_csv('/content/Train.csv')
df

Data has no issue, I deliberately choose this data because preprocessing is not our motive. So I directly convert the dataframe into an array so that it can pass into the neural network.

#convert the data into array
dataset = df.values
dataset

Select X and y values

X = dataset[:, 0:7]
y = dataset[:, 7]

Store all feature names in an array and save it into the “features” variable

features = ['City', 'Location_Score', 'Internal_Audit_Score',
       'External_Audit_Score', 'Fin_Score', 'Loss_score', 'Past_Results'
       ]

Convert the values into standard form

#process the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scale = scaler.fit_transform(X)
X_scale

Splitting data into training and testing in 80 : 20

#split the data 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X_scale, y, test_size=0.2, random_state = 4)

Model building and compiling

#Build the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid') # 1 because its a binary classification
])

#Compile the model
model.compile(
    loss = tf.keras.losses.binary_crossentropy,
    optimizer = tf.keras.optimizers.Adam(lr = 0.02),
    metrics = [
        tf.keras.metrics.BinaryAccuracy(name='accuracy'),
        tf.keras.metrics.Precision(name='precision'),
        tf.keras.metrics.Recall(name='recall')
    ]
)

hist = model.fit(X_train, y_train, epochs=100)

Now we will start calculating shap values. First start with defining an explainer, and in the second line we have calculated the shap values. Note that shap value calculation is extremely slow so before applying it make sure that it will take a lot of time, my test data has only 109 rows so i don’t have to worry. After running below code you will see the progress bar.

e = shap.KernelExplainer(model, X_train)
shap_values = e.shap_values(X_test)

We can use the shapely values to interpret our model.

shap.initjs()
# visualize the first prediction's explanation with a force plot
shap.force_plot(e.expected_value[0], shap_values[0][0], features = features)

We can use the shapely values to interpret our model. ‘force_plot’ showing how each feature influences the output. ‘External_Audit_Score’, ‘Internal_Audit_Score’, ‘Fin_Score’ are the biggest contributors in making predictions.

shap.summary_plot(shap_values[0], X_test, feature_names=features)

The side colors bar from high to low indicate the value of the feature, and at the x-axis tells risk, the positive side tells you there is a risk and the negative side tells you that there is no risk. Actually the negative side is 0 and the positive side is 1.

So the low value of “internal_audit_score” determines there is no risk, and higher value determines there is a risk. But see that the “Location_Score” high value determines there is no risk, and the low value determines there is no risk. The features are sorted by their significance in the data. We can see that “internal_audit_score” is the most important feature.

Final words

We start with the introduction to shap value then understand why this tool is very much important in interpreting the ML models. Then at the end we saw practically how shap value make life so easy in interpreting the ML models.

References

Access all our open Survey & Awards Nomination forms in one place

Waqqas Ansari

Waqqas Ansari is a data science guy with a math background. He likes solving challenging business problems through predictive modelling, descriptive modelling, and machine learning algorithms. He is fascinated by new technologies, especially those relating to machine learning.