Concept-based interpretability tools assist artificial intelligence researchers and engineers in designing, developing and debugging AI models. Additionally, these tools help understand the working of AI models, helping businesses assess if the models deliver accurate results and reflect their values. One such interpretability tool is Facebook’s Captum.
Captum is a powerful and flexible interpretability library for PyTorch, which makes algorithms for interpretability readily available for access to the PyTorch community. Captum supports model interpretability across modalities — vision, text; additionally allowing researchers to add new algorithms and benchmark their work against existing algorithms available in the library. Finally, it offers tools to help developers uncover vulnerabilities using metrics and adversarial attacks.
Recently, Facebook released its latest version of Captum — Captum 0.4, which has new functionality for model understanding. Facebook AI has added tools to evaluate model robustness, improvements to its existing attribution models, and new attribution methods in the latest version.
Removal of Statistical Biases
Captum 0.4 adds testing with Concept Activation Vectors (TCAV) that allows researchers and engineers to assess how different user-defined concepts affect a model’s prediction. It can also be used for checking algorithmic and label bias, which might be embedded in networks.
Additionally, TCAV’s capabilities expand beyond currently available attribution methods, enabling researchers to quantify the importance of the various inputs and quantify the impact of concepts like gender and race on a model’s prediction.
Captum 4.0 comes with generically implemented TCAV, allowing users to define custom concepts with example inputs for different modalities — vision and text.
Source: Facebook AI
The graphs above showcase visualised distributions of TCAV scores for the sensitivity analysis model implemented in a Captum tutorial. As a data set, Facebook AI researchers have used movie ratings with positive sentiment. The graphs visualise TCAV scores for positive adjectives concepts along with five sets of neutral terms concepts. The positive adjectives concept is more important than for both convolutional layers across all the five neutral concept sets, indicating the importance of positive adjectives in predicting positive sentiment.
Robust AI Models
Deep learning techniques are often vulnerable to adversarial inputs that, in turn, can fool the AI model and be imperceptible to humans. Captum 0.4 comes with tooling to support the improved understanding of the limitations and vulnerabilities of a model. As a result, the AI system will react to unforeseen issues and make necessary changes to avoid harming or otherwise negatively affecting people.
Captum 0.4 also comes with tools to understand the robustness of the model, including implementations of adversarial attacks and robustness metrics to evaluate the impact of different attacks or perturbations on a model. The robustness metrics included in its latest version are:
- Attack Comparator: It allows users to quantify input perturbation’s impact. This includes text augmentation, and torchvision transforms. It also helps quantify adversarial attacks on a model and compare the impact of the various attacks.
- Minimal Perturbation: It identifies the minimum perturbation required to cause a model to misclassify the perturbed input.
This new tooling enables developers to understand potential model vulnerabilities better and analyse counterfactual examples to comprehend a model’s decision boundary better.
Source: Facebook AI
Relevance Propagation & Attribution
Facebook AI has implemented a new attribution algorithm, in collaboration with Technische Universitat Berlin, to offer a new perspective for explaining model predictions. Captum 0.4 adds both LRP and a layer-attribution variant — layer LRP.
The Layer-wise Relevance Propagation (LRP) algorithm is based on a backward propagation mechanism applied sequentially to all layers of the model. The model output score represents the initial relevance which is then decomposed into values for each neuron of the underlying layers.
Additionally, in Captum 0.4, the Facebook AI team has added tutorials, improvements and bug fixes to the existing attribution methods.