“Are easy-to-interpret neurons actually necessary? It might be like studying automobile exhaust to understand automobile propulsion.”
Understanding the underlying mechanisms of deep neural networks (DNNs) typically rely on building intuition by emphasising sensory or semantic features of individual examples. Knowing what a model understands and why it does so is crucial for reproducing and improving AI systems.
Class selectivity is commonly used for analysing the properties of individual neurons to understand the neural networks. The researchers wrote that the preference of an individual neuron for a specific image, say neurons that activate for images of cats but not for other types of images is called “class selectivity.” Selectivity is intuitive and easy-to-understand. However, it remains to be known whether it is necessary and/or sufficient to learn class selectivity in individual units. To explore if easy to interpret neural networks actually hinder the learning process in deep learning models, the researchers at Facebook AI published three works covering interpretability and selectivity.
Why Class Selectivity Can Be Bad For AI
Despite the widespread association of class selectivity with that of interpretability in deep neural networks, exclaimed the researchers, there’s been surprisingly little research into whether easy-to-interpret neurons are actually necessary. The research has recently begun, and the results are conflicting, wrote the researchers at Facebook AI.
“…class selectivity in individual units is neither necessary nor sufficient for convolutional neural networks (CNNs).”
The AI community was hit by a new wave of interpretability. More researchers started focussing on these challenges and making AI more understandable and intuitive. This led to, what the FB team claims, to over-reliance on intuition-based methods, which they warn, can be misleading if not rigorously tested and verified. Models should not just be intuitive but also empirically grounded.
To make the best of class selectivity, the researchers developed a technique that acts like a knob to increase or decrease the class selectivity in neurons. While training a network to classify images, they added an incentive to decrease (or increase) the amount of class selectivity in its neurons. They did this by adding a term for class selectivity to the loss function used to train the networks and controlling the importance of class selectivity to the network using a single parameter.
This work underlines the importance of falsifiability in the domain of AI; falsifiability, the foundation of all scientific endeavours. The researchers showed how we can falter even in the process of making things more understandable and why it is necessary to interpret the intent behind ML interpretability rightly.
In one of the related papers, the authors discussed how interpretability research suffers from an over-reliance on intuition-based approaches that in some cases have led people to an illusion of progress.
Furthermore, to verify the falsifiability of interpretability, the authors recommend remembering the “human” in “human explainability”.
The experiments conducted by the researchers at FB AI team led to the following results:
- Class selectivity is not integral to DNN function and can sometimes have a negative effect.
- Reduced class selectivity can make neural networks more robust against noise.
- Decreasing class selectivity also makes neural networks more vulnerable to targeted attacks in which images are intentionally manipulated in order to fool the networks.
More broadly, concluded the researchers, these results warn against focusing on the properties of single neurons as the key to understanding how DNNs function. Researchers also hope that future work will address the question: Why do networks learn class selectivity if it’s not necessary for good performance?
Read the paper on class selectivity here.