With AI and machine learning making inroads into several domains, including critical ones like healthcare, finance and policy, there has been a growing demand for making these models interpretable to ML practitioners and domain experts. AI suffers from a black-box problem which acts as a big hurdle in gaining public trust and acceptance. To assess the reliability of these models, the systematic error that they pose, and other ingrained biases in these systems, practitioners seek to understand the behaviour of these models, which is called explainable AI.
To this end, many techniques have been proposed to explain complex AI models in a post hoc manner, including LIME, SHAP, gradient times input, SmoothGrad, Integrated Gradients, etc.
Sign up for your weekly dose of what's up in emerging technology.
That said, explainable AI has been under the scanner for multiple reasons. Critics have often spoken about the ambiguous nature of the term ‘explainability’, going as far as calling it virtual signalling that is severely disconnected from reality. While this is a much broader issue that is still being resolved, researchers from top universities like Harvard University, Carnegie Mellon University, and others have come together to analyse the specific ‘disagreement problem’ of explainable AI.
The disagreement problem
The post hoc methods in explainable AI are increasingly gaining popularity, owing mainly to their generality. They are being used in critical fields like medicine, law, policymaking, finance, etc. This makes it very critical to ensure that the explanations rendered by these methods are reliable. There has been prior research to analyse the behaviour of these explainable models; it has been found that several critical aspects pertaining to these methods are unexplored. One of the major challenges that were observed was inconsistency in explanations if multiple explainable methods were used on a particular model. In many instances, explanations generated by various methods disagree with each other; for example, the top-k most important features output by different methods may vary. This drawback has been suitably termed as the ‘disagreement problem’.
When such disagreement problems arise, practitioners need to tackle them carefully lest it might lead to misleading explanations escalating to catastrophic consequences. There is no general-purpose evaluation metric that may help ascertain and compare the quality of explanations – posing a major hurdle in addressing the disagreement problem.
Practitioner’s perspective of the disagreement problem
The authors of the study titled, ‘The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective’, have attempted to highlight the disagreement problem, its extent and scope in the real world, and how to resolve it in practice.
To this end, the authors first obtained inputs from ML practitioners and data scientists who regularly work with explainability tools on what really constitutes the disagreement problem and how frequently it is encountered in the day-to-day workflow. The work mainly focused on local explanation methods like LIME and SHAP that output feature attributions.
Using the inputs obtained from these experts, the authors derived insights and proposed a novel evaluation framework. This framework can quantitatively measure the disagreement between two explanation methods. This framework helps in carrying out empirical analysis with the real-world data to analyse the level of disagreement that may exist between the methods.
The authors experimented with four real-world datasets, six state-of-the-art explanations and a few popular predictive models like tree-based, deep neural networks, LSTM, ResNet, etc. An online user study was conducted to gauge which explanation model experts would rely on in case of disagreement. The participants were asked to provide a detailed description of the strategies to resolve explanation disagreements in their daily workflow.
The researchers found that the state-of-art explanation models often disagree in terms of explanations they put out. There is also a lack of a principled approach for ML practitioners to resolve these disagreements. This was further confirmed by the empirical findings. Further, 86 per cent of the online user study responses showed that ML practitioners either use arbitrary heuristics or are completely unaware of how to solve the disagreement problem.
Read the full paper here.