Listen to this story
Recently, MIT researchers published a paper, The Need for Interpretable Features: Motivation and Taxonomy, highlighting the importance of improving the feature interpretability in machine learning models to understand their outputs better.
In machine learning, interpretability refers to the degree to which a model can be understood in human terms. Interpretable ML models help decision-makers understand why the model predicted a certain outcome. For example, the creditworthiness of a customer. In addition, interpretability can be a useful debugging tool for detecting bias in machine learning models.
In their formal literature review, the researchers found that although there is wide-spread agreement that interpretable features are important for various reasons, there is little work formalizing what makes a feature interpretable, and almost no work quantifying the interpretability of features. The team found that “interpretable feature” is often defined as one that was human-generated or worded using human-readable language.
Sign up for your weekly dose of what's up in emerging technology.
Lessons from real-world domains
To understand machine-learning usability challenges, the researchers worked with experts (who generally do not have ML expertise) in five real-world domains. “We found that out in the real world, even though we were using state-of-the-art ways of explaining machine learning models, there is still a lot of confusion stemming from the features, not from the model itself,” said Alexandra Zytek, lead author of the paper.
In one case, the researchers conducted formal user studies, where child welfare screeners were provided with feature contribution explanations alongside the predicted risk score to aid them in their screening decision making. The team found that most confusion and distrust in the model arose from the features, despite them being interpretable in terms of the usual definitions — they were hand-selected by humans, and presented in readable, natural language. The main challenges were caused by features being worded confusingly, or being seemingly unrelated to the prediction target according to users.
In another case, the researchers built ML models using Electronic Health Record (EHR) data to predict the probability of patients facing complications after cardiac surgeries. In this case, although the model considered features like the trend of the patient’s heart rate over time to make the prediction, doctors were interested to know how strongly the patient’s heart rate data influenced that prediction.
“With interpretability, one size doesn’t fit all. When you go from area to area, there are different needs. And interpretability itself has many levels,” said Kalyan Veeramachaneni, co-author and principal research scientist in the Laboratory for Information and Decision Systems (LIDS).
Taxonomy of ML features
The study highlights the need for an interpretable feature space that closely aligns with real-world and human cognition. An interpretable feature space enables users (with no ML expertise) to make decisions. Based on real-world experiences across domains, the researchers suggested a taxonomy of feature properties that can make features more or less interpretable for different decision makers. They also highlighted which properties are likely most important to particular users.
For example, if the goal is to improve model performance, then features should be in a form that is model compatible and statistically correlated to the target variable. However, if the goal is to enable better decision making, the features should be described so that users can understand and draw conclusions from them.
The researchers identified five categories of users— developers, theorists, ethicists, decision makers and subjects impacted by machine-learning model’s predictions. Then, they defined properties that make features interpretable for each category. They also provided guidelines on how developers can transform features into simpler formats for laypersons to understand by using what they call “interpretable transforms”.
Traditionally, most work on feature engineering has focused on the model-ready feature state. These transforms generally convert the data to a form that is compatible with the model; or improve performance by factoring in an understanding of the domain. Model-ready transforms will often reduce the interpretability of features. For example, in the case of child welfare, one-hot encoding (a technique of converting data to prepare it for an ML algorithm and get a better prediction) reduces the “human- wordiness” of the features. In such cases, interpretable transforms come into play.
The screening requires features that are meaningful, understandable, and human-worded. The properties can be achieved in part through feature selection and semantic binning, while ensuring features presented are categorical and unstandardized. Converting data into categorical form improved the cardinality of the feature. In addition, semantic binning improved the interpretability of the features by allowing users to refer to age categories to make their decisions. The age categories were binned based on stages of child development like infant, toddler, teenager etc.
Similarly, in the case of healthcare, features need to be simulatable for doctors to understand how strongly the patient’s heart rate data influence the prediction of the ML model and provide treatment accordingly. To meet this, researchers connected the features to raw data, i.e. they used a patient’s pulse signal rather than the feature MEAN(pulse).
Although a model may require interpretable features to be useful, transforming to the interpretable feature space always carries some risk. Some transformations can bias the explanation. At times, incorporating only those features that are interpretable can prevent the model from predicting unexpected patterns that could be true. Therefore, developers need to decide consciously when selecting features.