Download our Mobile App
Learning contextually is associated with intelligence. In machines, this characteristic has so far been elusive. However, a group of researchers in their recent work claim that there is an uncanny way in which machines approach targets through shortcut learning. The emergence of shortcut learning also coincides with the increased usage of deep neural networks, which are fed with vast amount of data.
Identifying Shortcut Learning
For instance, consider a network that is trained to classify cows in images. Now, if the network fumbles when it is shown a cow near a beach, then it is almost obvious that the ML model got dangerously associated with locations like grass, mountains, etc. This is dangerous because misclassification due to change in settings is a typical display of bias. Here the bias, however, is towards the location. Now imagine a similar setup in case of self-driving cars or medical diagnosis. The models cannot afford to deviate too much from the target variable.
Many failed cases, observe the authors, are not independent phenomena, but are instead connected in the sense that DNNs follow unintended ‘shortcut’ strategies.
If cows happen to be on grassland for most of the training data, detecting grass instead of cows becomes a successful strategy for solving a classification problem in an unintended way. Worse yet, a machine classifier successfully detected pneumonia from X-ray scans of several hospitals. Still, its performance was surprisingly low for scans from novel hospitals as the model mysteriously started to identify particular hospital systems with near-perfect accuracy, just because it somehow started detecting a hospital-specific metal token on the scan.
These so-called dataset biases have long been known to be problematic for machine learning algorithms.
Shortcut learning currently occurs across deep learning, causing machines to fail unexpectedly. Many individual elements of shortcut learning have been identified long ago by parts
of the machine learning community, and some have already seen substantial progress. Still, currently a variety of approaches are explored without a commonly accepted strategy. The authors outline three actionable steps towards diagnosing and analysing shortcut learning.
Stay ConnectedGet the latest updates and relevant offers by sharing your email.
When changes to images – such as rotation or addition of noise – interact with the shortcut features that DNNs are sensitive to, they completely derail neural network prediction.
This highlights that generalisation failures are neither a failure to learn nor a failure to generalise at all, but instead, a failure to generalise in the intended direction—generalisation and robustness can be considered the flip side of shortcut learning.
So far, we have been talking about shortcuts in computer vision tasks. The authors, however, also shed light on how shortcut learning keeps surfacing in other domains, such as NLP and reinforcement learning-based models as well. Here are a few examples:
- NLP: The widely used language model BERT has been found to rely on superficial cue words. For instance, it learned that within a dataset of natural language arguments, detecting the presence of ‘not’ was sufficient to perform above chance in finding the correct line of argumentation.
- Reinforcement learning: When an RL agent was tasked to play Tetris, instead of learning how to play the game, the algorithm learned to pause it to evade losing. There is also a curious case of reward tampering emerging in RL systems. The agents in an RL environment may have found a way to obtain a reward without doing the task. This is also called reward hacking.
Avoiding Shortcut Learning
The authors, in their work, double down on the importance of making OOD generalisation tests a standard practice. The ability to distinguish between data that is anomalous or significantly different from that used in training is key to deploying a successful ML model. This is particularly important for deep neural network classifiers, which might classify such out-of-distribution (OOD) inputs into in-distribution classes with high confidence. This is critically important when these predictions inform real-world decisions.
Avoiding reliance on unintended cues, wrote the authors, can be achieved by designing architectures and data-augmentation strategies that discourage learning shortcut features. For example, if the orientation of an object does not matter for its category, either data augmentation or hard-coded rotation invariance can be applied. This strategy can be applied to almost any well-understood transformation of the inputs and finds its probably most general form in auto-augment as an augmentation strategy.
The authors conclude by warning about the pitfalls of sustained shortcut learning in ML systems. They firmly believe that the presence of shortcut learning adds to the growing challenges of establishing a fair and robust ML ecosystem.
Know more about how to handle shortcut learning here.
If you loved this story, do join our Telegram Community.
Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
What's Your Reaction?
I have a master's degree in Robotics and I write about machine learning advancements. email:firstname.lastname@example.org