Overfitting is a common phenomenon in deep learning where a particular model learns the training data just too well. In other words, the model becomes useless if we try to do something that is a bit different from the training dataset, which is often the case in real-world problems. The machine learning engineers can neither afford too many errors nor zero errors. With increasing data and growing complexities, engineers often walk a tightrope.
So, a team from the University of Tokyo and their peers have come up with a novel yet simple mechanism, a mathematical expression that takes care of this problem of overfitting. They call this new method, ‘flooding’.
In their recent work, they have discussed in great detail about ‘flooding’ and how that can come in handy in improving the performance that degrades with overfitting.
Explaining about the approach of identifying whether overfitting is happening or not, the authors highlight two indicators:
- When both the training and test losses are decreasing, but the former is shrinking faster than the latter and
- When the training loss is decreasing, but the test loss is increasing
The objective of this work is to make the training loss float around a small constant value so that training loss never approaches zero. As the name suggests, flooding is analogous with filling an empty tank with water so that there is now a new surface between the atmosphere and the bottom of the vessel. In short, it is maintaining a threshold. In the case of training neural networks, flooding can be quantified as a value, a threshold that hints the algorithm when there is a breach.
The author’s idea here is to use this technique to force the training loss to become positive, which does not necessarily mean the training error will become positive, as long as the flooding level is not too large.
As illustrated in the picture above 3 different concepts related to overfitting are shown
- [A] — the generalisation gap increases, while training and test losses decrease
- [B] — the test loss starts to rise
- [C] — the training loss nears zero
And, all this leads to a decreasing test loss once again.
The algorithm of flooding is simple and can be written as follows:
Where J denotes learning objective and b > 0 is the flooding level specified by the user, and θ is the model parameter.
Flooding can be implemented using PyTorch as follows:
outputs = model(inputs)
loss = criterion(outputs, labels)
flood = (loss-b).abs()+b # flooding in just one line #
For the experiments, the authors have chosen six benchmark datasets: MNIST, Fashion-MNIST, KuzushijiMNIST, CIFAR-10, CIFAR-100, and SVHN and the results on these benchmark datasets show that flooding gives better accuracy for most cases. The other key findings from this work can be summarised as follows:
- Flooding prevents further reduction of the training loss when it reaches a reasonably small value, and the flooding level corresponds to the level of training loss that the user wants to keep.
- Flooding, when combined with early stopping, or with both early stopping and weight decay, may lead to even better accuracy in some cases.
- During flooding, the training loss will fluctuate below and above the flooding level. And the model will continue to “random walk” with the same non-zero training loss. Therefore, the authors expect it to drift into an area with a flat loss landscape that leads to better generalisation.
- Since setting up flooding requires domain expertise, the authors recommend treating the flooding level as a hyper-parameter.
- Flooding tends to be already as good as doing so with early stopping as the accuracy for flooding combined with early stopping is often close to early stopping.
Know more about this work here.