Listen to this story
Encountering bugs in a model or software you have been building for the last few days, weeks, or even months is probably one of the most disheartening things. Developers have to trace back the whole system to find out the bug or trap that can be as small as a missing colon.
But what if the model or algorithm you are working with is too resistant to bugs? This was an issue about the deep learning architecture using gradient descent that was pointed out by several researchers from Meta, Nvidia, and Microsoft, and which sparked a debate whether it is a feature or a problem.
Gradient descent is an optimising algorithm that improvises the training of machine learning models by minimising errors between expected and actual results. Further, it is also used to train Neural Networks which is a network consisting of interconnected neurons having bias, weight, and an activation function.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
“Machine learning is the process by which bugs become features,” said Thomas Dietterich.
Machine learning researcher at Nvidia, J. F. Puget found out that his under development model was producing great results even when there were major bugs in the pipeline. Muzaffer Kal replied saying that deep learning algorithms work around the bugs. This automatic tackling of the bugs might be desirable by the users, but the model developers might find it an annoying problem as it is hard to figure out where the code went wrong or if it even did.
Another angle to this discussion was given by several users stating that deep learning is powerful because robustness to bugs or errors is a desirable feature for developers, or otherwise they would shift towards other solutions. As an addition, developers should build models for debugging and fixing codes.
Former lead of the ML Foundations team at Microsoft Research, Sebastien Bubeck asked the Twitter community about how gradient descent is so immune to traps and bugs. Yann LeCun, Chief AI scientist at Meta replied to Bubeck saying that this can be regarded as a property of the model to tolerate fault and stop the model from disengagement. Puget said that it should be considered an error since it stops developers from achieving the best performance within the time and compute budget.
Twitter user alth0u said that in this era of machine learning, engineers new to the field are also developing working models—even with bugs in the system. As Andrej Karpathy from Tesla said in 2017, “Gradient descent can write code better than you. I’m sorry.”
The confusion arises in gradient descent problems when the bugs or problems are often solving other issues in the model. For example, Aurelien Lucchi from ETH Zurich said that the saddle point issue—a point on the surface of loss function which looks like a minimum from one dimension and maximum from another—gaussian noise can remove irregularities from the landscape, though it is actually a bug.
Bubeck talks about how an overparameterized model remains differentiable and still offers great generalization in other parameters and produces valid solutions. Several Twitter users replied with mixed responses about considering the “coping up” with bugs, something to be concerned about or just a correction to a developer’s mistake.
Puget further adds to his side by explaining how oftentimes after fixing the bugs, the end product gets degraded as the pipeline gets automatically tuned according to the bug. This forces developers to tune the pipeline again or work on the project from scratch. Oriol Vinyals of DeepMind coined the word—“BugPropagation”.
There are several problems that arise with overlooking bugs, for instance developers do not know if their developed model is efficient enough. Also, since the bugs remain undetectable, training the models is unstable and requires fixing without actually understanding the core problem. Many developers use overparameterized models that yield results because of being pointed in the right direction with gradient descent, but are filled with bugs and errors.
Users of several models might not even infer the bugs present in them but developers struggle with the problem. CS researchers have been figuring out how to build resilient systems to at least track bugs easily without altering the entire pipeline. Apart from reviewing the entire code again, developers can try a few alternate methods to find the bug:
- Overfit the model to a small dataset for proving that the network can achieve low train error.
- Visualise activation functions or visualise weights—An activation function decides whether the input data is important for the process or not.
In December 2021, Microsoft Research released an AI tool called BugLab that could understand code and figure out specific bugs in the system. The models were trained by making them play “hide-n-seek”, meaning that one model had bugs intentionally installed in it, and the other would detect it. Although the tool repairs code with human level accuracy, it still only identifies bugs it is trained on and not arbitrarily complex ones.
To read more about gradient descent in deep learning, click here.