Listen to this story
This year at the NeurIPS conference, AI pioneer Geoffrey Hinton and his students from the University of Toronto were awarded with the Test of Time Award for their benchmark paper titled, ‘ImageNet Classification with Deep Convolutional Neural Networks’ which was published a decade ago. The paper showed that Hinton et al had created the first deep convolutional neural network which was able to demonstrate state-of-the-art results on the ImageNet database. The research was a leap into deep learning and started a revolution in image classification and detection.
This year, in his keynote speech at the conference, Hinton discussed another new research paper in front of the NeurIPS crowd. Titled, ‘The Forward-Forward Algorithm: Some Preliminary Investigations’, the paper was based on what the future of machine learning may look like if backpropagation was replaced. Dubbing it as the Forward-Forward algorithm, the study could potentially spark the beginnings of another revolution in deep learning.
History of Backpropagation
Deep learning has dominated machine learning over the past decade with little being questioned about the effectiveness of performing stochastic gradient descent with a massive number of parameters and huge amounts of data. These gradients are normally computed using backpropagation, a technique that Hinton himself popularised.
Introduced initially in the 1960s, backpropagation re-emerged almost 30 years later after Hinton alongwith Rumelhart and Williams published the paper titled, ‘Learning representations by back-propagating errors’. Soon enough, the algorithm became the most fundamental building block in neural networks—if deep learning was the body, backpropagation was the spine.
Backpropagation was used to train neural networks through a method called chain rule. In layman terms, after each forward pass in the network, backpropagation did a backward pass while adjusting the model’s parameters like the weights and biases. This repetitive process reduces the difference between the actual output vector of the network compared to the desired output vector. Essentially, backpropagation takes the error associated with the wrong guess made by the neural network and uses that error to adjust the network’s parameters in the direction of less error.
What is wrong with Backpropagation?
Even with the prevalence of backpropagation, it is not without its flaws. If neural networks mimic the working of a human brain, backpropagation doesn’t really fit in with how the brain actually works. Hinton argues in his paper that the cortex in the human brain does not explicitly propagate errors or store information for later use in a subsequent backward pass. Backpropagation works in a bottom-up direction as opposed to the top-down direction in which the visual system actually works.
The brain instead makes loops in which the neural activity moves through about half a dozen layers in the cortex before coming back to where it began. The brain deals with the constant stream of sensory data without frequent time offs by arranging the sensory input in a pipeline and puts it through various stages of sensory processing. The data in the later stages of the pipeline may give top-down information that eventually goes on to influence the earlier stages in the pipeline. But, the brain continuously infers from input and keeps learning in real time without pausing for backpropagation.
Besides, backpropagation requires knowing the computation completely in the forward pass to come up with the correct derivatives. If there is a black box or any ‘noise’ in the forward pass, backpropagation becomes impossible.
Hinton’s proposed Forward-Forward Algorithm
According to Hinton, the Forward-Forward algorithm is a better representation of the human brain’s processes. The FF algorithm intends to replace backpropagation’s forward and backward passes with two forward passes that move in the same way but use different data and have opposite objectives—one adjusts weights to improve the goodness in every hidden layer and a negative pass that adjusts weights to deteriorate the goodness. So, the FF algorithm works in a push-and-pull manner by having high goodness for positive data and low goodness for negative data.
The study experimented using the CIFAR-10 images dataset, containing 50,000 training images, which is commonly used in research for computer vision and other ML algorithms. Hinton’s experiments found that the FF algorithm had a 1.4% test error rate on the MNIST dataset which is just as effective as backpropagation and the two are comparable on the CIFAR-10 dataset.
The FF algorithm, Hinton says, can potentially train neural networks with a trillion parameters only on a few watts of power making compute much lighter and training faster.
In Hinton’s closing speech at the conference, he also spoke about how the AI community ‘has been slow to realise the implications of deep learning for how computers are built’. Hinton said that, “What I think is that we’re going to see a completely different type of computer, not for a few years, but there’s every reason for investigating this completely different type of computer”. This union between software and hardware paradigms, Hinton suggested, would save computational power and the FF algorithm would be perfectly suited for this type of hardware.