“PonderNet tries to find a sweet spot between training prediction accuracy, computational cost, and generalisation.”
As humans, we think many times before speaking our thoughts out loud. But, can we expect the same from machines? Last week, Deepmind introduced PonderNet, a new algorithm that allows artificial neural networks to learn to think for a while before answering. Halting to think is something very familiar to humans. In machines, the target is always to pick the most optimised route in less time using lesser compute. This new model by DeepMind answers a more fundamental problem by introducing halting steps into the model. Most machine learning algorithms, wrote the researchers at DeepMind, do not adjust their computational budget based on the complexity of the task they are learning to solve. Arguably, such adaptation is made manually by the machine learning practitioner. This adaptation is known as pondering.
Modern-day AI architectures are specialists. A two-dimensional residual network may be a good choice for processing images, but at best, it’s a loose fit for other kinds of data — such as the Lidar signals used in self-driving cars or the torques used in robotics. Whereas the architecture of PonderNet has a new component — a halting node that predicts the probability of halting conditional on not having halted before. The overall likelihood of halting at each step is computed as a geometric distribution. PonderNet is not regularised to minimise the number of computing steps explicitly but to incentivise exploration instead. According to the researchers, PonderNet is probabilistic in terms of the number of computational steps and the prediction produced by the network.
To validate the performance of PonderNet, the researchers picked three premises:
- Parity from Adaptive Compute paper
- bAbI Q&A dataset
- Paired Associative Inference
PonderNet builds on previous works such as Adaptive Compute while addressing their shortcomings. The above image illustrates the performance of PonderNet on the parity task.
Top: accuracy for both PonderNet(blue) and ACT(orange).
Bottom: number of ponder steps at evaluation time.
Error bars calculated over 10 random seeds. The researchers demonstrated on the parity task that a neural network equipped with PonderNet can increase its computation to extrapolate beyond the data seen during training.
With regards to the bAbI question answering dataset, which consists of 20 different tasks, this task was chosen as it proved to be difficult for a standard neural network architecture. According to the researchers, PonderNet can match the state of the art results, but it achieves them faster and with a lower average error. The comparison with Universal Transformer is interesting as it uses the same transformer architecture as PonderNet, but the compute time is optimised with ACT. Interestingly, to solve 20 tasks, Universal Transformer takes 10,161 steps, whereas in the case of PonderNet, only 1,658, implying PonderNet uses less compute
When tested on the Paired associative inference task (PAI), which is designed to capture the essence of reasoning, the results show that it benefits from the addition of adaptive computation.
PonderNet optimises a novel objective function that combines prediction accuracy with a regularisation term that incentivises exploration over the pondering time. The methods used in this work achieved the highest accuracy in complex domains such as question answering and multi-step reasoning. Additionally, the experiments show that enabling neural networks to adapt their computational complexity also benefits their performance (beyond the computational requirements) when evaluating outside of the training distribution, which is one of the limiting factors when applying neural networks for real-world problems.
So far, definitions around general intelligence machines have been restricted to concepts such as reward maximisation in reinforcement learning. It is thought to be the best shot we have at AGI. One can explain agents’ success in perception, generalisation and even imitation through the lens of reward maximisation. However, there are doubts on whether the reward that rationalises an act also rewards the agent for learning to perform that act and if the machines will have enough processing power.
With PonderNet, the researchers have taken the road less taken. It enables neural networks to adapt their computational complexity to the task they are trying to solve. Neural networks achieve state of the art in a wide range of applications including natural language processing, reinforcement learning, computer vision, and more. Currently, they require much time, expensive hardware and energy, to train and deploy. They also often fail to generalise and extrapolate to conditions beyond their training. PonderNet expands the capabilities of neural networks by letting them decide to ponder for an indefinite amount of time. This can be used to reduce the amount of compute and energy at inference time, which makes it particularly well suited for platforms with limited resources, such as mobile phones.
“We believe that biasing neural network architectures to behave more like algorithms, and less like “flat” mappings, will help develop deep learning methods to their full potential,” concluded the researchers.
Read the original Pondernet paper here.