“Intelligence, and its associated abilities, can be understood as subserving the maximisation of reward by an agent acting in its environment.”Paper by Silver et al.,
The objective of reinforcement learning, a sub-domain of AI, is to provide the underlying infrastructure for intelligence that arises as a result of seeking goals through rewards. These rewards are mathematical expressions in their most rudimentary form and nudge the algorithms towards the target. Rewards are sufficient to express a wide variety of goals but are they enough for creating artificial general intelligence(AGI)?
To address this, researchers Richard Sutton and David Silver, along with their peers from DeepMind, have recently published a paper titled, “Reward is enough”, where the authors propose a hypothesis on why maximising a reward function is enough for intelligence to emerge. Sutton and Silver are giants of reinforcement learning. And, their recent paper that repackages intelligence into a concept called “reward maximisation”, had garnered the curiosity of both the sceptics and futurists alike.
About Sutton’s & Silver’s Paper
Maximisation of reward for AGI might sound as teaching evolution for dummies, but the authors argue that intelligence and additional abilities can surface through this strategy. For example, DeepMind’s AlphaZero, which has pipped experts in the game of Go, faced issues when the abilities were innately integrated into a unified whole. The authors stated that maximising wins proved to be enough, in a simple environment such as Go, to drive behaviour exhibiting various specialised abilities. “We argue that maximising rewards in richer environments – more comparable in complexity to the natural world faced by animals and humans – could yield further, and perhaps ultimately all abilities associated with intelligence.”
The hypothesis looks at success or survival through the lens of reward maximisation. According to the authors, success, as measured by maximising reward, demands a variety of abilities associated with intelligence. “In such environments, any behaviour that maximises reward must necessarily exhibit those abilities. In this sense, the generic objective of reward maximisation contains within it many or possibly even all the goals of intelligence.”
In the reward-is-enough hypothesis, the researchers at DeepMind postulated that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward by an agent acting in its environment. For example, a squirrel acts so as to maximise its consumption of food (top, reward depicted by acorn symbol), or a kitchen robot acts to maximise cleanliness (bottom, reward depicted by bubble symbol) as shown above. To achieve these goals, complex behaviours are required that exhibit a wide variety of abilities associated with intelligence (depicted on the right as a projection from an agent’s stream of experience onto a set of abilities expressed within that experience).
Similarly, suppose artificial agents are capable of maximising a variety of reward signals in future environments. In that case, the authors believe that this can result in new forms of intelligence with abilities as distinct as laser-based navigation, communication by email, or robotic manipulation.
Throughout the paper, the authors stressed that most of these abilities, be it in a squirrel or a kitchen robot, serve a singular goal of maximising the agent’s reward within its environment. According to the authors, pursuing one goal may generate complex behaviour that exhibits multiple abilities associated with intelligence. Indeed, such reward-maximising behaviour may often be consistent with specific behaviours derived from the pursuit of separate goals associated with each ability. The squirrel-brain must be equipped with the abilities to identify good food(nuts in this case) and use it as a threshold to perform locomotion, caching of the nuts and even have a checkpoint to locate the inventory for future use. This is intelligence in a nutshell! The behaviour of the squirrel may be understood as maximising a cumulative reward towards the minimisation of hunger. “Reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation, and imitation,” explained the authors.
“This hypothesis may startle because the sheer diversity of abilities associated with intelligence seems to be at odds with any generic objective,” warned the authors. And, it did startle quite a few. Samim Winiger, an AI researcher in Berlin, told CNBC that DeepMind’s “reward is enough” view is a “somewhat fringe philosophical position, misleadingly presented as hard science.” According to Winiger, the path to general AI has countless challenges and finds statements such as “all you need is reward” to be “grandiose” and “totalitarian”.
The criticism is understandable given the reputation of the authors and the redundancy of the premise they are exploring. Moreover, the theory of intelligence eludes many experts of the field. Rewards need not trigger few abilities. They can be embedded in the organism thanks to millions of years of the evolutionary lottery. One argument in favour of reward maximisation is that it is the best shot we have got at AGI. One can explain the success of agents in perception, generalisation and even imitation through the lens of reward maximisation. Now the question is, how deep does the tree of rewards run? Can the reward that rationalises an act also reward the agent for learning the ability to perform that act? Will the machines have enough processing power to consider all these branches of rewards in real-time without getting baked?