DeepMind has been trying to bridge the gap between AI and biology for quite some time now. All their endeavours revolve around solving the problem of intelligence in machines. The straightforward trivial tasks for humans can be very, very sophisticated and almost for devices.
While human brains are hardcoded with millions of years of learning, the machines have many limitations when it comes to data. They can be fed with data that has been documented or prepared by humans, the magnitude of which is historically insignificant when compared to humans. But that didn’t discourage the researchers at DeepMind from carrying out experiments to decode how the brain works and investigate what implications these findings would have for the algorithms.
In late 2018, DeepMind stunned the world with their research on protein fold. Today, they are back with something even more luring. According to the blog post, DeepMind claims that they have figured out how to code dopamine. In other words, they have demystified the functioning of dopamine, the hormone that drives human actions.
In the context of reinforcement learning, this can be thought of as agents working on rewards. In the case of humans, future rewards are also predicted (aka expectations) and use that to guide the current activity.
Study hard. Get good grades, and to get good grades, you need to study hard.
Though the fallacy of the above statement is beyond the scope of this article, it is one of the most popular strategies that would paint a picture on how we, humans, rely on reward prediction.
Up there in that three-pound organ, these rewards are found to be directly related to the dopamine releases in the midbrain section.
So, keeping this fact in hindsight, DeepMind with the help of Harvard labs, analysed dopamine cells in mice and recorded how the mice received rewards while they learn a task. They then checked these recordings for how consistent the activity of the dopamine neurons is with standard temporal difference algorithm.
Reinforcement Learning On Dopamine
via DeepMind blog
In the real world, the amount of future reward that will result from a particular action is not a perfectly-known quantity but instead involves some randomness.
This is where the researchers thought of drawing parallels between distributed reinforcement learning algorithms and dopamine-related activities.
But why are distributional reinforcement learning algorithms?
Although this is still an active topic of research, a key ingredient is that learning about the distribution of rewards gives the neural network a more powerful signal for shaping its representation in a way that’s robust to changes in the environment or changes in the policy.
Dopamine cells, according to this blog, change their firing rate to indicate a prediction error – that is, if an animal receives more or less reward than it expected.
For each dopamine cell, the researchers determined the reward size for which it didn’t change its baseline firing rate. They call this the cell’s “reversal point”.
They then investigated if these reversal points were different between cells. They used this to show the marked differences between cells, with some cells predicting huge amounts of reward, and other cells predicting very little reward.
This reconstruction relied on interpreting the firing rates of dopamine cells as the reward prediction errors of a distributional temporal difference(TD) model and performing inference to determine what distribution that the model had learned.
The researchers believe that this work would raise a bunch of important questions like:
- What are the effects of optimistic and pessimistic dopamine neurons on the brain?
- What is the relation between representations done in brain and distribution learning?
- How does the distribution of rewards and this representation use downstream?
- How does dopamine cells relate to other known forms of diversity in the brain?
So far, AI has been blamed for dumbing down the neuroscience with its analogy on neural networks. Now, AI researchers put some credibility back in the analogy by directly relating artificial algorithms with natural processes.
Read more about this work here.