In the early 90s, a bunch of experiments were performed to find out how a living organism learns a task. In one such experiment, rats were made to run along the length of a single corridor or circular track, so researchers could easily determine which neuron coded for each position within the corridor.
To the surprise of the researchers, these marked cells also appeared to be firing in the rest state as well. During rest, the cells sometimes spontaneously fired in rapid sequences demarking the same path the animal ran earlier, but at a greatly accelerated speed. These sequences are called replay.
To test the significance of replaying sequences in learning a task, the researchers disturbed the brain activity during these replay events and found that the technique of experience replay allows the agent to repeatedly rehearse previous interactions, making the most of each interaction.
By measuring memory retrieval directly in the brain, neuroscientists have noticed something remarkable: spontaneous recollections, measured directly in the brain, often occur as very fast sequences of multiple memories. These so-called 'replay' sequences play out in a fraction of a second–so fast that we're not necessarily aware of the sequence.
How Significant Is Learning Through Replay
Imagine walking into a garden and coming across an apple on the ground, under a single apple tree. The fact that it has fallen from that tree is obvious and the possibility of the apple coming out of the ground doesn’t even occur to our mind. The reason why the apple doesn’t fly back to the tree is beyond the scope of this article.
The obvious connections between the tree and the fruit are made by an average human. Replay doesn’t literally rehearse events in the order they were experienced. Instead, it infers or imagines the real relationships between events, and synthesises sequences that make sense given an understanding of how the world works.
In AI terms, these replay sequences are generated using a learned model of the environment.
The imagination theory makes a different prediction about how replay will look: when you rest on the couch, your brain should replay the sequence "tree, apple, ground". You know from past experience that apples are more likely to fall from a tree than show up from the ground–and this knowledge can be used to reorganise experience into a more meaningful order.
In deep RL, the large majority of agents have used movie-like replay, because it is easy to implement (the system can simply store events in memory, and play them back as they happened).
Meanwhile, in neuroscience, classic theories of replay postulated that movie replay would be useful to strengthen the connections between neurons that represent different events or locations in the order they were experienced.
The most compelling observation is that even when rats only experienced two arms of a maze separately, subsequent replay sequences sometimes followed trajectories from one arm into the other.
To take this replay experiment to the next level, DeepMind in collaboration with Oxford and UCL have made few experiments. In these experiments, the subjects are shown a few scenes and then shown the scrambled sequence and then given five minutes to rest, while sitting in an MEG (magnetoencephalography) brain scanner.
To find how the brain builds sequences, the researchers played a new sequence for participants. In this sequence, they walk into your factory and see spilt oil on the floor. They then see a knocked over an oil barrel. Finally, they turn to see a guilty robot.
Common sense implies that the robot has spilt the oil in spite of not witnessing the act itself. This inference might come from the experiences such as those of having a kid or a dog at the house who spills water and then stands in a corner.
The researchers found the use of some abstract codes between the two narratives. These abstract codes, which incorporate the conceptual knowledge that lets people unscramble the sequences, may help the brain to retrieve the correct item for the next slot in the replay sequence.
This paints an interesting picture of a system where the brain slots new information into an abstract framework built from past experiences, keeping it organised using precise relative timings within very fast replay sequences.
Can Machines Benefit From Replaying Experience?
Research into experience replay has unfolded along parallel tracks in artificial intelligence and neuroscience, with each field providing ideas and inspiration for the other.
Incorporating replay on computers has been beneficial to advancing AI. Deep learning often depends upon a ready supply of large datasets. In reinforcement learning, these data come through direct interaction with the environment, which takes time.
The technique of experience replay allows the agent to repeatedly rehearse previous interactions, making the most of each interaction.
Experience Replay is originally proposed in Reinforcement Learning for Robots Using Neural Networks in 1993. Experience Replay stores experiences including state transitions, rewards and actions, which are necessary data to perform Q learning.
Q-learning is a model-free reinforcement learning algorithm. The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.
This method proved crucial for combining deep neural networks with reinforcement learning in the DQN agent that first mastered multiple Atari games.
Deep Q-Network(DQN) is introduced in 2 papers, Playing Atari with Deep Reinforcement Learning on NIPS in 2013 and Human-level control through deep reinforcement learning in 2015.
Since the introduction of DQN, the efficiency of replay has been improved by preferentially replaying the most salient experiences from memory, rather than simply choosing experiences at random for replay.
Further improvements in agent performance have come from combining experiences across multiple agents, learning about a variety of different behaviours from the same set of experiences.
Experiments such as the above show that a carefully constructed AI algorithm can reach superhuman performance and the potential of AI to enter the coveted arena of AGI looks higher than ever before.
Read the original paper here.
Register for our upcoming events:
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- WEBINAR: HOW TO BEGIN A CAREER IN DATA SCIENCE | 24th Oct
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad