DeepMind Is About To Change How Reinforcement Learning Works. Here’s How.

Reinforcement Learning

Google’s DeepMind made new advancement and changed the way how reinforcement learning works to complete tasks and receive maximum rewards. In its attempt to enhance the way RL works through experience, Google tried to resemble humans’ capability of prolifically engaging in mental “time travel”. This allows machine learning models to make decisions based on potential future outcomes.

The Motivation

Humans learn about various events and make decisions based on past experiences. The idea is to teach machines to do the same without even training with a hit-and-trial method on every occasion, which is what reinforcement learning does.

When humans do something, they realise if it was a right or wrong decision based on their past experiences. For one, if we place a glass too close to the edge of the table, we realise that it might accidentally hit the floor a moment later. Predicting such instance even before disasters occur is what makes humans superiors to machines. Over the years, we have obtained such cognitive intuitions to make better decisions. However, ML models always have to go through the hit-and-trial methodology for determining the best action.


Sign up for your weekly dose of what's up in emerging technology.

But what if we tell you that a machine might describe a long-term consequence before even going thought various experiences? This will enable humans to make better decisions for choosing appropriate careers, lifestyles and even monetary investments. This is where Google’s DeepMind team focused and innovated within a game.

Temporal Value Transport

DeepMind’s deep learning program is called Temporal Value Transport (TVT). It is a methodology to send lessons from the future. In other words, it assimilates the long-term consequence of various choices and makes the right decisions in the present. In a nutshell, they are gamifying memory to make informed actions.

However, this does not mean that they are creating a memory or recreating what happens in the human mind. Instead, they are offering a mechanical description of behaviour that can inspire models in neuroscience, psychology, and behavioural economics. The memory agent will use several objectives to learn, store, and retrieve a record of past states as a kind of memory.


Long-term credit allocation or discounted utility is the ability of people to identify the fruitfulness of actions based on its consequences. Such response and reward methodology are used in reinforcement learning, but it has numerous limitation as it does not make long-term correlations. 

A lot of learning happens in humans without the need for immediate reward or direct feedback. To replicate such human abilities, DeepMind uses TVT to send reward signal backwards from far away in the future as an alternative form of neural networks, thereby, creating a feedback loop. The researchers used Turing Neural Machine (NMT) that was created by DeepMind in 2014. Back then, it was deployed to make computer search memory records based on descent of gradient. However, in this research, they embraced it to retrieve memories of past actions. This technique uses NMT to handle storage and gathering memorise, hence the name Reconstructive Memory Agent (RMA).

The researchers also mentioned that such techniques had been adopted in the past to enhance the capabilities of reinforcement learning. But this is the first time that memories of past events have been coded. This is somewhat similar to the approach of encoding in a generative neural network through the variational automatic encoder. 


The result of the research has drawn attention throughout the world as this approach performed better than traditional RL models. However, as all of this was carried out in a game through simulation, one cannot expect the model to defy physics in the real world.

More Great AIM Stories

Rohit Yadav
Rohit is a technology journalist and technophile who likes to communicate the latest trends around cutting-edge technologies in a way that is straightforward to assimilate. In a nutshell, he is deciphering technology. Email:

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM