MITB Banner

How DeepMind’s Reinforcement Agent Dreamer Can Predict The Future 

Share

Among the various applications of reinforced learning, DeepMind is now using RL to train its new reinforcement agent, Dreamer to learn to predict the actions of an object based on its current state. In short: its immediate future.

Reinforcement learning is that area of machine learning where an agent takes suitable actions in order to get maximum rewards. 

Generally, a reinforced learning agent learns complex behaviour from the learned world models through the high dimensional sensory data input. There are many other potential ways of deriving behaviours from them, and now, agent Dreamer is capable of solving the long-horizon tasks from images purely using latent imagination. Put in simple words; the agent selects future actions based on the input by imagining their long term outcome. The Dreamer agent will broadly impact the data efficiency and computation time as well as the final performance.

Meet Agent Dreamer

Every agent in reinforced learning follows a model in order to predict rewards from observations and actions. In this context, the agent Dreamer learns a latent dynamic model to predict a reward. The latent dynamic model here is a model that learns from the image input and performs planning action to gather new experience.

Quick Overview of the working

Using latent imagination is better compared to the traditional image prediction here because these latent states have a small memory footprint that enables parallel imagination of thousands of trajectories. The use of word latent is used to imply the short sequence of hidden or latent states; this allows non-existential representations of objects like positions and velocities.

The information from the input images is integrated into these latent states using an encoder component after which these hidden states project in future to anticipate images and rewards.

Dreamer completes a pendulum swing task. The middle shows 45-step predictions.

The Dreamer latent dynamic model is a multi-part complex structure. It has a representation model, transition model, reward model. The representation model encodes the actions and observations by the agent. The transition model anticipates the latent states without seeing the observations. The reward model rewards the model states. After anticipating the states, the action model aims to predict the action and solve the imagined environments within the learned policies. The action model’s achievements are estimated based on its actions, and a value model is there to determine the expected rewards. Then, finally, the feedback signals are provided by the observation model.

Above: Dreamer playing an Atari game (Boxing). The middle shows 45-step predictions.

Experimental Evaluation

The Dreamer agent was experimentally evaluated on a variety of control tasks. These are some of the experiments that were designed to compare the agent Dreamer to current best methods by the researchers:

Control Tasks: The Dreamer was evaluated and put through 20 visual control tasks of the DeepMind Control Suite. These control tasks pose many different challenges to the agent, including contact dynamics, sparse rewards and 3D scenes.

Implementation: DeepMind first trained the agent on Nvidia V100 graphics chip and ten processor cores for each training run. Wach training run took 9 hours per 10^6 environment steps. This time recorded was a lot quicker than Google’s PlaNet, which took 17 hours. The world models are run through reconstruction unless specified.

Performance: The Dreamer agent was compared to the state of the art reinforcement agents like D4PG and PlaNet. The Dreamer exceeds the performance in both environmental steps taken by D4PG and PlaNet’s data efficiency, hence proving that small quantity of experiences can help generalise world model. The Dreamer shows that learning behaviours from top methods in experience play are less efficient than the learning behaviours from the world models by latent imagination.

Outlook

With reinforcement learning becoming one of the most used parts of machine learning, now it has found its application in trying to predict the future with high dimensional inputs. The Dreamer using latent imagination means faster calculations and more efficient data processing for the real world. The potential of latent imagination is yet to be scaled in the future where the agents might be tested in an environment of more visual complexity. The researchers plan to present their work at NeurIPS 2019 in Vancouver this week.

Share
Picture of Sameer Balaganur

Sameer Balaganur

Sameer is an aspiring Content Writer. Occasionally writes poems, loves food and is head over heels with Basketball.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India