Among the various applications of reinforced learning, DeepMind is now using RL to train its new reinforcement agent, Dreamer to learn to predict the actions of an object based on its current state. In short: its immediate future.
Reinforcement learning is that area of machine learning where an agent takes suitable actions in order to get maximum rewards.
Generally, a reinforced learning agent learns complex behaviour from the learned world models through the high dimensional sensory data input. There are many other potential ways of deriving behaviours from them, and now, agent Dreamer is capable of solving the long-horizon tasks from images purely using latent imagination. Put in simple words; the agent selects future actions based on the input by imagining their long term outcome. The Dreamer agent will broadly impact the data efficiency and computation time as well as the final performance.
Meet Agent Dreamer
Every agent in reinforced learning follows a model in order to predict rewards from observations and actions. In this context, the agent Dreamer learns a latent dynamic model to predict a reward. The latent dynamic model here is a model that learns from the image input and performs planning action to gather new experience.
Quick Overview of the working
Using latent imagination is better compared to the traditional image prediction here because these latent states have a small memory footprint that enables parallel imagination of thousands of trajectories. The use of word latent is used to imply the short sequence of hidden or latent states; this allows non-existential representations of objects like positions and velocities.
The information from the input images is integrated into these latent states using an encoder component after which these hidden states project in future to anticipate images and rewards.
Dreamer completes a pendulum swing task. The middle shows 45-step predictions.
The Dreamer latent dynamic model is a multi-part complex structure. It has a representation model, transition model, reward model. The representation model encodes the actions and observations by the agent. The transition model anticipates the latent states without seeing the observations. The reward model rewards the model states. After anticipating the states, the action model aims to predict the action and solve the imagined environments within the learned policies. The action model’s achievements are estimated based on its actions, and a value model is there to determine the expected rewards. Then, finally, the feedback signals are provided by the observation model.
Above: Dreamer playing an Atari game (Boxing). The middle shows 45-step predictions.
Experimental Evaluation
The Dreamer agent was experimentally evaluated on a variety of control tasks. These are some of the experiments that were designed to compare the agent Dreamer to current best methods by the researchers:
Control Tasks: The Dreamer was evaluated and put through 20 visual control tasks of the DeepMind Control Suite. These control tasks pose many different challenges to the agent, including contact dynamics, sparse rewards and 3D scenes.
Implementation: DeepMind first trained the agent on Nvidia V100 graphics chip and ten processor cores for each training run. Wach training run took 9 hours per 10^6 environment steps. This time recorded was a lot quicker than Google’s PlaNet, which took 17 hours. The world models are run through reconstruction unless specified.
Performance: The Dreamer agent was compared to the state of the art reinforcement agents like D4PG and PlaNet. The Dreamer exceeds the performance in both environmental steps taken by D4PG and PlaNet’s data efficiency, hence proving that small quantity of experiences can help generalise world model. The Dreamer shows that learning behaviours from top methods in experience play are less efficient than the learning behaviours from the world models by latent imagination.
Outlook
With reinforcement learning becoming one of the most used parts of machine learning, now it has found its application in trying to predict the future with high dimensional inputs. The Dreamer using latent imagination means faster calculations and more efficient data processing for the real world. The potential of latent imagination is yet to be scaled in the future where the agents might be tested in an environment of more visual complexity. The researchers plan to present their work at NeurIPS 2019 in Vancouver this week.