Mastering Atari with Discrete World Models: DreamerV2

Discrete World model DreamerV2

In alliance with Deep Mind and the University of Toronto, Google has released DreamerV2, the very first Reinforcement Learning agent that achieves human-level Atari performance. The paper was released under this name: Mastering Atari with Discrete World Models by Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba. Reinforcement Learning methods have made quite a progress in a short time. These approaches have successfully beaten their respective world champions by using model-free learning methods to model-based methods. 

DreamerV2, a model-based method in which the agent predicts the output of the potential actions performed to make informed decisions for a new scenario. The proposed method uses the Dreamer agent from DreamerV1 with a bit of adjustification. Using a single GPU and a single environment instance, DreamerV2 outperforms top model-free single-GPU agents within the same computational budget and training time. 

The Model Architecture of DreamerV2

DreamerV2 consists of 3 components mainly:

  1. Learn a world model from the dataset of past experience.

DreamerV2 is built upon the Recurrent State-Space Model(RSSM), the backbone of this step. The training data is encoded using CNN, where each image is changed into a stochastic representation(z1 – z3) and is further stored in a recurrent state(h1 – h3). With recurrent state and stochastic representations, the model tries to reconstruct the model’s same image to learn general representations. And predict reward based on the actions(a1-a2) performed.

Unlike DreamerV1 agent, DreamerV2 agent represents each image with categorical variables(for multimodal distribution) instead of using normal(continuous) variables and that’s why this model is named as Discrete World Model. The above encoder converts each image into 32 distributions over 32 classes, and the world model itself learns these distributions and classes. The one-hot vectors are then sampled from the distribution generated and are added to a sparse representation, which the model passes to a recurrent state. To backpropagate through the samples, we use straight-through gradients that are easy to implement using automatic differentiation. The second difference that DreamerV2 has is its loss function. It uses KL balancing, which trains the prior(prediction) and regularizes how much information the posterior(stochastic representations) incorporates from the image. The regularization increases robustness to novel inputs. It also encourages reusing existing information from past steps to predict rewards and reconstruct images, thus learning long-term dependencies.

  1. Learn an actor and critic from imagined sequences of compact model states. To learn the predictions from an observation, DreamerV2 uses actor-critic learning for imagination.
  1. and execute the actor in the environment to grow the experience dataset.

More details about its architecture can be found here.

Performance of DreamerV2

The picture shown below is the predictions of a model world, DreamerV2. The top row represents the episode of a game, and the bottom row contains the predictions from the DreamerV2 model.

Requirements & Installation

Install all the dependencies of the proposed method via pip.

 %%bash
 pip install --user tensorflow==2.3.1
 pip install --user tensorflow_probability==0.11.1
 pip install --user pandas
 pip install --user matplotlib
 pip install --user ruamel.yaml
 pip install --user 'gym[atari]' 

Clone the repository through git.

 !git clone https://github.com/danijar/dreamerv2.git
 %cd dreamerv2 

Train your Dreamer

Train the dreamerV2 model on a single GPU on Colab Notebook. The code is given below:

 !python dreamer.py --logdir ~/logdir/atari_pong/dreamerv2/1 \
     --configs defaults atari --task atari_pong 

You can further monitor the results by using tensorboard.

%tensorboard --logdir ~/logdir

Generate plots by :

!python plotting.py --indir ~/logdir --outdir ~/plots --xaxis step --yaxis eval_return --bins 1e6

Conclusion

In this article, we have given a short introduction of DreamerV2 model, the very first model-based reinforcement learning algorithm that achieves human-level performance on Atari benchmark and outperforms many model-free methods.

Reference material are as follows:

More Great AIM Stories

Aishwarya Verma
A data science enthusiast and a post-graduate in Big Data Analytics. Creative and organized with an analytical bent of mind.

More Stories

OUR UPCOMING EVENTS

8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

MORE FROM AIM
Yugesh Verma
All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges

Yugesh Verma
A beginner’s guide to Spatio-Temporal graph neural networks

Spatio-temporal graphs are made of static structures and time-varying features, and such information in a graph requires a neural network that can deal with time-varying features of the graph. Neural networks which are developed to deal with time-varying features of the graph can be considered as Spatio-temporal graph neural networks. 

Vijaysinh Lendave
How to Evaluate Recommender Systems with RGRecSys?

A recommender system, sometimes known as a recommendation engine, is a type of information filtering system that attempts to forecast a user’s “rating” or “preference” for an item. In this post, we will look at RGRecSys, a library that performs constraint evaluation of recommender systems.

Yugesh Verma
A guide to explainable named entity recognition

Named entity recognition (NER) is difficult to understand how the process of NER worked in the background or how the process is behaving with the data, it needs more explainability. we can make it more explainable.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM