Now Reading
Hands-on Alchemy: A Structured Task Distribution for Meta-Reinforcement Learning

Hands-on Alchemy: A Structured Task Distribution for Meta-Reinforcement Learning


This post is in continuation with our previous article about Alchemy, the very first benchmark on meta-Reinforcement Learning. Deepmind with the University of London has released an open-source benchmark environment for meta-RL :  Alchemy: A structured task distribution for meta-reinforcement learning by Jane X. Wang, Michael King, Nicolas Porcel, Zeb Kurth-Nelson, Tina Zhu, Charlie Deck, Peter Choy, Mary Cassin, Malcolm Reynolds, Francis Song, Gavin Buttimore, David P. Reichert, Neil Rabinowitz, Loic Matthey, Demis Hassabis, Alex Lerchner, and Matthew Botvinick.

Meta-Learning or Learning-to-learn requires a series of tasks that provide both complete knowledge of task(accessibility) and shared structure from the real-world point of view(interesting). Existing work on meta-RL offered either of interest ability or accessibility but Alchemy extended the boundaries by providing both. Alchemy is a single-player 3D video game, whose environment is implemented in Unity. Here is a demonstration video of how Alchemy works. 


  • Docker
  • Python >= 3.6.1
  • x86-64 CPU with SSE4.2 support
  • Alchemy is fully supported on Linux but can be run on Windows via WSL.


The following installation procedure is for Linux 20.04. 

  1. Install docker from here. Run commands 

docker run hello-world and docker run -d

To check the proper installation of the docker.

  1. Install Alchemy by cloning the git repository
 !git clone
 !pip install wheel
 !pip install --upgrade setuptools
 !pip install ./dm_alchemy 

Getting Started with Alchemy

First step is to create a 3D environment and observe it.

  1. Import all the required packages and modules.
 import os
 import matplotlib.pyplot as plt
 import numpy as np
 import seaborn as sns
 import dm_alchemy
 from dm_alchemy import io
 from dm_alchemy import symbolic_alchemy
 from dm_alchemy import symbolic_alchemy_bots
 from dm_alchemy import symbolic_alchemy_trackers
 from dm_alchemy import symbolic_alchemy_wrapper
 from dm_alchemy.encode import chemistries_proto_conversion
 from dm_alchemy.encode import symbolic_actions_proto_conversion
 from dm_alchemy.encode import symbolic_actions_pb2
 from dm_alchemy.types import stones_and_potions
 from dm_alchemy.types import unity_python_conversion
 from dm_alchemy.types import utils 
  1. Create an environment.
 #an image of size 240 X 200
 width, height = 240, 200
 #define the task, description can be found here:
 level_name = 'alchemy/perceptual_mapping_randomized_with_rotation_and_random_bottleneck'
 seed = 1023
 settings = dm_alchemy.EnvironmentSettings(
     seed=seed, level_name=level_name, width=width, height=height)
 env = dm_alchemy.load_from_docker(settings) 
  1. Check the observations via env.observation_spec() whose description can be found here.
  1. Alchemy provides many actions to the agent like to move left, right, up, down, etc, you can find the whole list here or via this command env.action_spec().
  1. Start a new episode and plot it.
 #plot the new episode
 timestep = env.reset()
  1. From the image above, it is visible that the person is far from the table so to get closer to it, let’s take some actions.
 for _ in range(38):
   timestep = env.step({'MOVE_BACK_FORWARD': 1.0})
 for _ in range(6):
   timestep = env.step({'LOOK_LEFT_RIGHT': -1.0})
 for _ in range(3):
   timestep = env.step({'LOOK_DOWN_UP': 1.0})

Now we can see the potions of different colours which can be used to transform stones to increase their value before placing them in the big white cauldron to get reward.

Create a Symbolic Environment : Representing the above challenge in symbolic form which eliminates the complex perceptual challenges, more difficult action space and longer timescales of the 3d environment. Full details of the symbolic environment can be found in the Appendix of the Research Paper.

  1. Create an environment.

env = symbolic_alchemy.get_symbolic_alchemy_level(level_name, seed=314)

  1. Check the observations via env.observation_spec(). Here the symbolic environment is concatenated into a single array of length 39 with 5 features for each stone. Details can be found here.
  1. Total actions provided by alchemy for symbolic link are 40 (0 – 39) where 0 represents nothing and other integers represents putting stone in a potion or cauldron.


  1. Start the observation with stone 0 being purple and small but in mid way it found round and pointy. Hence, its reward indicator shown -1
 timestep = env.reset()
  1. After step 2, symbolic representation can be found as :
 timestep = env.step(2)

If we put stone 0 into potion 0 and look at the resulting observation we can see that stone 0 has become purple, large and halfway between round and pointy and its reward indicator shows -3. From this an agent which understands the task and the generative process can deduce things.       

See Also
A Deep Reinforcement Learning Model Outperforms Humans In Gran Turismo Sport

Combined Environment    

This environment combines 3D environment and symbolic environment via wrapper around 3D environment. The wrapper listens to the events output and uses them to initialize and take actions in a symbolic environment so as to sync symbolic environment with 3D environment.

 seed = 1023
 settings = dm_alchemy.EnvironmentSettings(
     seed=seed, level_name=level_name, width=width, height=height)
 env3d = dm_alchemy.load_from_docker(settings)
 env = symbolic_alchemy_wrapper.SymbolicAlchemyWrapper(
     env3d, level_name=level_name, see_symbolic_observation=True) 

Here, the agent takes actions in the 3D environment and receives observations from the 3D environment. Apart from this, it can also receive observations generated from symbolic environments.

  1. Create a new episode using the same seed 
 timestep = env.reset()
  1. Take some actions in a 3D environment.
 for _ in range(37):
   timestep = env.step({'MOVE_BACK_FORWARD': 1.0})
 for _ in range(4):
   timestep = env.step({'LOOK_LEFT_RIGHT': -1.0})
 for _ in range(3):
   timestep = env.step({'LOOK_DOWN_UP': 1.0})
  1. Now, with symbolic observation, we see that the output matches the symbolic environment, even the color of the potions are the same.
 for i in range(0, 15, 5):
   print(timestep.observation['symbolic_obs'][i:i + 5])
 for i in range(15, 39, 2):
   potion_index = int(round(timestep.observation['symbolic_obs'][i] * 3 + 3))
   potion = stones_and_potions.perceived_potion_from_index(potion_index)

Additional Information about the combined environment can be found here.


In this post, we have covered the basic usage of Alchemy. Reference material can be found here: 

Note: The above implementation is verified on Linux (Ubuntu LTS 20.04) operating system.

What Do You Think?

Join Our Telegram Group. Be part of an engaging online community. Join Here.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top