Recently, a team of researchers from DeepMind and University College London have released a principled benchmark for meta-reinforcement learning (meta-RL) research, known as Alchemy. The benchmark is a combination of structural richness and structural transparency.
As an approach for increasing the flexibility and sample efficiency of reinforcement learning, meta-reinforcement learning (meta RL) has picked up momentum in the last few years. Meta-RL is defined as any process which yields faster learning, on average, with each new draw from the task distribution.
Sign up for your weekly dose of what's up in emerging technology.
As per the researchers, unlike deep reinforcement learning that requires a task, meta-RL needs a task distribution — a large set of tasks with a shared structure. However, researchers often face challenges in this area such as: a scarcity of adequate benchmark tasks; ill-defined to support principled analysis, etc. The researchers came up with the new meta-RL benchmark to address these hurdles.
The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure. Alchemy is a 3D, first-person perspective video game implemented in the Unity game engine. According to the researchers, the benchmark was created to test the ability of agents to reason and plan via latent state inference, as well as useful exploration and experimentation.
Alchemy is highly structured and has non-trivial latent causal structure resampled every time the game is played. It requires knowledge-based experimentation and strategic action sequencing. The latent causal structure is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge.
The researchers stated, “Because Alchemy levels are procedurally created based on a fully accessible generative process with a well-defined parameterisation, we are able to implement a Bayesian ideal observer as a gold standard for performance.”
How It Works
The Alchemy environment is played in a series of ‘trials’, which fit together into ‘episodes’. Within each trial, the goal is to use a set of potions to transform each in a collection of visually distinctive stones into more valuable forms, collecting points when the stones are dropped into a central cauldron. Also, the value of each stone is tied to its perceptual features, but this relationship changes from episode to episode. Hence, the implicit challenge within each episode is to diagnose, within the available time, the current chemistry and thus leveraging this diagnosis to manufacture the most valuable stones possible.
Benefits Of Alchemy
The researchers said Alchemy brings two desirable features:
- Structural Interestingness: It demands experimentation, structured inference and strategic action sequencing
- Structural Accessibility: Alchemy is conferred by its explicitly defined generative process, which furnishes an interpretable prior and supports the construction of a Bayesoptimal reference policy.
As a validation of the 3D environment, the researchers evaluated two powerful reinforcement learning agents on Alchemy and found that in both the cases, despite mastering the basic mechanical aspects of the task, neither agent showed any appreciable signs of meta-learning.
Alchemy proved to be a challenging benchmark for meta-RL and will be useful to the larger community. The researchers open-sourced both the full 3D and symbolic versions of the Alchemy benchmark environment, along with a suite of benchmark policies, analysis tools, and episode logs on GitHub.
To use this benchmark environment, one must require Docker, Python 3.6.1 as well as an x86-64 CPU with SSE4.2 support. Also, the benchmark is intended to be run on Linux and is not officially supported on Mac and Windows
Click here to install Alchemy.