DeepMind Introduces A New Benchmark For Meta Reinforcement Learning

Recently, a team of researchers from DeepMind and University College London have released a principled benchmark for meta-reinforcement learning (meta-RL) research, known as Alchemy. The benchmark is a combination of structural richness and structural transparency.

As an approach for increasing the flexibility and sample efficiency of reinforcement learning, meta-reinforcement learning (meta RL) has picked up momentum in the last few years. Meta-RL is defined as any process which yields faster learning, on average, with each new draw from the task distribution. 

As per the researchers, unlike deep reinforcement learning that requires a task, meta-RL needs a task distribution — a large set of tasks with a shared structure. However, researchers often face challenges in this area such as: a scarcity of adequate benchmark tasks; ill-defined to support principled analysis, etc. The researchers came up with the new meta-RL benchmark to address these hurdles.


Sign up for your weekly dose of what's up in emerging technology.

Behind Alchemy

The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure. Alchemy is a 3D, first-person perspective video game implemented in the Unity game engine. According to the researchers, the benchmark was created to test the ability of agents to reason and plan via latent state inference, as well as useful exploration and experimentation.

Alchemy is highly structured and has non-trivial latent causal structure resampled every time the game is played. It requires knowledge-based experimentation and strategic action sequencing. The latent causal structure is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge.

The researchers stated, “Because Alchemy levels are procedurally created based on a fully accessible generative process with a well-defined parameterisation, we are able to implement a Bayesian ideal observer as a gold standard for performance.”

How It Works

The Alchemy environment is played in a series of ‘trials’, which fit together into ‘episodes’. Within each trial, the goal is to use a set of potions to transform each in a collection of visually distinctive stones into more valuable forms, collecting points when the stones are dropped into a central cauldron. Also, the value of each stone is tied to its perceptual features, but this relationship changes from episode to episode. Hence, the implicit challenge within each episode is to diagnose, within the available time, the current chemistry and thus leveraging this diagnosis to manufacture the most valuable stones possible. 

Benefits Of Alchemy

The researchers said Alchemy brings two desirable features:

  • Structural Interestingness: It demands experimentation, structured inference and strategic action sequencing
  • Structural Accessibility: Alchemy is conferred by its explicitly defined generative process, which furnishes an interpretable prior and supports the construction of a Bayesoptimal reference policy.

Wrapping Up

As a validation of the 3D environment, the researchers evaluated two powerful reinforcement learning agents on Alchemy and found that in both the cases, despite mastering the basic mechanical aspects of the task, neither agent showed any appreciable signs of meta-learning.

Alchemy proved to be a challenging benchmark for meta-RL and will be useful to the larger community. The researchers open-sourced both the full 3D and symbolic versions of the Alchemy benchmark environment, along with a suite of benchmark policies, analysis tools, and episode logs on GitHub.

To use this benchmark environment, one must require  Docker, Python 3.6.1 as well as an x86-64 CPU with SSE4.2 support. Also, the benchmark is intended to be run on Linux and is not officially supported on Mac and Windows  

Click here to install Alchemy.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM