MITB Banner

DeepMind Launches Evaluation Suite For Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning explores how artificial agents interact with one another and their environment.

Share

Melting Pot

The huge gap between the environment technology is developed and the environment it is deployed in could explain why most technology fails. The generalisation is difficult to achieve, especially in the case of multi-agent systems. Two main reasons for this are physical environment variation and social environment variation. While the former has been studied extensively, the social environment variation challenge has been ignored.

Multi-agent reinforcement learning explores how artificial agents interact with one another and their environment. This class of algorithms would benefit from social generalisation abilities. However, there has been no systematic evaluation benchmark for assessing such abilities.

To this end, DeepMind has introduced a scalable evaluation suite for multi-agent reinforcement learning called Melting Pot. 

What is Melting Pot?

Melting Pot is a new evaluation technique that assesses generalisation to novel situations that consist of known and unknown individuals. It can test a broad range of social interactions such as cooperation, deception, competition, trust, reciprocation, stubbornness, etc.

Unlike multi-agent reinforcement learning (MARL) that lacks a broadly accepted benchmark test, single-agent reinforcement learning (SARL) has a diverse set of benchmarks suitable for different purposes. Further, MARL has a relatively less favourable evaluation landscape compared to other machine learning subfields.

Credit: DeepMind

Melting pot offers a set of 21 MARL multi-agent games or ‘substrates’ to train agents on and more than 85 unique test scenarios for evaluating these agents. 

A central equation– Substrate+Background Population=Scenario–captures the true essence of the Melting pot technique. The term substrate refers to a partially observable general sum Markov game; a Melting Pot substrate is a game of imperfect information that each player possesses which is unknown to their co-players. It includes the layout of the map, how objects are located, and how they move. The term background population is the part of the simulation that has agency while excluding the focal population of the agents being tested. Finally, a scenario is a multi-agent environment used only for testing and not for agents to be trained in.

The Melting Pot research assesses and compares multi-agent reinforcement learning algorithms and is only concerned with test-time evaluation. Meaning, a developer has access to each test’s substrate but is not allowed to dictate how to use it.

The Melting Pot suite contains a collection of zero-shot test scenarios that save a similar substrate and substitute an unfamiliar background population. The DeepMind team has included purely competitive games, team-based competitive games, different kinds of mixed motivations, coordination games, and games of pure common interest. The number of players in each game range from two to 16. For the experiment, the researchers provided benchmark results on Melting Pot for different MARL models and found maximising collective reward produced policies that are less robust to novel social situations. However, the reverse is true for policies obtained by maximising individual rewards.

The agents’ performance on the test scenarios answers the following questions:

  • Do they perform well across social situations where individuals are interdependent?
  • Do they interact effectively with unfamiliar individuals not observed during training?
  • Do they pass a universalisation test?

Answers to these questions can be used to rank different multi-agent reinforcement algorithms by their ability to generalise.

Wrapping up

Melting pot is an open-source project. The team has used reinforcement learning to reduce human labour on the environment design. One demonstration of this was the creation of 85 different scenarios. The effectiveness of bots in test scenarios is enhanced by improving the performance of learning systems. Melting Pot will be further improved by incorporating the latest agent technology into new background population and test scenarios. “We hope Melting Pot will become a standard benchmark for multi-agent reinforcement learning. We plan to maintain it, and will be extending it in the coming years to cover more social interactions and generalisation scenarios,” the team said in a blog.
Read the full paper here.

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.