The huge gap between the environment technology is developed and the environment it is deployed in could explain why most technology fails. The generalisation is difficult to achieve, especially in the case of multi-agent systems. Two main reasons for this are physical environment variation and social environment variation. While the former has been studied extensively, the social environment variation challenge has been ignored.
Multi-agent reinforcement learning explores how artificial agents interact with one another and their environment. This class of algorithms would benefit from social generalisation abilities. However, there has been no systematic evaluation benchmark for assessing such abilities.
To this end, DeepMind has introduced a scalable evaluation suite for multi-agent reinforcement learning called Melting Pot.
What is Melting Pot?
Melting Pot is a new evaluation technique that assesses generalisation to novel situations that consist of known and unknown individuals. It can test a broad range of social interactions such as cooperation, deception, competition, trust, reciprocation, stubbornness, etc.
Unlike multi-agent reinforcement learning (MARL) that lacks a broadly accepted benchmark test, single-agent reinforcement learning (SARL) has a diverse set of benchmarks suitable for different purposes. Further, MARL has a relatively less favourable evaluation landscape compared to other machine learning subfields.
Melting pot offers a set of 21 MARL multi-agent games or ‘substrates’ to train agents on and more than 85 unique test scenarios for evaluating these agents.
A central equation– Substrate+Background Population=Scenario–captures the true essence of the Melting pot technique. The term substrate refers to a partially observable general sum Markov game; a Melting Pot substrate is a game of imperfect information that each player possesses which is unknown to their co-players. It includes the layout of the map, how objects are located, and how they move. The term background population is the part of the simulation that has agency while excluding the focal population of the agents being tested. Finally, a scenario is a multi-agent environment used only for testing and not for agents to be trained in.
The Melting Pot research assesses and compares multi-agent reinforcement learning algorithms and is only concerned with test-time evaluation. Meaning, a developer has access to each test’s substrate but is not allowed to dictate how to use it.
The Melting Pot suite contains a collection of zero-shot test scenarios that save a similar substrate and substitute an unfamiliar background population. The DeepMind team has included purely competitive games, team-based competitive games, different kinds of mixed motivations, coordination games, and games of pure common interest. The number of players in each game range from two to 16. For the experiment, the researchers provided benchmark results on Melting Pot for different MARL models and found maximising collective reward produced policies that are less robust to novel social situations. However, the reverse is true for policies obtained by maximising individual rewards.
The agents’ performance on the test scenarios answers the following questions:
- Do they perform well across social situations where individuals are interdependent?
- Do they interact effectively with unfamiliar individuals not observed during training?
- Do they pass a universalisation test?
Answers to these questions can be used to rank different multi-agent reinforcement algorithms by their ability to generalise.
Melting pot is an open-source project. The team has used reinforcement learning to reduce human labour on the environment design. One demonstration of this was the creation of 85 different scenarios. The effectiveness of bots in test scenarios is enhanced by improving the performance of learning systems. Melting Pot will be further improved by incorporating the latest agent technology into new background population and test scenarios. “We hope Melting Pot will become a standard benchmark for multi-agent reinforcement learning. We plan to maintain it, and will be extending it in the coming years to cover more social interactions and generalisation scenarios,” the team said in a blog.
Read the full paper here.