Intel AI has proposed and developed MERL (Multiagent Evolutionary Reinforcement Learning), a scalable, data-efficient method for training a team of agents to solve a coordination task jointly. In short, the agents can learn to not only maximise their reward and develop self-interested strategies, but take decisions that can benefit a team as a whole.
The advances in computer vision and reinforcement learning (RL) have improved perception and decision making – the two key aspects of autonomous systems. Recent advances in RL methods often leverage these capabilities to enable agents to interact with the environment more efficiently and make better decisions.
However, when it comes to training multiple agents in an environment — the complexity level increases. Take, for example, in a game of soccer; forward agents are trained to score goals. Sometimes, the game requires even the forward player to sideline its goal-scoring prerogative and defend the team’s lead in the final minutes of the game.
In the proposed method, a set of agents is represented as a multi-headed neural network with a common trunk. Researchers split the learning objective into two optimisation processes that operate simultaneously. For each agent, they use a policy gradient method to optimize its dense local rewards. For the sparser team objective, the team utilizes an evolutionary method similar to its earlier approach in CERL.
Earlier this year, Intel AI researchers presented CERL — a novel framework that allowed agents to learn challenging continuous control problems – e.g., training a 3D humanoid model to walk from scratch.
This enables the team to optimize both objectives simultaneously without explicitly mixing them together. We construct a population of teams, where each team is evaluated on its performance on the actual task. Following each evaluation, strong teams are retained, weak teams are eliminated, and new teams are formed by genetic operations like mutation and crossover on the elite survivors. Periodically, agents that are trained using policy gradients are inserted into the evolutionary population to provide building blocks for the evolutionary search process. At any given time, the team with the highest score for the task is considered the champion team.
One can read the entire methodology here.
Additionally, Intel recently announced Intel Arc, its upcoming consumer graphics product. This brand will cover hardware, software, and services. Under Arc, Intel introduced its first generation of GPUs based on the Xe HPG microarchitecture. Formerly called DG2, it is now code-named Alchemist. Intel also revealed that future generation hardware code names under Arc would be Battlemage, Celestial, and Druid.