MITB Banner

Robots Do Not Need A Centralized Authority Anymore

Hierarchical predictive planning is a decentralized model-based reinforcement learning system to enable agents to align their goals on the fly.

Share

robot decentralized

Interaction between robots and their environment is an exciting area of study. In a lab or a controlled environment, multiple robots could be coordinated through a centralized planner. However, in real-life applications, a centralized planner may not be feasible.

For example, rendezvous task. Here, a group of wheeled robots must agree to meet at a time and place without explicitly communicating with each other. During the task, the individual robots must maintain network connectivity and avoid collisions. There are two main challenges to perform a decentralized rendezvous task: the obstacles in the environment and the policies and dynamics of each agent. An individual robot must have the ability to model its and other agents’ motion and adapt to diverging intentions while using limited information.

Google’s research team has proposed hierarchical predictive planning (HPP), a decentralized model-based reinforcement learning system to enable agents to align their goals on the fly. The team demonstrated that HPP is more effective in predicting and aligning trajectories, avoiding miscoordination, and transferring to the real world without additional fine-tuning.

Hierarchical predictive planning

HPP was first introduced at the Conference on Robot Learning 2020 in a paper titled ‘Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous’.

Credit: Google AI blog

The learning system consists of three modules: Prediction, planning, and control.

Each agent employs this module to learn agent motion and predict its own and other agents’ future position using ego agent and LiDAR systems, respectively. The motion predictors make up the prediction module and are used by each agent’s planning module. The output of the prediction module is an essential input for the planning module. It evaluates different goal locations and maintains a belief distribution that gives information about where the team should converge. This belief distribution is periodically updated using evaluations provided by the prediction module.

The control module of the agent is equipped with a pre-trained navigation policy to steer the robot to a given location in an obstacle-ridden environment. The selected goal is given as input to the agent’s control module. The control policy then analyses the best course of action for the robot.

Additionally, the approach proposed by Google’s team closes the loop between the control and the planning module for decentralized multiagent systems by using a sensor-informed prediction module.

Training prediction model

HPP trains motion predictors in simulation. However, these models have no access to other agents’ observations and control policies. These predictors are trained through self-supervision. First, to collect the training data, all the agents and the obstacles are placed in an environment. Each agent is given a random goal, and as they move towards their respective destinations, the agent records its sensor observations and the poses of all the agents.

Next, using this recorded observation, an individual agent learns a separate predictor for every agent and itself. The goals and labels are derived from the recorded experience. Conditioned on the target agent’s goals, the model predicts where each agent will be in future depending on the present position, also called temporal causality. The predictor training is done based on only the information available to the agents at the runtime and in environments independent of the deployment environments.

A model-based RL planner for each agent used the learned predictors in the deployment environment to guide it to the common meeting point. This process simulates a centralized planner for fictitious agents by using prediction models to predict trajectories of agents moving to a fixed goal.

For goal selection, each of the goal options available is evaluated by scoring them using an anticipated system state with task rewarding goals that bring agents closer. A cross-entropy method is used to convert the goal evaluation to belief updates. Finally, the agent’s planner selects a goal and passes it to the agent’s control module.

“There are two main takeaways from our results. One is that HPP enables agents to predict and align trajectories, avoiding miscoordinations. The second takeaway is that HPP transfers directly into the real world without additional training,” the team said in a blog.

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.