Google AI Introduces Pathdreamer, A World Model for Indoor Navigation

Pathdreamer is an indoor navigation world model that generates high-resolution 360º visual observations of areas of a building unseen by an agent.
BERT in Google Search


Google AI recently introduced their reinforcement learning world model to encapsulate rich and meaningful information about surroundings, which enables a learning agent to make its specific predictions about actionable outcomes within the environment. 

The world model, known as Pathdreamer, is an indoor navigation world model that generates high-resolution 360º visual observations of areas of a building unseen by an agent, using only limited seed observations and a proposed navigation trajectory.


Sign up for your weekly dose of what's up in emerging technology.

The Pathdreamer model can synthesize an immersive scene from a single viewpoint, predicting what an agent might see if it moved to a new viewpoint or even a completely unseen area, such as around a corner. Beyond potential applications in video editing and bringing photos to life, solving this task promises to codify knowledge about human environments to benefit robotic agents navigating in the real world. 

Image Source: Google

World models such as Pathdreamer can also be used to increase the amount of training data for agents by training agents in the model.

The inputs and predictions both consist of RGB, semantic segmentation, and depth images. Internally, Pathdreamer uses a 3D point cloud to represent surfaces in the environment. Points in the cloud are labelled with both their RGB colour value and their semantic segmentation class, such as wall, chair or table.

To predict visual observations in a new location, the point cloud is first re-projected into 2D at the new location to provide ‘guidance’ images, from which Pathdreamer generates realistic high-resolution RGB, semantic segmentation and depth. As the model ‘moves’, new observations (either real or predicted) are accumulated in the point cloud. 

Image Source: Google

Pathdreamer is trained with images and 3D environment reconstructions from Matterport3D and is capable of synthesizing realistic images as well as continuous video sequences. Pathdreamer is capable of generating multiple diverse and plausible images for regions of high uncertainty. 

Google aims to apply Pathdreamer to several embodied navigation tasks such as Object-Nav, continuous VLN, and street-level navigation. For further details, you can try out Pathdreamer yourself using its open-source code link here.

More Great AIM Stories

Victor Dey
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM