Google AI recently introduced their reinforcement learning world model to encapsulate rich and meaningful information about surroundings, which enables a learning agent to make its specific predictions about actionable outcomes within the environment.
The world model, known as Pathdreamer, is an indoor navigation world model that generates high-resolution 360º visual observations of areas of a building unseen by an agent, using only limited seed observations and a proposed navigation trajectory.
Sign up for your weekly dose of what's up in emerging technology.
The Pathdreamer model can synthesize an immersive scene from a single viewpoint, predicting what an agent might see if it moved to a new viewpoint or even a completely unseen area, such as around a corner. Beyond potential applications in video editing and bringing photos to life, solving this task promises to codify knowledge about human environments to benefit robotic agents navigating in the real world.
Image Source: Google
World models such as Pathdreamer can also be used to increase the amount of training data for agents by training agents in the model.
The inputs and predictions both consist of RGB, semantic segmentation, and depth images. Internally, Pathdreamer uses a 3D point cloud to represent surfaces in the environment. Points in the cloud are labelled with both their RGB colour value and their semantic segmentation class, such as wall, chair or table.
To predict visual observations in a new location, the point cloud is first re-projected into 2D at the new location to provide ‘guidance’ images, from which Pathdreamer generates realistic high-resolution RGB, semantic segmentation and depth. As the model ‘moves’, new observations (either real or predicted) are accumulated in the point cloud.
Image Source: Google
Pathdreamer is trained with images and 3D environment reconstructions from Matterport3D and is capable of synthesizing realistic images as well as continuous video sequences. Pathdreamer is capable of generating multiple diverse and plausible images for regions of high uncertainty.
Google aims to apply Pathdreamer to several embodied navigation tasks such as Object-Nav, continuous VLN, and street-level navigation. For further details, you can try out Pathdreamer yourself using its open-source code link here.