Researchers from UC Berkeley, Waymo and Google Research have proposed a grid-based Block-NeRF variant for representing larger environments. In the paper, Block-NeRF: Scalable Large Scene Neural View Synthesis, the researchers demonstrated that when scaling NeRF to render city-scale scenes spanning multiple blocks, it is vital to decompose the scene into individually trained NeRFs.
Block-NeRF is built upon NeRFs and the recently introduced mip-NeRF extension, a multiscale representation for anti-aliasing neural radiance fields that reduces aliasing issues that hurt NeRF performance in scenes where the input images observe a given scene from different distances. The team also incorporates techniques from NeRF in the Wild (NeRF-W) to deal with inconsistent scene appearances when applying NeRF to landmarks from the Photo Tourism dataset. The proposed Block-NeRF can thus combine many NeRFs to reconstruct a coherent large environment from millions of images
Sign up for your weekly dose of what's up in emerging technology.
The researchers used Block-NeRF, a variant of Neural Radiance Fields that can represent large-scale environments. Researchers demonstrated that when scaling NeRF to render city-scale scenes spanning multiple blocks, it is vital to decompose the scene into individually trained NeRFs. This decomposition decouples rendering time from scene size, enables rendering to scale to arbitrarily large environments, and allows per-block updates of the environment. The team adopted several architectural changes to make NeRF robust to data captured over months under different environmental conditions. They also added appearance embeddings, learned pose refinement, and controllable exposure to each individual NeRF, and introduced a procedure for aligning appearance between adjacent NeRFs so that they can be seamlessly combined.
Researchers used San Francisco’s Alamo Square neighbourhood as the target area and the city’s Mission Bay District as the baseline. The training dataset was derived from 13.4 hours of driving time sourced from 1,330 different data collection runs for a total of 2,818,745 training images.