Now Reading
Guide to Intel’s Stable View Synthesis – A State-of-Art 3D Photorealistic Framework

Guide to Intel’s Stable View Synthesis – A State-of-Art 3D Photorealistic Framework

Rajkumar Lakshmanamoorthy
Stable View Synthesis
Some of the synthesized 3D photorealistic images with Stable View Synthesis

Stable View Synthesis achieves state-of-the-art performance in 3D photorealistic view synthesis, significantly outperforming any of the current approaches. It was developed by Gernot Riegler and Vladlen Koltun from Intel Labs and published(Research Paper) recently.  Photorealistic view synthesis is the art of acquiring a new viewpoint of a subject by learning from various actual images of that subject captured in different views and orientations with identical camera settings. 

Photorealistic view synthesis can help explore space and other technologies where real photography is hardly possible. Stable View Synthesis develops a scene-based image and allows one to view the same scene from almost all possible viewpoints that can be run as a sequence of images. Input to the computer vision system can be a short video of a subject by moving the camera around the subject with a focus on the subject. 

Stable View Synthesis, shortly called SVS, develops structure-from-motion (SfM) scenario to develop image poses of input images and prediction of camera settings and orientation. These image poses are used in multi-view stereo to generate 3D dense point clouds. A 3D geometric scaffold of the scene is synthetically constructed by meshing these points. On the other hand, an autoencoder convolutional neural network is incorporated to encode sequences of input images into feature tensors. 

Photorealistic Stable View Synthesis
Photorealistic Geometric scaffolding in Stable View Synthesis – an example
Encoding of input images into feature tensors in Stable View Synthesis
Photorealistic Stable View Synthesis
Decoding synthesized feature tensors into an output 3d image in Stable View Synthesis

The pixels on the geometric scaffold corresponding to that specific view are located in many of the original images to synthesise a new view. Each of such images is used to generate feature maps through rays to arrive at view synthesis. SVS employs on-surface aggregation using a differentiable set network to process this synthesized data to produce the target ray’s feature vector.

Photorealistic Stable View Synthesis
Surface aggregation of different input rays in Stable View Synthesis

Rendering of the output image can be done by developing a depth map using camera poses and other details. This depth map is used to define how far the pixels on the geometric scaffold need to be unprojected. Thus output-view-dependent feature vectors are generated and assembled to form the feature tensors. Using the already-trained convolutional neural network, these feature tensors are transformed into the 3D reconstructed scene.

A few sampled images in a sequence capturing a playground scene from the Tanks and Temples dataset are shown below.

Photorealistic Stable View Synthesis

Coding Stable View Synthesis in python

To install Stable View Synthesis and its dependencies in your local machine, run the following commands. It should be noted that Stable View Synthesis can be trained or run only on CUDA GPU. Hence, users who work with notebook environments should enable CUDA GPU runtime to install and train the system.

 # install necessary libraries
 sudo apt-add-repository universe
 sudo apt-get install libeigen3-dev
 pip install torchvision 
 pip install torch-scatter 
 pip install torch-sparse 
 pip install torch-geometric
 pip install torch-sparse
 pip install open3d
 pip install python-opencv
 pip install ninja 

In order to obtain necessary source files from the github repository, clone it and update submodules.

 git clone
 cd StableViewSynthesis
 git submodule update --init --recursive --remote 

Install the files

 cd StableViewSynthesis/ext/preprocess
 cmake -DCMAKE_BUILD_TYPE=Release .
 cd ../mytorch
 python build_ext --inplace 

Open up the experiments directory and run evaluation by providing the following commands. This invokes the pretrained model and runs with four sampled sequences from the tanks and temples dataset.

 cd StableViewSynthesis/experiments
 python --net resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16 --cmd eval --iter last --eval-dsets tat-subseq 

The whole model can also be retrained completely using the command,

 python --net resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16 --cmd retrain 

Stable View Synthesis exhibits qualitative as well as quantitative outperformance compared to well acclaimed approaches such as Free View Synthesis (FVS), Local Light Field Fusion (LLFF), Neural Radiance Fields (NERF), Improved NERF (NERF++), Extreme View Synthesis (EVS), and Neural Point-Based Graphics (NPBG). 

Note: The articles’ illustrations are obtained from the Tanks and Temples dataset, FVS dataset, and original research paper.

Some useful references:

Github official code repository

See Also

Original research paper

Performance analysis of SVS

View Synthesis – Wiki

Free View Synthesis – Research paper

Tanks and Temples dataset

FVS dataset

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join Our Telegram Group. Be part of an engaging online community. Join Here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top