Guide To Infinite Nature For Perpetual View Generation

Infinite Nature, aka perpetual view generation, allows you to take an image and fly into it as a bird would do

Share

Published on June 19, 2021

by Mudit Rustagi

GANs have made generating images, text and so on quite easy. Each month we can see some new applications popping up. But this article is going to be about a spectacular application of a Deep Learning technique in which we synthesize or create scenes from a given picture. It can be some sequence of progressive images or frames that correspond to the original image, to be precise.

Infinite Nature, aka perpetual view generation, allows you to take an image and fly into it as a bird would do, mapping and exploring all the landscape. We generate a long range of novel views (constructing new images relating to the original but progressive). This corresponds to an arbitrary or random long camera following a trajectory of a sky view, for example, a bird. All this from a single image!!

This sounds like a challenging problem, considering how far the generation will go beyond the capabilities of current view synthesis models. These too work for a limited number of viewpoints (the image from where the synthesis will start or the base image). Another problem was that these viewpoints degenerate quickly and generate images/frames with minimal changes.

The technique discussed in this article solves all the above problems by using a hybrid solution based on integrating both image synthesis and geometry in an interactive framework with iterative rendering, refining and repeating. This allows long-range generation that can cover large distances even after hundreds of frames. This approach is trained upon a set of monocular video sequences without any manual annotation, which saves a lot of time.

The key point to be noted here is that authors have used the geometry of the image, so first, a disparity map(map showing the variation in-depth in an image) was created using a state-of-the-art network called MiDaS, informing the network about the depths inside the image.

The goal of the renderer is to generate new views based on the old view. Note that this is a differentiable generator so that backpropagation can be leveraged for training. Then a 3D mesh is used to generate an image from a novel viewpoint. Another network called SPADE, which is also state-of-the-art, accounts for conditional image synthesis. This process repeats over and over, producing newer, deeper images into the view.

Code Implementation

Below are the instructions for running the model locally.

Install libraries with the given requirements file here

 #installing dependencies with help of requirements file
 pip3 install -r requirements.txt

As mentioned earlier, we have to use a 3D mesh renderer leveraged from TensorFlow. Authors have used GCC to build the library instead of Bazel instructions given in Tensorflow Github.

Tensorflow mesh was originally for versions less than 2.x, but authors have prepared a small patch that can be downloaded for upgrading the functions to work on version 2.x

 #downloading the tensorflow 3D mesh from github
 source download_tf_mesh_renderer.sh

Now, download the required data and pre-trained checkpoints.

 #downloading the zip file containing model and checkpoints
 wget https://storage.googleapis.com/gresearch/infinite_nature_public/ckpt.tar.gz
 #unzip the file 
 tar xvf ckpt.tar.gz

Sample auto cruise obtained from here.

 #sample inputs by authors mentioned in paper and Official github
 wget https://storage.googleapis.com/gresearch/infinite_nature_public/autocruise_input1.pkl
 wget https://storage.googleapis.com/gresearch/infinite_nature_public/autocruise_input2.pkl
 wget https://storage.googleapis.com/gresearch/infinite_nature_public/autocruise_input3.pkl

Inside the pickle files is a dictionary with entries containing nature scenes and respective disparity maps predicted by MiDaS.

Run the code for 100 steps of Infinite Nature using autocruise, saving the frames to a file.

 #running the model with 100 frames which will stored in an output file as mentioned
 python -m autocruise --output_folder=autocruise --num_steps=100

So this was all about running the pre-trained model locally on a local machine.

Let’s have a look at the application of Infinite Nature on Google Colab.

Installing Dependencies

 #imageio for image manipulation IPython for showing image in notebook
 import imageio
 import IPython
 #numpy for array pickle for model files and checkpoint files
 import numpy as np
 import pickle
 #importing libraries, infinite_nature_lib, fly_camera from authors
 import config
 import infinite_nature_lib
 import fly_camera
 import tensorflow as tf
 import tensorflow_hub as hub

Downloading model weights, sample data

 #making sure dynamic linking is able to find tensorflow libraries.
 os.system('ldconfig ' + tf.sysconfig.get_lib())
 #python can successfully find libraries defined by authors
 sys.path.append('infinite_nature')
 sys.path.append('infinite_nature/tf_mesh_renderer/mesh_renderer')
 #the mesh renderer library should know where from to load its .so file from.
 os.environ['TEST_SRCDIR'] = 'infinite_nature'
 #tensorflow, os and system for directories and saving files
 import tensorflow as tf
 import sys
 import os

NOTE : The following snippet has been taken from the Official GitHub Repository of Infinite Nature containing links and correct, specific procedure.

 %%shell
 echo Fetching code from github...
 #for storing client settings while running model 
 apt install subversion
 svn export --force https://github.com/google-research/google-research/trunk/infinite_nature
 #fetching the weights , checkpoint files in form of zip files
 echo
 echo Fetching trained model weights...
 rm -f autocruise_input*.pkl
 rm -f ckpt.tar.gz
 rm -rf ckpt
 wget https://storage.googleapis.com/gresearch/infinite_nature_public/autocruise_input1.pkl
 wget https://storage.googleapis.com/gresearch/infinite_nature_public/autocruise_input2.pkl
 wget https://storage.googleapis.com/gresearch/infinite_nature_public/autocruise_input3.pkl
 wget https://storage.googleapis.com/gresearch/infinite_nature_public/ckpt.tar.gz
 tar -xf ckpt.tar.gz
 #installing specific versions of libraries
 echo
 echo Installing required dependencies...
 pip install -r infinite_nature/requirements.txt
 #starting 3D mesh renderers from TF Github
 echo
 echo Fetching tf_mesh_renderer and compiling kernels...
 cd infinite_nature
 rm -rf tf_mesh_renderer
 source download_tf_mesh_renderer.sh
 echo Done.

Build Model

 config.set_training(False)
 #model path which is a ckpt checkpoint file
 mod_path = "ckpt/model.ckpt-6935893"
 #instantiate methods from libraries
 render_refiner, style_encod = infinite_nature_lib.load_model(mod_path)
 #initial dimensions will be taken from sample images
 initial_rgbds = [
     pickle.load(open("autocruise_input1.pkl", "rb"))['input_rgbd'],
     pickle.load(open("autocruise_input2.pkl", "rb"))['input_rgbd'],
     pickle.load(open("autocruise_input3.pkl", "rb"))['input_rgbd']]
 '''
 The state that we need to remember while flying
 Code for an autopilot demo. 
 We expose two functions that will be invoked
 from an HTML/JS frontend: reset and step.
 '''
 state = {
   'intrinsics': None,
   'pose': None,
   'rgbd': None,
   'start_rgbd': None,
   'style_noise': None,
   'next_pose_function': None,
   #setting offset none for controlling with mouse
   'direction_offset': None, 
 }
 def current_image_png():
   img_data = tf.image.encode_png(
       tf.image.convert_image_dtype(state['rgbd'][..., :3], dtype=tf.uint8))
   return IPython.display.Image(data=img_data.numpy())

Reset Function

 #function to reset the rgbd channels d is for depth
 def reset(rgbd=None):
   #condition for new input channel
   if rgbd is None:
     rgbd = state['start_rgbd']
   ht, w, _ = rgbd.shape
   aspectratio = w / float(ht)
   #resizing the image so that it looks like we are zooming in
   rgbd_channel = tf.image.resize(rgbd_channel, [160, 256])
   state['rgbd'] = rgbd_channel
   #default rgbd channel 
   state['start_rgbd'] = rgbd_channel
   state['pose'] = np.array(
       [[1.0, 0.0, 0.0, 0.0],
        [0.0, 1.0, 0.0, 0.0],
        [0.0, 0.0, 1.0, 0.0]],
       dtype=np.float32)
   #0.8 focal_x corresponds to a FOV (focal view) of ~64 degrees.
   state['intrinsics'] = np.array(
       [0.8, 0.8 * aspect_ratio, .5, .5],
       dtype=np.float32)
   #no movement from self, defined by mouse or autopilot
   state['direction_offset'] = (0.0, 0.0)
   state['style_noise'] = style_encoding(rgbd_channel)
   #new pose after current image
   state['next_pose_function'] = fly_camera.fly_dynamic(
     state['intrinsics'],
     state['pose'],
     #turn the camera where mouse points
     turn_function=(lambda _: state['direction_offset']))
   return current_image_png()

Step Function

 #function for direction to take
 new frame in
 def step(offx, offy):
   state['direction_offset'] = (offx, offy)
   #calling self function
   next= state['next_pose_function'](state['rgbd'])
  # new rgbd channel refiner
   next_rgbd = render_refiner(
        state['rgbd'], state['style_noise'],
        state['pose'], state['intrinsics'],
        next, state['intrinsics'])
   state['pose'] = next
   state['rgbd'] = next_rgbd
   return current_image_png()

Midas Disparity

 #running on user-supplied images, using MiDaS V2, obtain initial disparity.
 midas_mod = hub.load('https://tfhub.dev/intel/midas/v2/2', tags=['serve'])
 def midas_dis(rgb):
   """Computes MiDaS v2 disparity on an RGB input image.
   Arguments:
     rgb: [H, W, 3] Range [0.0, 1.0].
   Function outputs:
     [H, W, 1] MiDaS disparity resized to the input size and in the range
     [0.0, 1.0]
   """
   size = rgb.shape[:2]
   resized_img = tf.image.resize(rgb, [384, 384], tf.image.ResizeMethod.BICUBIC)
  # MiDaS networks wants [1, C, H, W]
   midas_in = tf.transpose(resized_img, [2, 0, 1])[tf.newaxis]
   pred = midas_mod.signatures['serving_default'](midas_in)['default'][0]
   min = tf.reduce_min(prediction)
   max = tf.reduce_max(prediction)
   prediction = (pred - min) / (max - min)
   return tf.image.resize(
       pred[..., tf.newaxis], size,  method=tf.image.ResizeMethod.AREA)

Load Function

 #initial rgbd channels for frame
 def load_initial(i):
   return reset(rgbd=initial_rgbds[i])
 def load_image(data):
   '''
   Data is converted from JavaScript which ends up as a string, then it 
   needs to be converted to byte format
   using Latin-1 encoding (maps 0-255 to 0-255).
   '''
   d = d.encode('Latin-1')
  # decoding image from channels which are also provided as input
   rgb = tf.image.decode_image(data, channels=3, dtype=tf.float32)
   #resizing is vital for moving ahead in the frame  
   resized = tf.image.resize(rgb, [160, 256], tf.image.ResizeMethod.AREA)
   #concatenation with midas disparity map from previous function
   rgbd = tf.concat([resized, midas_dis(resized)], axis=-1)
   return reset(rgbd=rgbd)

Output

The frontend for this application in HTML is given here.

 #displaying frontend made by HTML script provided above
 display(IPython.display.HTML(html))
 #initial image , base
 output.register_callback('initial', load_initial)
 #corresponding generated frame
 output.register_callback('image', load_image)
 #reset rgbd channels
 output.register_callback('reset', reset)
 #step or change the channels for new frame
 output.register_callback('step', step)

EndNote

The output can be viewed here, successfully made a frontend application for the Infinite Nature model. The dataset which can I recommend trying is ACID(Aerial Coastline Imagery Dataset)

One can try rigorously changing the camera position in the application. Former approach to this problem can be read here.

References:

Access all our open Survey & Awards Nomination forms in one place

Mudit Rustagi

Mudit is experienced in machine learning and deep learning. He is an undergraduate in Mechatronics and worked as a team lead (ML team) for several Projects. He has a strong interest in doing SOTA ML projects and writing blogs on data science and machine learning.