Guide To Infinite Nature For Perpetual View Generation

GANs have made generating images, text and so on quite easy. Each month we can see some new applications popping up.  But this article is going to be about a spectacular application of a Deep Learning technique in which we synthesize or create scenes from a given picture. It can be some sequence of progressive images or frames that correspond to the original image, to be precise. 

Infinite Nature, aka perpetual view generation, allows you to take an image and fly into it as a bird would do, mapping and exploring all the landscape. We generate a long range of novel views (constructing new images relating to the original but progressive). This corresponds to an arbitrary or random long camera following a trajectory of a sky view, for example, a bird. All this from a single image!! 

This sounds like a challenging problem, considering how far the generation will go beyond the capabilities of current view synthesis models. These too work for a limited number of viewpoints (the image from where the synthesis will start or the base image). Another problem was that these viewpoints degenerate quickly and generate images/frames with minimal changes.


Sign up for your weekly dose of what's up in emerging technology.

The technique discussed in this article solves all the above problems by using a hybrid solution based on integrating both image synthesis and geometry in an interactive framework with iterative rendering, refining and repeating. This allows long-range generation that can cover large distances even after hundreds of frames. This approach is trained upon a set of monocular video sequences without any manual annotation, which saves a lot of time. 

The key point to be noted here is that authors have used the geometry of the image, so first, a disparity map(map showing the variation in-depth in an image) was created using a state-of-the-art network called MiDaS, informing the network about the depths inside the image.

Download our Mobile App

The goal of the renderer is to generate new views based on the old view. Note that this is a differentiable generator so that backpropagation can be leveraged for training.  Then a 3D mesh is used to generate an image from a novel viewpoint. Another network called SPADE, which is also state-of-the-art, accounts for conditional image synthesis.  This process repeats over and over, producing newer, deeper images into the view.

Code Implementation

Below are the instructions for running the model locally.

Install libraries with the given requirements file here

 #installing dependencies with help of requirements file
 pip3 install -r requirements.txt

As mentioned earlier, we have to use a 3D mesh renderer leveraged from TensorFlow. Authors have used GCC to build the library instead of Bazel instructions given in Tensorflow Github. 

Tensorflow mesh was originally for versions less than 2.x, but authors have prepared a small patch that can be downloaded for upgrading the functions to work on version 2.x

 #downloading the tensorflow 3D mesh from github

Now, download the required data and pre-trained checkpoints.

 #downloading the zip file containing model and checkpoints
 #unzip the file 
 tar xvf ckpt.tar.gz 

Sample auto cruise obtained from here.

 #sample inputs by authors mentioned in paper and Official github

Inside the pickle files is a dictionary with entries containing nature scenes and respective disparity maps predicted by MiDaS. 

Run the code for 100 steps of Infinite Nature using autocruise, saving the frames to a file.

 #running the model with 100 frames which will stored in an output file as mentioned
 python -m autocruise --output_folder=autocruise --num_steps=100 

So this was all about running the pre-trained model locally on a local machine.

Let’s have a look at the application of Infinite Nature on Google Colab.

Installing Dependencies
 #imageio for image manipulation IPython for showing image in notebook
 import imageio
 import IPython
 #numpy for array pickle for model files and checkpoint files
 import numpy as np
 import pickle
 #importing libraries, infinite_nature_lib, fly_camera from authors
 import config
 import infinite_nature_lib
 import fly_camera
 import tensorflow as tf
 import tensorflow_hub as hub 
Downloading model weights, sample data 
 #making sure dynamic linking is able to find tensorflow libraries.
 os.system('ldconfig ' + tf.sysconfig.get_lib())
 #python can successfully find libraries defined by authors
 #the mesh renderer library should know where from to load its .so file from.
 os.environ['TEST_SRCDIR'] = 'infinite_nature'
 #tensorflow, os and system for directories and saving files
 import tensorflow as tf
 import sys
 import os 

NOTE : The following snippet has been taken from the Official GitHub Repository of Infinite Nature containing links and correct, specific procedure.

 echo Fetching code from github...
 #for storing client settings while running model 
 apt install subversion
 svn export --force
 #fetching the weights , checkpoint files in form of zip files
 echo Fetching trained model weights...
 rm -f autocruise_input*.pkl
 rm -f ckpt.tar.gz
 rm -rf ckpt
 tar -xf ckpt.tar.gz
 #installing specific versions of libraries
 echo Installing required dependencies...
 pip install -r infinite_nature/requirements.txt
 #starting 3D mesh renderers from TF Github
 echo Fetching tf_mesh_renderer and compiling kernels...
 cd infinite_nature
 rm -rf tf_mesh_renderer
 echo Done. 
Build Model
 #model path which is a ckpt checkpoint file
 mod_path = "ckpt/model.ckpt-6935893"
 #instantiate methods from libraries
 render_refiner, style_encod = infinite_nature_lib.load_model(mod_path)
 #initial dimensions will be taken from sample images
 initial_rgbds = [
     pickle.load(open("autocruise_input1.pkl", "rb"))['input_rgbd'],
     pickle.load(open("autocruise_input2.pkl", "rb"))['input_rgbd'],
     pickle.load(open("autocruise_input3.pkl", "rb"))['input_rgbd']]
 The state that we need to remember while flying
 Code for an autopilot demo. 
 We expose two functions that will be invoked
 from an HTML/JS frontend: reset and step.
 state = {
   'intrinsics': None,
   'pose': None,
   'rgbd': None,
   'start_rgbd': None,
   'style_noise': None,
   'next_pose_function': None,
   #setting offset none for controlling with mouse
   'direction_offset': None, 
 def current_image_png():
   img_data = tf.image.encode_png(
       tf.image.convert_image_dtype(state['rgbd'][..., :3], dtype=tf.uint8))
   return IPython.display.Image(data=img_data.numpy()) 
Reset Function
 #function to reset the rgbd channels d is for depth
 def reset(rgbd=None):
   #condition for new input channel
   if rgbd is None:
     rgbd = state['start_rgbd']
   ht, w, _ = rgbd.shape
   aspectratio = w / float(ht)
   #resizing the image so that it looks like we are zooming in
   rgbd_channel = tf.image.resize(rgbd_channel, [160, 256])
   state['rgbd'] = rgbd_channel
   #default rgbd channel 
   state['start_rgbd'] = rgbd_channel
   state['pose'] = np.array(
       [[1.0, 0.0, 0.0, 0.0],
        [0.0, 1.0, 0.0, 0.0],
        [0.0, 0.0, 1.0, 0.0]],
   #0.8 focal_x corresponds to a FOV (focal view) of ~64 degrees.
   state['intrinsics'] = np.array(
       [0.8, 0.8 * aspect_ratio, .5, .5],
   #no movement from self, defined by mouse or autopilot
   state['direction_offset'] = (0.0, 0.0)
   state['style_noise'] = style_encoding(rgbd_channel)
   #new pose after current image
   state['next_pose_function'] = fly_camera.fly_dynamic(
     #turn the camera where mouse points
     turn_function=(lambda _: state['direction_offset']))
   return current_image_png() 
Step Function
 #function for direction to take
 new frame in
 def step(offx, offy):
   state['direction_offset'] = (offx, offy)
   #calling self function
   next= state['next_pose_function'](state['rgbd'])
  # new rgbd channel refiner
   next_rgbd = render_refiner(
        state['rgbd'], state['style_noise'],
        state['pose'], state['intrinsics'],
        next, state['intrinsics'])
   state['pose'] = next
   state['rgbd'] = next_rgbd
   return current_image_png() 
Midas Disparity
 #running on user-supplied images, using MiDaS V2, obtain initial disparity.
 midas_mod = hub.load('', tags=['serve'])
 def midas_dis(rgb):
   """Computes MiDaS v2 disparity on an RGB input image.
     rgb: [H, W, 3] Range [0.0, 1.0].
   Function outputs:
     [H, W, 1] MiDaS disparity resized to the input size and in the range
     [0.0, 1.0]
   size = rgb.shape[:2]
   resized_img = tf.image.resize(rgb, [384, 384], tf.image.ResizeMethod.BICUBIC)
  # MiDaS networks wants [1, C, H, W]
   midas_in = tf.transpose(resized_img, [2, 0, 1])[tf.newaxis]
   pred = midas_mod.signatures['serving_default'](midas_in)['default'][0]
   min = tf.reduce_min(prediction)
   max = tf.reduce_max(prediction)
   prediction = (pred - min) / (max - min)
   return tf.image.resize(
       pred[..., tf.newaxis], size,  method=tf.image.ResizeMethod.AREA) 
Load Function
 #initial rgbd channels for frame
 def load_initial(i):
   return reset(rgbd=initial_rgbds[i])
 def load_image(data):
   Data is converted from JavaScript which ends up as a string, then it 
   needs to be converted to byte format
   using Latin-1 encoding (maps 0-255 to 0-255).
   d = d.encode('Latin-1')
  # decoding image from channels which are also provided as input
   rgb = tf.image.decode_image(data, channels=3, dtype=tf.float32)
   #resizing is vital for moving ahead in the frame  
   resized = tf.image.resize(rgb, [160, 256], tf.image.ResizeMethod.AREA)
   #concatenation with midas disparity map from previous function
   rgbd = tf.concat([resized, midas_dis(resized)], axis=-1)
   return reset(rgbd=rgbd) 

The frontend for this application in HTML is given here.

 #displaying frontend made by HTML script provided above
 #initial image , base
 output.register_callback('initial', load_initial)
 #corresponding generated frame
 output.register_callback('image', load_image)
 #reset rgbd channels
 output.register_callback('reset', reset)
 #step or change the channels for new frame
 output.register_callback('step', step) 


The output can be viewed here, successfully made a frontend application for the Infinite Nature model. The dataset which can I recommend trying is ACID(Aerial Coastline Imagery Dataset)

One can try rigorously changing the camera position in the application. Former approach to this problem can be read here.


Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Mudit Rustagi
Mudit is experienced in machine learning and deep learning. He is an undergraduate in Mechatronics and worked as a team lead (ML team) for several Projects. He has a strong interest in doing SOTA ML projects and writing blogs on data science and machine learning.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox