Last updated March 18, 2024
In AI Mysteries

Python Guide to Neural Body: Converting 2D images to 3D

Neural Body employs sparse cameras to capture the poses of dynamic human body and renders integrated high-quality 3D views and scenes.

Share

Published on March 25, 2021

by Rajkumar Lakshmanamoorthy

Novel view synthesis finds interesting applications in movie production, sports broadcasting and telepresence. Novel view synthesis is the process of generating a 3D view, or sometimes 3D scene, with the available 2D images captured from different poses, orientations and illuminations. Human body view synthesis is one of the challenging problems, especially the human body, which is in motion. Present view synthesis methods employ either image-based rendering or implicit neural representation to develop the 3D view.

However, the major hindrance in these view synthesis approaches is the hardware complexity. View synthesis requires either a dense array of cameras to capture the object from different views and orientations or a few high-definition depth sensors. High hardware requirements make the system highly expensive or impossible to establish due to spatial constraints and strict-configurations requirements. These dense camera requirements may be overcome by employing a relatively lesser number of cameras or sensors. But, the reduced number of cameras causes sparsity in the view continuity. This causes ill-posed representation learning of views and thus results in poor view rendering. An approach to novel view synthesis with limited number of cameras or sensors has become a need nowadays.

To this end, Sida Peng, Yuanqing Zhang, Qing Shuai, Hujun Bao and Xiaowei Zhou of Zhejiang University, Yinghao Xu of The Chinese University of Hong Kong, and Qianqian Wang of Cornell University introduced a powerful approach named Neural Body that employs sparse cameras to capture the poses of dynamic human body and renders high-quality 3D view as well as 3D scene of the original human body.

Neural Body performs 3D reconstruction and Novel view synthesis from a sparse multi-view video captured with limited RGB cameras (Source).

This approach assumes that the learnt implicit neural representation among different sparse camera capturing frames share the same structured latent space representation code set anchored to a deformable mesh. Thus, the sparse capturing can be integrated to form a continuous 3D view representation. The deformable mesh can be deformed to any possible human position based on the input pose. Neural Body synthesizes photorealistic novel views of a human performer in complex motions and varying illustrations from sparse multi-view video frames. Moreover, this framework needs no pre-trained networks to learn the representations.

Neural Body generates different implicit 3D representations of a human body based on the input poses from a common structured latent code anchored to a deformable mesh (Source).

While training the Neural Body framework, the structured latent codes are fed as input into a sparse convolutional neural network (SparseConvNet) that outputs a latent code volume. Thus 3D space representation is enabled from the input data. Latent code for any inference 3D point can be obtained by performing trilinear interpolation of the neighbour points in the latent code volume. Once latent code is obtained for any inference pose, they are fed into feed-forward networks for colour and density regression.

Deformable mesh is designed by connecting its vertices with structured latent codes. For this, the famous SMPL (Skinned Multi-Person Linear Model) is employed that is governed by the shape parameters and the pose parameters. By anchoring the latent representations to this SMPL model, a dynamic mesh of the human body is developed. This model enables quick inference on 3D reconstruction and novel view synthesis.

Python Implementation

Neural Body requires Python 3.6+, CUDA 10.0, PyTorch 1.4.0 and a GPU runtime. The following commands install PyTorch 1.4.0 compatible with CUDA 10.0.

!pip install torch==1.4.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

Output:

The following command downloads source code to the local machine.

!git clone https://github.com/zju3dv/neuralbody

Output:

Verify the downloaded contents by exploring the directory.

!ls neuralbody

Output:

Change the current directory to refer content/neuralbody/ by providing the line-magic command.

%cd neuralbody/

Download the Anaconda-3 package using the following command, if the local machine does not have a conda environment.

!wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh

Output:

Install the downloaded Anaconda-3 package using the following command.

!bash Anaconda3-2020.02-Linux-x86_64.sh

Enable the conda directory to run further commands,

 %cd content/neuralbody/
 !export PATH=~/anaconda3/bin:$PATH
 !exec bash

and activate the environment and NeuralBody by providing the following commands inside the inner base mode command cell as shown below.

 conda create -n neuralbody python=3.7
 conda activate neuralbody

Install the dependencies using the following command.

!pip install -r requirements.txt

Install spconv library and build its wheels using the following commands.

 %%bash
 cd
 git clone https://github.com/traveller59/spconv --recursive
 cd spconv
 git checkout abf0acf30f5526ea93e687e3f424f62d9cd8313a
 export CUDA_HOME="/usr/local/cuda-10.0"
 python setup.py bdist_wheel
 cd dist
 pip install spconv-1.2.1-cp36-cp36m-linux_x86_64.whl

Download the datasets from the official data page to the directory /content/neuralbody. It should be noted that the size of the datasets exceeds 30GB in size. Once downloaded, the datasets can be prepared using the following commands.

 %%bash
 ROOT= content/neuralbody/
 cd $ROOT/data
 ln -s content/neuralbody/people_snapshot people_snapshot
 # OR
 ln -s content/neuralbody/zju_mocap zju_mocap

Download the pre-trained model from the official models page to a newly created /data directory, and enable one of the models and run it using the commands,

 %%bash
 $ROOT/data/trained_model/if_nerf/female3c/latest.pth
 python run.py --type visualize --cfg_file configs/snapshot_f3c_demo.yaml exp_name female3c
 python run.py --type visualize --cfg_file configs/snapshot_f3c_perform.yaml exp_name female3c
 python run.py --type visualize --cfg_file configs/snapshot_f3c_mesh.yaml exp_name female3c train.num_workers 0
 # start training
 python train_net.py --cfg_file configs/snapshot_f3c.yaml exp_name female3c resume False
 # distribute training based on the gpu availability
 python -m torch.distributed.launch --nproc_per_node=4 train_net.py --cfg_file configs/snapshot_f3c.yaml exp_name female3c resume False gpus "0, 1, 2, 3" distributed True

It should be noted that training may take several hours based on the memory availability and device configuration.

Performance of Neural Body

The Neural Body framework is trained on complex human motions such as twirling, Taichi, arm swings, warmups, punching and kicking. The complex human motions are captured by a multi-camera system of 21 synchronized cameras. Inputs from 4 evenly distributed cameras are chosen for training and the rest for testing. Training and testing of the Neural Body and the recent state-of-the-arts, the NeRF (Neural Radiance Fields), the NV (Neural Volumes), COLMAP, DVR (Differentiable Volumetric Rendering), People-Snapshot and the PIFuHD are carried out under identical conditions. Neural Body greatly outperforms any other model on the PSNR (Peak Signal-to-Noise Ratio) scale and the SSIM (Structural Similarity Index Metric) scale.