Python Guide to Neural Body: Converting 2D images to 3D

Neural Body employs sparse cameras to capture the poses of dynamic human body and renders integrated high-quality 3D views and scenes.
neural body

Novel view synthesis finds interesting applications in movie production, sports broadcasting and telepresence. Novel view synthesis is the process of generating a 3D view, or sometimes 3D scene, with the available 2D images captured from different poses, orientations and illuminations. Human body view synthesis is one of the challenging problems, especially the human body, which is in motion. Present view synthesis methods employ either image-based rendering or implicit neural representation to develop the 3D view. 

However, the major hindrance in these view synthesis approaches is the hardware complexity. View synthesis requires either a dense array of cameras to capture the object from different views and orientations or a few high-definition depth sensors. High hardware requirements make the system highly expensive or impossible to establish due to spatial constraints and strict-configurations requirements. These dense camera requirements may be overcome by employing a relatively lesser number of cameras or sensors. But, the reduced number of cameras causes sparsity in the view continuity. This causes ill-posed representation learning of views and thus results in poor view rendering. An approach to novel view synthesis with limited number of cameras or sensors has become a need nowadays. 


Sign up for your weekly dose of what's up in emerging technology.

To this end, Sida Peng, Yuanqing Zhang, Qing Shuai, Hujun Bao and Xiaowei Zhou of Zhejiang University, Yinghao Xu of The Chinese University of Hong Kong, and Qianqian Wang of Cornell University introduced a powerful approach named Neural Body that employs sparse cameras to capture the poses of dynamic human body and renders high-quality 3D view as well as 3D scene of the original human body. 

Neural Body performs 3D reconstruction and Novel view synthesis from a sparse multi-view video captured with limited RGB cameras (Source).

This approach assumes that the learnt implicit neural representation among different sparse camera capturing frames share the same structured latent space representation code set anchored to a deformable mesh. Thus, the sparse capturing can be integrated to form a continuous 3D view representation. The deformable mesh can be deformed to any possible human position based on the input pose. Neural Body synthesizes photorealistic novel views of a human performer in complex motions and varying illustrations from sparse multi-view video frames. Moreover, this framework needs no pre-trained networks to learn the representations.

Neural Body generates different implicit 3D representations of a human body based on the input poses from a common structured latent code anchored to a deformable mesh (Source).

While training the Neural Body framework, the structured latent codes are fed as input into a sparse convolutional neural network (SparseConvNet) that outputs a latent code volume. Thus 3D space representation is enabled from the input data. Latent code for any inference 3D point can be obtained by performing trilinear interpolation of the neighbour points in the latent code volume. Once latent code is obtained for any inference pose, they are fed into feed-forward networks for colour and density regression. 

Deformable mesh is designed by connecting its vertices with structured latent codes. For this, the famous SMPL (Skinned Multi-Person Linear Model) is employed that is governed by the shape parameters and the pose parameters. By anchoring the latent representations to this SMPL model, a dynamic mesh of the human body is developed. This model enables quick inference on 3D reconstruction and novel view synthesis.

neural body
Neural Body on Novel view synthesis and 3D reconstruction (Source)

Python Implementation

Neural Body requires Python 3.6+, CUDA 10.0, PyTorch 1.4.0 and a GPU runtime. The following commands install PyTorch 1.4.0 compatible with CUDA 10.0.

!pip install torch==1.4.0+cu100 -f


The following command downloads source code to the local machine.

!git clone


Verify the downloaded contents by exploring the directory.

!ls neuralbody


Change the current directory to refer content/neuralbody/ by providing the line-magic command.

%cd neuralbody/

Download the Anaconda-3 package using the following command, if the local machine does not have a conda environment.



Install the downloaded Anaconda-3 package using the following command.


Enable the conda directory to run further commands,

 %cd content/neuralbody/
 !export PATH=~/anaconda3/bin:$PATH
 !exec bash 

and activate the environment and NeuralBody by providing the following commands inside the inner base mode command cell as shown below. 

 conda create -n neuralbody python=3.7
 conda activate neuralbody 

Install the dependencies using the following command.

!pip install -r requirements.txt

Install spconv library and build its wheels using the following commands.

 git clone --recursive
 cd spconv
 git checkout abf0acf30f5526ea93e687e3f424f62d9cd8313a
 export CUDA_HOME="/usr/local/cuda-10.0"
 python bdist_wheel
 cd dist
 pip install spconv-1.2.1-cp36-cp36m-linux_x86_64.whl 

Download the datasets from the official data page to the directory /content/neuralbody. It should be noted that the size of the datasets exceeds 30GB in size. Once downloaded, the datasets can be prepared using the following commands.

 ROOT= content/neuralbody/
 cd $ROOT/data
 ln -s content/neuralbody/people_snapshot people_snapshot
 # OR
 ln -s content/neuralbody/zju_mocap zju_mocap 

Download the pre-trained model from the official models page to a newly created /data directory, and enable one of the models and run it using the commands,

 python --type visualize --cfg_file configs/snapshot_f3c_demo.yaml exp_name female3c
 python --type visualize --cfg_file configs/snapshot_f3c_perform.yaml exp_name female3c
 python --type visualize --cfg_file configs/snapshot_f3c_mesh.yaml exp_name female3c train.num_workers 0
 # start training
 python --cfg_file configs/snapshot_f3c.yaml exp_name female3c resume False
 # distribute training based on the gpu availability
 python -m torch.distributed.launch --nproc_per_node=4 --cfg_file configs/snapshot_f3c.yaml exp_name female3c resume False gpus "0, 1, 2, 3" distributed True 

It should be noted that training may take several hours based on the memory availability and device configuration.

Performance of Neural Body

The Neural Body framework is trained on complex human motions such as twirling, Taichi, arm swings, warmups, punching and kicking. The complex human motions are captured by a multi-camera system of 21 synchronized cameras. Inputs from 4 evenly distributed cameras are chosen for training and the rest for testing. Training and testing of the Neural Body and the recent state-of-the-arts, the NeRF (Neural Radiance Fields), the NV (Neural Volumes), COLMAP, DVR (Differentiable Volumetric Rendering), People-Snapshot and the PIFuHD are carried out under identical conditions. Neural Body greatly outperforms any other model on the PSNR (Peak Signal-to-Noise Ratio) scale and the SSIM (Structural Similarity Index Metric) scale. 

neural body
Qualitative comparison of Neural Body with other models in Novel view synthesis (Source).
neural body
Qualitative compassion of Neural Body with People-Snapshot on 3D reconstruction on monocular videos (Source).

Further reading

More Great AIM Stories

Rajkumar Lakshmanamoorthy
A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

What is Direct to Mobile technology?

The Department of Technology is conducting a feasibility study of a spectrum band for offering broadcast services directly to users’ smartphones.