Hands-on Guide to PyTorch 3D – A Library for Deep Learning with 3D Data

PyTorch 3D

Facebook AI’s PyTorch 3D is a python library to deal with 3D data in deep learning. It is based on PyTorch tensors and highly modular, flexible, efficient and optimized framework, which makes it easier for researchers to experiment with and impart scalability to big 3D data. PyTorch 3D framework contains a set of 3D operators, batching techniques and loss functions(for 3D data) that can be easily integrated with existing deep learning systems through its fast and differentiable API’s. The key features of PyTorch 3D are as follows:

  • Operations of PyTorch 3D are implemented using PyTorch tensors.
  • Provides the functionality to use GPU for acceleration.
  • PyTorch 3D is capable of handling mini-batches of heterogeneous data
Source : Official Video Tutorial

You can cover the theoretical aspect of PyTorch 3D through our previous article on PyTorch 3D. In this article, we will cover some Python demos of PyTorch 3D.

Core Components in CodeBase


Sign up for your weekly dose of what's up in emerging technology.

Overview of components in the codebase is shown below. The foundation layer consists of data structures for 3D data, data loading utilities and composable transforms. The data structures in particular enable the operators and loss functions in the second layer to efficiently support heterogeneous batching.

Source : Official Video Tutorial


Install PyTorch 3D through these commands below:

import os
!curl -LO https://github.com/NVIDIA/cub/archive/1.10.0.tar.gz
!tar xzf 1.10.0.tar.gz
#create a new environement
os.environ["CUB_HOME"] = os.getcwd() + "/cub-1.10.0"
!pip install 'git+https://github.com/facebookresearch/pytorch3d.git@stable'

Demo – Deform source mesh to target mesh

In this demo, we will deform an initial generic shape to fit or convert it to a target. It is divided into four parts mainly:

  1. Import all the required packages and libraries. The code snippet is available here. Now, download the target object and save it locally.
!wget https://dl.fbaipublicfiles.com/pytorch3d/data/dolphin/dolphin.obj
# Load the dolphin mesh.
trg_obj = os.path.join('dolphin.obj')

Now, load the target image as an object via load_obj. It will give you tensors of vertices(verts), faces(vertex indices) and aux. Then normalize the tensor of the vertex-indices of each of the corners of the face and then create a mesh with the help of Meshes data structure available in PyTorch 3D.

# We read the target 3D model using load_obj
#which sets verts to be a (V,3)-tensor of vertices and faces.verts_idx to be an (F,3)- tensor of the vertex-indices of each of the corners of 
#the faces. Faces which are not triangles will be split into triangles. aux is an object which may contain normals, 
#uv coordinates, material colors and textures if they are present, and faces may additionally contain indices into these normals, 
#textures and materials in its NamedTuple structure. 
verts, faces, aux = load_obj(trg_obj)

# verts is a FloatTensor of shape (V, 3) where V is the number of vertices in the mesh
# faces is an object which contains the following LongTensors: verts_idx, normals_idx and textures_idx
# For this tutorial, normals and textures are ignored.
faces_idx = faces.verts_idx.to(device)
verts = verts.to(device)

# We scale normalize and center the target mesh to fit in a sphere of radius 1 centered at (0,0,0). 
# (scale, center) will be used to bring the predicted mesh to its original center and scale
# Note that normalizing the target mesh, speeds up the optimization but is not necessary!
center = verts.mean(0)
verts = verts - center
scale = max(verts.abs().max(0)[0])
verts = verts / scale

# We construct a Meshes structure for the target mesh
#initialize a PyTorch3D datastructure called Meshes, 
trg_mesh = Meshes(verts=[verts], faces=[faces_idx])

    Now, initialize a source shape to be sphere of radius 1.

# We initialize the source shape to be a sphere of radius 1

#ico_sphere creates verts and faces for a unit ico-sphere, with all faces oriented consistently.
# here, integer specifying the number of iterations for subdivision of the mesh faces. 
#Each additional level will result in four new faces per face.
src_mesh = ico_sphere(4, device)
  1. Now, visualize the source and target mesh. The code snippet is available here.
  2. Now, create a deform_verts of size of source mesh with values 0. We will now deform the mesh by offsetting its vertices. 
# We will learn to deform the source mesh by offsetting its vertices
# The shape of the deform parameters is equal to the total number of vertices in src_mesh
verts_shape = src_mesh.verts_packed().shape
#Creates a tensor of size size filled with fill_value= 0.0
deform_verts = torch.full(verts_shape, 0.0, device=device, requires_grad=True)

Then, initialize a stochastic gradient descent as an optimizer.

# The optimizer
#create a stochastic gradient optimizer for the deform_verts with
#learning rate of 1.0
optimizer = torch.optim.SGD([deform_verts], lr=1.0, momentum=0.9)

Now, we will run a loop to learn the offset to each vertex in the mesh so that the predicted mesh is closer to target mesh at each optimization step. The loss function used here are as follows:

  • chamfer_distance, the distance between the predicted (deformed) and target mesh, defined as an evaluation metric for two point clouds. It takes the distance of each point into account. For each point in each cloud, chamfer_distance finds the nearest point in the other point set and sums the square of distance up.

However, minimizing only the chamfer distance between the predicted and the target mesh will lead to a non-smooth shape. Hence, we will consider other minimization functions i.e., add shape regularizers to the object for smoothness.

  • mesh_edge_length, which minimizes the length of the edges in the predicted mesh.
  • mesh_normal_consistency, which enforces consistency across the normals of neighbouring faces.
  • mesh_laplacian_smoothing, which is the laplacian regularizer.

Initialize the number of iterations and weight of each loss function and then start a loop. 

# Number of optimization steps
Niter = 2000
# Weight for the chamfer loss
w_chamfer = 1.0 
# Weight for mesh edge loss
w_edge = 1.0 
# Weight for mesh normal consistency
w_normal = 0.01 
# Weight for mesh laplacian smoothing
w_laplacian = 0.1 
# Plot period for the losses
plot_period = 250

Now, start the loop by initializing the optimizer and offset the verts of deform_verts, to get a new source mesh. Next, sample 5000 each from both new source and target mesh and calculate all the loss functions and create a final loss by giving weights to each loss function. This process will repeat at each iteration. At last, calculate the loss gradient and update the parameters, as shown below in the code.

for i in loop:
    # Initialize optimizer
    # Deform the mesh
    new_src_mesh = src_mesh.offset_verts(deform_verts)
    # We sample 5k points from the surface of each mesh 
    sample_trg = sample_points_from_meshes(trg_mesh, 5000)
    sample_src = sample_points_from_meshes(new_src_mesh, 5000)
    # We compare the two sets of pointclouds by computing (a) the chamfer loss
    loss_chamfer, _ = chamfer_distance(sample_trg, sample_src)
    # and (b) the edge length of the predicted mesh
    loss_edge = mesh_edge_loss(new_src_mesh)
    # mesh normal consistency
    loss_normal = mesh_normal_consistency(new_src_mesh)
    # mesh laplacian smoothing
    loss_laplacian = mesh_laplacian_smoothing(new_src_mesh, method="uniform")
    # Weighted sum of the losses
    loss = loss_chamfer * w_chamfer + loss_edge * w_edge + loss_normal * w_normal + loss_laplacian * w_laplacian
    # Print the losses
    loop.set_description('total_loss = %.6f' % loss)
    # Save the losses for plotting
    # Plot mesh
    if i % plot_period == 0:
        plot_pointcloud(new_src_mesh, title="iter: %d" % i)
    # Optimization step

The output at each 250 iterations is shown below.

  1. Visualize all the loss functions with respect to the number of iterations.

You can check the full demo here.

Demo – Bundle Adjustments

Bundle Adjustments is state estimation technique used to estimate the location of points in the environment and those points have been estimated from camera images and we do not only want to estimate the location of those points in the world, but we also want to estimate where the camera was, when taking the image and where it was looking. In all, we want to estimate the location of points and camera jointly so the re-projection error where the points are actually projected to, can be minimized. This same problem can be visualized as :

The picture below depicts the situation at the beginning of our optimization. The ground truth cameras are plotted in purple while the randomly initialized estimated cameras are plotted in orange: 

We seek to align the estimated (orange) cameras with the ground truth (purple) cameras, by minimizing the difference between pairs of relative cameras. Thus, the solution to the problem should look as follows: 

Mathematically, the above problem can be defined by minimizing the Sum of Squared Re-projection Errors


g1, g2, . . ., gN are the extrinsics(location in the world) of N cameras.

gij  are the set of relative positions that map between coordinate frames of randomly selected pairs of cameras ( i, j ).

d(gi, gj) are is a suitable metric that compares the extrinsics of cameras gi and gj .

In this demo, we will learn to initialize a batch of Structure from Motion(SfM), setting up loss functions for bundle adjustments and run an optimization loop using Cameras, transforms and so3 API of PyTorch 3D. The steps are as follows:

  1. Import all the required libraries and packages. The code snippet is available here.
  2. Fetch all the utility python script for plotting and SE3 graph of camera. The code snippet for this, is available here.
  3. In practice, the camera extrinsic gij and gi are represented using objects from the SfMPerspectiveCameras class initialized with the corresponding rotation and translation matrices R_absolute and T_absolute that define the extrinsic parameters g = (R, T); R ∈ SO(3); T∈ R3. In order to ensure that R_absolute is a valid rotation matrix, we represent it using an exponential map (implemented with so3_exponential_map) of the axis-angle representation of the rotation log_R_absolute. The code shown below, load the data(camera data) and load the ground truth and relative positions.
# load the SE3 graph of relative/absolute camera positions
camera_graph_file = './data/camera_graph.pth'
(R_absolute_gt, T_absolute_gt), \
    (R_relative, T_relative), \
    relative_edges = \

# create the relative cameras
cameras_relative = SfMPerspectiveCameras(
    R = R_relative.to(device),
    T = T_relative.to(device),
    device = device,

# create the absolute ground truth cameras
cameras_absolute_gt = SfMPerspectiveCameras(
    R = R_absolute_gt.to(device),
    T = T_absolute_gt.to(device),
    device = device,

# the number of absolute camera positions
N = R_absolute_gt.shape[0]
  1. Next, we will define the optimization functions for calculating camera distance and getting the relative camera. The two functions are :

calc_camera_distance compares a pair of cameras. This function is important as it defines the loss that we are minimizing. The method utilizes the so3_relative_angle function from the SO3 API.

get_relative_camera computes the parameters of a relative camera that maps between a pair of absolute cameras. Here we utilize the compose and inverse class methods from the PyTorch3D Transforms API.

The code for it is shown below:

def calc_camera_distance(cam_1, cam_2):
    Calculates the divergence of a batch of pairs of cameras cam_1, cam_2.
    The distance is composed of the cosine of the relative angle between 
    the rotation components of the camera extrinsics and the l2 distance
    between the translation vectors.
    # rotation distance
    R_distance = (1.-so3_relative_angle(cam_1.R, cam_2.R, cos_angle=True)).mean()
    # translation distance
    T_distance = ((cam_1.T - cam_2.T)**2).sum(1).mean()
    # the final distance is the sum
    return R_distance + T_distance

def get_relative_camera(cams, edges):
    For each pair of indices (i,j) in "edges" generate a camera
    that maps from the coordinates of the camera cams[i] to 
    the coordinates of the camera cams[j]

    # first generate the world-to-view Transform3d objects of each 
    # camera pair (i, j) according to the edges argument
    trans_i, trans_j = [
            R = cams.R[edges[:, i]],
            T = cams.T[edges[:, i]],
            device = device,
         for i in (0, 1)
    # compose the relative transformation as g_i^{-1} g_j
    trans_rel = trans_i.inverse().compose(trans_j)
    # generate a camera from the relative transform
    matrix_rel = trans_rel.get_matrix()
    cams_relative = SfMPerspectiveCameras(
                        R = matrix_rel[:, :3, :3],
                        T = matrix_rel[:, 3, :3],
                        device = device,
    return cams_relative
  1. Now, start the optimization of absolute cameras. We are going to use a Stochastic Gradient Descent optimizer with momentum and we are going to optimize over T_absolute and log_R_absolute. The code is shown below for this process.
# init the optimizer
optimizer = torch.optim.SGD([log_R_absolute, T_absolute], lr=.1, momentum=0.9)

# run the optimization
n_iter = 2000  # fix the number of iterations
for it in range(n_iter):
    # re-init the optimizer gradients

    # compute the absolute camera rotations as 
    # an exponential map of the logarithms (=axis-angles)
    # of the absolute rotations
    R_absolute = so3_exponential_map(log_R_absolute * camera_mask)

    # get the current absolute cameras
    cameras_absolute = SfMPerspectiveCameras(
        R = R_absolute,
        T = T_absolute * camera_mask,
        device = device,

    # compute the relative cameras as a compositon of the absolute cameras
    cameras_relative_composed = \
        get_relative_camera(cameras_absolute, relative_edges)

    # compare the composed cameras with the ground truth relative cameras
    # camera_distance corresponds to $d$ from the description
    camera_distance = \
        calc_camera_distance(cameras_relative_composed, cameras_relative)

    # our loss function is the camera_distance
    # apply the gradients

    # plot and print status message
    if it % 200==0 or it==n_iter-1:
        status = 'iteration=%3d; camera_distance=%1.3e' % (it, camera_distance)
        plot_camera_scene(cameras_absolute, cameras_absolute_gt, status)

print('Optimization finished.')

You can check the full demo, here.


In this article, we have talked about PyTorch 3D and its demo for using Mesh data structure – converting deform source mesh to target mesh and also seen the optimized bundle adjustments. The following demo are available at:

You can check other libraries dealing with 3D data, here.

Codes, Docs and Tutorials are available at:

More Great AIM Stories

Aishwarya Verma
A data science enthusiast and a post-graduate in Big Data Analytics. Creative and organized with an analytical bent of mind.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM