There has been a surge of advancements in automated analysis of 3D data caused by affordable LiDAR sensors, more efficient photogrammetry algorithms, and new neural network architectures. So much that the number of papers related to 3D data being presented at vision conferences is now on par with images, although this rapid methodological development is beneficial to the young field of deep learning for 3D, with its fast pace come several shortcomings:
- Adding new datasets, tasks, or neural architectures to existing approaches is a complicated endeavour, sometimes equivalent to reimplementing from scratch.
- Handling large 3D datasets requires a significant time investment and is prone to many implementation pitfalls.
- There is no standard approach for inference schemes and performance metrics, which makes assessing and reproducing new algorithms’ intrinsic performance difficult.
Torch-Points3D aims to solve these issues. It is an open-source framework designed to facilitate deep neural networks on point cloud-based computer vision. It provides an intuitive interface with most open-access 3D datasets, implementations of many state-of-the-art networks, data augmentation schemes, and validated performance metrics.
Torch-Points3D has a modular design and its components are highly customizable, they can be plugged into one another using a unified system of configuration files. It aims to make it easy to standardize experiments to ensure reproducibility and to help evaluate the performances of different approaches fairly. As the developers put it, “the purpose of our framework is to become for 3D point clouds what torchvision or PyTorch-geometric have become for images and graphs respectively“. The framework is built upon Pytorch Geometric and Facebook Hydra. Like PyTorch, Torch-Points3D uses the background processes to help increase the data processing speed. It off-loads the radius search and subsampling operations to background processes operating on CPUs.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
of points processed per second (kpts/s)
Functionalities/operations supported by Torch-Points3D
You can check out all supported tasks and algorithms here.

Supported datasets
Torch-Points3D supports multiple 3D datasets with the data download, pre-processing, as well as automatic result submission.
You can find a comprehensive list of all supported datasets here.
Installation and Requirements
Requirements
- CUDA 10 or higher (if you want GPU version)
- Python 3.7 or higher + headers (python-dev)
- PyTorch 1.7 or higher
- A Sparse convolution backend (optional) like torchsparse
Run the following code before installing Torch-Points3D to ensure that you don’t run into a CUDA version mismatch error.
import torch def format_pytorch_version(version): return version.split('+')[0] TORCH_version = torch.__version__ TORCH = format_pytorch_version(TORCH_version) def format_cuda_version(version): return 'cu' + version.replace('.', '') CUDA_version = torch.version.cuda CUDA = format_cuda_version(CUDA_version) !pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html !pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html !pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html !pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html !pip install torch-geometric
Install Torch-Points3D from PyPI
!pip install torch-points3d
For instructions on how to install using other methods see this.
Install PyVista for visualizing point clouds
!pip install pyvista
Creating a KP Conv Segmentation model with Torch-Points3D
- Import necessary libraries
import os #omegaconf config is used for dealing with config files. from omegaconf import OmegaConf import pyvista as pv import torch import numpy as np
- We are going to use the Torch-Points3D version of ShapeNet. Create the config file for the dataset and download it using the torch_points3d.datasets.segmentation.ShapeNet class.
CATEGORY = "All" USE_NORMALS = True shapenet_yaml = """ class: shapenet.ShapeNetDataset task: segmentation dataroot: %s normal: %r # Use normal vectors as features first_subsampling: 0.02 # Grid size of the input data pre_transforms: # Offline transforms, done only once - transform: NormalizeScale - transform: GridSampling3D params: size: ${first_subsampling} train_transforms: # Data augmentation pipeline - transform: RandomNoise params: sigma: 0.01 clip: 0.05 - transform: RandomScaleAnisotropic params: scales: [0.9,1.1] - transform: AddOnes - transform: AddFeatsByKeys params: list_add_to_x: [True] feat_names: ["ones"] delete_feats: [True] test_transforms: - transform: AddOnes - transform: AddFeatsByKeys params: list_add_to_x: [True] feat_names: ["ones"] delete_feats: [True] """ % (os.path.join(DIR,"data"), USE_NORMALS) params = OmegaConf.create(shapenet_yaml) if CATEGORY != "All": params.category = CATEGORY from torch_points3d.datasets.segmentation import ShapeNetDataset dataset = ShapeNetDataset(params)
- Visualize some random point clouds from the dataset using pyvista.
objectid_1 = 9 objectid_2 = 82 objectid_3 = 95 samples = [objectid_1,objectid_2,objectid_3] p = pv.Plotter(notebook=True,shape=(1, len(samples)),window_size=[1024,412]) for i in range(len(samples)): p.subplot(0, i) sample = dataset.train_dataset[samples[i]] point_cloud = pv.PolyData(sample.pos.numpy()) point_cloud['y'] = sample.y.numpy() p.add_points(point_cloud, show_scalar_bar=False, point_size=3) p.camera_position = [-1,5, -10] p.show()
- Create a multi-headed segmentation module to use with the KP Convolution network.
from torch_points3d.core.common_modules import MLP, UnaryConv class MultiHeadClassifier(torch.nn.Module): """ Allows segregated segmentation in case the category of an object is known. This is the case in ShapeNet for example. Parameters ---------- in_features - size of the input channel cat_to_seg category to segment maps for example: { 'Airplane': [0,1,2], 'Table': [3,4] } """ def __init__(self, in_features, cat_to_seg, dropout_proba=0.5, bn_momentum=0.1): super().__init__() self._cat_to_seg = {} self._num_categories = len(cat_to_seg) self._max_seg_count = 0 self._max_seg = 0 self._shifts = torch.zeros((self._num_categories,), dtype=torch.long) for i, seg in enumerate(cat_to_seg.values()): self._max_seg_count = max(self._max_seg_count, len(seg)) self._max_seg = max(self._max_seg, max(seg)) self._shifts[i] = min(seg) self._cat_to_seg[i] = seg self.channel_rasing = MLP( [in_features, self._num_categories * in_features], bn_momentum=bn_momentum, bias=False ) if dropout_proba: self.channel_rasing.add_module("Dropout", torch.nn.Dropout(p=dropout_proba)) self.classifier = UnaryConv((self._num_categories, in_features, self._max_seg_count)) self._bias = torch.nn.Parameter(torch.zeros(self._max_seg_count,)) def forward(self, features, category_labels, **kwargs): assert features.dim() == 2 self._shifts = self._shifts.to(features.device) in_dim = features.shape[-1] features = self.channel_rasing(features) features = features.reshape((-1, self._num_categories, in_dim)) features = features.transpose(0, 1) # [num_categories, num_points, in_dim] features = self.classifier(features) + self._bias # [num_categories, num_points, max_seg] ind = category_labels.unsqueeze(-1).repeat(1, 1, features.shape[-1]).long() logits = features.gather(0, ind).squeeze(0) softmax = torch.nn.functional.log_softmax(logits, dim=-1) output = torch.zeros(logits.shape[0], self._max_seg + 1).to(features.device) cats_in_batch = torch.unique(category_labels) for cat in cats_in_batch: cat_mask = category_labels == cat seg_indices = self._cat_to_seg[cat.item()] probs = softmax[cat_mask, : len(seg_indices)] output[cat_mask, seg_indices[0] : seg_indices[-1] + 1] = probs return output
Create a KPConv backbone model using the KPCONV method, you learn more about available models here.
from torch_points3d.applications.kpconv import KPConv class PartSegKPConv(torch.nn.Module): def __init__(self, cat_to_seg): super().__init__() self.unet = KPConv( architecture="unet", input_nc=USE_NORMALS * 3, num_layers=4, in_grid_size=0.02 ) self.classifier = MultiHeadClassifier(self.unet.output_nc, cat_to_seg) @property def conv_type(self): """ This is needed by the dataset to infer which batch collate should be used""" return self.unet.conv_type def get_batch(self): return self.batch def get_output(self): """ This is needed by the tracker to get access to the ouputs of the network""" return self.output def get_labels(self): """ Needed by the tracker in order to access ground truth labels""" return self.labels def get_current_losses(self): """ Entry point for the tracker to grab the loss """ return {"loss_seg": float(self.loss_seg)} def forward(self, data): self.labels = data.y self.batch = data.batch # Forward through unet and classifier data_features = self.unet(data) self.output = self.classifier(data_features.x, data.category) # Set loss for the backward pass self.loss_seg = torch.nn.functional.nll_loss(self.output, self.labels) return self.output def get_spatial_ops(self): return self.unet.get_spatial_ops() def backward(self): self.loss_seg.backward() model = PartSegKPConv(dataset.class_to_segments)
- Create the data loaders and toggle the CPU operation precompute by setting the
precompute_multi_scale parameter
toTrue
NUM_WORKERS = 4 BATCH_SIZE = 16 dataset.create_dataloaders( model, batch_size=BATCH_SIZE, num_workers=NUM_WORKERS, shuffle=True, precompute_multi_scale=True ) sample = next(iter(dataset.train_dataloader)) sample.keys
- The
sample
contains the pre-computed spatial information in the multiscale (encoder side) andupsample
(decoder) attributes.
sample.multiscale
contains 10 different versions of the input batch, each one of these versions contains the location of the points in pos
as well as the neighbors of these points in the previous point cloud.
Let’s take a look at the points coming out of each downsampling layer.
sample_in_batch = 0 ms_data = sample.multiscale num_downsize = int(len(ms_data) / 2) p = pv.Plotter(notebook=True,shape=(1, num_downsize),window_size=[1024,256]) for i in range(0,num_downsize): p.subplot(0, i) pos = ms_data[2*i].pos[ms_data[2*i].batch == sample_in_batch].numpy() point_cloud = pv.PolyData(pos) point_cloud['y'] = pos[:,1] p.add_points(point_cloud, show_scalar_bar=False, point_size=3) p.add_text("Layer {}".format(i+1),font_size=10) p.camera_position = [-1,5, -10] p.show()
- Train the model
from tqdm.auto import tqdm import time class Trainer: def __init__(self,model, dataset, num_epoch = 50, device=torch.device('cuda')): self.num_epoch = num_epoch self._model = model self._dataset=dataset self.device = device def fit(self): self.optimizer = torch.optim.Adam(self._model.parameters(), lr=0.001) self.tracker = self._dataset.get_tracker(False, True) for i in range(self.num_epoch): print("=========== EPOCH %i ===========" % i) time.sleep(0.5) self.train_epoch() self.tracker.publish(i) self.test_epoch() self.tracker.publish(i) def train_epoch(self): self._model.to(self.device) self._model.train() self.tracker.reset("train") train_loader = self._dataset.train_dataloader iter_data_time = time.time() with tqdm(train_loader) as tq_train_loader: for i, data in enumerate(tq_train_loader): t_data = time.time() - iter_data_time iter_start_time = time.time() self.optimizer.zero_grad() data.to(self.device) self._model.forward(data) self._model.backward() self.optimizer.step() if i % 10 == 0: self.tracker.track(self._model) tq_train_loader.set_postfix( **self.tracker.get_metrics(), data_loading=float(t_data), iteration=float(time.time() - iter_start_time), ) iter_data_time = time.time() def test_epoch(self): self._model.to(self.device) self._model.eval() self.tracker.reset("test") test_loader = self._dataset.test_dataloaders[0] iter_data_time = time.time() with tqdm(test_loader) as tq_test_loader: for i, data in enumerate(tq_test_loader): t_data = time.time() - iter_data_time iter_start_time = time.time() data.to(self.device) self._model.forward(data) self.tracker.track(self._model) tq_test_loader.set_postfix( **self.tracker.get_metrics(), data_loading=float(t_data), iteration=float(time.time() - iter_start_time), ) iter_data_time = time.time() trainer = Trainer(model, dataset) trainer.fit()
Last Epoch (Endnote)
In this article, we discussed Torch-Points3D, a flexible and powerful framework that aims to make deep learning on 3D data both more accessible and reproducible. It’s built on Pytorch Geometric and Facebook Hydra. It has a modular design to facilitate easy experimentation and comes with many datasets and models built-in. As per the paper, the developers are currently working on a high-level API for pre-trained, self-supervised, self-trained, and unsupervised deep learning approaches operating on 3D point clouds.
For the official code, documentation, papers, and tutorials, see: