MITB Banner

Hands-On Guide to PyTorch Geometric (With Python Code)

Share

PyTorch Geometric

Illustration by Geometric Deep learning

Released under MIT license, built on PyTorch, PyTorch Geometric(PyG) is a python framework for deep learning on irregular structures like graphs, point clouds and manifolds, a.k.a Geometric Deep Learning and contains much relational learning and 3D data processing methods. Graph Neural Network(GNN) is one of the widely used representations learning methods but the implementation of it is quite challenging as the throughput of GPU needs to be achieved on highly sparse and irregular data of varying sizes.  PyG overcomes this bottleneck by providing dedicated CUDA kernels for sparse data and mini-batch handlers for varying sizes. Methods implemented in PyG framework are supported by both CPU and GPU.

PyTorch Geometric was submitted as a workshop paper at ICLR 2019, as FAST GRAPH REPRESENTATION LEARNING WITH PYTORCH GEOMETRIC. The framework was developed by Matthias Fey, eJan Eric Lenssn from TU Dortmund University. 

Overview of PyTorch Geometric

In PyG, a graph is represented as G =  (X, (I, E)) where X is a node feature matrix and belongs to ℝN x F , here N is the nodes and the tuple (I, E) is the sparse adjacency tuple of E edges and I ∈ ℕ2 X E  encodes edge indices in COOrdinate (COO) format and E ∈ ℝE X D holds D-dimensional edge features. All the API’s that users can use are inspired from PyTorch framework itself, so that the usage of PyG should be familiar. 

Functionalities provided by PyG :

  • Neighbourhood Aggregation
  • Global Pooling
  • Hierarchical Pooling
  • Mini-Batch Handling
  • Processing of Datasets

You can check all the algorithms supported by PyTorch Geometric here.

Requirements & Installation

Install all the requirements of PyTorch Geometric and then install it via PyPI.

  • PyTorch >= 1.4.0

    For checking the version of PyTorch, run the mentioned code:

!python -c "import torch; print(torch.__version__)"

  • Check the version of CUDA installed with PyTorch.

!python -c "import torch; print(torch.version.cuda)"

  • Install the dependencies :

Replace TORCH with the PyTorch version and CUDA with the CUDA version which you are using. Might take some time to install.

 !pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
 !pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
 !pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
 !pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html 
  • Install PyG:

!pip install torch-geometric

For installing from other sources, refer here.

Basics of PyTorch Geometric

  1. First example refers to the data handling.

 Creating an unweighted and undirected graph with three nodes and four edges. Each node contains exactly one feature as shown below : 

 #import the libraries
 import torch
 from torch_geometric.data import Data
 #making the edge 
 #the tensor defining the source and target nodes of all edges, is not a list of index tuples
 edge_index = torch.tensor([[0, 1, 1, 2],
                            [1, 0, 2, 1]], dtype=torch.long)
 #making nodes
 #Node feature matrix with shape [num_nodes, num_node_features]
 x = torch.tensor([[-1], [0], [1]], dtype=torch.float)
 data = Data(x=x, edge_index=edge_index) 

    Above in edge_index, if you want to give indices, transpose the edge_index like this:

     edge_index = torch.tensor([[0, 1],
                            [1, 0],
                            [1, 2],
                            [2, 1]], dtype=torch.long) 

    And call contiguous on data constructor. Example is shown below:

    data = Data(x=x, edge_index=edge_index.t().contiguous())

    You can check out all the utilities of data handling here.

  1. Common Benchmark Datasets

PyG contains many benchmark datasets e.g., : all Planetoid datasets (Cora, Citeseer, Pubmed), all graph classification datasets from http://graphkernels.cs.tu-dortmund.de and their cleaned versions, the QM7 and QM9 dataset, and 3D mesh/point cloud datasets such as FAUST, ModelNet10/40 and ShapeNet. An example of loading the benchmark dataset is shown below:

 from torch_geometric.datasets import TUDataset
 dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES') 

    You can check all the functionalities of benchmark datasets in PyG here  or here.

  1. Mini-Batches 

PyG provides torch_geometric.data.DataLoader for merging the data objects to a mini batch. An example of it, is shown below:

 from torch_geometric.datasets import TUDataset
 from torch_geometric.data import DataLoader
 dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES', use_node_attr=True)
 loader = DataLoader(dataset, batch_size=32, shuffle=True)
 for batch in loader:
     print(batch)
     print(batch.num_graphs) 

You can learn more about it from here.

  1. Data Transforms

PyG provides its data transformation utility whose input is Data object and output is transformed Data object. Further, it can be concatenated via torch_geometric.transforms.Compose and are applied before saving a processed dataset on disk (pre_transform) or before accessing a graph in a dataset (transform).

For example, we have taken a ShapeNet dataset.

 import torch_geometric.transforms as T
 from torch_geometric.datasets import ShapeNet
 dataset = ShapeNet(root='/tmp/ShapeNet', categories=['Airplane'],
                     pre_transform=T.KNNGraph(k=6))
 dataset[0] 

Learn more functionality here.

  1. Learning methods on Graphs.

This section will create a graph neural network by creating a simple Graph Convolutional Network(GCN) layer. The whole experiment is based on the Cora dataset.

  1. Import the cora dataset.
 from torch_geometric.datasets import Planetoid
 dataset = Planetoid(root='/tmp/Cora', name='Cora')
 print(f'Dataset: {dataset}:')
 print('======================')
 print(f'Number of graphs: {len(dataset)}')
 print(f'Number of features: {dataset.num_features}')
 print(f'Number of classes: {dataset.num_classes}') 
  1. Calculate the statistics on the dataset and visualize it.
 data = dataset[0]
 # Gather some statistics about the graph.
 print(f'Number of nodes: {data.num_nodes}')
 print(f'Number of edges: {data.num_edges}')
 print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
 print(f'Number of training nodes: {data.train_mask.sum()}')
 print(f'Training node label rate: {int(data.train_mask.sum()) / data.num_nodes:.2f}')
 print(f'Contains isolated nodes: {data.contains_isolated_nodes()}')
 print(f'Contains self-loops: {data.contains_self_loops()}')
 print(f'Is undirected: {data.is_undirected()}')
 from torch_geometric.utils import to_networkx
 G = to_networkx(data, to_undirected=True)
#helper function, check colab notebook mentioned in endnotes
 visualize(G, color=data.y) 

The output will be :

  1. Create a two-layer GCN network.
 import torch
 import torch.nn.functional as F
 from torch_geometric.nn import GCNConv
 class Net(torch.nn.Module):
     def __init__(self):
         super(Net, self).__init__()
         self.conv1 = GCNConv(dataset.num_node_features, 16)
         self.conv2 = GCNConv(16, dataset.num_classes)
     def forward(self, data):
         x, edge_index = data.x, data.edge_index
         x = self.conv1(x, edge_index)
         x = F.relu(x)
         x = F.dropout(x, training=self.training)
         x = self.conv2(x, edge_index)
         return F.log_softmax(x, dim=1) 

The following network contains two GCNConv layers which are used in forward pass of the model.Here, we chose to use ReLU as our intermediate non-linearity between and finally output a softmax distribution over the number of classes.

  1.  Let’s train this model on the train nodes for 200 epochs.
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 model = Net().to(device)
 data = dataset[0].to(device)
 optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
 model.train()
 for epoch in range(200):
     optimizer.zero_grad()
     out = model(data)
     loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
     loss.backward()
     optimizer.step() 
  1. Evaluate the model on test data.
 model.eval()
 _, pred = model(data).max(dim=1)
 correct = int(pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
 acc = correct / int(data.test_mask.sum())
 print('Accuracy: {:.4f}'.format(acc)) 

    You can learn more about the creation of Graph Neural Network in PyTorch Geometric here.

You can check other examples here : 

Conclusion

This post discussed PyTorch Geometric for fast representation learning on graphs, point clouds, and manifolds. This framework is built upon PyTorch and easy to use. It consists of various methods for Geometric Deep learning. It provides an easy-to-use mini-batch loader, multi GPU-support, benchmark datasets, and data transforms for arbitrary graphs and points clouds.

Official Codes, Documentation & Tutorials are available as :

Share
Picture of Aishwarya Verma

Aishwarya Verma

A data science enthusiast and a post-graduate in Big Data Analytics. Creative and organized with an analytical bent of mind.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.