A guide to self-supervised learning with graph data

graph structure has much additional information with them like node attributes, and label information of nodes. Using this source of information, we can have unprecedented opportunities to design advanced level self-supervised pretext tasks

In recent years, we have witnessed the emergence of self-supervised learning with various advancements. Various examples have shown the outperformance of self-supervised learning techniques when used with the image or natural language data. Self-supervised learning settings can also be used with graph data in some specific cases to generate accurate results. In this article, we will discuss how and why we can apply self-supervised learning with graph data. The Major points to be discussed in this article are listed below.      

Table of Contents

  1. Why is Graph Data Used?
  2. Downstream Tasks with Graph Data 
  3. Self-Supervised Learning Strategies Using Graph Data
  4. Categorization of Self-Supervised Learning method for Graph Data

Let’s begin the discussion by understanding why and where the graph data is used.

Why is Graph Data Used?

In many studies, we can find that graph-structured data is more complex than other types of data. We can assume graph data as more complicated structured information. The other data types such as image data which is a fixed grid and text data which is a simple sequence. When talking about graph data, it is not restricted to such rigid structures. Each node in the graph can be considered as the standalone instance of the data because most of the time nodes in graph data are associated with their attribute and topology structure. While in other data types, the entire structure of the data can be considered as a single data sample. 

When talking about the distribution type, the graph data holds the nodes or instances, interlinked and dependent on each other. At the same time, the text or image data are independent and identically distributed. After all these complexities with graph data, we can say that adoption of graph data for self-supervised learning is challenging but we can think of these complexities as an opportunity when we work in the right direction. We can utilize these complexities as a rich source of information that can be used for designing pretext tasks from various perspectives. 

As we discussed the nodes, we can focus on the node’s properties like features and topology. Since the nodes are dependent on each other and graph, this dependency feature gives us new aspects to inspect such as dependence on a node pair or even a set of nodes. Unlike other data types, graph structure has much additional information with them like node attributes, structure information, and label information of labelled nodes. Using this source of information, we can have unprecedented opportunities to design high and advanced level self-supervised pretext tasks. 

Below are a few attempts to adapt self-supervised learning in a graph neural network:

The above-discussed works show how the graph data can be used in self-supervised learning settings. Before applying the graph data in any setting, we are required to know about the task that can be performed using the graph data.

Downstream Tasks with Graph Data 

When talking about the downstream tasks using the graph data, we can divide them into three categories:

  • Node level tasks: Tasks that are focused on nodes properties can be considered node level tasks. Node classification can be considered as an example of a node-level task. In node classification with a self-supervised learning setting, few nodes are known with their corresponding levels which we use to predict levels for nodes that are not known. 
  • Link level tasks: Tasks that are focused on learning the representation of node pairs or properties of edges can be considered link-level tasks. Link prediction can be an example of a link-level task, where using two nodes the task is to discriminate if there is an edge between them.
  • Graph level tasks: Tasks that are focused on learning from multiple graphs can be considered graph-level tasks. Graph regression can be considered a graph-level task. In a self-supervised learning setting, very few amounts of graphs with their known properties can be used to infer the properties of graphs with unknown levels using an encoder trained on a few amount of graphs with known properties.

We have discussed various downstream tasks which can be performed using the graph data. To perform these tasks, we are required to have a neural network that can deal with the graph data. Neural networks which can deal with graph data are called graph neural networks.

Graph Neural Network (GNN)

From the family of neural networks, networks that can deal with the graph representation of the data can be considered as the graph neural network. Talking about the basic computation of the graph neural network, it can consist of two main components:

  • Aggregate Operation: Aggregate information from neighbourhoods.
  • Update Operation: Update node representation using the aggregated information from the aggregate operation.

Using the above components of computation, we receive and update the information in the components of a graph like a link, node, and edges. These updates can be considered as the learning part of the graph neural network. Using the GNN, we can train a network on graph-structured data. To apply this in self-supervised learning settings we are required to make some strategies, following which we can obtain good results.

Self-Supervised Learning Strategies Using Graph Data

As of now, we have discussed why we should use graph data with SSL and what kind of neural network can help us with graph data. To obtain a good result from self-supervised learning with graph data, we are required to have a strong strategy to work. By looking at different works related to this field, we can divide the strategies into three categories.  

  • Pre-training and fine-tuning: This can be considered as a basic strategy to be followed in any self-supervised learning setting. Using this strategy, we are required to train the model in two stages. In the first stage, any encoder is trained on the predefined pretext-task, then the pre-trained parameters can be used for initializing the encoder. The second stage is the fine-tuning stage where the pre-trained encoder can be fine-tuned with a prediction head where we know about the downstream task. The below image is a representation of this strategy.

Image source

  • Joint learning: If any specific pretext-task and downstream task is known, in such a situation, we can train the encoder with the prediction head. This strategy comes under multitask learning and jointly training the encoder and prediction head makes it joint learning. The below image can be a representation of this type of learning strategy.

Image source

  • Unsupervised Representation Learning: This strategy is pretty similar to the pretraining and fine-tuning strategy because the first step of the training is similar in both strategies. But here in the unsupervised representation learning strategy, we make the pre-trained parameters frozen so that the model can be trained on the frozen representation with downstream tasks only. The image below can be a representation of learning using this strategy.

Image source

Here in this section, we have seen what are three major strategies which can be followed to apply self-supervised learning on graph data. Considering the older work in this field, we can find there are specific categories of work performed using the graph data under the self-supervised learning settings. In the next section, we will discuss the different categories of old methods.

Categorization of Self-Supervised Learning method for Graph Data

Various works have been performed for self-supervised learning using graph data. By taking an overview from these works, we can say that methods under this field can be divided into three categories:

  • Contrastive Learning: We can find many works related to contrastive learning in the field of computer vision and natural language processing where in between some of the works are also there which represents the use of graph data in contrastive learning. By Generalizing them, we can find that the method works by generating multiple views of every instance through the data augmentation of the graph data. Furthermore, two views of the same instance can be considered as the positive pairs and two views from the different instance can be considered as the negative pair. After that the work of contrastive learning starts, which is to make the strong agreement between positive pairs and weak agreement between the negative pair.

For example, on a given graph we can apply different transformations to get multiple views, then a set of encoders can be applied on the multiple views to generate different representations of each view, where the contrastive learning will aim to maximize the mutual information of two views. Sum-up view of this method can be represented by the following image.

Image source 

  • Generative learning: The generative methods are based on the traditional generative models where embeddings that are holding rich information can be treated as natural self-supervision. In various studies we find that the prediction head works as a graph decoder; this decoder can also be used for graph reconstruction. A sum-up view of this method can be represented by the following image.

Image source

The works related to generative methods focus on the information embedded in the graph, generally based on tasks such as reconstruction, which helps in exploiting the properties of graph data as self-supervision signals.  

  • Predictive Learning: In many works, we can find out that they are focused on self-generative informative labels from the data as supervision. To obtain the labels on the data this method can follow the following ways:
  1. Node Property Prediction: Pre-calculation of properties can be used as self-supervised labels to utilize them for prediction.
  2. Context-Based Prediction: Contextual information from the graph can be extracted to perform self-supervised learning. Information can be local or global.    
  3. Self Training: In this way, the predictive models can use the learned from pseudo-labels from clustering or random assignment.
  4. Domain Knowledge Based Prediction: Subject matter experts or tools can be used in advance to analyze the graph data so that informative labels can be obtained on the graph.

A sum-up view of this method can be represented by the following image.

Image source

Final Words

In the article, we have discussed a new approach in a self-supervised learning setting where the graph data can be used because the graph data consists of the information that can help the self-supervised learning to learn in various aspects. Along with this, we discussed downstream tasks which can be performed using graph data with methods and strategies which can be followed by us to perform self-supervised learning on graph data.

More Great AIM Stories

Yugesh Verma
Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.

More Stories

OUR UPCOMING EVENTS

8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

MORE FROM AIM
Yugesh Verma
All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM