In recent years, we have witnessed the emergence of self-supervised learning with various advancements. Various examples have shown the outperformance of self-supervised learning techniques when used with the image or natural language data. Self-supervised learning settings can also be used with graph data in some specific cases to generate accurate results. In this article, we will discuss how and why we can apply self-supervised learning with graph data. The Major points to be discussed in this article are listed below.
Table of Contents
- Why is Graph Data Used?
- Downstream Tasks with Graph Data
- Self-Supervised Learning Strategies Using Graph Data
- Categorization of Self-Supervised Learning method for Graph Data
Let’s begin the discussion by understanding why and where the graph data is used.
Why is Graph Data Used?
In many studies, we can find that graph-structured data is more complex than other types of data. We can assume graph data as more complicated structured information. The other data types such as image data which is a fixed grid and text data which is a simple sequence. When talking about graph data, it is not restricted to such rigid structures. Each node in the graph can be considered as the standalone instance of the data because most of the time nodes in graph data are associated with their attribute and topology structure. While in other data types, the entire structure of the data can be considered as a single data sample.
When talking about the distribution type, the graph data holds the nodes or instances, interlinked and dependent on each other. At the same time, the text or image data are independent and identically distributed. After all these complexities with graph data, we can say that adoption of graph data for self-supervised learning is challenging but we can think of these complexities as an opportunity when we work in the right direction. We can utilize these complexities as a rich source of information that can be used for designing pretext tasks from various perspectives.
As we discussed the nodes, we can focus on the node’s properties like features and topology. Since the nodes are dependent on each other and graph, this dependency feature gives us new aspects to inspect such as dependence on a node pair or even a set of nodes. Unlike other data types, graph structure has much additional information with them like node attributes, structure information, and label information of labelled nodes. Using this source of information, we can have unprecedented opportunities to design high and advanced level self-supervised pretext tasks.
Below are a few attempts to adapt self-supervised learning in a graph neural network:
- Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels: In this work, we can see a training algorithm for graph neural networks(GCNs) that is combined with a self-supervised learning approach. A major focus of the work is on improving the performance of GCNs on graphs with very few labelled nodes in the graph.
- Self-Supervised Graph Representation Learning via Global Context Prediction: In this work, we can be a witness to a self-supervised learning approach for graph data, where a self-supervised learning strategy can be learned by exploiting supervision provided by the data itself. This work is an example of investigating the global context of graphs as a source of useful supervisory signals for learning useful node representation.
The above-discussed works show how the graph data can be used in self-supervised learning settings. Before applying the graph data in any setting, we are required to know about the task that can be performed using the graph data.
Downstream Tasks with Graph Data
When talking about the downstream tasks using the graph data, we can divide them into three categories:
- Node level tasks: Tasks that are focused on nodes properties can be considered node level tasks. Node classification can be considered as an example of a node-level task. In node classification with a self-supervised learning setting, few nodes are known with their corresponding levels which we use to predict levels for nodes that are not known.
- Link level tasks: Tasks that are focused on learning the representation of node pairs or properties of edges can be considered link-level tasks. Link prediction can be an example of a link-level task, where using two nodes the task is to discriminate if there is an edge between them.
- Graph level tasks: Tasks that are focused on learning from multiple graphs can be considered graph-level tasks. Graph regression can be considered a graph-level task. In a self-supervised learning setting, very few amounts of graphs with their known properties can be used to infer the properties of graphs with unknown levels using an encoder trained on a few amount of graphs with known properties.
We have discussed various downstream tasks which can be performed using the graph data. To perform these tasks, we are required to have a neural network that can deal with the graph data. Neural networks which can deal with graph data are called graph neural networks.
Graph Neural Network (GNN)
From the family of neural networks, networks that can deal with the graph representation of the data can be considered as the graph neural network. Talking about the basic computation of the graph neural network, it can consist of two main components:
- Aggregate Operation: Aggregate information from neighbourhoods.
- Update Operation: Update node representation using the aggregated information from the aggregate operation.
Using the above components of computation, we receive and update the information in the components of a graph like a link, node, and edges. These updates can be considered as the learning part of the graph neural network. Using the GNN, we can train a network on graph-structured data. To apply this in self-supervised learning settings we are required to make some strategies, following which we can obtain good results.
Self-Supervised Learning Strategies Using Graph Data
As of now, we have discussed why we should use graph data with SSL and what kind of neural network can help us with graph data. To obtain a good result from self-supervised learning with graph data, we are required to have a strong strategy to work. By looking at different works related to this field, we can divide the strategies into three categories.
- Pre-training and fine-tuning: This can be considered as a basic strategy to be followed in any self-supervised learning setting. Using this strategy, we are required to train the model in two stages. In the first stage, any encoder is trained on the predefined pretext-task, then the pre-trained parameters can be used for initializing the encoder. The second stage is the fine-tuning stage where the pre-trained encoder can be fine-tuned with a prediction head where we know about the downstream task. The below image is a representation of this strategy.
- Joint learning: If any specific pretext-task and downstream task is known, in such a situation, we can train the encoder with the prediction head. This strategy comes under multitask learning and jointly training the encoder and prediction head makes it joint learning. The below image can be a representation of this type of learning strategy.
- Unsupervised Representation Learning: This strategy is pretty similar to the pretraining and fine-tuning strategy because the first step of the training is similar in both strategies. But here in the unsupervised representation learning strategy, we make the pre-trained parameters frozen so that the model can be trained on the frozen representation with downstream tasks only. The image below can be a representation of learning using this strategy.
Here in this section, we have seen what are three major strategies which can be followed to apply self-supervised learning on graph data. Considering the older work in this field, we can find there are specific categories of work performed using the graph data under the self-supervised learning settings. In the next section, we will discuss the different categories of old methods.
Categorization of Self-Supervised Learning method for Graph Data
Various works have been performed for self-supervised learning using graph data. By taking an overview from these works, we can say that methods under this field can be divided into three categories:
- Contrastive Learning: We can find many works related to contrastive learning in the field of computer vision and natural language processing where in between some of the works are also there which represents the use of graph data in contrastive learning. By Generalizing them, we can find that the method works by generating multiple views of every instance through the data augmentation of the graph data. Furthermore, two views of the same instance can be considered as the positive pairs and two views from the different instance can be considered as the negative pair. After that the work of contrastive learning starts, which is to make the strong agreement between positive pairs and weak agreement between the negative pair.
For example, on a given graph we can apply different transformations to get multiple views, then a set of encoders can be applied on the multiple views to generate different representations of each view, where the contrastive learning will aim to maximize the mutual information of two views. Sum-up view of this method can be represented by the following image.
- Generative learning: The generative methods are based on the traditional generative models where embeddings that are holding rich information can be treated as natural self-supervision. In various studies we find that the prediction head works as a graph decoder; this decoder can also be used for graph reconstruction. A sum-up view of this method can be represented by the following image.
The works related to generative methods focus on the information embedded in the graph, generally based on tasks such as reconstruction, which helps in exploiting the properties of graph data as self-supervision signals.
- Predictive Learning: In many works, we can find out that they are focused on self-generative informative labels from the data as supervision. To obtain the labels on the data this method can follow the following ways:
- Node Property Prediction: Pre-calculation of properties can be used as self-supervised labels to utilize them for prediction.
- Context-Based Prediction: Contextual information from the graph can be extracted to perform self-supervised learning. Information can be local or global.
- Self Training: In this way, the predictive models can use the learned from pseudo-labels from clustering or random assignment.
- Domain Knowledge Based Prediction: Subject matter experts or tools can be used in advance to analyze the graph data so that informative labels can be obtained on the graph.
A sum-up view of this method can be represented by the following image.
In the article, we have discussed a new approach in a self-supervised learning setting where the graph data can be used because the graph data consists of the information that can help the self-supervised learning to learn in various aspects. Along with this, we discussed downstream tasks which can be performed using graph data with methods and strategies which can be followed by us to perform self-supervised learning on graph data.