Self-supervised learning of graph-structured data has recently aroused interest in learning generalizable, transferable, and robust representations from unlabeled graphs. A Graph Contrastive Learning (GCL) technique first generates numerous graph views by stochastic augmentations of the input and then learns representations by comparing positive samples against negative ones. In this article, we will be discussing the theoretical aspect of Graph Contrastive Learning. Following are the topics to be covered.
Table of contents
- What is Graph Contrastive Learning (GCL)?
- When to use GCL?
- Self Supervised Learning
- How does it work?
- Benefits and Drawbacks of GCL
- Application of GCL
Let’s start with an introduction to Graph Contrastive Learning and know why it doesn’t need any human annotations.
What is Graph Contrastive Learning?
Graph Contrastive Learning (GCL), as the name implies, contrasts graph samples and pushes those belonging to the same distribution toward each other in embedded space. On the other hand, those belonging to different distributions are pressed against each other.
The backbone of this method is Contrastive Learning. It employs proxy tasks to guide the learning of the representations. The proxy task is designed to predict any part of the input from any other observed part.
- Contrastive learning is self-supervised learning in which unlabeled data points are placed side by side to form a model of which points are similar and which are different.
So, a general contrastive pattern which characterizes the design space of interest into four dimensions
- Data augmentation functions,
- Contrasting modes
- Contrastive objectives
- Negative mining strategies
Are you looking for a complete repository of Python libraries used in data science, check out here.
When to use GCL?
As GCL aims to learn the graph representation with the help of contrastive learning. It is used where there is a need to understand the low dimensional nodes embeddings both structural and attributive information.
For instance, we are developing an AI software for deaf people as a sign language and we have data related to the graphical representation of hand gestures. Now each node of the representation has to be learned to understand the sign language and reply in signs.
These nodes are called low dimensional because they have fewer features that are needed to be learned as compared to the observations. So to learn the structural and attributive information of each graphical representation we will use GCL as a pre-training model and for the target model, we can use NN.
Since Graph Contrastive Learning uses Contrastive Learning which is a self-supervised technique, let’s understand the basics of self-supervised learning.
Self Supervised Learning
The system learns to anticipate a portion of its input from other portions of its input in self-supervised learning. In other words, a portion of the input is utilised as a supervisory signal to a predictor that is fed the remainder of the data. Neural networks may learn in two phases when using self-supervised learning:
- Problems with erroneous labels can be handled to initialize the weight of the networks.
- The process’s real task can be completed via either supervised or unsupervised learning.
Why self-supervised learning is used in GCL?
Due to three major points, self-supervised learning instead of supervised learning is being used.
- Scalability: To forecast the result of unknown data, the supervised learning approach requires labeled data. However, building models that generate good predictions may need a vast dataset. Manual data labeling takes time and is often impractical. This is where self-supervised learning comes in handy since it automates the process even when dealing with massive volumes of data.
- Improved capabilities: self-supervised learning offers a wide range of applications in computer vision, including colourization, 3D rotation, depth completion, and context filling. Another area where self-supervised learning excels is speech recognition.
- Human intervention: Labels are generated automatically by self-supervised learning without the need for human interaction.
Let’s dive deeper into the concept of Contrastive Learning and how it is used in the graph domain.
How does it work?
The goal of GCL is to create different samples of the input graph and train the algorithm to learn the low dimensional attributes of the graphical representation which will then be categorized into positive and negative subsets.
An input graph with two primary properties: the node-set and the edge set. During training, there is no supplied class information of nodes or graphs in the context of unsupervised representation learning. Our goal is to train a GNN encoder that takes graph characteristics and structure as input and creates low-dimensional node embeddings.
We may also generate a graph-level representation of the input graph that aggregates node-level embeddings for graph-oriented activities. These representations can be employed in subsequent tasks like node/graph categorization and community discovery.
Stochastic augmentations are applied at each training cycle to produce different graph perspectives from the input graph. We sample two augmentation functions to build two graph views with all potential transformation functions. We then use a common GNN encoder to obtain node representations for the two views. A readout function may be used to acquire graph representations for each graph view if desired.
The contrasting mode defines a positive set and a negative set for each node embedding “V” as the anchor instance. The positive set in a pure unsupervised learning context consists of embeddings in the two augmented graph views corresponding to the same node or graph. It should be noted that when label supervision is used, the positive set may be augmented by samples from the same class.
Furthermore, we may use negative mining algorithms to enhance the negative sample set by taking into account the relative similarity of negative samples. Finally, we rate these defined positive and negative pairings using a contrastive objective.
So, the whole process could be divided into four major parts.
- The goal of data augmentation is to produce congruent, identity-preserving positive samples from a given network. The majority of GCL work entails bi-level augmentation approaches such as structural transformation and feature transformation.
- In the case of an anchor, contrasting modes determine the positive and negative sets at various granularities of the graph. Three contrasting forms of work are regularly used in mainstream work.
- The goal of Local-Local CL is to contrast node-level representations in the two viewpoints. The anchor for a node embedding is “V,” and the positive sample is its congruent counterpart in the other perspective “U”; embeddings other than “U” are naturally picked as negatives.
- Global-Global CL ensures that node- and graph-level embeddings are compatible. For example, if a global embedding is the anchor instance, the positive sample is all of its node embeddings throughout the network. If the readout function is expressive enough, the global-local scheme can be used as a surrogate for local-local CL.
- Global-Local CL ensures that the graph embeddings of the two augmented views from the same graph are consistent. The positive sample for a graph embedding “S1” is the embedding “S2” of the other enhanced view. Other graph embeddings in the batch are deemed negative samples in this situation.
- The space of the contrasting mode is determined by subsequent tasks. For node datasets, only local-local and global-local CL are appropriate, but graph datasets can employ all three modes. For a better understanding, refer to the preceding visual illustration.
- To train the encoder, contrastive objectives are employed to maximize the agreement between positive samples and the disparity between negatives.
- Other than the anchor instance, the embeddings of nodes or graphs are distinct to the anchor and so contemplate negatives. As a result, it is logical to conclude that greater batch/sampling sizes are required for successful Contrast Learning (CL) to include more negatives to deliver more useful training signals. This is referred to as the negative mining strategy.
Benefits and Drawbacks of GCL
Let’s have a look at the benefits and drawbacks of Graph Contrastive Learning.
- It is self-supervised learning so there is no need for any human annotations.
- By maximizing mutual information, GCL is used to generate multiple views of the same graph.
- Does not depend on the quality of data. It can operate on low-quality data.
- Due to data augmentation, information loss incurs.
- High probability of adversarial attack due to the discrete nature of edges and nodes in graphs.
- Speculative overfitting of the learner.
Application of GCL
- In the discipline of botany, GCL aids in the understanding of the molecular structure of various specimens, which may then be categorized further.
- Graph Contrast Learning aids in the addition of multiple colors to a certain picture.
- Context-aware prediction is aided by GCL. It aids in the completion of the context if certain elements are lacking.
- GCL might be utilised as a target’s pre-training model. This is referred to as knowledge transmission. Because it can operate with fewer amounts of data.
- GCL is used to determine the connection between several patches in a picture.
Graph Contrasting Learning is self-supervised learning which augments the data and based on the augmentation it learns different attributes about the data at root levels. In this article, we have discussed the functionality of GCL which helps to understand its back processes and where to implement this kind of learner.