Last updated December 25, 2021
In AI Mysteries

How to Build Sequential Recommendation Systems with Graph Convolutional Networks?

Share

Published on December 26, 2021

by Basawanyya Hiremath

In one of the previous articles, we discussed the sequential recommendation systems in detail. The sequential recommendation systems recommend items to the users by capturing their sequence of interactions. Even when the user is interacting with the system in sequential order, there are several dependencies and relations also in that interaction that need to be modelled. For such use cases, Jianxin Chang and his team proposed a solution where graph convolutional neural networks can be used to build effective sequential recommendation systems. In this article, we will have a close look at this approach by going through the following important points.

Traditional Recommendation Systems
Sequential Recommendation Systems
Graph Convolutional Networks
Sequential Recommendation With Graph Convolutional Networks
The Methodology

Let’s begin the discussion by having a brief understanding of traditional vs sequential recommendation systems.

Traditional Recommendation Systems

Traditional recommendation systems are generally considered to be collaborative filtering systems and content-based filtering systems. Collaborative filtering systems predict your preferences based on similar preferences of other users. Whereas, content-based filtering systems predict your preferences based only on your search history. These two recommendation systems fail to predict the next move of the user as their interaction is static.

Sequential Recommendation Systems

Sequential recommendation systems try to understand the user input over time and model in sequential order. The user input interaction is essentially sequence-dependent. Sequential recommendation systems are capable of capturing the diverse behaviour of the user as the trends change.

The above image represents the sequential behaviour of the user. The user buys a few items like a dress, purse, and shoes. The recommendation system will predict the next item to be shown.

Graph Convolutional Networks

A graph convolutional network is a class of neural networks used for processing data that is represented by graph data structures. Graph Convolutional Networks are similar to convolutional neural networks which learn features from neighbouring nodes. The main difference between them is that a convolutional neural network is built to operate on Euclidean or regularly structured data whereas a graph neural network is built to operate on Non-Euclidean or irregularly structured data.

The above image shows a convolutional neural network on the left side and a Graph of Convolutional Networks on the right side.

Sequential Recommendation with Graph Convolutional Networks

Challenges

User behaviours in long sequences reflect implicit and noisy preference signals. Users’ behaviour is rich in historical sequences. Users may interact with many items that are not of their interest. This type of behaviour will not reflect user preferences. This type of user record will serve as noise and worsen the performance of the model.

The preferences of users always drift over a period of time due to its diversity. User preference changes no matter what slow or fast trends in social change. Some preferences may be active, some are not. If we extract data from users’ noise preference that is not active or the user may not like it anymore, this will also worsen the performance of the model.

To address these challenges we use the graph-based method. The graph convolutional networks extract implicit preference signals and dynamic graph pooling is then used to capture the dynamic preference. In the long historical sequence from the user, we first convert the loose item sequence to a tight item graph. And design an attentive graph convolutional network that gathers weak signals to string ones that can accurately reflect users’ preferences. Then we use a dynamic graph pooling technique that will adaptively reserve activated core preferences for predicting users’ next preference or behaviour.

Problem Formulation

Let us have the formal definition of a sequential recommendation system. Consider we have set sequential historical data denoted by X, where 𝑥 ∈ X and it is the number of items in X. The user’s sequential interaction sequence is represented as {𝑥₁, 𝑥₂, . . . , 𝑥_𝑛} Where 𝑥₁is the first interaction and 𝑥_𝑛 last interaction. We have to predict 𝑥_𝑛+1 which is the next item which is users’ preference. The user’s preferences are in chronological order. The sequential recommendation can be formulated as follows:

Input: The historical interaction from each user is {𝑥₁, 𝑥₂, . . . , 𝑥_𝑛}

Output: The recommendation system will estimate the probability of the user interaction and predicts 𝑥_𝑛+1.

The Methodology

The graph presented below consists of:-

Interest Graph Construction
Interest-fusion Graph Convolutional Layer
Interest-extraction Graph Pooling Layer
Prediction Layer

Interest Graph Construction: Here we re-construct loose item sequences into tight item interest graphs based on metric learning. There is always a challenge that the sparseness of the co-occurrence relation is not sufficient to generate a connected graph for each other.

Here we first insert raw graph construction, which will construct a unidirectional graph G = {V, E, 𝐴} to represent each of the interaction sequences. Here, the E denotes the edges of the graph to learn and 𝐴 ∈ R 𝑛×𝑛 represents the adjacency matrix in correspondence. Each vertices 𝑣_i ∈ V where |V| = 𝑛 is related to an interacted item. Since we need a prior graph in which neighbour nodes are similar, we use node similarity metric learning. A good similarity metric function must be learnable to expressiveness and acceptable complexity. The metric function is formulated as,

Where ⊙ denotes Hadamard product, vector 𝑤 is trainable weights, vector hi and hj are item embeddings.

To increase the stabilizing of the learning process and expressive power, the similarity metric function can be extended to a multi-head metric.

Where 𝜙 is the number of heads, 𝑀^𝛿_𝑖jcomputes the similarity metric between two items embedding. Each head captures a different perspective of semantics.

Graph sparsification via 𝜀-sparseness is typically the adjacency matrix elements should be non-negative. The cosine value is in the range between [-1,1].

Where Rank_𝜀𝑛2(M) returns a value of 𝜀𝑛2 -the largest value in the metric matrix.

Interest-fusion Graph Convolutional Layer

In this layer, we dynamically strengthen important behavior and weaken noise behavior.

Interest fusion via graph attentive convolution, an alignment score E_ijis computed to map the importance target node 𝑣_i

Whereσis nonlinearity function, aggregate can be mean, sum, max etc. We add a few more steps like cluster- and query-aware attention and finally, we sum the target node’s cluster score and the score node’s query score as an updated weight source node j to the target node i.

Interest-extraction Graph Pooling Layer: Every user has different preferences at different moments. A dynamic graph pooling operation is conducted to adaptively reserve dynamically activated preferences. Here we use multiple steps like Interest extraction via graph pooling, Assignment regularization, Same mapping regularization, Single affiliation regularization And Graph readout. In graph readout point we have tightly coarsened graph G which represents users strong interest signal. We perform a weighted readout on raw graph embedding.

Where 𝛾_iweight score of each node before pooling.

Prediction layer: After pooling, the graphs are flattened to reduce sequence. In Interset evolution modeling, as the users’ preferences change to supply the final representation of interest with more historical data it is necessary to consider the chronological relationship between interests of the user. To represent a single sequential model we use

Where AUGRU is GRU with an attentional gate.

Prediction: We take the evolution output of the interest evolution layer as users current interest and graph level representation of the interest extraction layer and concatenate them.

We use the negative log-likelihood function as a loss function. The process of optimization here is used to minimize the loss function. We use L2 regularization to prevent over-fitting.

Where Θ is a set of trainable parameters, 𝜆 will control penalty strength. O is the training set and |O| is the number of training instances.

Final Words

In this article, we understood sequential recommendation systems and how they differ from traditional recommendation systems. We could also understand what graph convolutional networks are and why we use sequential recommendation systems with graph convolutional networks.

Reference