“Twitter perhaps one of the largest producers of graph-structured data in the world, second only to the Large Hadron Collider!”
Graphs are popular with fields like biology, quantum chemistry, and high-energy physics. Social media platforms like Twitter too, are leveraging graph-based ML for their services. Before we get into how social media platforms can benefit from graphs, let’s briefly talk about what separates GraphML from traditional deep learning methodologies.
Overview Of GraphML
Graphs can be looked at as mathematical abstractions of relations within a complex system. A graph consists of nodes or vertices with pairwise connections (edges).
The idea behind these representation learning approaches is to learn a mapping that embeds nodes, or entire graphs — graphs, where the nodes are users and edges are the conversations between users. Deep learning on graphs is also known as geometric deep learning because the goal is to make sure that the geometric relationships in this learned space reflect the structure of the original graph. These learned embeddings can be then used as inputs to a machine learning model.
Like convolutional neural networks(CNNs) in computer vision tasks, graphs rely on the way local operations are designed. Like weight sharing in neural networks. If two nodes or users share the same edge or like the same post, then a machine learning model can use the insights as inputs to dish out recommendations.
A significant difference compared to classical deep neural networks is that graphs are permutation-invariant, i.e. independent of the order of neighbour nodes. There are no such rules on how to arrange the nodes. Graphs are application dependent. For instance, in node-wise problems, properties of individual nodes, say spammers in a network, are predicted. Whereas, in graph-wise problems, prediction about the entire graph is made.
Researchers at Stanford stated that machine learning on graphs is about finding a way to incorporate information about the structure of the graph into the machine learning model. Social media giants like Twitter are leveraging and even researching graph ML to push the boundaries.
How Twitter Uses It
Twitter interactions can be likened to very large-scale complex graphs where the nodes model users and Tweets, while the edges model the interactions such as replies, Retweets, or favs. Twitter handles hundreds of millions of Tweets and Retweets every day. “This makes Twitter perhaps one of the largest producers of graph-structured data in the world, second perhaps only to the Large Hadron Collider,” says Michael Bronstein, head of graph learning research at Twitter.
These millions of retweets and likes translate to hundreds of millions of nodes and billions of edges. Moreover, these applications are time-constrained. Users would like to see real-time trends and customised recommendations. Current research in graph network models, writes Bronstein, only deals with modestly sized graphs, which are inadequate for large-scale settings, both in terms of architecture and training algorithms. The key to successful usage of graph neural networks is to make the right tradeoff between performance, computational complexity, memory footprint, training and inference time.
Furthermore, existing literature doesn’t address the dynamic problem. On platforms like Twitter, real-world interactions between people are dynamic by nature, and we witness trends, topics, and interests emerge and fade out all the time.
To handle these dynamics, Twitter’s graph takes shape by feeding on a stream of asynchronous events such as new users, their following lists, likes, tweets and retweets.
Twitter users generate graphs in different forms. For instance, the following graph represents the social network of the users, and the engagement graph captures how people interact with Tweets, or graphs used by Integrity Data Science team to relate users to devices and IP addresses from which they access the service in order to detect malicious behaviour and violations.
Graph learning team at Twitter believes that deep learning on graphs has a lot of untapped potential. For example, the majority of graph neural networks are limited to nodes and edges only, but higher-order structures such as motifs, graphlets, or simplicial complexes are known to be of importance in complex networks. These complex structures offer more expressive power to graph-based models, but as mentioned earlier, there is a tradeoff.