Search

# How This Graph-Powered Real Time Engine Handles A Billion Recommendations

Recommendation systems are the cornerstone of a majority of the modern day billion dollar industries. From Amazon to Netflix to Pinterest. Amazon recommends commodities for its e-commerce website and Netflix has a limited number of movies to recommend but in the case of Pinterest, the numbers are not that small. It has more than 100 billion ideas saved and has to deliver to more than 200 million of its users in real time.

Pinterest opted for a graph-based system for this high performance task. They do this with their custom built technique named Pixie.

##### Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy

Pixie’s ease of generalisation to different types of tasks makes this possible, whether it’s creating many types of graphs, recommending different objects, or tuning the graphs to capture the absolute right content. Now with Pixie and recent optimisations, Pinterest is streamlined to serve relevant ideas in real time.

The above diagram illustrates the graphs connecting the boards and pins (images saved by the user). Here each edge shows a saved pin to the board. As the same pins can be saved to different boards for example, a bike can be saved into both ‘bikes’ board or ‘royal enfield’ or both. This eventually increases the number of edges to more than 100 billion. These graphs can be loaded on to AWS machines and deployed at scale.

The Pixie algorithm examines the portion of the graph nearest to the relevant nodes by using biased random walk algorithm.

A biased  random walk is an approach for statistical analysis of graphs. They are used to extract symmetries when the graph contains complex networks.

At each step, it selects a random neighbor and visits the node, incrementing node visit counts as it visits more random neighbors. The probability Alpha, set at 0.5, to restart at node Q so our walks do not stray too far. And then neighboring nodes and boards undergo randomly sampling for 100,000 steps.

So, in the case of the bike example, if a user opens and checks bike images 3 times then this information will be given to Pixie and it comes up with thousands of similar images.

The nodes that have been visited 14 and 16 times are the ones that are most closely related to the query node.

Pixie continuously repeats this process in real-time as the data grows, so the users are always able to keep narrowing down their searches and find the exact ideas they’re looking for to pursue their goal.

The top 1,000 most visited nodes are retrieved , to avoid random walks through the graphs  for complete 100,000 steps every time. To accomplish this, walking is done until the rank 1,000 candidate gets at least 20 visits.  Along with this the engineering team at Pinterest also created Graph Pruning to cut down edges.

Though the recommendation systems have evolved over time, real-time(< 100 milliseconds) large scale application is still a challenging task for the online platforms. Pinterest’s Pixie is one of the best solutions available out today. Pixie uses 3 billion nodes and 17 billion edges and the results show that the user interaction has increased up to 50% when compared to Hadoop-based production system.

### Key Takeaways Of Pixie

• 3 billion nodes are pruned and fit to 120 GB main memory. This reduction in memory improves performance.
• Graphs enable parallel execution hence lesser time(less than 60 milliseconds).
• Pixie can be scaled up by simply adding more machines(AWS) to the cluster.

I have a master's degree in Robotics and I write about machine learning advancements.

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### AI Assists Production in Indian Film Industry

Implementing AI in pre-production can bring down storyboarding process time by 50-80% and reduce the

### Is GPT-4 Really Better than Radiologists?

“Radiology report summaries created by GPT-4 are comparable, and in some cases, even preferred over

### TSMC: The Wizard Behind AI’s Curtain

TSMC anticipates a substantial CAGR of nearly 50% in the AI sector from 2022 to 2027.

Not really.

### Google Gemini To Arrive Sooner Than Expected

This is after announcing the AI at the Google I/O 2023, the company had postponed

### ByteDance to Launch Platform to Build Custom Chatbots

This comes just a few days after OpenAI had delayed its plan to launch a

### This New AI tool Could Mark the Beginning of the End for TikTok and Instagram Influencers

Alibaba Group announces a model framework that can transform still images into dynamic character videos

### Embracing Identity: The Journey of Sujoy Das

“Why is it that corporate diversity efforts are often limited to specific times of the

### The Biggest Data Breaches of 2023

The most significant breaches that impacted the global landscape in 2023.

### NVIDIA Planning Big Expansions in Japan

Prime Minister Fumio Kishida has extended billions of dollars in financial support to bolster TSMC