MITB Banner

How This Graph-Powered Real Time Engine Handles A Billion Recommendations

Share

Recommendation systems are the cornerstone of a majority of the modern day billion dollar industries. From Amazon to Netflix to Pinterest. Amazon recommends commodities for its e-commerce website and Netflix has a limited number of movies to recommend but in the case of Pinterest, the numbers are not that small. It has more than 100 billion ideas saved and has to deliver to more than 200 million of its users in real time.

Source: Pinterest

Pinterest opted for a graph-based system for this high performance task. They do this with their custom built technique named Pixie.

Pixie’s ease of generalisation to different types of tasks makes this possible, whether it’s creating many types of graphs, recommending different objects, or tuning the graphs to capture the absolute right content. Now with Pixie and recent optimisations, Pinterest is streamlined to serve relevant ideas in real time.

The above diagram illustrates the graphs connecting the boards and pins (images saved by the user). Here each edge shows a saved pin to the board. As the same pins can be saved to different boards for example, a bike can be saved into both ‘bikes’ board or ‘royal enfield’ or both. This eventually increases the number of edges to more than 100 billion. These graphs can be loaded on to AWS machines and deployed at scale.

Source:Pinterest

The Pixie algorithm examines the portion of the graph nearest to the relevant nodes by using biased random walk algorithm.

A biased  random walk is an approach for statistical analysis of graphs. They are used to extract symmetries when the graph contains complex networks.

At each step, it selects a random neighbor and visits the node, incrementing node visit counts as it visits more random neighbors. The probability Alpha, set at 0.5, to restart at node Q so our walks do not stray too far. And then neighboring nodes and boards undergo randomly sampling for 100,000 steps.

So, in the case of the bike example, if a user opens and checks bike images 3 times then this information will be given to Pixie and it comes up with thousands of similar images.

The nodes that have been visited 14 and 16 times are the ones that are most closely related to the query node.

Pixie continuously repeats this process in real-time as the data grows, so the users are always able to keep narrowing down their searches and find the exact ideas they’re looking for to pursue their goal.

The top 1,000 most visited nodes are retrieved , to avoid random walks through the graphs  for complete 100,000 steps every time. To accomplish this, walking is done until the rank 1,000 candidate gets at least 20 visits.  Along with this the engineering team at Pinterest also created Graph Pruning to cut down edges.

Though the recommendation systems have evolved over time, real-time(< 100 milliseconds) large scale application is still a challenging task for the online platforms. Pinterest’s Pixie is one of the best solutions available out today. Pixie uses 3 billion nodes and 17 billion edges and the results show that the user interaction has increased up to 50% when compared to Hadoop-based production system.

Key Takeaways Of Pixie

  • 3 billion nodes are pruned and fit to 120 GB main memory. This reduction in memory improves performance.
  • Graphs enable parallel execution hence lesser time(less than 60 milliseconds).
  • Pixie can be scaled up by simply adding more machines(AWS) to the cluster.

Read more here

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.