Inside Netflix’s Recommendation Engine

Netflix began as a humble DVD rental platform in 1997  and transformed into a major Over-The-Top giant with 207 million plus paid subscribers worldwide. At the height of the pandemic, Netflix US added up to 500 new shows and reeled in over 36 million new subscribers.

Despite its relatively higher subscription rates, Netflix continues to gain traction. One of the major factors behind this growth is its data science strategy.

Recommendation system

Netflix first started using analytics tools in 2000 to recommend DVDs for users. Two decades later, the Netflix recommendation system is one of the most sophisticated in business. The personalised recommendation algorithms drive customer retention, helping Netflix pocket profits to the tune of $1 billion annually. More than 80 percent of the shows people watch on Netflix are discovered through the platform’s recommendation system.

Credit: Netflix

In an earlier interview, Todd Yellin, Netflix’s vice president of product innovation said, “The three legs of this stool would be Netflix members; taggers who understand everything about the content; and our machine learning algorithms that take all of the data and put things together.”

The Netflix homepage on most devices is structured with videos organised into coherent rows presented in a two-dimensional layout. The key approach to personalisation is how these rows are displayed on the homepage. Netflix uses several ranking algorithms to present these rows, some of them include personalised video ranking, Top-N video ranking, trending now ranker, continue watching ranker, and video-to-video similarity ranker. Each of these algorithms then undergoes the row generation process to find videos that fit the appropriate row.

Credit: Netflix

To select which of these thousands of rows to display on the user’s homepage, Netflix uses one of the three main approaches:

  • Row-ranking approach uses existing recommendation or learning to rank techniques to score each row and then rank them accordingly.
  • Stage-wise approach: Here, each row is selected sequentially from the first. The next rows are then recomputed to consider its relationship with previous rows and the previous items already chosen for the page.
  • Machine learning approach: This approach aims to create a scoring function by training the model using historical information of the homepages.

Personalisation is also applied to the image they choose for the shows they are promoting. These images are called artwork. Computer vision algorithms are then used to scan the shows and pick out the best image in line with the user preference.

Use of Python

The platform heavily relies on Python. Three main areas of application include:

Open Connect: It is Netflix’s purpose-built content delivery network responsible for serving the video traffic. The software systems required for CDN’s infrastructure are written in Python. Python applications also manage the models and hardware components.

Demand Engineering: Built on Python, it is responsible for regional failovers, capacity operations, traffic management, and fleet efficiency of the Netflix cloud.

CORE: Netflix’s alerting and statistical team depends on libraries such as NumPy, SciPy, and Pandas to automate the analysis of thousands of related signals. Python helps in automation, data exploration, cleaning, and visualisation.

Polynote

Developed by Netflix, Polynote is an open-source polyglot notebook with support for Scala. It allows smooth integration of JVM based machine learning platforms with Python. 

  • Provides insight into tasks in execution and kernel status
  • Simple dependency and configuration management
  • Provides IDE-like features like auto-complete, error highlighting, editing, improvements, data visualisation, etc.

Metacat

Netflix has numerous data stores with different data formats and large data volumes. The Netflix data warehouse consists of many data sets stored in Amazon S3, Elasticsearch, Druid, Redshift, and MySql. It also supports Spark, Presto, Hive, and Pig for producing, processing, and consuming data sets.

Credit: Netflix

Netflix has introduced Metacat to ensure the data platform can interoperate across diverse data sets as one single warehouse. The metadata service makes data easy to discover, process, and manage.

Download our Mobile App

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.