Inside Netflix’s Recommendation Engine

Netflix began as a humble DVD rental platform in 1997  and transformed into a major Over-The-Top giant with 207 million plus paid subscribers worldwide. At the height of the pandemic, Netflix US added up to 500 new shows and reeled in over 36 million new subscribers.

Despite its relatively higher subscription rates, Netflix continues to gain traction. One of the major factors behind this growth is its data science strategy.

Recommendation system

Netflix first started using analytics tools in 2000 to recommend DVDs for users. Two decades later, the Netflix recommendation system is one of the most sophisticated in business. The personalised recommendation algorithms drive customer retention, helping Netflix pocket profits to the tune of $1 billion annually. More than 80 percent of the shows people watch on Netflix are discovered through the platform’s recommendation system.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Credit: Netflix

In an earlier interview, Todd Yellin, Netflix’s vice president of product innovation said, “The three legs of this stool would be Netflix members; taggers who understand everything about the content; and our machine learning algorithms that take all of the data and put things together.”


Download our Mobile App



The Netflix homepage on most devices is structured with videos organised into coherent rows presented in a two-dimensional layout. The key approach to personalisation is how these rows are displayed on the homepage. Netflix uses several ranking algorithms to present these rows, some of them include personalised video ranking, Top-N video ranking, trending now ranker, continue watching ranker, and video-to-video similarity ranker. Each of these algorithms then undergoes the row generation process to find videos that fit the appropriate row.

Credit: Netflix

To select which of these thousands of rows to display on the user’s homepage, Netflix uses one of the three main approaches:

  • Row-ranking approach uses existing recommendation or learning to rank techniques to score each row and then rank them accordingly.
  • Stage-wise approach: Here, each row is selected sequentially from the first. The next rows are then recomputed to consider its relationship with previous rows and the previous items already chosen for the page.
  • Machine learning approach: This approach aims to create a scoring function by training the model using historical information of the homepages.

Personalisation is also applied to the image they choose for the shows they are promoting. These images are called artwork. Computer vision algorithms are then used to scan the shows and pick out the best image in line with the user preference.

Use of Python

The platform heavily relies on Python. Three main areas of application include:

Open Connect: It is Netflix’s purpose-built content delivery network responsible for serving the video traffic. The software systems required for CDN’s infrastructure are written in Python. Python applications also manage the models and hardware components.

Demand Engineering: Built on Python, it is responsible for regional failovers, capacity operations, traffic management, and fleet efficiency of the Netflix cloud.

CORE: Netflix’s alerting and statistical team depends on libraries such as NumPy, SciPy, and Pandas to automate the analysis of thousands of related signals. Python helps in automation, data exploration, cleaning, and visualisation.

Polynote

Developed by Netflix, Polynote is an open-source polyglot notebook with support for Scala. It allows smooth integration of JVM based machine learning platforms with Python. 

  • Provides insight into tasks in execution and kernel status
  • Simple dependency and configuration management
  • Provides IDE-like features like auto-complete, error highlighting, editing, improvements, data visualisation, etc.

Metacat

Netflix has numerous data stores with different data formats and large data volumes. The Netflix data warehouse consists of many data sets stored in Amazon S3, Elasticsearch, Druid, Redshift, and MySql. It also supports Spark, Presto, Hive, and Pig for producing, processing, and consuming data sets.

Credit: Netflix

Netflix has introduced Metacat to ensure the data platform can interoperate across diverse data sets as one single warehouse. The metadata service makes data easy to discover, process, and manage.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: Evolution of Data Science: Skillset, Toolset, and Mindset

In my opinion, there will be considerable disorder and disarray in the near future concerning the emerging fields of data and analytics. The proliferation of platforms such as ChatGPT or Bard has generated a lot of buzz. While some users are enthusiastic about the potential benefits of generative AI and its extensive use in business and daily life, others have raised concerns regarding the accuracy, ethics, and related issues.