Inside Netflix’s Recommendation Engine

Advertisement

Netflix began as a humble DVD rental platform in 1997  and transformed into a major Over-The-Top giant with 207 million plus paid subscribers worldwide. At the height of the pandemic, Netflix US added up to 500 new shows and reeled in over 36 million new subscribers.

Despite its relatively higher subscription rates, Netflix continues to gain traction. One of the major factors behind this growth is its data science strategy.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Recommendation system

Netflix first started using analytics tools in 2000 to recommend DVDs for users. Two decades later, the Netflix recommendation system is one of the most sophisticated in business. The personalised recommendation algorithms drive customer retention, helping Netflix pocket profits to the tune of $1 billion annually. More than 80 percent of the shows people watch on Netflix are discovered through the platform’s recommendation system.

Credit: Netflix

In an earlier interview, Todd Yellin, Netflix’s vice president of product innovation said, “The three legs of this stool would be Netflix members; taggers who understand everything about the content; and our machine learning algorithms that take all of the data and put things together.”

The Netflix homepage on most devices is structured with videos organised into coherent rows presented in a two-dimensional layout. The key approach to personalisation is how these rows are displayed on the homepage. Netflix uses several ranking algorithms to present these rows, some of them include personalised video ranking, Top-N video ranking, trending now ranker, continue watching ranker, and video-to-video similarity ranker. Each of these algorithms then undergoes the row generation process to find videos that fit the appropriate row.

Credit: Netflix

To select which of these thousands of rows to display on the user’s homepage, Netflix uses one of the three main approaches:

  • Row-ranking approach uses existing recommendation or learning to rank techniques to score each row and then rank them accordingly.
  • Stage-wise approach: Here, each row is selected sequentially from the first. The next rows are then recomputed to consider its relationship with previous rows and the previous items already chosen for the page.
  • Machine learning approach: This approach aims to create a scoring function by training the model using historical information of the homepages.

Personalisation is also applied to the image they choose for the shows they are promoting. These images are called artwork. Computer vision algorithms are then used to scan the shows and pick out the best image in line with the user preference.

Use of Python

The platform heavily relies on Python. Three main areas of application include:

Open Connect: It is Netflix’s purpose-built content delivery network responsible for serving the video traffic. The software systems required for CDN’s infrastructure are written in Python. Python applications also manage the models and hardware components.

Demand Engineering: Built on Python, it is responsible for regional failovers, capacity operations, traffic management, and fleet efficiency of the Netflix cloud.

CORE: Netflix’s alerting and statistical team depends on libraries such as NumPy, SciPy, and Pandas to automate the analysis of thousands of related signals. Python helps in automation, data exploration, cleaning, and visualisation.

Polynote

Developed by Netflix, Polynote is an open-source polyglot notebook with support for Scala. It allows smooth integration of JVM based machine learning platforms with Python. 

  • Provides insight into tasks in execution and kernel status
  • Simple dependency and configuration management
  • Provides IDE-like features like auto-complete, error highlighting, editing, improvements, data visualisation, etc.

Metacat

Netflix has numerous data stores with different data formats and large data volumes. The Netflix data warehouse consists of many data sets stored in Amazon S3, Elasticsearch, Druid, Redshift, and MySql. It also supports Spark, Presto, Hive, and Pig for producing, processing, and consuming data sets.

Credit: Netflix

Netflix has introduced Metacat to ensure the data platform can interoperate across diverse data sets as one single warehouse. The metadata service makes data easy to discover, process, and manage.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MORE FROM AIM