Netflix began as a humble DVD rental platform in 1997 and transformed into a major Over-The-Top giant with 207 million plus paid subscribers worldwide. At the height of the pandemic, Netflix US added up to 500 new shows and reeled in over 36 million new subscribers.
Despite its relatively higher subscription rates, Netflix continues to gain traction. One of the major factors behind this growth is its data science strategy.
Netflix first started using analytics tools in 2000 to recommend DVDs for users. Two decades later, the Netflix recommendation system is one of the most sophisticated in business. The personalised recommendation algorithms drive customer retention, helping Netflix pocket profits to the tune of $1 billion annually. More than 80 percent of the shows people watch on Netflix are discovered through the platform’s recommendation system.
In an earlier interview, Todd Yellin, Netflix’s vice president of product innovation said, “The three legs of this stool would be Netflix members; taggers who understand everything about the content; and our machine learning algorithms that take all of the data and put things together.”
The Netflix homepage on most devices is structured with videos organised into coherent rows presented in a two-dimensional layout. The key approach to personalisation is how these rows are displayed on the homepage. Netflix uses several ranking algorithms to present these rows, some of them include personalised video ranking, Top-N video ranking, trending now ranker, continue watching ranker, and video-to-video similarity ranker. Each of these algorithms then undergoes the row generation process to find videos that fit the appropriate row.
To select which of these thousands of rows to display on the user’s homepage, Netflix uses one of the three main approaches:
- Row-ranking approach uses existing recommendation or learning to rank techniques to score each row and then rank them accordingly.
- Stage-wise approach: Here, each row is selected sequentially from the first. The next rows are then recomputed to consider its relationship with previous rows and the previous items already chosen for the page.
- Machine learning approach: This approach aims to create a scoring function by training the model using historical information of the homepages.
Personalisation is also applied to the image they choose for the shows they are promoting. These images are called artwork. Computer vision algorithms are then used to scan the shows and pick out the best image in line with the user preference.
Use of Python
The platform heavily relies on Python. Three main areas of application include:
Open Connect: It is Netflix’s purpose-built content delivery network responsible for serving the video traffic. The software systems required for CDN’s infrastructure are written in Python. Python applications also manage the models and hardware components.
Demand Engineering: Built on Python, it is responsible for regional failovers, capacity operations, traffic management, and fleet efficiency of the Netflix cloud.
CORE: Netflix’s alerting and statistical team depends on libraries such as NumPy, SciPy, and Pandas to automate the analysis of thousands of related signals. Python helps in automation, data exploration, cleaning, and visualisation.
Developed by Netflix, Polynote is an open-source polyglot notebook with support for Scala. It allows smooth integration of JVM based machine learning platforms with Python.
- Provides insight into tasks in execution and kernel status
- Simple dependency and configuration management
- Provides IDE-like features like auto-complete, error highlighting, editing, improvements, data visualisation, etc.
Netflix has numerous data stores with different data formats and large data volumes. The Netflix data warehouse consists of many data sets stored in Amazon S3, Elasticsearch, Druid, Redshift, and MySql. It also supports Spark, Presto, Hive, and Pig for producing, processing, and consuming data sets.
Netflix has introduced Metacat to ensure the data platform can interoperate across diverse data sets as one single warehouse. The metadata service makes data easy to discover, process, and manage.