MITB Banner

All About Netflix’s Data Management Systems

Netflix utilises Alpakka-Kafka for their streaming processing solutions.

Share

Netflix

Behind every blockbuster Netflix binge is an amazingly managed data platform. It is a task to envision all the data and the systems that work to ensure the speed at which the platform updates series and TV shows for you, but Netflix’s engineering blogs give us a glimpse into the world of managing tens of thousands of data. 

In this article, we break down the company’s operations. 

Device Management Platform

The Netflix team has built a reliable data management tool at scale to ensure two ongoing efforts. First, providing the latest releases deliver the same quality of Netflix experience on different device types. Second, upkeep the quality bar while working with its partners to port the Netflix SDK in their devices. Netflix’s Device Management platform is the infrastructural foundation for the Netflix Test Studio. It comprises a computing environment called Reference Automation Environment (RAE) and its complimenting software on the cloud. 

The features: 

  1. Service-level abstraction for controlling devices and their environments 
  2. Collect and aggregate information, state updates for all devices attached to the RAEs in the fleet.

The Device Management Platform has regular device updates event sourced through the control plane to the cloud to ensure that the NTS is updated with information about the devices available for testing. The challenge it does face is the ability to ingest and process these events in a scalable manner. 

Architecture

A Stream Processing Framework

Netflix utilises Alpakka-Kafka for their streaming processing solutions. It provides advanced control over the streaming process, satisfies the system requirements, including the Netflix Spring integration, and the framework is lightweight with a less terse code. 

The construction of the Alpakka-based Kafka processing pipeline

Kafka has excelled at the three indicators of consumption performance: the message fetch rate, the max consumer lag, and the committed rate. 

Fetch Request Metrics

While before Kafka, the number of fetch calls remained unchanged across burst events but was otherwise quite unstable over time. After the deployment, the calls followed a 1:1 correspondence with Kafka topic’s message publication rate. The number of fetch calls also remained stable over time. 

Alpakka-Kafka-based processor hugely scaled its Kafka consumption to ensure that the system is not under or over-consuming Kafka messages.

Max Consumer Lag

The Kafka consumer lag metrics showed a significant improvement from the previous lag that floated long-term at around 60,000 records, which delay updating information by a significantly long time, making it easier for the users to notice. The Alpakka-Kafka-based processor has decreased the average max consumer lag over time to zero outside the burst event windows and 20,000 records inside the burst event window.  

Commit Rate  

Kafka consumers can perform manual, or automatic offset commits when it fetches records. With auto commits, messages are acknowledged as ‘received’ as soon as they are brought and irrespective of processing. Alpakka-Kafka-based processor lowered the committed rate from 7 kbytes/sec to 50 bytes/sec.

The Data Explorer

Netflix uses their Data Explorer to give the engineers fast and safe access to their data stored in Cassandra and Dynomite/Redis data stores.

Features of Data Explorer: 

  • Multi-Cluster Access

The data explorer directs users to a single web portal for all of their data stores to increase user productivity. In a production environment with hundreds of clusters, this tool helps reduce the available data stores to those authorised for access. 

  • Schema Designer

The schema designer in Cassandra allows the users to drag and drop their way to a new table instead of writing ‘Create Table’ statements that users have found to be an intimidating experience. With schema designer, users can create a new table using any collection data type, then designate Netflix partition key and clustering columns. 

  • Explore Netflix Data

Explore mode lets users execute point queries against Netflix clusters, export result sets to CSV, or download them as CQL insert statements. 

  • Query IDE

While explore mode supports efficient point queries, the Query mode goes one step further to provide a powerful CQL IDE, including autocomplete and helpful snippets.

  • Dynomite and Redis Features

Along with the C*, the data explorer has facilities for Dynomite and Redis users as well. 

  • Key Scanning

To ensure the clusters aren’t strained since Redis is an in-memory data store, the data explorer allows it to perform SCAN operations across all nodes in the cluster. 

The team has also codified their best practices to support various OSS environments and built several adapter layers into the product to ensure custom implementations can be made. In addition, they enabled OSS by introducing seams where users could provide their performances for discovery, access control, and data store-specific connection settings. 

The Data Explorer has an overridable configuration that allows mechanisms to override the defaults and specify Netflix custom values for different production environments. The CLI Setup Tool provides a series of prompts to improve the experience of creating a Netflix configuration file. “The CLI tool is the recommended approach for building Netflix configuration files, and you can re-run the tool at any point to create a new configuration,” the team wrote in a blog post.

Data Mesh and Data Movement in Netflix Studio

Netflix is moving more and more towards creating all original content from their Netflix Studio; after a pitch, the series goes through several phases. This poses the challenge of providing visibility of Studio data across all the stages. 

The Data Mesh Platform

Data Mesh is a fully managed, streaming data pipeline to enable Change Data Capture  (CDC) use cases. Data Mesh allows users to create, source and construct pipelines. Its drag-and-drop, the self-service user interface, allows users to explore sources and create pipelines without the need to manage and scale complex data streaming infrastructure.

This platform allows for data movement in Netflix Studio through its configuration drive, decreasing the lead time when creating a new pipeline. Its offerings also include end-to-end schema evolution, self-serve UI, and secure data access. 

The Data Mesh platform powers the data movement across Netflix Studio applications exposing GraphQL queries via Studio Edge. Change Data Capture (CDC) sources connector reads from studio applications’ database transaction logs and emits the change events. These events are passed on to the Data Mesh processor, which issues GraphQL queries to Studio Edge, which lands the data in Iceberg tables in Netflix Data Warehouse. After this, they assist in ad-hoc or scheduled querying and reporting. 

Data Consumption

Netflix studio partners rely on data for decision making and collaboration during the production phases, for which the Studio Tech Solutions team has to ensure real-time reports. 

The Genesis is a Semantic Data Layer created by the team to map data points in Data Sourced Definitions to generate the trackers’ SQL. Genesis joins, aggregates, formats and filters data based on what is available in the Data Source Definitions. Genesis currently powers 240+ trackers. 

The generated queries are used for multiple trackers in Workflow Definitions to create data movement workflows managed through Netflix Big Data Scheduler, powered by Titus. The scheduler executes Netflix queries and moves the results to a data tool – a Google Sheet Tab, Airtable base, or Tableau dashboard.

Data Consumption Overview

Now, do you finally comprehend the price of binge-watching? Tons and tons of data management programmes.

Share
Picture of Avi Gopani

Avi Gopani

Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.