“As opposed to tables and columns, Airbnb’s platform concentrates on metrics and dimensions.”
For the past couple of years, Airbnb has exemplified what a successful data-driven company should look like. It switched from Chronos to Apache Airflow to orchestrate workflow and invested in developing a collection of very crucial data tables, known as ‘core_data’. Airbnb analytics blossomed with `core data` serving as the cornerstone. Airbnb built and scaled its experimentation platform so that it could introduce A/B testing into the company. Furthermore, a Data portal was established to help organise and record the company’s data as they prepared to launch Apache Superset is an open-source project. To evaluate and optimise and serve the internal users’ end-to-end needs, the company needed a comprehensive data platform. This need led to the construction of Minerva, Airbnb’s metric platform.
Building on expertise in managing an experimental metric repository, the development team at Airbnb chose to employ six design principles for Minerva. The specifications for designing Minerva stated that it should be Scalable, Declarative, Consistent, Highly Available, and Well Tested.
How was Minerva conceived
(Source : Airbnb)
As part of the company’s commitment to producing a single source of truth metric platform, Airbnb made significant investments in the development of Minerva, a platform that standardised the ways metrics are developed, computed, provided, and consumed. While ‘core data’ introduced multiple-step modifications to the data capabilities of Airbnb, it wasn’t done without substantial costs. By adopting the Core Data standard, users could discover which tables were needed, resulting in uniform table usage. But on the other side, it imposed a massive challenge on the centralised data engineering team because it required them to control and onboard an endless supply of fresh datasets, even when all the tables in an application were already in place. Such pipelines developed downstream of core data which had a significant impact on metric proliferation and divergence. Because of this, data scientists and engineers spend endless hours fixing errors.
This is when the platform Minerva, created by Airbnb, came into effect. As opposed to tables and columns, Minerva concentrates on metrics and dimensions. By unifying the data from upstream data engineering teams and data from downstream consumption teams, the Minerva API allows enterprises to update core tables more flexibly while supporting many downstream customers.
For example, necessary metadata must be submitted by the author when a metric is defined in Minerva. The configuration files all require information such as ownership, lineage and metric description. In the past, such metadata were present only in charts marked up as either internal corporate information or used in generic definitions by a variety of business intelligence (BI) tools. Every definition in Minerva handled a version-controlled piece of code.
Advantages of Minerva
Minerva is developed on top of open-source projects using Airflow, a workflow management framework, under the Apache licence. It is used for data pipeline-building, monitoring, and adjustment stages. All workflows on the platform are written in Python. It computes with Apache Hive and Apache Spark and consumes with Presto and Apache Druid. Minerva manages the entire life cycle, from conception to computation, serving, consumption, and final deprecation. It centralises critical business KPIs, dimensions, and other metadata in a centralised Github repository that is accessible to anybody within the company.
Its development path ensures that the best data engineering standards such as code review, static validation, and test runs are followed. Its advanced computing pipeline features built-in checks to assure data quality which can automatically self-heal following job failures. Minerva effectively denormalized data by maximising the reuse of data and intermediate joined outcomes. It provides a uniform data API that allows for the on-demand delivery of aggregated and basic information. Minerva version control is also used to manage data definitions, data management, and data retention.
Through Minerva’s API, Minerva data is made available to Airbnb’s proprietary R and Python clients. The availability of a notebook environment simplifies the data analysis process. The notebook environment produces and exposes the same data as Superset and Metric Explorer. What makes this data API noteworthy is that it allows the development of internal tools that can subsequently be made publicly available for the entire company.
As COVID-19 froze global travel in March of 2020, Airbnb suffered considerably. In no time, reservations on Airbnb fell, and cancellations increased. Airbnb was able to drastically cut the time from data curation to insight finding through Minerva. An analysis of Minerva data also helped the company to anticipate and plan for the rapidly shifting market landscape confidently. Through this, they were able to change the shift in demand for local and long term stays.