Why big tech firms open source their feature stores

Feature stores are a management layer for machine learning; they allow sharing and discovery of features that helps users in creating better machine learning pipelines.
Open source feature store

Earlier this month, LinkedIn open-sourced Feathr, its feature store that is built for machine learning feature management simplification and for improving developer productivity. With this announcement, LinkedIn’s Feathr joins Hopsworks, Feast and other popular open-source feature stores.

For the uninitiated, feature stores are a management layer for machine learning. It means that they allow sharing and discovery of features that helps users in creating better machine learning pipelines. With feature stores, users can deploy machine learning applications faster.


Sign up for your weekly dose of what's up in emerging technology.

In a positive development, it is being observed that a lot of companies are open sourcing their feature stores. What does it mean for the machine learning community?

Why do we need feature stores?

At large companies like LinkedIn, at a given point in time, there are hundreds of machine learning models running. Maintaining these feature preparation pipelines is difficult and resource-intensive. In the long run, this affects the productivity, innovation and the process of further improving the application. Feature stores offer three advantages in particular. 

Firstly, they enable the reuse of features across the company. As also mentioned in their latest blog, the cost of building and maintaining feature pipelines is borne redundantly across different teams in the organisation since each of them has its own pipelines. The complexity of each pipeline then increases as new features and capabilities are added. 

Without common abstraction for features, there is no uniform way to name features across models, no common type system for features, and no standard way to deploy and serve features in production. This brings us to the second advantage of having feature stores – they make it simple to standardise feature definition and naming conventions.

Lastly, feature stores help businesses achieve consistency between the models that are built offline and the models when they are finally deployed online. Apart from managing features, the other main task of the feature store is to serve these features to models. It may happen during online prediction or offline (during batch predictions or training). The offline mode requires generating a large dataset of many records across different features and providing it to the model for training/prediction, while the online mode requires generating features for one or more records for real-time predictions. While these two use cases may require different architectures, it is the job of the feature store to abstract away this complexity and offer a simple serving layer.

Open source feature stores

Two of the most popular open-source feature stores are Feast and Hopsworks.

About Feast, it was developed jointly by GO-JEK and Google Cloud for teams to manage, store, and discover features for use in machine learning projects. By open-sourcing Feast, GO-JEK and Google Cloud aimed at solving a set of common challenges that are faced by machine learning engineering teams by way of providing an open, extensible and uniform platform for feature storage. Feast provides a centralised platform to standardise the definition, access, and storage of features for training and serving and acts as a bridge between data engineering and machine learning.

Credit: Feast

Towards the end of 2020, Tecton (an enterprise feature store platform founded by developers of Uber Michelango) announced that it will now be a core contributor to Feast’s open-source platform. In a blog where they made this announcement, the folks at Tectos wrote that feature stores have largely been accessible to only the large and sophisticated technology companies that have the skills and resources to build their infrastructure. While for the rest of the industry it has remained totally out of reach. “Our objective is to change this, and to put feature stores in reach of every organisation, regardless of their ML maturity and available commercial resources,” they wrote. Other major companies that are contributing to or running Feast include – Salesforce, IBM, Redis, and Shopify, among others.

The other leading open source feature store is Hopsworks. It started as a collaborative project between KTH University, RISE, and Logical Clocks. It allows developers to develop and run Python, Spark, and Flink applications. Users can build production pipelines with bundled Airflow and train models on as many GPUs as installed in a Hopsworks cluster and share it with other users. Several funding bodies have contributed to its development, like European Commission (FP7, H2020), EIT, SSF, Vinnova and Celtic-Next.

Image: Hopsworks

Other companies to follow suit?

Many other leading companies have invested in developing their own feature stores. But unlike LinkedIn, these remain proprietary. Interestingly, it was Uber that first introduced the concept of a feature store in 2017. Currently, its feature store is called Michelangelo Palette (on the lines of its machine learning platform Michelangelo). Other popular proprietary feature stores include Vertex AI from Google, Databricks Feature Store from Databricks, and SageMaker Feature Store from AWS.

Given how open source is ruling the roost, one may expect the above-mentioned organisations to open source their feature stores soon.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.