Now Reading
All About Lyft’s ML Architecture

All About Lyft’s ML Architecture

Launched three years after Uber, Lyft has expanded from 60 US cities in 2016 to 600+ cities in 2021. Responsible for 25 percent of US ride hailing, Lyft has managed to become the second most used ride hailing apps in the country. 

 Machine learning forms the app’s backbone, dealing with the optimal way to match drivers with passengers, pricing, rider incentives, fraud detection, route planning, and automated support.


The ML models for these use-cases need features that are computed through batch jobs on the data warehouse or via event streams. For instance, a ‘cancel model’ predicts whether the user can cancel a particular ride after the request is made on the app. The model will use data such as the user’s cancel history by running a batch job on the Hive data warehouse. 

Lyft’s Feature Service comprises Feature Definitions, Feature Ingestion & Processing, and Retrieval.

Source: Lyft

‘We decided on SQL as the language for feature definitions. It is expected to have one column designated to be an entity ID. The rest of the columns are features whose complexity ranges from querying a single table to a few 1000s of lines of SQL comprising complex joins across multiple tables’, said Vinay Kakade, a machine learning advisor at Lyft.  For ease of managing a large number of features, multiple features could be grouped into a feature group, he added.


In 2020, Lyft open sourced a cloud-native machine learning and data processing platform – Flyte. This workflow automation platform handles production model training and data processing across teams. 

The platform executes complex workflows such as hardware provisioning, scheduling, data storage and monitoring for Lyft. Flyte manages over 10,000 unique workflows at Lyft, totalling over 1,000,000 executions every month, 20 million tasks, and 40 million containers. The platform uses protocol buffers as the specification language to specify workflows and functions. In addition, Flyte comes with Flytekit — a Python SDK to develop applications on Flyte, be it authoring workflows or tasks. 

Flyte is built on top of Kubernetes, that comes with benefits like portability, scalability and reliability. In addition, since all the entities in it are immutable, every change is explicitly captured as a new version, making it easy and efficient for a developer to iterate, experiment and roll back the workflows. 

“With Kubernetes, we deploy each application in its separate cluster. It has its own dedicated job manager pods and task manager pods. We built something called a Flink Kubernetes Operator at Lyft, which is available in open-source. With Flink Kubernetes Operator, you can think of it as a cron job,” said Sherin Thomas, Lyft’s senior software engineer, at the Qcon London 2020 conference.

The app also uses AWS to detect anomalies with its data points about the application operating on multiple timescales. In addition, it uses Anodot, an AI-powered time-series analytics solution built on AWS, to identify potential problems and detect incidents. 

Source: Lyft

Lyft vs Uber

While Uber has 91 million monthly riders, Lyft comes second with 30 million riders. Uber has become synonymous with ride-hailing service with presence across 63 countries, as opposed to Lyft’s US and Canada bases. 

Lyft uses Amazon Web Services, whereas Uber takes a classic hybrid cloud approach, with multiple vendors and co-located facilities.

Uber uses data science and algorithms to solve pain points ranging from the menu to automatic driver license approval, crash detection, to improved GPS. Uber’s AI predicts supply-demand by evaluating distance, time, weather and traffic. It uses real-time algorithms to create its payment, routing and marketing technologies for Uber and its Eats and Freight services. 

Lyft ML mines the historical data to offer competitive prices during high demand, predict driver availability, smooth ride experience for its customers and optimise routes. 

Uber’s Flyte equivalent is Manifold – a model-agnostic, a visual debugging tool for machine learning. The open-source platform is used to identify performance issues across ML models.  

Lyft is bullish on autonomous driving with a two-pronged AI strategy. The company has also acquired computer vision technology developer Blue Vision Labs to push its self-driving ambitions.

What Do You Think?

Join Our Telegram Group. Be part of an engaging online community. Join Here.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top