MITB Banner

Databricks Releases Open Source Machine Learning Platform MLflow Aimed To Standardize ML Workflows

Share

Illustration by San Francisco company Databricks launched MLflow to simplify ML lifecycle

San Francisco company Databricks launched MLflow to simplify ML lifecycle

San Francisco headquartered Databricks that provides a unified analytics platform released MLflow, a new open source project that strives to provide some standardization to the complex processes that machine learning engineers face during the course of building, testing, and deploying machine learning models. Announcing the release of the open source platform, CTO Matei Zaharia, also the creator of Apache Spark noted that even though there are a number of open source tools that cover each and every phase of the machine learning ifecycle, such as data preparation and model training, it is hard to track experiments and reproduce the results.

At a keynote address, Zaharia observed that machine learning development lifecycle is highly complex and developers face a lot of issues which are usually not present in a traditional software development lifecycle.

Zaharia Listed Down A Few Pain Points Developers Face In Building ML Models:

  • Number of tools have grown: Zaharia cited that unlike the process in traditional software development, where teams select one tool for each phase, in machine learning, engineers end up testing every available algorithm to see whether it improves results. In the end, developers use dozens of libraries.
  • Reproducing results: Reproducing machine learning workflow by retracing the steps is extremely difficult in machine learning. For example if you have to debug a problem, it can be difficult to go back to the past work.
  • productionizing ML models: A big challenge developers face is moving a model to production because there is no set way move models from a library to any of these tools, emphasises Zaharia in the post.

MLflow Open Source Project Provides A Standardized Format For Training & Deployment

MLflow, currently in alpha stage manages the entire machine learning lifecycle and allows developers to work with any machine learning library. It offers three components: MLflow tracking to record and query experiments; MLflow projects, a standardized format to package reusable code and MLflow models. Talking on the sidelines of the release at the Databricks’ Spark and AI Summit in San Francisco, Zaharia observed that MlFLOW standardises the data for training and deployment loop. “As long as developers work within the platform, if you are building models with these tools, you can deploy and productionize it thereby saving a lot of time,” he said.

Since it is an open source platform, developers from across the globe MLflow would make contributions and would be able to share workflow and ML models if developers want to open source their code. The platform’s open interface is a key feature here – it is built around REST APIs and simple data formats, instead of just replying on a small set of built-in functionality. This means developers can easily add MLflow to their existing ML code and share code across any ML library that others in the company can run.

Need For Standardized Open Source ML Platform

Besides open source ML platforms such as Keras and Theano, companies developed internal ML platforms to manage the development lifecycle. For example, earlier last year Uber Engineering released Michelangelo, machine learning-as-a-service system for building and deploying models, Facebook developed FBLearner and Google has TFX, an end-to-end general-purpose machine learning platform released last year. Google has already open sourced some TFX libraries. According to Zaharia, most machine learning platforms only support a small set of built-in algorithms, or a single ML library, and are also tied to each company’s infrastructure. This implies that developers are unable to use other machine learning libraries.

Outlook

The platform is currently offered in a hosted version but if it takes off it can help startups and companies consolidate their ML workflow and can be a bit hit with businesses. However, it faces stiff competition from TensorFlow, which thanks to tech giant Google’s backing is set to become industry standard for machine learning researchers and developers. Also, Google’s TensorFlow is backed by Jeff Dean and gets continued support from the tech giant. It is also used in daily operations and TensorFlow also provides a Visualization tool called Tensorboard that most frameworks usually lack. ML practitioners also cite that the recent version of Tensorflow provides a brand new feature called Eager execution. Databrick’s project MLflow is currently hosted at GitHub and also integrates with the company’s Unified Analytics Platform.

PS: The story was written using a keyboard.
Picture of Richa Bhatia

Richa Bhatia

Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed