After releasing Polynote in October for simplifying workflows, Netflix this time open-sourced a Python framework called Metaflow for assisting data scientists in managing real-life projects right from prototyping to production. The product was originally developed to enhance productivity while developing production-level code.
Unlike in the past, when data scientists’ primary objective was to analyse data and help solve business challenges, today, their day-to-day activity is focused towards developing ML-based applications. And since development includes a wide range of dependencies, they need to embrace best practices for effectively managing their workflows.
Metaflow For Data Science
Metaflow is a Python library that provides a unified API for helping data scientists execute projects while effortlessly handling the infrastructure stack.
Typically, data scientists need various infrastructures such as data warehouse, computer resources, job scheduler, architecture, versioning, along with their domain-related model development and feature engineering. Such architecture requires specific skills to leverage and streamline development processes. However, data scientists are not proficient with the development processes, and in some cases, they do not prefer to get engaged in organising and orchestrating multiple units of work.
Spending time in software architecture often slackens data scientists, because instead of building models and enhancing them, they focus on the dependencies of projects. This is where the Metaflow comes into play — it provides intuitive approaches to navigate the stack, thereby, increasing the efficiency and productivity of data scientists.
Being an off-the-shelf package, Metaflow handles the lower level of the stack and does not significantly alter the process of data scientists at the top of the stack. Developers can utilise Metaflow with their preferred machine learning libraries such as PyTorch, TensorFlow, among others. And on top of that, it is integrated with Amazon Web Services to facilitate the services on the go.
- Focused on a wide variety of ML use cases: The library will not support specific large-scale projects such as self-driving cars or other real-time biddings. Instead, the idea is to assist numerous ML projects for small or medium-sized use-cases.
- Collaboration: Metaflow enables several types of collaboration by removing the hindrance and allowing large as well as small teams to engage between departments.
- Support for prototyping and production: One of the desirable features is that it supports iterative development; one can start with prototyping with a straightforward script, and later improve it to process strenuous tasks.
- Scalability: As projects scale, it becomes bulky. Thus the framework can help in optimising the performance through compilers like Numba. Besides, it offers hassle-free distributed learning through AWS Sagemaker for vertical and horizontal scalability.
- Failures: With Metaflow, developers can monitor and detect errors before things fail catastrophically. Consequently, they provide solutions to fix and diagnose several problems in development, resulting in cost savings.
Will It Be Helpful
A library can be one of the many options to streamline the development, but it requires changes in algorithms. And data scientists usually do not like to alter their algorithm workflows. Therefore, they embrace integrated development environments (IDEs) to manage the dependencies as it does not transform the code. Even Netflix says that it does not expect the current of Metaflow to be perfect as it is still actively developed.
In fact, data scientists are adopting IDEs like Jupyter Lab for mitigating the pain points pertaining to developments. IDEs allow them to make add extensions and manage architectures.
Metaflow has the potential to optimise the way data scientists develop the applications, but might only see its adoption among developers who are flexible enough to transform their code. However, it may witness a rise as the library is very Pythonic, and developers might not feel an overhaul in their workflows.
Provide your comments below
If you loved this story, do join our Telegram Community.
Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
Rohit is a technology journalist and technophile who likes to communicate the latest trends around cutting-edge technologies in a way that is straightforward to assimilate. In a nutshell, he is deciphering technology. Email: firstname.lastname@example.org