Last updated October 18, 2022
In AI Trends & Future

Top 10 Papers to Learn About MLOps

Your new favourite go-to resource about MLOps!

Published on October 18, 2022
by Tasmia Ansari

Listen to this story

The past few years have witnessed remarkable advancements in machine learning. Machine learning operations (MLOps) are therefore becoming integral for data science project implementation. Through this method, companies can generate long-term value and lower the risk associated with AI/ML.

MLOps refers to a set of approaches and tools for deploying ML models in production. Here are 10 papers as your new favourite go-to resources about MLOps.

Let’s dive in!

Machine Learning: The High-Interest Credit Card of Technical Debt

Author(s): D. Sculley et al.

Machine learning is a significant toolkit for building complex systems quickly. However, this paper argues that these quick wins don’t come for free. Using the framework of technical debt, the researchers noted that it is extremely simple to suffer a huge amount of ongoing maintenance costs at the system level when applying ML.

This paper aims to highlight ML-specific risk factors and patterns to avoid. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, and a variety of system-level anti-patterns.

Read the full paper here.

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

Author(s): Dominik Kreuzberger et al.

MLOps is considered a vague term, and its consequences for researchers are ambiguous. To address this gap, the authors conducted mixed-method research to provide an aggregated overview of the necessary principles, components and roles along with the associated architecture and workflows.

The paper guides ML researchers and practitioners who want to automate and operate ML products with a set of technologies.

Read the full paper here.

Operationalizing Machine Learning: An Interview Study

Author(s): Shreya Shankar et al.

Organisations rely on machine learning engineers (MLEs) to deploy and maintain ML pipelines in production. In semi-structured, ethnographic interviews with 18 MLEs working across many applications, the researchers try to understand the unaddressed challenges and the implications for tool builders.

The researchers summarised common practices for successful ML experimentation, deployment, and sustaining production performance. Furthermore, they discuss interviewees’ pain points and anti-patterns, with implications for tool design.

Read the full paper here.

How to avoid machine learning pitfalls: a guide for academic researchers

Author(s): Michael A. Lones

The paper provides a concise outline of some common errors that occur in the use of ML techniques and ways in which they can be avoided. It is intended primarily as a guide for research students. It focuses on issues of particular concern within academic research, such as the need to make rigorous comparisons and reach valid conclusions.

Read the full paper here.

Quality issues in Machine Learning Software Systems

Author(s): Pierre-Olivier Côté, Amin Nikanjam, Rached Bouchoucha, Foutse Khomh

Machine learning models are implemented as software components and deployed in Machine Learning Software Systems (MLSSs). Therefore, quality assurance of these MLSSs is integral because poor decisions can lead to the malfunction of other systems and significant financial losses.

This paper investigates the characteristics of real quality issues in MLSSs from the practitioner’s viewpoint. Through interviews with ML practitioners, the paper identifies a list of bad practices related to poor quality in MLSSs.

Read the full paper here.

Training Transformers Together

Author(s): Alexander Borzunov et al.

Training state-of-the-art models is often expensive and only affordable for large corporations and institutions.

In this demonstration, the researchers collaboratively trained a text-to-image transformer similar to OpenAI’s ‘DALL-E’. They showed that the resulting model generates images of reasonable quality on several prompts.

Read the full paper here.

A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts

Author(s): Konstantin Grotov, Sergey Titov et al.

In this work, the researchers compare Python code written in Jupyter Notebooks and in traditional Python scripts. The objective was to pave the way to study specific problems of notebooks that should be addressed by the development of notebook-specific tools and provide various insights that can be useful in this regard.

Read the full paper here.

Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training

Author(s): Mark Zhao, Niket Agarwal, Aarti Basant et al.

This paper presents Meta’s end-to-end DSI pipeline, composed of a central data warehouse built on distributed storage and a Data PreProcessing Service that eliminates data stalls.

The researchers characterise how multiple models are collaboratively trained across data centres via continuous training. They measure the intense network, memory, and compute resources required by each training job to pre-process samples during training. The paper’s key takeaways include the following:

Identifying hardware bottlenecks.
Discussing opportunities for DSI hardware.
Deploying lessons learned in optimising DSI infrastructure.

Read the full paper here.

The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design

Author(s): Jeffrey Dean

This paper discusses machine learning advancements and their implications on the kinds of computational devices we need to build, especially in the post-Moore’s Law era. It also discusses how machine learning may help with aspects of the circuit design process.

It provides an outline of at least one direction towards multi-task models that are activated and employ better example- and task-based routing than today’s machine learning models.

Read the full paper here.

Asset Management in Machine Learning: A Survey

Author(s): Samuel Idowu, Daniel Strüber, Thorsten Berger

The paper presents a survey of 17 tools with ML asset management support identified in a systematic search. They overview these tools’ features for managing the different types of assets used for engineering ML-based systems and performing experiments.

In conclusion, most asset management support depends on traditional version control systems and only a few tools support an asset granularity level that differentiates between important ML assets, such as datasets and models.

Read the full paper here.

Access all our open Survey & Awards Nomination forms in one place >>

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.

Top 10 Papers to Learn About MLOps

Tasmia Ansari

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

KissanAI Releases Dhenu Llama 3, an Indic LLM for Farmers

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Is it Humane to Bash Humane Ai Pin?

Meta Llama 3 Now Available on Databricks For Enterprise

How Databricks is Enabling Agriculture’s Data Revolution with UPL

How Good is Llama 3 for Indic Languages?

OpenAI Hires Pragya Misra As Its First Employee in India

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

India is Making its Own AI Servers

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

AIM Launches the 3rd Edition of Data Engineering Summit. May 30-31, Bengaluru