Last updated September 6, 2020

10 Best ML Engineering Practices

Published on February 24, 2020
by Ram Sagar

Machine learning is the hottest topic in the industry. Therefore, they are one of the highest-paid professionals in the industry. ML and its services are only going to extend their influence and push the boundaries to new realms of the technology revolution. However, deploying ML comes with great responsibility. The black box modeling, though is shedding off its black box reputation, it is crucial to establish trust in both in-house teams and stakeholders.

This can be done by practising a few routines that have been tested at the heart of Google AI research departments. Here are a few best practices, which can help ML engineers in a hassle-free model building:

It’s Okay To Have A Simple Model

First impressions last. So, pick a model that is simple to avoid infrastructure issues. Before exporting your fancy new machine learning system, it is important to determine how to get examples to your learning algorithm. A simple model provides the team with baseline metrics and behaviour that one can use to test more complex models.

Keep The Infrastructure Testable

Machine learning has an element of uncertainty, so ensure to tests the code for creating examples in training and serving. To keep the infrastructure issues in check:

Test getting data into the algorithm and if possible, check statistics in the pipeline in comparison to statistics for the same data processed elsewhere
Test getting models out of the training algorithm and make sure that the model in the training environment delivers the same score as the model in serving environment

Unused features create technical debt and combining it with other features is not working, then drop it out of the infrastructure.

Check The Freshness Of Your Model

Experts suggest monitoring the model for degradation in quality with passing time. If the model is not updated for a day and the quality goes down, then round the clock engineering service is necessary. For instance, if the ML model for Google Play Search is not updated, it can negatively impact within a month.

Don’t Export Models In A Hurry

If the model’s performance is not reasonable on held-out data, then it is important to run a sanity check before exporting and serving the model to the customer. It is good practice to check the area under the ROC curve before exporting.

Stick To Simple Metrics Initially

There are many metrics to evaluate the model’s performance and with such abundance, engineers can end up chasing their tail while choosing metrics. It is advisable to stick to something simple that satisfies the first objective.

The ML objective should be something that is easy to measure and is a proxy for the “true” objective

Indirect effects make great metrics that can be used during A/B testing and during launch decisions.

Keep Models Interpretable

If predictions are interpretable, it becomes easier to debug. This is true for models that use objectives (zero-one loss, various hinge losses, and so on) that try to directly optimise classification accuracy or ranking performance.

Launch Models Regularly

There are three basic reasons to launch new models:

You are coming up with new features
You are tuning regularisation and combining old features in new ways
You are tuning the objective

It is essential to think about how easy it is to add or remove or recombine features and if it is easy to create a fresh copy of the pipeline to verify its correctness. Launching models in this way regularly can keep the quality consistent.

Having Specific Features Is Good

With a plethora of data, it is simpler to learn millions of simple features than a few complex features. For generalisation, it is better to have groups of features, where each feature applies to a very small fraction of data.

Reuse Code

Experts at Google, insist everyone to reuse code between training pipeline and serving pipeline whenever possible. And, try not to use two different programming languages between training and serving. That decision will make it nearly impossible to share code.

Keep Ensembles Simple

An ensemble of models is a “model” which combines the scores of other models to perform better.

To keep things simple, each model should either be an ensemble only taking the input of other models, or a base model taking many features, but not both.

If models on top of other models are trained separately, then combining them can be inefficient.

Use a simple model for ensembling that takes only the output of the “base” models as inputs. And it is good if the incoming models are semantically interpretable so that changes of the underlying models do not confuse the ensemble model.

Most of the problems are, in fact, engineering problems and the above-mentioned tips are one of the many principles that one can refer to while setting up ML pipelines from scratch.

Access all our open Survey & Awards Nomination forms in one place >>

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.

10 Best ML Engineering Practices

It’s Okay To Have A Simple Model

Keep The Infrastructure Testable

Check The Freshness Of Your Model

Don’t Export Models In A Hurry

Stick To Simple Metrics Initially

Keep Models Interpretable

Launch Models Regularly

Having Specific Features Is Good

Reuse Code

Keep Ensembles Simple

Ram Sagar

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.