Active Hackathon

MIT’s New Open Source Tool Lets You See Behind The Scenes Of Black Box Modeling



Sign up for your weekly dose of what's up in emerging technology.

A machine learning model can have many dependencies and to store all the components to make sure all features available both offline and online for deployment, all the information is stored in a central repository.

The main objective of having a proper pipeline for any ML model is to exercise control over it. A well-organised pipeline makes the implementation more flexible. It is like having an exploded view of a car engine where you can pick the faulty pieces and replace it- in our case, replacing a chunk of code.

A pipeline consists of a sequence of components; components which are a compilation of computations. Data is sent through these components and is manipulated with the help of computation.

Pipelines, unlike the name would suggest are not one-way flows. They are cyclic in nature and enable iteration to improve the scores of the machine learning algorithms. And, make the model scalable.

A typical machine learning pipeline would consist of the following processes:

  • Data collection
  • Data cleaning
  • Feature extraction (labelling and dimensionality reduction)
  • Model validation
  • Visualisation

Data collection and cleaning are the primary tasks of any machine learning engineer who wants to make meaning out of data. But getting data and especially getting the right data is an uphill task in itself.

Data quality and its accessibility are two main challenges one will come across in the initial stages of building a pipeline.

The captured data should be pulled and put together and the benefits of collection should outweigh the costs of collection and analysis.

But there can be problems associated with the information that is deployed into the model such as:

  • an incorrect model gets pushed
  • incoming data is corrupted
  • incoming data changes and no longer resembles datasets used during training

Researchers from MIT and elsewhere have developed an interactive tool that, for the first time, lets users see and control how increasingly popular automated machine-learning (AutoML) systems work.

The tool, ATMSeer, generates a user-friendly interface that shows in-depth information about a chosen models’ performance, as well as the selection of algorithms and parameters that can all be adjusted.

At The Heart Of ATMSeer

This new tool ATMSeer is built around ‘Auto-Tuned Models (ATM).’ What this model does differently than other automated machine learning models is that it catalogues all the results as it tries to fit the models to data.

ATM randomly selects an algorithmic approach, be it neural networks or decision trees and also model’s hyperparameters like the size of the tree or number of layers in a network.

The model does this act of choosing and tuning hyperparameters repeatedly while assessing the performance. The results of this performance become a determining factor in choosing the next model; a better one. Finally, it displays all the results with models that suit a particular task.

The interface of ATMSeer via MITNews

ATMSeer interface consists of a control panel that allows users to upload datasets and an AutoML system, and start or pause the search process. There is also a “leaderboard” of top-performing models in descending order. A non-expert can decipher the performance of various models with these intuitive visualisations.

ATMSeer includes an “AutoML Profiler,” with panels containing in-depth information about the algorithms and hyperparameters, which can all be adjusted. One panel represents all algorithm classes as histograms — a bar chart that shows the distribution of the algorithm’s performance scores, on a scale of 0 to 10, depending on their hyperparameters.

“We let users pick and see how the AutoML systems works,” says Kalyan Veeramachaneni, a principal research scientist in the MIT Laboratory for Information and Decision Systems (LIDS), who leads the Data to AI group.

Whether it is the market crash or a wrong diagnosis, the after-effects will be certainly irreversible. Tracking the development of machine learning algorithm throughout its life cycle, therefore, becomes crucial.

Know more about Auto-Tuned Models here.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM