MITB Banner

MIT’s New Open Source Tool Lets You See Behind The Scenes Of Black Box Modeling

Share

 

A machine learning model can have many dependencies and to store all the components to make sure all features available both offline and online for deployment, all the information is stored in a central repository.

The main objective of having a proper pipeline for any ML model is to exercise control over it. A well-organised pipeline makes the implementation more flexible. It is like having an exploded view of a car engine where you can pick the faulty pieces and replace it- in our case, replacing a chunk of code.

A pipeline consists of a sequence of components; components which are a compilation of computations. Data is sent through these components and is manipulated with the help of computation.

Pipelines, unlike the name would suggest are not one-way flows. They are cyclic in nature and enable iteration to improve the scores of the machine learning algorithms. And, make the model scalable.

A typical machine learning pipeline would consist of the following processes:

  • Data collection
  • Data cleaning
  • Feature extraction (labelling and dimensionality reduction)
  • Model validation
  • Visualisation

Data collection and cleaning are the primary tasks of any machine learning engineer who wants to make meaning out of data. But getting data and especially getting the right data is an uphill task in itself.

Data quality and its accessibility are two main challenges one will come across in the initial stages of building a pipeline.

The captured data should be pulled and put together and the benefits of collection should outweigh the costs of collection and analysis.

But there can be problems associated with the information that is deployed into the model such as:

  • an incorrect model gets pushed
  • incoming data is corrupted
  • incoming data changes and no longer resembles datasets used during training

Researchers from MIT and elsewhere have developed an interactive tool that, for the first time, lets users see and control how increasingly popular automated machine-learning (AutoML) systems work.

The tool, ATMSeer, generates a user-friendly interface that shows in-depth information about a chosen models’ performance, as well as the selection of algorithms and parameters that can all be adjusted.

At The Heart Of ATMSeer

This new tool ATMSeer is built around ‘Auto-Tuned Models (ATM).’ What this model does differently than other automated machine learning models is that it catalogues all the results as it tries to fit the models to data.

ATM randomly selects an algorithmic approach, be it neural networks or decision trees and also model’s hyperparameters like the size of the tree or number of layers in a network.

The model does this act of choosing and tuning hyperparameters repeatedly while assessing the performance. The results of this performance become a determining factor in choosing the next model; a better one. Finally, it displays all the results with models that suit a particular task.

The interface of ATMSeer via MITNews

ATMSeer interface consists of a control panel that allows users to upload datasets and an AutoML system, and start or pause the search process. There is also a “leaderboard” of top-performing models in descending order. A non-expert can decipher the performance of various models with these intuitive visualisations.

ATMSeer includes an “AutoML Profiler,” with panels containing in-depth information about the algorithms and hyperparameters, which can all be adjusted. One panel represents all algorithm classes as histograms — a bar chart that shows the distribution of the algorithm’s performance scores, on a scale of 0 to 10, depending on their hyperparameters.

“We let users pick and see how the AutoML systems works,” says Kalyan Veeramachaneni, a principal research scientist in the MIT Laboratory for Information and Decision Systems (LIDS), who leads the Data to AI group.

Whether it is the market crash or a wrong diagnosis, the after-effects will be certainly irreversible. Tracking the development of machine learning algorithm throughout its life cycle, therefore, becomes crucial.

Know more about Auto-Tuned Models here.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India