MITB Banner

OpenAI & Uber AI Proposed A New Approach To Neural Architecture Search

Share

Recently, OpenAI collaborated with UberAI to propose a new approach — Synthetic Petri Dish — for accelerating the most expensive step of Neural Architecture Search (NAS). The researchers explored whether the computational efficiency of NAS can be improved by creating a new kind of surrogate, one that can benefit from miniaturised training and still generalise beyond the observed distribution of ground-truth evaluations.

Deep neural networks have been witnessing success and are able to mitigate various business challenges such as speech recognition, image recognition, machine translation, among others for a few years now.

According to the researchers, Neural Architecture Search (NAS) explores a large space of architectural motifs and is a compute-intensive process that often involves ground-truth evaluation of each motif by instantiating it within a large network, and training and evaluating the network with thousands or more data samples. By motif, the researchers meant the design of a repeating recurrent cell or activation function that is repeated often in a larger Neural Network blueprint.

Behind Synthetic Petri Dish

In this work, the researchers took inspiration from an idea in biology and materialised this idea with machine learning, the application of a Synthetic Petri Dish is created that aims to identify high-performing architectural motifs. Thus, the approach proposed in this research attempted to algorithmically recreate this kind of scientific process for the purpose of finding better neural network motifs. 

According to the researchers, the aim of the Synthetic Petri Dish is to create a microcosm training environment such that the performance of a small-scale motif trained within it well-predicts performance of the fully-expanded motif in the ground-truth evaluation.

How It Works

In the above figure, the left figure illustrates the inner-loop and outer-loop training of Synthetic Petri Dish procedure. The motifs (in this example, activation functions) are extracted from the full network (e.g a 2-layer, 100 wide MLP) and instantiated in separate, much smaller motif-networks (e.g. a two-layer, single-neuron MLP). 

The motif-networks are then trained in the inner-loop with the synthetic training data and evaluated using synthetic validation data. In the outer-loop, an average mean squared error loss is computed between the normalised Petri dish validation losses and the corresponding normalised ground-truth losses. Synthetic training and validation data are optimised by taking gradient steps with respect to the outer-loop loss.

How Is It Different From Other Net Models

According to the researchers, unlike other neural network-based prediction models that parse the structure of the motif to estimate its performance, the Synthetic Petri Dish predicts the performance of the motif by training the actual motif in an artificial setting, thus deriving predictions from its true intrinsic properties.

The researchers compared Synthetic Petri Dish to the control of training a neural network surrogate model to predict performance as a function of the sigmoid slope. This NN-based surrogate control is a 2-layer, 10-neuron-wide feedforward network that takes the sigmoid value as input and predicts the corresponding MNIST network validation accuracy as its output.

Unlike this Neural Network-based model that predicts the performance of new motifs based on their scalar value, the Synthetic Petri Dish trains and evaluates each new motif independently with synthetic data, which means it actually uses a NN with a particular sigmoidal slope in a small experiment and thus should have better information regarding how well this slope performs. 

Key Takeaways From This Research:

  • Synthetic Petri Dish has the capability to predict the performance of new motifs with significantly higher accuracy, especially when insufficient ground truth data is available
  • According to the researchers, this research can inspire a new research direction in studying the performance of extracted components of models in a synthetic diagnostic setting optimised to provide informative evaluations
  • The researchers stated that by approaching architecture search in this way as a kind of question-answering problem on how certain motifs or factors impact final results, they gained the intriguing advantage that the prediction model is no longer a black box.

Read the paper here.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.