Facebook recently introduced few-shot neural architecture search (NAS). The new approach combines the accurate network ranking of vanilla NAS with the speed and minimal computing cost of one-shot NAS. Few-shot NAS enables users to quickly design a powerful customised model for their tasks using just a few GPUs.
Few-shot NAS can effectively design numerous SOTA models, from convolutional neural networks (CNNs) for image recognition to generative adversarial networks (GANs) for image generation. The source code of Latent Action Monte Carlo Tree Search (LA-MCTS), alongside its application to NAS, is available on GitHub.
Why few-shot NAS?
Of late, NAS has become an exciting area in deep learning research, offering promising results in computer vision, especially when specialised models need to be found under different resources and platform constraints (for example, on-device models in virtual reality (VR) headsets).
For years, researchers have used vanilla NAS. It utilises search techniques to accurately explore the search space and evaluate new architectures by training them from scratch. However, this approach requires thousands of GPU hours, leading to high computing costs.
Meanwhile, one-shot NAS significantly lowers the computing cost by using supernet, an extensive network whose edge contains every type of edge connection. Once pre-trained, a supernet can approximate the accuracy of neural architectures in the search space without having to be trained from scratch.
While one-shot NAS reduces GPU requirements, its search can be hampered by inaccurate predictions from the supernet, making it hard to identify suitable architectures. Facebook researchers Yiyang Zhao, Linnan Wang, Yuandong Tian, Rodrigo Fonseca and Tian Guo came up with few-shot NAS to address the issue.
Compared to one-shot NAS, few-shot NAS improves performance estimation by first partitioning the search space into different independent areas and later employing multiple sub-supernets to cover these regions. “To partition to search space in a meaningful way, we choose to leverage the structure of the original supernet. By picking each type of edge connection individually, we choose a way to split the search space that is consistent with how the supernet is constructed,” said researchers.
In other words, few-shot NAS is a trade-off between the accuracy of vanilla NAS and the low computing cost of one-shot NAS.
Here’s how few-shot NAS works
According to Facebook, the innovation offered by few-shot NAS arises from the observation that a supernet can be regarded as a representation of search space, and that we can enumerate every neural architecture by recursively splitting each supernet’s compound edges.
The researchers tested on multiple supernets, checking if it could offer the best aspects of both one-shot NAS and vanilla NAS. “To investigate this idea, we designed a search space containing 1,296 networks. First, we trained the networks in order to rank them according to their true accuracies on the CIFAR10 dataset. We then predicted the 1,296 networks using 6, 36, and 216 sub-supernets and compared the predicted ranking with the true ranking,” wrote Tian, in a blog post.
Interestingly, the researchers found the ranking improved substantially even when adding just a few sub-supernets. With multiple sub-supernets, the predicted accuracy matched well with the ground truth accuracy and the ranking prediction also improved, along with the final search performance. As depicted in the image below:
The researchers also tested their idea on real-world tasks and found that, compared to one-shot NAS, few-shot NAS improved the accuracy of architecture evaluation with a slight increase in evaluation costs. For example, for seven sub-supernets it used in the experiment, few-shot NAS established new SOTA:
- On ImageNet, the researchers found that the models that reach 80.5 percent top-1 accuracy at 600 MFLOPS (million floating-point operations per second) and 77.5 percent top-1 accuracy at 238 MFLOPS
- On CIFAR10, it reached 98.72 percent top-1 accuracy without using extra data or transfer learning.
- In AutoGAN, few-shot NAS outperformed the previously published results by up to 20 percent.
- Extensive experiments showed that few-shot NAS significantly improved various one-shot methods, including four gradient-based and six search-based strategies on three tasks in NasBench-201 and NasBench1-shot-1.
“Overall, our work demonstrates that few-shot NAS is a simple yet highly effective advance over the ability of one-shot NAS to improve ranking prediction. It is also widely applicable to all existing NAS methods,” shared Facebook researchers.
Facebook believes the latest technique will help researchers develop broad applications, particularly when a candidate architecture needs to be evaluated quickly in search of better architectures.