Meta-learning was introduced to make machine learning models to learn new skills and adapt to the ever changing environments in the presence of finite training precedents. The main objective of this approach is to find model agnostic solutions.
One highly successful meta-learning algorithm has been Model Agnostic Meta-Learning (MAML). This algorithm, with deep neural networks as the underlying model, has been highly influential, with significant follow on work, such as first order variants, probabilistic extensions and augmentation with generative modelling.
The above figure illustrates training of a simple model and a task-agnostic algorithm, Model-Agnostic Meta-Learning (MAML). The training of the model’s parameters is done in such a way so that a small number of gradient updates will lead to faster learning on a new task.
Model Agnostic Meta-Learning optimizes for a set of parameters such that when a gradient step is taken for a specific task say i, the parameters are close to the optimal parameters θ*(i) for task i.
Model agnostic meta-learning or for any machine learning model eventually runs into issues like unlabeled data. The model will be starved on data and is forced to learn on less data. In this scenario, two approaches are widely considered:
- Rapid learning and
- Feature Reuse
Evaluating Rapid Learning And Feature Reuse
Rapid learning is the use of large, efficient changes in the representations. Whereas, feature reuse involves meta initialization with existing high quality features.
In rapid learning, large representational and parameter changes occur during adaptation to each new task as a result of favorable weight conditioning from the meta-initialization. In feature reuse, the meta-initialization already contains highly useful features that can mostly be reused as is for new tasks, so little task-specific adaptation occurs.
As can be seen in the above figure, in Rapid Learning, outer loop training leads to a parameter setting that is well-conditioned for fast learning, and inner loop updates result in significant task specialization. In Feature Reuse, the outer loop leads to parameter values corresponding to reusable features, from which the parameters do not move significantly in the inner loop.
To know whether MAML systems benefit more from rapid learning or feature reuse, researchers from MIT, Google and Cornell University have collaborated to evaluate.
They have performed two sets of experiments:
- We evaluate few-shot learning performance when freezing parameters after MAML training, without test time inner loop adaptation.
- We use representational similarity tools to directly analyze how much the network features and representations change through the inner loop.
MiniImageNet dataset was used for the experiments. The results show feature reuse to be a dominating factor in improving the effectiveness of meta learning algorithms.
The authors in their paper state that for all layers except the head of the neural network, the meta-initialization learned by the outer loop of MAML results in very good features that can be reused as is on new tasks.
And, inner loop adaptation does not significantly change the representations of these layers, even from early on in training. So, they suggest a simplification of the MAML algorithm: ANIL (Almost No Inner Loop) algorithm.
The researchers claim that ANIL algorithm significantly speeds up both training and inference.
Key Takeaways
- Researchers find that feature reuse is the dominant component in MAML’s efficacy
- Introduce ANIL(almost no inner loop) algorithm a simplification of MAML that has identical performance on standard image classification and reinforcement learning benchmarks
- Results show that lower layers of a network is sufficient for few-shot classification at test time instead of final layer
The applications of meta-learning are not limited only to semi-supervised tasks but can be taken advantage in tasks such as item recommendation, density estimation, and reinforcement learning tasks.