Optimising machine learning models to do specific tasks by manipulating hyperparameters involves significant inputs from the user, which makes it a tedious process. This burden was eased with the introduction of meta-learning, which, in a way, has automated the optimisation part of the process. However, current meta-learning approaches still need manual inputs.
To solve this, the researchers from UC Berkeley, Carnegie Mellon and Stanford University collaborated to introduce unsupervised meta-learning, where machine learning algorithms themselves propose their own task distributions.
Unsupervised meta-learning, say the researchers, would further reduce the amount of human supervision required to solve tasks, potentially inserting a new rung on this ladder of abstraction.
If we take the example of a self-driving car, the tasks can range from finding the right lane discipline, smoothness, safety, and efficiency for each rider. For this, one has to get inputs from the riders so, that a reward function can be designed. This is a cumbersome task for the developers.
When we define task distributions for meta-learning, we do so with some prior knowledge in mind. Without this prior information, tuning a learning procedure is challenging because some configuration might enable learning on some tasks while ignoring other tasks.
If designing task distributions is the bottleneck in applying meta-learning algorithms, why not have meta-learning algorithms propose their own tasks?
Furthermore, No Free Lunch Theorem suggests that it is impossible to propose tasks without additional knowledge.
So, the researchers took cues from reinforcement learning, where a robot can interact with its environment without receiving any reward, knowing that downstream tasks will be constructed by defining reward functions for this very environment (i.e. the real world).
Seen from this perspective, the emergence of unsupervised meta-learning becomes clear i.e. in case of unlabeled data, construct task distributions from this unlabeled data or environment, and then meta-learn to quickly solve these self-proposed tasks.
Even though the tasks constructed can be random, the resulting task distribution is not random, because all tasks share the underlying unlabeled data.
To answer the above questions, the researchers define a regret function, which is an optimal meta-learner that achieves the largest expected reward, averaged across the distribution of tasks.
So, an unsupervised meta-learner can be thought of as a meta-learner that can achieve minimum worst-case ‘regret’ across all possible task distributions within an environment.
Experimental results show that unsupervised meta-RL algorithm, which does not require manual task design, substantially improves on learning from scratch, and is competitive with supervised meta-RL approaches on benchmark tasks
Meta-learning algorithms with memory, believe the researchers, may perform the best with different task proposal mechanisms.
Meta-learning has emerged out as one of the most promising techniques that can lead the community towards artificial general intelligence AGI. Now with unsupervised meta-learning, this goal is one step closer as it further reduces the amount of human supervision required to solve tasks.
The key findings in this work can be summarised as follows:
- This work analyses how exactly we might achieve unsupervised meta-learning
- The researchers define an optimal unsupervised meta-learner; a regret function that can be leveraged to enable self-proposed tasks
- Scaling unsupervised meta-learning to leverage large-scale datasets and complex tasks hold the promise of acquiring learning procedures for solving real-world problems more efficiently than our current learning procedures
- For a variety of robotic control tasks, unsupervised meta-RL can effectively acquire RL procedures
Unsupervised learning is closely connected to unsupervised meta-learning: the former uses unlabeled data to learn features, while the second uses unlabeled data to tune the learning procedure. So, the researchers put the question back to the community, asking if there can be some unifying treatment of both approaches?
Know more about unsupervised meta-learning here.