Most people can learn how to do a task by seeing someone else do it just once. Robots that are intended to learn by copying people, on the other hand, usually require several human examples before they can successfully repeat the desired behaviour. Researchers claim to have been able to teach robots to perform new activities by seeing a single human example using meta-learning techniques. However, these learning strategies frequently necessitate the collection of real-world data, which can be time-consuming. To address this issue, a team of Imperial College London scholars has created a novel method for robots to learn one-shot imitation.
Task-embedded control networks (TecNets) are methods that allow artificial agents to learn how to accomplish tasks from a single or several demonstrations, as well as artificially manufactured training data. During the robot’s training, the researchers’ method does not necessitate any interaction with real humans. Instead, the system uses TechNets to infer control policies, with human examples included that can condition a given control policy and, in turn, enable one-shot imitation learning.
What is Imitation Learning?
Imitation learning (IL) aims to imitate human behaviour in a certain activity. By learning a mapping between observations and actions, an agent (a learning machine) can be taught to accomplish a task based on demonstrations. The concept of teaching by imitation has been around for a long time; but, because of recent breakthroughs in computing and sensing and a rising need for intelligent applications, the field is gaining traction. It makes it easier to teach complicated activities to students who have an only rudimentary knowledge of the skills.
Without the requirement for explicit programming or task-specific reward functions, generic imitation learning methods might possibly simplify the problem of teaching a task to that of presenting demonstrations. Modern sensors can quickly collect and transmit large amounts of data, and processors with tremendous computing capacity can quickly analyse the sensory input and convert it to actions. This opens the door to a wide range of AI applications that require real-time observation and reaction, including humanoid robots, self-driving cars, human-computer interaction, and computer games, to mention a few. On the other hand, learning by imitation necessitates the use of specific algorithms to learn models successfully and reliably.
Benefits of IL
Using deep neural networks (DNNs), several “recent” algorithms, such as generative adversarial imitation learning, have successfully learned human decision-making techniques from their behaviour data. However, such DNN-based models are “black box” models, making it difficult to explain what information the models have learnt from humans and how they make such decisions, which has not been addressed in imitation learning literature.
Generic imitation learning approaches could help solve the difficulty of training a task to deliver examples without requiring explicit programming or the creation of task-specific reward mechanisms. In a restricted number of use scenarios, generative adversarial imitation learning (GAIL) has shown “tremendous effectiveness, especially when paired with neural network parameterisation,” according to several research articles. GAIL, unlike reinforcement learning, learns the policy and reward function from a demonstration by a human expert.
GAIL is an Inverse Reinforcement Learning (IRL) algorithm. It is built on generative adversarial networks, as the name suggests (GANs). GAIL is an imitation learning algorithm that is not based on a model. When compared to other model-free methods for emulating complicated behaviours, this algorithm has shown substantial performance advantages, especially in big, high-dimensional environments. Despite promising results, the GAIL theory remains mostly unknown. Indeed, it appears to presume that all of the demonstrations come from a single expert and that the demonstrations cannot be extracted.
Conclusion
Given everything, adversarial approaches have been demonstrated to be unstable and can take a long time to converge in the presence of small amounts of data. Some authors who pioneered the Adversarial Inverse Reinforcement Learning (AIRL) argue that GAIL “fails to generalise to varied environments’ dynamics.” According to their findings, AIRL is more resistant to changes in the dynamics of the environment. The London researchers are now aiming to look into other acts that robots could be taught using their method. Human actions can be transmitted from simulation to reality in a variety of ways. There will be a lot more in the near future. These developments will rely heavily on the efforts of Indian researchers.