MITB Banner

Reinforcement learning & imitation learning: A comparative analysis

Imitation learning is a training method where the computer imitates human behaviour.

Share

Reinforcement learning powers DeepMind’s MuZero, AlphaStar, Agent57 etc., while imitation learning is at the heart of Waymo’s self-driving cars. But what exactly are these two training methods, and how do they stack up against each other? Let’s find out.

What is reinforcement learning? 

Reinforcement learning refers to how a model learns to perform a task through repeated trial-and-error interactions in a dynamic environment. The systems learn to make decisions based on a reward function without human intervention and explicit programming. RL is considered a viable path to AGI as it does not depend on historical data sets. To that end, tech companies like Facebook, Google, DeepMind, Amazon, and Microsoft have committed substantial resources to pushing the frontiers of RL. The goal of RL is to learn an optimal policy which maximises the long-term cumulative rewards.

What is imitation learning?

Imitation learning is a training method where the computer imitates human behaviour. In IL, instead of the reward function, an expert, usually a human, provides the agent with a set of demonstrations. The agent then tries to learn the optimal policy by following and imitating the expert’s decisions. Finally, the agent learns to map between observations and actions based on the demonstrations. 

Benefits

Reinforcement learning does not need large datasets or historical data to train the agent. Hence, RL bypasses the challenges of data labelling and the pitfalls of biased and incorrect data. The method allows the agent to be innovative and design solutions humans may not have thought of, furthering its adaptability.

Imitation learning doesn’t face training issues such as lack of reward functions and the need for explicit programming. Research shows generative adversarial imitation learning has ‘tremendous effectiveness, especially when paired with neural network parameterisation’ in some use cases. 

Limitations

RL comes with its own set of challenges. Agents can be very hard to train in environments with sparse or no rewards. With lesser samples, it takes the RL system a considerable amount of time to be efficient. For instance, DeepMind’s AlphaGoZero played five million Go games before beating the world champion. The lack of reproducibility and agents not performing well in real-life scenarios are other major limitations.

Imitation learning is used for data-driven models. Means, an unethical model built on biased historical data can pose problems. IL also doesn’t generalise well because the information fed is only a collection of the universal sample. Exactly why models like GPT-3, trained on billions of parameters, tend to go rogue.

Learning efficiency

 

Since reinforcement learning is based on a reward mechanism, the trainer has to set rules. RL works best when the action space of the model is different from that of the expert, allowing the model to learn and innovate based on the problem. However, given the sparse nature of rewards and constant learning and re-learning, reinforcement learning requires several training episodes.

 

Imitation learning is efficient when the action space of the model and the trainer overlaps. For instance, in a self-driving scenario, the action space of the model and human driver will consist of the same break, steering or accelerators. Therefore, imitation learning doesn’t require a lot of training episodes.

Use cases

 

Reinforcement learning is used for text summarisation, chat-bots, self-driving cars, online stock trading, automating the data centre cooling, and recommendation systems. It is also used in games like Pac-Man. DeepMind’s AlphaGo Zero is another example where the model learns to play Go from scratch by playing against itself. 

 

The first self-driving car, ALVINN, is a classic example of imitation learning. The car was fitted with sensors that had to learn to map the sensor inputs into steering angles and drive autonomously. Today companies like Tesla and Waymo leverage imitation learning for their self-driving cars. DeepMind has also leveraged the technique in their model MIA.

Share
Picture of Avi Gopani

Avi Gopani

Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.