“Beyond the cost of a robot, there are many design choices in choosing how to set-up the algorithm and the robot.”
Levine et al.,
From Atari to chess, to playing poker to a single robotic arm solving rubik’s cube, deep reinforcement learning has demonstrated remarkable progress on a wide variety of challenging tasks.
Like humans, DeepRL agents adopt strategies to generate long-term rewards. The reward-driven paradigm of learning by trial-and-error is known as reinforcement learning (RL). DeepRL has emerged at the confluence of deep learning and RL, geared to achieve human-level performance across challenging domains.
Application of reinforcement learning calls for setting up of an environment, modelling of reward functions etc. You might even have to start every task from scratch. RL methods can be data-hungry and starting from scratch for every new problem makes it impractical in real-world situations. For instance, RL algorithms require millions of stochastic gradient descent (SGD) steps to train policies that can accomplish complex tasks. The number of training steps will increase with model size. It is well known that the usefulness of captured knowledge depends on the quality of the data provided.
Overview Of DeepRL
Deep RL algorithms leverage the representational power of deep learning to tackle the reinforcement learning problem through smart selection of rewards. Rewards’ mathematical functions are carefully crafted to guide the agent in the desired direction. For example, consider teaching a robotic arm or an AI playing a strategic game like Go or chess to reach a target on its own.
Key Concepts in DeepRL
- On Policy vs Off policy
- Exploration strategies
- Generalization
- Reward Shaping
Exploration algorithms in Deep RL could be based on randomized value functions, unsupervised policy learning or intrinsic motivation. Whereas, memory-based exploration strategies offset the disadvantages of reward-based reinforcement learning. Rewards in varying environments can be inadequate in real time scenarios.
When it comes to DeepRL deployment in real world robotics, collecting high-quality data becomes challenging. This in turn makes generalization difficult. RL generalization typically refers to transfer learning between tasks. Achieving generalisation in robotics requires reinforcement learning algorithms that take advantage of vast amounts of prior data as opposed to computer vision, where humans can label the data. DeepRL agents struggle to transfer their experience to new environments. According to OpenAI researchers, generalizing between tasks still remains difficult for state of the art DeepRL algorithms.
Also Read: Generalization In Reinforcement Learning
In a recent survey published by renowned researcher Sergey Levine and his peers, the authors provide a treatise into how deep RL fares in a robotics context. They addressed many key challenges in RL and offered a new perspective on major challenges that remain to be solved.
Addressing The Challenges
The researchers took various robotic activities such as locomotion, grasping and others into account and explored the current solutions and outstanding challenges crippling these applications.
For example, the researchers observed that grasping still remains one of the significant open problems in robotics. To teach a robot to grasp requires complex interaction with previously unseen objects, closed loop vision-based control to react to unforeseen dynamics or situations, and, in some cases, pre-manipulation to isolate the object to be grasped.
The researchers concluded:
- To learn generalizable grasping, we need unattended data collection and a scalable RL pipeline.
- For getting large varied data, we need to leverage all of the previously collected data so far that is offline and need a framework that makes this easy.
- To achieve maximal performance, combine offline data with a small amount of online data; this leads to a 86% to 96% grasp success.
Another bottleneck in robotic learning is the autonomous and safe collection of a large amount of data. The learning algorithms that perform well in the popular “Gym” environments may not work well on real robots. This is where simulation comes into picture. The researchers suggest simulation can run orders of magnitude faster than real-time, and can start many instances simultaneously. “Combining with sim-to-real transfer techniques, simulators allow us to learn policies that can be deployed in the real world with a minimal amount of real world interaction,” the authors explained.
Deep RL algorithms are notoriously difficult to use in practice. The performance depends on careful settings of the hyperparameters, and often varies substantially between runs. According to researchers at Berkeley,any effective data-driven method for DeepRL should be able to use data to pre-train offline while improving with online fine-tuning. This helps learn about the dynamics of the world and the task being solved.
Key Takeaways
The researchers covered all the bases of deepRL from a robotics perspective. Here are a few takeaways:
- Current deep RL methods are not as inefficient as often believed.
- Of the many challenges, training without persistent human oversight is itself a significant engineering challenge.
- A suitable goal for robotic deep reinforcement learning research would be to make robotic RL as natural and scalable as the learning performed by humans and animals.