Simulation enables engineers to prototype rapidly and with minimal human effort. In robotics, physics simulations provide a secure and low-cost virtual playground for robots to gain physical skills through Deep Reinforcement Learning (DRL). However, simulations use hand-derived physics that will have difficulty adapting when tested on real hardware. This challenge is termed the “sim-to-real gap” or the domain adaptation problem. Reinforcement-based approaches( RL-CycleGAN and RetinaGAN) have been utilised to bridge the simulation-to-reality gap for purely perceptual tasks, such as grasping. However, the gap is still present because of the dynamic characteristics of robotic systems. In this case, researchers are prompted to ask whether or not they can find a more accurate physics simulator by examining a few real robot trajectories. If so, then it may be possible to use the improved simulator to give the robot controller a higher chance of succeeding in the real world.
In a paper published in ICRA 2021, titled SimGAN: Hybrid Simulator Identification for Adversarial Reinforcement Learning, researchers proposed to treat the Physics Simulator as a learning component with a particular rewarding function trained by the DRL that penalises differences between the trajectories generated in simulation, that is, robots moving over time.
According to the researchers, reinforcement learning (RL) policies can be trained using simulation data to support more diversified actions in robots. While controller creation in simulation has been made far more automatic due to implementing learning-based methodologies, moving a trained policy from simulation to real hardware typically involves considerable manual work. To account for the possible ranges in the simulation and the actual world, a range should be big enough to include all of the unmodeled differences but not so large as to impede performance.
The researchers, therefore, focused their contribution in :
- A unique simulation identification formulation that is posed as an adversarial RL problem
- A learnt GAN loss that alleviates manual loss design and sensitive excitation trajectories by providing limited set-level supervision
- Reducing the necessity for a properly defined parameter set through an expressive hybrid simulator parameterisation.
A conventional physics simulator is a system to simulate the movement or interaction of objects in a virtual world by solving differential equations. However, given the complexity of the circumstances that robots could experience in the actual world, such environmental modelling techniques would be arduous (or possibly impossible), so it is helpful to employ a machine-based approach instead. While the simulators can fully learn from the data, the learnt simulator may violate the laws of physics when it needs to model scenarios if the training data does not cover a diversity of situations. Hence, the robot trained in such a small simulator in the actual world is more likely to fail.
To address these complications, researchers built a hybrid simulator, combining both neural networks and physics equations. In particular, researchers replace those parameters, often manually defined by the simulator — contact parameters and motor parameters — with a simulation that can be learned as the unmodified contact and motor dynamics details are important causes of the sim-to-real gap.
The third component of the hybrid simulator includes physical equations that ensure that the simulation complies with fundamental physical principles, such as energy preservation, bringing it closer to the real world and lowering the sim-to-real gap.
(Source: ResearchGate – SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning)
Therefore the researchers set up the experiment to see if their method could work in order to:
- Enhance domain adaptability for robots with varying morphologies
- Deal with dynamical mismatches that occur during sim-to-real transfer
- Manage dynamical disparities that are not intuitively translated to the list of parameters that our model identifies but can be absorbed by our model’s state-dependence.
( Source: Google AI Blog)
Generating identical trajectories in a hybrid simulator to those collected on the real robot will be successful if one learns the parameter functions for the simulation. This ability to learn is enabled by having a metric for the trajectory similarity. GANs, created to generate synthetic images with the same distribution or “style” with a limited number of authentic images, can now be used to create synthetic trajectories indistinguishable from real ones.
(Source : Google AI Blog)
Therefore, the research concludes that simulation learning can be thought of as an RL problem. A trained neural network using only a small number of real-world trajectories learns state-dependent contact and motor parameters. To do this, the neural network is configured to produce the simulation’s trajectories with minimum error. Reducing this inaccuracy over a period of time increases the accuracy of a simulation that will ultimately guide the control system.
One of the significant impediments preventing robots from harnessing the power of reinforcement learning is the sim-to-real gap. Researchers addressed this problem by developing a simulator that can more accurately replicate real-world dynamics while requiring only a modest quantity of real-world data. The researchers plan to expand on this basic framework by extending it to other robot learning tasks, including navigation and manipulation.