The pace in research around Reinforcement Learning (RL) has been growing seriously in the recent years. It's no longer restricted to just the classic problem of robots getting punished or rewarded for actions and then rectifying them. It has moved beyond this context now. \n\nAlthough the robot problem formed the basis for many self-learning applications, RL has taken on a new level. It has been used in virtual environments as well as in gaming for being formidable virtual opponents for gamers or players. Now, RL may even emulate human movements along with their natural behaviour as well.\n\nWe will discuss one particular research study called DeepMimic by academics at University of California, Berkeley, who have managed to simulate acrobatic movements precisely using RL methods. \n\nComputer Graphics For Accurate Movement Visualisation\n\nXue Bin Peng, author of the paper for DeepMimic says that the inspiration for the acrobatic movement simulation project through RL, came from computer graphics which offer precise visualisation of real-world physics and the ability to model them. The possibility presented through animations can aid studies that have explored analysing human body movements for simulation. \n\nBut, the challenge lies in modelling physics-based models for simulation, across other applications. While there are many studies that have created models, there are setbacks with respect to optimisation or dynamics in movements or motion. Recent developments in motion simulation focus on bringing online models for easier representation. However, they fall short when it comes to richer dynamics and implementing more motions into the model. \n\nThis was what drove Peng and the team to provide a single simulation model capable of incorporating a large number of motions including acrobatics. The model also encapsulates RL policies efficiently. It will mean that RL and physics animations go hand in hand, which is a huge improvement. \n\nDeepMimic - A Powerful Model For Motion Imitation\n\nAs mentioned earlier, building physics-based models which incorporate a lot of movement and actions, is quite challenging. Even if they are created with all the considerations, they may realistically fail to achieve significant results for ML. But with DeepMimic, this is not the case. Along with capturing unnatural acrobatic movements, this RL model considers a data-driven approach for these movements, says Peng. \n\n\u201cAn alternative is to take a data-driven approach, where reference motion capture of humans provides examples of natural motions. The character can then be trained to produce more natural behaviours by imitating the reference motions. Imitating motion data in simulation has a long history in computer animation and has seen some recent demonstrations with deep RL. While the results do appear more natural, they are still far from being able to faithfully reproduce a wide variety of motions.\u201d\n\n[su_youtube url="https:\/\/www.youtube.com\/watch?v=vppFvq2quQ0" width="280" height="200"]\n\nIn the context of standard RL, the policies are trained for each acrobatic movement through a motion imitation task. These movements are represented in the form of \u2018target poses\u2019, which are necessary for individual timestep actions. This makes it possible to project complex acrobatic movements smoothly in the model. \n\nCharacters And Tasks \n\nDeepMimic has four character visualisations:\n\n\n 3D humanoid \n Atlas robot model\n T-Rex \n Dragon\n\n\nThese are generated as rigid bodies with kinematic links of three degrees of freedom (DOF) except for the knees and elbows having one DOF. In addition, physical characteristics such as mass and height are also mentioned. All of these arrangements form the structure of bodily acrobatic movements. RL policies are trained for these character objects. \n\nApart from the characters, many tasks are delineated and assigned to these rigid body objects. The tasks classified by DeepMimic\u2019s researchers are given below.\n\n\n Target Heading\n Strike \n Throw\n Terrain traversal\n\n\nBased on these tasks\u2019 categories, a total of 30 skills are designed for the simulation, and are trained in RL. Also, some of the skills are integrated to perform multi-skill actions \u2014 for example, movements like running, jumping and flipping motions are clubbed to get a unique action movement.\n\nTraining For Simulation\n\nAfter finalising the adequate parameters such as policy states, actions rewards and the neural network to map all of these features, the model was subjected to training. Once policy and value functions are calculated and trained, the training process starts sequentially for each instance of the state of reference movements in a batch-wise fashion. \n\nFor imitating desired acrobatic motions, the policies in RL should capture every phase of the motion incrementally over time. This is done through initial state distribution, which helps the RL agent to capture the exact beginning of motion precisely. Similarly for cyclic motions such as backflips, frontflips etc., another strategy called \u2018early termination\u2019 is used. (Details of RL as well as training strategies can be found here). \n\nConclusion\n\nThe results after training show motions emulated accurately through RL. In addition, multiple movement integration also fares very well in visualisation. The physics aspects of DeepMimic model is where it makes a mark, thus presenting the possibilities of emulating a variety of movements. With more and more eccentric movements captured, RL can vastly improve self-learning areas that use motions and movements.