Despite the success of reinforcement learning algorithms, there are few challenges which are still pervasive. Rewards, which make up for much of the RL systems, are tricky to design. A smarter reward system ensures an outcome with better accuracy.\u00a0\r\n\r\nIn the context of reinforcement learning, a reward is a bridge that connects the motivations of the model with that of the objective.\u00a0\r\n\r\nReward design decides the robustness of an RL system. Designing a reward function doesn\u2019t come with much restrictions and developers are free to formulate their own functions. The challenge, however, is the chance of getting stuck in local minima.\r\n\r\nReward functions are peppered with clues to make the system\/model\/machine to move in a certain direction. The clues in this context are a bunch of mathematical expressions that are written with efficient convergence in mind.\r\nAutomating Reward Design\r\nMachine learning practitioners, especially those who deal with reinforcement learning algorithms, encounter a common challenge of making the agent realise that certain task is more lucrative than the other. To do this, they use reward shaping.\u00a0\r\n\r\nDuring the course of learning, the reward is edited based on the feedback that is generated on completion of tasks. This information is used to retrain the RL policy. This process is repeated until the agent performs desirable actions.\r\n\r\nThe challenges to retrain policies and observing for long durations makes one question if reward design can be automated and if there can be a proxy reward that while promoting the learning, also meets the task objective.\u00a0\r\n\r\nIn an attempt to automate the reward design, the Robotics department at Google, introduced AutoRL, a method that automates RL reward design by using evolutionary optimisation over a given objective.\r\n\r\nTo measure the effectiveness, the team at Google, applied AutoRL\u2019s evolutionary reward search to four continuous control benchmarks from OpenAI Gym, including:\r\n\r\n \tAnt\r\n \tWalker2D\r\n \tHumanoidStandup\r\n \tHumanoid\r\n\r\nThese were applied over two RL algorithms: off-policy Soft Actor-Critic and on-policy Proximal Policy Optimisation.\r\n\r\nTo assess AutoRL\u2019s ability to reduce reward engineering while maintaining the quality of existing metrics, the team considered task objectives and standard returns.\r\n\r\nTask objectives measure task achievement for continuous control: distance traveled for Ant, Walker, and Humanoid, and height achieved for Stand Up. Whereas, standard returns are the metrics by which tasks are normally evaluated.\r\nKey Findings\r\nThe authors, in their paper, list the following findings:\r\n\r\n \tEvolving rewards trains better policies than hand-tuned baselines, and on complex problems outperforms hyperparameter-tuned baselines, showing a 489% gain over hyperparameter tuning on a single-task objective for SAC on the Humanoid task.\u00a0\r\n \tSecond, the optimisation over simpler single-task objectives produces comparable results to the carefully hand-tuned standard returns, reducing the need for manual tuning of multi-objective tasks.\u00a0\r\n \tLastly, under the same training budget, reward tuning produces higher-quality policies faster than tuning the learning hyperparameters.\r\n\r\nThe complexity of reward design has also led to the development of RL systems with alternative reward systems. These systems assess sequential social dilemmas in a multi-agent environment to monitor the influence of one agent\u2019s action over the other. These approaches showed promising results in scenarios where there is very little scope for designing reward systems.\u00a0\r\n\r\nDesign of reinforcement learning systems gets special attention owing to their implications in the physical world. Unlike other machine learning models, RL systems are of great use in domains such as robotics.\u00a0\r\n\r\nBe it a pick and place robot that is learning to drop fragile objects or surgeon tool that needs to make micro-cuts, the outcomes are usually of critical nature.