One of the most exciting areas in machine learning right now is reinforcement learning. Its application is found in a diverse set of sectors like data processing, robotics, manufacturing, recommender systems, energy, and games, among others.
What makes reinforcement learning (RL) different from other kinds of algorithms is that it does not depend on historical data sets. It learns through trial and error like human beings.
Understanding its importance, the last few years have seen an accelerated pace in understanding and improving RL. Think of any big name in tech- be it Facebook, Google, DeepMind, Amazon, or Microsoft, they are all investing significant time, money and effort in bringing out innovations in RL.
For robots to be useful to mankind, they need to perform a variety of tasks. But, even training for one task using offline reinforcement learning will take a massive amount of time and huge computational expenditure.
To work on this issue, Google came out with MT-Opt and Actionable Models. While the first one is a multi-task RL system for automated data collection and multi-task RL training, the latter is a data collection mechanism to collect episodes of various tasks on real robots and demonstrates a successful application of multi-task RL. They also help robots to learn new tasks more quickly.
A leader in the reinforcement learning space, DeepMind gave us some unique innovations this year. It released RGB-stacking as a benchmark for vision-based robotic manipulation. Here, DeepMind used reinforcement learning to train a robotic arm to balance and stack objects of different shapes.
The diversity of objects used and the number of empirical evaluations performed made this reinforcement learning-based project unique. The learning pipeline was divided into three stages- training in simulation by using an off-the-shelf RL algorithm, training a new policy simulation with only realistic observations, and lastly, collecting data using this policy on real robots and bringing out an improved policy from this.
The implementation of sequential decision processes is crucial for those working in reinforcement learning. In order to simplify such a process, social media giant Facebook (now Meta) came out with “SaLinA” just a month back. It is built as an extension of PyTorch and can work in both supervised and unsupervised situations with compatibility options with multiple CPUs and GPUs. Such a method will see usage in systems where large-scale training use cases are involved.
IBM, too, has been active in the reinforcement learning segment in 2021. It released the text-based gaming environment called TextWorld Commonsense (TWC) to work on the problem of infusing RL agents with commonsense knowledge. This method was used to train and evaluate RL agents with a specific commonsense knowledge about objects, their attributes, and affordances. It worked on the issue of sequential decision making by introducing several baseline RL agents.
In the self-supervised learning area, we saw new methodologies coming out. Google released an approach called Reversibility-Aware RL, which adds a separate reversibility estimation component to the self-supervised RL procedure. Google said this method increases the performance of RL agents on several tasks, including the Sokoban puzzle game.
Deep RL in Gaming
As reinforcement learning has a significant impact on games, in the middle of 2021, we saw DeepMind training agents playing games without intervention with the help of reinforcement learning mechanisms. Though previous innovations by DeepMind like AlphaZero beat world champion programs in Chess, Shogi and Go, they still trained separately on each game, unable to learn a new one without repeating the RL procedure from the beginning.
Through this method, however, the agents were able to react to new conditions with adaptation flexibility to new environments. The core part of this research relied on how deep RL can play a role in training neural networks of the agents.
Google has been working on using RL in the gaming domain. In early 2021, it released “Evolving Reinforcement Learning Algorithms”, which showed how to learn analytically interpretable and generalisable RL algorithms by using a graph representation and applying optimisation techniques from the AutoML community.
It used Regularized Evolution to evolve a population of the computational graphs over a set of simple training environments. This helped to better RL algorithms in complex environments with visual observations like Atari games.
Growing interest in RL
With so much happening in the RL space, interest in this area is bound to grow among students and the professional community. To cater to the growing demand, Microsoft organised the Reinforcement Learning (RL) Open Source Fest to introduce students to open source reinforcement learning programs and software development.
Researchers from DeepMind teamed up with the University College London (UCL) to offer students a comprehensive introduction to modern reinforcement learning. It intended to give students a detailed understanding of topics like Markov Decision Processes, sample-based learning algorithms, deep reinforcement learning, etc.
Reinforcement learning and its advancements still have a long way to go, but there has been major progress in the last couple of years. Its usage can be a game-changer for certain industries. With more and more research coming in RL, we can expect to see major breakthroughs in the near future.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Sreejani Bhattacharyya is a journalist with a postgraduate degree in economics. When not writing, she is found reading on geopolitics, economy and philosophy. She can be reached at email@example.com