The continuous reward and reprimand system of reinforcement learning has come a long way from its initial days. While the technique has taken time to develop and doesn’t have the simplest application, it is behind some of the most important advancements in AI, like leading self-driving software in autonomous vehicles and AI racking up wins in games like Poker. Reinforcement learning algorithms like AlphaGo and AlphaZero were able to excel at a game like Go by just playing it on its own. Despite the challenges involving reinforcement learning, it is the method that is closest to human cognitive learning. Fortunately, aside from the gaming domain, which is more competitive and cutting-edge, there are a growing number of reinforcement learning frameworks that are publicly available now.
DeepMind’s OpenSpiel
DeepMind is one of the most active contributors to open-source deep learning stacks. Even back in 2019, Alphabet’s DeepMind introduced a reinforcement learning framework that was game-oriented, called OpenSpiel. Basically, the framework contains a package of environments and algorithms that can help with research in general reinforcement learning, especially in the context of games. OpenSpiel provides tools for searching and planning in games as well as analysing learning dynamics and other common evaluation metrics.
The framework supports more than 20 single and multi-agent game types, including collaborative, zero-sum games, one-shot games and sequential games. That is, in addition, to strictly turn-taking games, auction games, matrix games, and simultaneous-move games, plus perfect games (where players are perfectly informed of all the events that have previously occurred when making a decision) and imperfect information games (where decisions are made simultaneously).
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
The developers kept simplicity and minimalism as the main ethos while building OpenSpiel, due to which it uses reference implementations instead of fully optimised and high performing codes. The framework also has minimum dependencies and keeps install footprints at a minimum, cutting down the chance of compatibility issues. The framework is also easily installed and easy to understand and extend.

Source: Research Paper, Game implementations in OpenSpiel
Games in OpenSpiel are written in C++, while some custom RL environments are available in Python.
Open AI’s Gym
Source: OpenAI Gym, Sample of a Gym environment for a game
OpenAI created Gym with the intention to maintain a toolkit that develops and compares reinforcement learning algorithms. It is a Python library that contains a vast number of testing environments so that users can write general algorithms and test them on Gym’s RL agent algorithms’ shared interface. Gym has specific environments that are arranged in an environment-agent style. This means that the framework gives the user access to an agent that can perform certain actions in an environment. Once it performs the action, Gym gets the observation and reward as the reaction to the action taken.
The environments that Gym offers are: Algorithmic, Atari, classic control and toy text, 2D and 3D robots. Gym was created to fill in the gap that was present in the standardisation of environments used in various publications. A small tweak in the definition of the problem, such as the reward or the actions, can up the difficulty level of the task. Besides, there was also a need for better benchmarks as the pre-existing open-source RL frameworks were not diverse enough.
TensorFlow’s TF-Agents
TensorFlow’s TF-Agents was built as an open source infrastructure paradigm to help build parallel RL algorithms on TensorFlow. The framework provides various components that match with the main portions of an RL problem to help users design and implement algorithms easily.
Instead of making singular observations, the platform simulates two parallel environments and performs the neural network computation on a batch instead. This removes the need for the requirement for manual synchronisation and allows the TensorFlow engine to parallelise computation. All environments within the framework are built using separate Python processes.
Meta AI’s ReAgent
Source: MetaAI, ReAgent’s serving platform
Meta AI released ReAgent in 2019 as a toolkit to build models that could be used for guiding decision-making in real-life situations. Coined after combining the terms ‘reasoning’ and ‘agents,’ the framework is currently used by the social media platform Facebook to make millions of decisions each day.
ReAgent is used for three major resources: models that make decisions on the basis of feedback, an offline module to evaluate how models perform before they go into production, and a platform that deploys models at scale, collects feedback and iterates the models fast.
ReAgent was built on the first open-source end-to-end RL platform that was meant to optimise systems on a large scale called Horizon. While Horizon could only be employed in models that were en route development instead of existing models, ReAgent was created as a tiny C++ library and could be embedded into any application.
Uber AI’s Fiber
Source: Uber engineering, How Fiber works on a computer cluster
As machine learning tasks have multiplied, so has the requirement for computation power. To help address this issue, Uber AI released Fiber, a Python-based library that works with computer clusters. Fiber was developed with the initial idea of powering large scale parallel computing projects within Uber itself.
Fiber is comparable to ipyparallel, which is iPython for parallel computing, spark and the regular Python multiprocessing library. The research conducted by Uber AI showed when the tasks were shorter, Fiber outperformed its alternatives. To be able to run on different types of cluster management systems, Fiber was divided into three layers: API layer, backend layer and cluster layer.
Fiber is also adept at error handling in pools. Once a new pool is created, an associated task queue, result queue and a pending table are created. Every fresh task is added to the queue that is then shared between the master and worker processes. A user grabs a task from the queue and then runs functions within that task. Once a task is done from the task queue, there is an entry added to the pending table.