An agent interacts with its environment to generate the training data for most reinforcement learning and sequential decision-making algorithms. These interactions help the system achieve optimal performance. But, these very interactions can sometimes be tough to generate, for instance, when collecting data with a real robot or interacting with a human expert. The agent approach then turns inefficient.
Pondering on this stumbling block, Google AI produced a new RL ecosystem, which has the ability to generate, share, and use datasets efficiently.
What is RLDS?
Reinforcement Learning Datasets, or RLDS, is a suite of tools for recording, replaying, manipulating, annotating, and sharing data for sequential decision-making, including offline RL, learning from demonstrations, or imitation learning. RLDS makes it very easy to share datasets without any loss of information and works by keeping the sequence of interactions instead of randomising them to be agnostic to the underlying original format. This enables users to test new algorithms on a wider range of tasks quickly. Additionally, RLDS also provides tools for collecting data generated by either synthetic agents such as the EnvLogger, or humans (RLDS Creator), and for inspecting and manipulating the collected data. Finally, RLDS’ integration with TensorFlow Datasets (TFDS) facilitates the sharing of RL datasets with the research community that can further explore it.
Sign up for your weekly dose of what's up in emerging technology.
RLDS makes the data format explicit by defining the contents and the meaning of each of the fields of the generated dataset and provides methods that can help re-align and transform the data to fit the format required by any algorithm implementation. To define the data format, RLDS makes use of the inherently standard structure of RL datasets, i.e., sequences, also known as episodes, of interactions (steps) between agents and environments. Here, the agents can be rule-based/automation controllers, formal planners, humans, animals, or a combination of any of these.
Each step contains the current observation and the action applied to the current observation. The reward is obtained from applying certain actions, and the discount is obtained together with the reward. These steps also include additional information that indicates whether the step is the first or last from the episode or if the observation obtained corresponds to a terminal state. Each step and episode might, at times, also contain custom metadata that can be used to store environment-related or model-related data.
To maintain the data usefulness, raw data is ideally stored in a lossless format by recording all the information produced, keeping the temporal relation between the data items, without making any assumption on how the dataset will be used in the future. The RLDS ecosystem comprises a web-based tool called RLDS Creator, which provides a universal interface to any human-controllable environment that can be accessed through a browser. Users can interact with different environments for agents to learn from, such as playing the Atari games online. The interactions are recorded and stored so that they can be loaded back later using RLDS for analysis or train agents.
RLDS ecosystem has been further integrated with TensorFlow Datasets (TFDS), an existing library that can be used for sharing datasets within the machine learning community. Once a dataset becomes a part of TFDS, it is indexed as the global TFDS catalogue, making it easily accessible to any researcher by using tfds.load(name_of_dataset), which loads the data in Tensorflow or Numpy formats. With TFDS, the users can either keep ownership and full control over their data, or all the datasets include a citation to credit the dataset authors.
Consuming the generated data
Researchers can use these generated datasets to analyse, visualise or train a variety of machine learning algorithms, which can be consumed in different formats than how it has been stored. For example, algorithms like R2D2 or R2D3 consume full episodes, while others like Behavioral Cloning or ValueDice, consume batches of randomised steps. To enable this, RLDS provides a library of transformations for RL scenarios. RLDS users can easily implement some high-level functionalities using the optimised transformations, and the pipelines developed are made reusable across all RLDS datasets.