DeepMind Wants To Change How Reinforcement Learning ‘Collect & Infer’

The main idea of the ‘Collect & Infer’ paradigm is to re-think data-efficient reinforcement learning using clear separation of data collection and exploitation into two distinct bet connected processes
DeepMind Wants To Change How Reinforcement Learning ‘Collect & Infer’


Reinforcement learning (RL) is the most widely used machine learning algorithm, besides supervised and unsupervised learning and the less common self-supervised and semi-supervised learning. RL focuses on the controlled learning process, where a machine learning algorithm is provided with a set of actions, parameters, and end values. It teaches the machine trial and error. 

From a data efficiency perspective, several methods have been proposed, including online setting, reply buffer, storing experience in a transition memory, etc. In recent years, off-policy actor-critic algorithms have been gaining prominence, where RL algorithms can learn from limited data sets entirely without interaction (offline RL). 


Sign up for your weekly dose of what's up in emerging technology.

Despite the advancement in the field, several challenges put a brake on reinforcement learning, mainly how one uses the collected data, grows the dataset, and builds the most effective datasets. Data efficiency is critical in many real-world scenarios, where gathering data is the main bottleneck, especially in robotics

Extrapolating from these developments, DeepMind researchers have proposed the reinforcement learning process into two distinct sub-processes, data-collection and inference of knowledge, which improves the data efficiency and enhances capabilities for the next generation of RL agents. The researchers call this the ‘Collect and Infer (C&I)’ paradigm. 

Introducing Collect and Infer 

In a paper, ‘Collect & Infer – a fresh look at data-efficient Reinforcement Learning,’ the DeepMind researchers explain how this works and give a lightweight overview of the core concepts and implications of the C&I paradigm. 

The C&I method assumes two sub-processes: acting (data collection) and learning (inference) decoupled but connected through a transition memory, where all data resulting from environment interaction is collected and later drawn for learning. Plus, the research views RL as two independent processes, which offers additional flexibility in algorithm design, and emphasises that these processes can and should be optimised independently.  

The image below shows the ‘collect and infer’ agent. The top part depicts collecting experience, and the lower part is inference (the two parts share policy pool and transition memory). 

Collect and Infer agent (Source: arXiv)

Here’s how it works

DeepMind researchers said that the key idea of the C&I paradigm is to separate reinforcement learning into two processes, which is optimised by considering each process separately. 

  • Process 1: Deals with data collection into a transition memory by interacting with the surrounding or environment 
  • Process 2: Infers knowledge about the environment or surroundings by learning from the memory data 

Further, the team set two objectives to optimise each process: 

  • Optimal inference: given a fixed data batch, what is the correct learning setup to get to the maximally performing policy? 
  • Optimal collection: given an inference process, what is the minimal set of data required to get to a maximally performing policy? 

The researchers also described the algorithms into the following objectives:  

  • Learning is done offline (in a batch setting) assuming fixed data as suggested by ‘optimal inference.’ Data may have been collected by a behaviour policy different from the one that is the learning target. That enables the use of the same data to optimise for multiple objectives simultaneously and coincides with interest in offline reinforcement learning. 
  • Data collection is a process that needs to be optimised in its own right. Naive exploration policies that employ simple random perturbations of a task policy (epsilon greedy) are likely to be insufficient. The behaviour that is optimal for data collection in the sense of ‘optimal collection’ may be quite different from the optimal behaviour for a task of interest. 
  • Treating data collection as a separate process provides novel ways to integrate known methods like skills, innovative exploration schemes, or model-based approaches into the learning process without biasing the final task solution. 
  • Data collection may happen concurrently with an inference or can be conducted separately. 
  • ‘Collect and Infer’ suggests a different focus for evaluation compared to usual regret-based frameworks for exploration. C&I does not aim to optimise task performance during collection. Instead, they distinguish between a learning phase and a deployment phase.


Regarding implications, C&I suggests alternative solutions to several problems that will become prominent as reinforcement learning is applied to more challenging tasks, including multi-task, transfer, or life-long learning. 

Further, the team discussed the use case of C&I in robotics, and how these algorithms are interpreted from the C&I perspective, and where that perspective suggests changes or improves. The example of SAC-X, using basic C&I principles, learns to solve complex scenarios of putting two items in a box after opening the lid. The example highlighted the flexibility of using the C&I paradigm. It suggested an interpolation between pure offline and more conventional online learning scenarios and chimed naturally with the growing interest in data-driven approaches, where large datasets of experience are built up over a period of time, which enabled rapid learning of new behaviours with only small amounts of online experience. 

DeepMind researchers said that decoupling acting and learning, along with emphasising three off-policy learning, gives greater flexibility when designing exploration or other actively optimised data collection strategies. This includes schemes for unsupervised reinforcement learning and unsupervised skill discovery. Leveraging data as a vehicle for knowledge transfer enables new algorithms for multi-task and transfer scenarios. 

Wrapping up 

According to DeepMind, the main idea of the ‘Collect & Infer’ paradigm is to re-think data-efficient reinforcement learning using clear separation of data collection and exploitation into two distinct bet connected processes. Also, to exploit the flexibility of off-policy reinforcement learning in agent design for problems as diverse as online RL, offline RL, or life-long learning. 

The team believes that C&I will become a go-to option for a data-efficient learning agent that treats data as a resource transformed into different types of representations used for action selection (policies or may facilitate future learning problems (models, skills, or perceptual representations). 

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM