A team of researchers from IIT Bombay and Microsoft Research have used reinforcement learning to find approaches that would help figure out the right policies during lockdowns. The researchers have published their work in a paper titled ’ Optimizing Lockdown Policies for Epidemic Control using Reinforcement Learning’.
Reinforcement learning is a branch of AI that deals with teaching machines how to teach themselves. Just like a person learns to ride a bicycle by falling, the reinforcement learning algorithm contains agents that learn from rewards. These rewards are a bunch of mathematical expressions that are designed to make the RL agent do the most desirable task in as many steps as possible.
Overview Of The Algorithm
For finding optimal lockdown policies, the authors have used Deep Q Network, a reinforcement learning (RL) algorithm that runs a large number of simulations of the spread of the disease, while attempting to find the optimal policy for lockdowns. The goal here, wrote the authors, is to quantify the cost of each outcome of the simulation.
The following features were considered to train the RL algorithm:
- Population of own node,
- Fraction infected (symptomatic) in own node,
- Fraction infected (symptomatic) in overall population,
- Fraction of population recovered in own node,
- Fraction dead in own node,
- Potential infectors from rest of population,
- Fraction increase in the symptomatic population in their own node in the last few days, and overall increase as well.
The collected memory is discarded after every simulation episode. The terminal rewards R are computed as soon as the number of active infections in the whole network goes to 0 and calculated as follows:
The training was carried out after each simulation for five epochs using stochastic gradient descent SGD optimizer in Keras with a learning rate LR of 0.001 and a momentum of 0.8.
The algorithm then proposes policies, as a function of parameters such as infectiousness of the disease, its gestation period, the duration of symptoms and the probability of it being fatal. Along with this, the characteristics of the population, such as density, was also considered.
Nature and the execution of lockdowns being imperfect, the authors have tried to tap into realistic scenarios to offer the best policy possible. The results of the experiments show that the policy obtained using reinforcement learning is a viable quantitative approach towards lockdowns.
Key Takeaways
The above illustration is a comparison of infection rates across several policies, where it can be observed that the reinforcement learning approach has the smallest peak that still has smooth evolution and the policies with 5% & 10% lockdowns have lower peaks, and the plot also takes a turn if the lockdowns are ended prematurely and indicates a generation of the second wave of infections.
The objective of this work can be summarized as follows:
- To compute the optimal lockdown/release policy using reinforcement learning for each node in a network, given disease characteristics and network properties.
- The user can utilize this approach without any optimization knowledge.
- The recommended policies are realistic and consider both health and economic costs, with the weight on each factor specified by the user’s preferences.
- The same algorithm can be used to compute the policy for any changes in network data, disease parameters, and cost definitions. Only the relevant input values need to be updated by the user.
- This approach also caters to explainability in ML models as the users can tear down the decisions to its core reason. In a way, alleviating the black box nature of ML algorithms.
Know more about this work here.