Active Hackathon

The Societal Implications of Deep Reinforcement Learning

Machine learning (ML) can handle many complex tasks than just output singular decisions based on a labelled training dataset. Reinforcement learning (RL), a subset of ML, can train an agent to learn through interaction with the environment and use trial and error methods to alter its behaviour based on feedback. While the RL model can make decisions using a large table, modern RL applications are far too complex for the tabular approach to suffice. Deep Learning (DL), another subset of ML, can come in handy here: It uses a large matrix of numerical values to produce an output through the repeated application of mathematical operations. 

A deep reinforcement learning (DRL) model is a combination of RL and DL, where RL helps the network learn which actions to take based on the inputs or rewards, and DL will scale it for more complex environments. A DRL system works best when they have a well-defined environment, a clear reward function, plenty of data, and lots of computing power. Most DRL models do not check all the boxes, limiting the widespread deployment of DRL. 


Sign up for your weekly dose of what's up in emerging technology.

Despite these limitations, DRL models enjoy a good track record. On the other hand, DRL models are raising a few concerns as well. According to a study published in the Journal of AI Research, the situation will worsen as DRL makes strides.

Human Oversight

According to the Ethics Guidelines for Trustworthy AI, published by the European Commission, AI systems must allow human oversight. DRL’s complex applications aim to bring more autonomy to machines and could potentially be used in systems that make hundreds of decisions in a short period. This poses a challenge since it is unclear how human oversight could look like in such a context.

DRL systems that can learn continually after deployment pose another challenge. These systems will learn at a pace that is challenging for humans to keep up with. The question of how humans will be able to maintain oversight on continually learning systems has not been addressed in existing ethics and governance proposals. 

Humans should at least be able to impose constraints when the system is being designed or review decisions after they have been made.

Safety and Reliability

RL is the core of DRL and is trained using a trial and error method, which means an application learns from its mistakes. In some scenarios, mistakes are unaffordable. For instance, if a Cobot working alongside humans decides to move his hand in the wrong direction could have fatal repercussions.

Engineers need to develop new methods like training algorithms using simulations to make the models robust to avoid harm. However, this will not be useful for DRL systems that continue to learn from data after deployment. Continual monitoring of such systems, along with continuous testing, evaluation, and validation, is critical. Explainable AI models that are inherently easier to understand could help in these situations.

Harms From Reward Function Design

RL is based on rewards and can pose a threat when specific reward functions in the DRL systems result in abnormal behaviour. For example, an RL agent was trained to play a boat race game. The boat was incentivised to finish the race, but it ended up knocking its opponents to gain points but never really finished the race.

Sometimes, even if the behaviours are correct, there may be chances of broader consequences. For instance, social media content-selection algorithms are optimised to show content a user is most likely to click. But the ubiquitous use of these algorithms has instead ended up changing users’ preferences. The algorithms start to traffic in extreme polarising content, leading to filter bubbles, echo chambers and resurgences of fascism and crumbling of economies.

Such consequences are a result of companies focusing on specific objectives rather than individual or societal interests. To address this issue, DRL systems will require a broader sense of what responsible and ethical reward functions should look like.

Incentives For Data Collection

DRL is dependent on huge amounts of data. Up-to-date and expansive data collection of people and societies presents several concerns regarding data privacy violations or mass surveillance. 

For instance, DRLs used in smart cities. As cities get digitised, effective DRL systems could help with better resource management, but they could also intrude on people’s privacy by collecting sensitive data. Lack of transparency in data use can further lead to a lack of accountability. And the collected data could be used for different purposes than it’s intended for. India’s Aadhaar card is a good case in point. Also, the collection of personal data could disproportionately affect vulnerable communities.

Security And Potential Misuse

DRL systems will live up to their potential when deployed in real-life, time-sensitive applications like self-driving cars. In such cases, these systems will have to be robust, not only to various situations they could face in the real world but also to adversarial inputs, like a fake traffic sign. Research has shown that it is harder to distinguish adversarial attacks in DRLs than other ML approaches since the training data is continuously changing. 

DRL systems also risk potential misuse. For instance, they could be used for spreading disinformation. DRL algorithms can enable this with more accuracy and speed.

Automation And Future Of Work

Several studies suggest low-wage or manual jobs are unlikely to get automated because of the skill and mobility needed for such jobs. However, with the advances in DRL and other advanced technologies like robotics, SLAM or embedded vision systems, the automation of these jobs could be accelerated. On the flip side, DRL systems could also reduce human labour. It could, for example, automate tedious and dangerous aspects of manual labour.

More Great AIM Stories

Kashyap Raibagi
Kashyap currently works as a Tech Journalist at Analytics India Magazine (AIM). Reach out at

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022

How does the Indian Army want to use AI?

An AI system that can collect data, analyse them and present the same to the commander in a very short time frame is one of the key requirements for the Indian Army

How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?