Duality: A New Approach to Reinforcement Learning

The environment and its underlying dynamics of all the reinforcement learning problems are typically abstracted as a Markov decision process (MDP). Because MDPs are useful for studying optimisation problems solved via dynamic programming and reinforcement learning. Today, most of the effective existing reinforcement learning (RL) algorithms are rooted in this dynamic programming paradigm.

An alternative paradigm for RL is based on linear programming (LP). The researchers at Google, try to generalise this LP approach and have tried to demonstrate how relevant it is to RL.

A Brief Of Overview Of Duality

Duality or the duality principle is associated with the optimisation theory, which posits that optimisation problems can be perceived as the primal problem or the dual problem.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The solution to the dual problem provides a lower bound to the solution of the primal (minimisation) problem. The Fenchel-Rockafellar duality theorem (FRDT) is one of the many theorems on duality, named after mathematicians Werner Fenchel and R.T Rockafellar. This theorem is considered to be one of the most powerful tools in all of the convex analysis.

The concept of duality is a basic and powerful tool in optimisation and machine learning, especially in the field of convex analysis, allowing a researcher to easily reformulate optimisation problems in alternative ways that are potentially more tractable.

The paper states that the Fenchel-Rockafellar duality and its variants are at the heart of a number of recent RL algorithms, although many of these were originally presented through less generalisable derivations.

Duality Applied to RL

via GoogleAI

In practical scenarios such as robotic navigation in warehouses or on highways, there are multiple constraints that are difficult to keep track of. The researchers at Google with the help of duality try to address this question of how to learn to satisfy the tremendous number of constraints associated with arbitrary input? 

Traditionally, RL implements Q-learning and actor-critic, which, the authors write, often find function approximation, bootstrapping, and off-policy learning to be challenging and the conventional methods obscure them through a series of rough approximations. This detaches the practical implementations of these algorithms from their mathematical foundations.

So, to test their new approach of duality, the researchers implemented a duality-based training on a navigational agent. The objective of the agent is to start at one corner of a multi-room map and navigate to the opposite corner. Based on the performance in this objective, the authors compared their algorithm to an actor-critic approach. 

Although the underlying mathematical problem is the same in both cases, actor-critic uses a number of approximations due to the infeasibility of satisfying a large number of constraints.

Whereas, the algorithm based on duality, the authors claim, is better suited for practical implementation. According to the experimental results, the duality-based implementation achieved significantly higher reward compared to actor-critic.

Depending on the exact form of regularisation, wrote the authors, application of Fenchel-Rockafellar duality leads to objectives which hint at connections to actor-critic via squared Bellman error minimisation as well as max-likelihood policy learning.

According to the authors, the techniques of this new approach can be summarised as follows: 

  • When presented with a problem that appears difficult to solve, consider writing the problem as a constrained convex optimisation and solving its Fenchel-Rockafellar

dual, or its Lagrangian form

  • If the dual is still difficult to solve (e.g. when the primal objective is linear, yielding

a dual with constraints), consider modifying the original objective, e.g., by applying

an appropriate convex regulariser.

This work attempts to formulate the well-known reinforcement learning problem as a mathematical objective with constraints. And, when convex duality is applied repeatedly in combination with a regulariser, an equivalent problem without constraints is obtained. When the problem becomes unconstrained then it enables easier implementation in real world settings. 

The researchers conclude that their algorithms are not only more mathematically principled than existing RL methods, but they also often yield better practical performance, showing the value of unifying mathematical principles with practical implementation.

Link to paper

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.