MITB Banner

Duality: A New Approach to Reinforcement Learning

Share

The environment and its underlying dynamics of all the reinforcement learning problems are typically abstracted as a Markov decision process (MDP). Because MDPs are useful for studying optimisation problems solved via dynamic programming and reinforcement learning. Today, most of the effective existing reinforcement learning (RL) algorithms are rooted in this dynamic programming paradigm.

An alternative paradigm for RL is based on linear programming (LP). The researchers at Google, try to generalise this LP approach and have tried to demonstrate how relevant it is to RL.

A Brief Of Overview Of Duality

Duality or the duality principle is associated with the optimisation theory, which posits that optimisation problems can be perceived as the primal problem or the dual problem.

The solution to the dual problem provides a lower bound to the solution of the primal (minimisation) problem. The Fenchel-Rockafellar duality theorem (FRDT) is one of the many theorems on duality, named after mathematicians Werner Fenchel and R.T Rockafellar. This theorem is considered to be one of the most powerful tools in all of the convex analysis.

The concept of duality is a basic and powerful tool in optimisation and machine learning, especially in the field of convex analysis, allowing a researcher to easily reformulate optimisation problems in alternative ways that are potentially more tractable.

The paper states that the Fenchel-Rockafellar duality and its variants are at the heart of a number of recent RL algorithms, although many of these were originally presented through less generalisable derivations.

Duality Applied to RL

via GoogleAI

In practical scenarios such as robotic navigation in warehouses or on highways, there are multiple constraints that are difficult to keep track of. The researchers at Google with the help of duality try to address this question of how to learn to satisfy the tremendous number of constraints associated with arbitrary input? 

Traditionally, RL implements Q-learning and actor-critic, which, the authors write, often find function approximation, bootstrapping, and off-policy learning to be challenging and the conventional methods obscure them through a series of rough approximations. This detaches the practical implementations of these algorithms from their mathematical foundations.

So, to test their new approach of duality, the researchers implemented a duality-based training on a navigational agent. The objective of the agent is to start at one corner of a multi-room map and navigate to the opposite corner. Based on the performance in this objective, the authors compared their algorithm to an actor-critic approach. 

Although the underlying mathematical problem is the same in both cases, actor-critic uses a number of approximations due to the infeasibility of satisfying a large number of constraints.

Whereas, the algorithm based on duality, the authors claim, is better suited for practical implementation. According to the experimental results, the duality-based implementation achieved significantly higher reward compared to actor-critic.

Depending on the exact form of regularisation, wrote the authors, application of Fenchel-Rockafellar duality leads to objectives which hint at connections to actor-critic via squared Bellman error minimisation as well as max-likelihood policy learning.

According to the authors, the techniques of this new approach can be summarised as follows: 

  • When presented with a problem that appears difficult to solve, consider writing the problem as a constrained convex optimisation and solving its Fenchel-Rockafellar

dual, or its Lagrangian form

  • If the dual is still difficult to solve (e.g. when the primal objective is linear, yielding

a dual with constraints), consider modifying the original objective, e.g., by applying

an appropriate convex regulariser.

This work attempts to formulate the well-known reinforcement learning problem as a mathematical objective with constraints. And, when convex duality is applied repeatedly in combination with a regulariser, an equivalent problem without constraints is obtained. When the problem becomes unconstrained then it enables easier implementation in real world settings. 

The researchers conclude that their algorithms are not only more mathematically principled than existing RL methods, but they also often yield better practical performance, showing the value of unifying mathematical principles with practical implementation.

Link to paper

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.