Search

# Duality: A New Approach to Reinforcement Learning

The environment and its underlying dynamics of all the reinforcement learning problems are typically abstracted as a Markov decision process (MDP). Because MDPs are useful for studying optimisation problems solved via dynamic programming and reinforcement learning. Today, most of the effective existing reinforcement learning (RL) algorithms are rooted in this dynamic programming paradigm.

An alternative paradigm for RL is based on linear programming (LP). The researchers at Google, try to generalise this LP approach and have tried to demonstrate how relevant it is to RL.

### A Brief Of Overview Of Duality

Duality or the duality principle is associated with the optimisation theory, which posits that optimisation problems can be perceived as the primal problem or the dual problem.

##### Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy

The solution to the dual problem provides a lower bound to the solution of the primal (minimisation) problem. The Fenchel-Rockafellar duality theorem (FRDT) is one of the many theorems on duality, named after mathematicians Werner Fenchel and R.T Rockafellar. This theorem is considered to be one of the most powerful tools in all of the convex analysis.

The concept of duality is a basic and powerful tool in optimisation and machine learning, especially in the field of convex analysis, allowing a researcher to easily reformulate optimisation problems in alternative ways that are potentially more tractable.

The paper states that the Fenchel-Rockafellar duality and its variants are at the heart of a number of recent RL algorithms, although many of these were originally presented through less generalisable derivations.

### Duality Applied to RL

In practical scenarios such as robotic navigation in warehouses or on highways, there are multiple constraints that are difficult to keep track of. The researchers at Google with the help of duality try to address this question of how to learn to satisfy the tremendous number of constraints associated with arbitrary input?

Traditionally, RL implements Q-learning and actor-critic, which, the authors write, often find function approximation, bootstrapping, and off-policy learning to be challenging and the conventional methods obscure them through a series of rough approximations. This detaches the practical implementations of these algorithms from their mathematical foundations.

So, to test their new approach of duality, the researchers implemented a duality-based training on a navigational agent. The objective of the agent is to start at one corner of a multi-room map and navigate to the opposite corner. Based on the performance in this objective, the authors compared their algorithm to an actor-critic approach.

Although the underlying mathematical problem is the same in both cases, actor-critic uses a number of approximations due to the infeasibility of satisfying a large number of constraints.

Whereas, the algorithm based on duality, the authors claim, is better suited for practical implementation. According to the experimental results, the duality-based implementation achieved significantly higher reward compared to actor-critic.

Depending on the exact form of regularisation, wrote the authors, application of Fenchel-Rockafellar duality leads to objectives which hint at connections to actor-critic via squared Bellman error minimisation as well as max-likelihood policy learning.

According to the authors, the techniques of this new approach can be summarised as follows:

• When presented with a problem that appears difficult to solve, consider writing the problem as a constrained convex optimisation and solving its Fenchel-Rockafellar

dual, or its Lagrangian form

• If the dual is still difficult to solve (e.g. when the primal objective is linear, yielding

a dual with constraints), consider modifying the original objective, e.g., by applying

an appropriate convex regulariser.

This work attempts to formulate the well-known reinforcement learning problem as a mathematical objective with constraints. And, when convex duality is applied repeatedly in combination with a regulariser, an equivalent problem without constraints is obtained. When the problem becomes unconstrained then it enables easier implementation in real world settings.

The researchers conclude that their algorithms are not only more mathematically principled than existing RL methods, but they also often yield better practical performance, showing the value of unifying mathematical principles with practical implementation.

I have a master's degree in Robotics and I write about machine learning advancements.

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### AI Assists Production in Indian Film Industry

Implementing AI in pre-production can bring down storyboarding process time by 50-80% and reduce the

### Is GPT-4 Really Better than Radiologists?

“Radiology report summaries created by GPT-4 are comparable, and in some cases, even preferred over

### TSMC: The Wizard Behind AI’s Curtain

TSMC anticipates a substantial CAGR of nearly 50% in the AI sector from 2022 to 2027.

Not really.

### Google Gemini To Arrive Sooner Than Expected

This is after announcing the AI at the Google I/O 2023, the company had postponed

### ByteDance to Launch Platform to Build Custom Chatbots

This comes just a few days after OpenAI had delayed its plan to launch a

### This New AI tool Could Mark the Beginning of the End for TikTok and Instagram Influencers

Alibaba Group announces a model framework that can transform still images into dynamic character videos

### Embracing Identity: The Journey of Sujoy Das

“Why is it that corporate diversity efforts are often limited to specific times of the

### The Biggest Data Breaches of 2023

The most significant breaches that impacted the global landscape in 2023.

### NVIDIA Planning Big Expansions in Japan

Prime Minister Fumio Kishida has extended billions of dollars in financial support to bolster TSMC