MITB Banner

Antisymmetric RNN: A New Take On Recurrent Neural Networks

Share

Recurrent Neural Networks (RNNs) have found widespread use  across a variety of domains from language modeling and machine translation to speech recognition and recommendation systems.

RNNs are built on the recursive formula, where the new state is a function of the old state and the input. And, RNNs excel for handling time series data. 

However, when it comes to training these networks, few challenges surface. 

The Need To Revisit And Revamp RNNs

Though RNNs have become immensely popular with the NLP tasks, their reputation of succumbing to exploding and vanishing gradients have put them in the back seat.

The main difficulty arises as error signal back-propagated through time (BPTT) suffers from exponential growth or decay, a dilemma commonly referred to as exploding or vanishing gradient.

The exploding gradients problem refers to the large increase in the norm of the gradient during training. Such events are caused by the explosion of the long term components, which can grow exponentially more than short term ones.

Modelling complex temporal dependencies in sequential data using RNNs, especially the long-term dependencies, remains an open challenge.

Gated variants of RNNs, such as long short-term memory (LSTM) networks and gated recurrent units (GRU) were introduced to alleviate these issues.

Identity  and orthogonal  initialization is  another proposed solution  to the exploding or vanishing gradient problem of Deep Neural Networks.

However,  some of these approaches come with significant computational overhead and reportedly hinder representation power of these models. Moreover, orthogonal weight matrices alone do not prevent exploding and vanishing gradients, due to the nonlinear nature of deep neural networks.

In order to address the drawbacks of recurrent neural networks, a new framework of neural networks and their connection between ordinary differential equations(ODE) has been exploited to introduce AntisymmetricRNN. By exploiting the underlying differential equation, the researchers at Google Brain try to capture long-term dependencies.

Antisymmetric RNNs

In  numerical  analysis, stability  theory addresses the  stability of solutions  of ODEs under small perturbations of initial conditions. 

An ODE solution is stable if the long-term behaviour of the system does not depend significantly on the initial conditions.

The performance of the proposed antisymmetric networks is evaluated on four image classification tasks with long-range dependencies. 

The classification is done by feeding pixels of the images as a sequence to RNNs and sending the last hidden state of the RNNs into a fully-connected layer and a softmax function.  

Cross-entropy loss and stochastic gradient descent(SGD) with momentum and Adagrad as optimizers are used here.  In this work, the authors try to draw connections between RNNs and the ordinary differential equation theory and design  new recurrent architectures by discretizing ODEs.   

This  new view  opens up possibilities to  exploit the computational and  theoretical success from dynamical  systems to understand and improve the trainability of RNNs. 

AntisymmetricRNN is a discretization of ODEs. Besides its appealing theoretical properties, this model have competitive performance over strong recurrent baselines on a comprehensive set of benchmark tasks.

Key Takeaways

  • Existing approaches to improving RNN trainability often incur significant computation overhead.  In comparison, AntisymmetricRNN achieves the same goal by design.
  • AntisymmetricRNN exhibits much more predictable dynamics. 
  • It outperforms regular LSTM models on tasks requiring long-term memory and matches the performance on tasks where short-term dependencies dominate despite being much simpler.

By establishing a link between recurrent networks and ordinary differential equations, the authors believe that this work will inspire future research. For example, one such aspect of work can be dedicated towards exploring other stable ordinary differential equations and numerical methods that might lead to novel and well-conditioned recurrent architectures.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.