MITB Banner

Top Research Papers On Recurrent Neural Networks For NLP Enthusiasts

Share

Illustration by Source: Google Cloud blog

Source: Google Cloud blog

Recurrent Neural Networks (RNN) have become the de facto neural network architecture for Natural Language Processing (NLP) tasks. Over the last few years, recurrent architecture for neural networks has advanced quite a lot with NLP tasks — from Named Entity Recognition to Language Modeling through Machine Translation. As compared to Artificial Neural Networks (ANN), RNNs deal with sequential data, thanks to their “memory”. The success of RNN in NLP tasks can be ascribed to their ability to deal with sequential data, as opposed to ANNs which are known for not having any notion of time. Also, the only input ANNs take into consideration is the current example they are being fed, meanwhile RNNs consider both the current input and a “context unit” built upon what they’ve seen previously.

Understanding Recurrent Neural Network

ML practitioner Denny Britz says that in a traditional neural network, it is generally assumed that all inputs (and outputs) are independent of each other. However, in NLP, if one wants to predict the next word in a sentence, it is best to know which words came before it. RNNs are defined as recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous computations. Another way to define RNNs is that they have a “memory” that captures information about what has been calculated so far. Essentially, RNNs make use of information in long sequences, but in practice, they are limited to looking back only a few steps. Here is what a typical RNN architecture looks like:

Image: Nature

In an article, Britz explains how the diagram shows recurrent neural network being unravelled into a full network. For example, if the sequence is a sentence of five words, the network would unfold into a five-layered neural network, one layer for each word. In terms of disadvantages, RNN is not considered the go-to neural network for all NLP tasks which perform poorly on sentences with a lot of words. Instead, advanced RNNs like LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit) which tend to outperform conventional RNNs.

Top Must-Read Papers on Recurrent Neural Networks

Speech Recognition With Deep Recurrent Neural Networks: This 2013 paper on RNN provides an overview of deep recurrent neural networks. It also showcases multiple levels of representation that have proved effective in deep networks. When trained end-to-end with suitable regularisation, the author finds deep LSTM RNNs scored a test set error of 17.7 percent on the TIMIT phoneme recognition benchmark — so far the best-recorded score.

Training RNNs as Fast as CNNs: This 2017 paper revolutionised the field of natural language processing (NLP) by theorising that CNN and RNN, the two pivotal deep neural network architectures, are widely explored to handle various NLP tasks. This paper presents a comparative study of between CNN and RNN and their performance on NLP tasks, with an aim to guide DNN selection.

Long Short-Term Memory: This 1997 paper authored by Swiss and German researchers from Technical University of Munich introduced what is now the industry standard in NLP tasks — gradient-based method called Long Short-Term Memory (LSTM) and benchmarked it against other tried-and-tested methods such as Recurrent Cascade-Correlation, Neural Sequence Chunking, Backpropagation Through Time and Real-Time Recurrent Learning. The paper proposed key aspects about LSTM – bridging time intervals in excess of 1,000 steps (words) even when the data had noisy and long sequences.

Quasi-Recurrent Neural Networks: As the title of the paper suggests, this 2016 paper delves into RNN which have been panned for the dependence of each timestep’s computation on the previous timestep’s output, thus making RNNs unsuitable for long sequences. The researchers introduced quasi-recurrent neural networks (QRNNs) that alternate convolutional layers, which apply in parallel across timesteps. The paper proposed a better result as compared to LSTM and the researchers posited that thanks to increased parallelism, QRNNs were 16 times faster at both training and testing time. Dubbed as building blocks for a range of sequence tasks QRNNs fared well on tasks such as sentiment classification and language modelling.

Google’s Neural Machine Translation System: Finally, Google’s Neural Machine Translation (NMT), better known as NMT consists of a deep LSTM network with eight encoder and eight decoder layers using attention and residual connections. Pegged as an end-to-end automated translation system it provided a single method to translate between multiple languages. So far all models had been built for a single language pair, there was no model to translate among multiple languages. The model that powers Google Translate consists of two RNNs. In fact, a recent article by a tech magazine indicates that research in NMT is seeing a spike with big tech giants publishing 76 papers between a short period from February to April 2018. From Facebook to IBM and Amazon, companies are investing significantly to improve existing NMT models for better translation.

 

Share
Picture of Richa Bhatia

Richa Bhatia

Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.