Complete Guide To Exploding Gradient Problem

Exploding Gradient Problem
Image © Fixing a problem concept by untangling a knot with a closeup view of hands of a man unravelling a badly knotted piece of string over a red background

Neural Networks have surely saved us many at times, the way we have used them for different use cases if simply phenomenal. This concept of deep learning was in talks for decades but because of computational issues, it was side talked for a few years. Deep Learning has got its hype again, many think that it has come a few years ago but that ain’t true. 

With computational issues, there were many other issues with a neural network too. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

One such is Exploding gradient.

In this article you will learn about:

  1. What is exploding gradient and what issues did it cause?
  2. How to identify it?
  3. How to rectify it?

What is exploding gradient and how does it hamper us?

It can be understood as a recurrent neural network. For those who don’t understand what a recurrent neural network is, can be intuited as a Neural network who gives feedback to its own self after every iteration of the self. Here feedback means the changing of the weight.

Source: Research gate.

In gradient descent we try to find the global minimum of the cost function which will be the optimal solution for us.

In this, the flow of information is from x1 to y3, in between we see h0,h1 etc which are the hidden layers. These hidden layers add biases and weight which is referred to as w. While propagating the information from y3 to x1, it will have to go through the hidden layers. With every iteration, the weights are set again. In RNN’s the weights are set to itself for a hidden layer itself too. That term is called Wrec which stands for Weight recurring.

The output value at y3 is multiplied by the weights of h2 which is then given to h1 whose weights are multiplied by that of h1 and thus this goes on. Here the thing that we have to understand is that if the weights that are multiplied by the output of y3 are less than 1 then with time the actual value will diminish. Similarly, if the weights that are multiplied are more than one so eventually the value will become exponentially larger than the usual one. 

So for it to not change the value of the weights has to be equal to one. 

So here, in the situation where the value of the weights is larger than 1, that problem is called exploding gradient because it hampers the gradient descent algorithm. When the weights are less than 1 then it is called vanishing gradient because the value of the gradient becomes considerably small with time. The actual weights are greater than one and thus the output becomes exponentially larger at the end which hinders the accuracy and thus model training. A network with the problem of exploding gradient won’t be able to learn from its training data. This is a serious problem.

How to identify exploding gradients?

There are a few ways by which you can get an idea of whether your model is suffering from exploding gradients or not. They are:

  1. If the model weights become unexpectedly large in the end.
  2. Your model has a poor loss
  3. Or the model displays NaN loss whilst training.
  4. The gradient value for error persists over 1.0 for every subsequent iteration during training.

How to deal with an exploding gradient?

  1. Use LSTM’s (Long short term memory)

LSTM’s store the information and then is tolled against the values of the previous iterations. Here what happens is the value of Wrec is equalled to 1 which later doesn’t really impact the gradient.

The sign sigma is for sigmoid activation function, tanh is for the tangent hyperbolic activation function. The value x which is coming out from ht is the final output value.

Xt is the value that is added to the system, more like an input vector. 

There’s a lot more to it which you can understand and read from here.

  1. Gradient Clipping

In really simple terms, it can be understood as clipping the size of the gradient by limiting it to a certain range of acceptable values. 

This is a process that is done before the gradient descent step takes place.

You can read more about gradient clipping from the research paper here.

  1. Weight Regularization

In this what we do is penalise the network’s loss function by regularising the loss. 

We use L1 regularisation or L2 regularisation which adds the square of the value to it.

These regularisations techniques: L1 and L2 can be used for controlling the exploding gradients. You can read more from the research paper from here

Conclusion

This article is aimed to discuss the issues that we may have whilst training a neural network in the step of backpropagation. This issue is addressed by the name exploding gradient when the weight recurring is greater than 1 and vanishing gradient when weight recurring is less than 1. We had also discussed how to identify the problem of exploding gradients which is by identifying and observing the loss and the weights of the model.

Later we halted the article with a few solutions to our problem. LSTM is one of the most prominently used solutions for the same and apart from that, we had discussed gradient clipping and regularization techniques.

Hope you liked the article.

More Great AIM Stories

Bhavishya Pandit
Understanding and building fathomable approaches to problem statements is what I like the most. I love talking about conversations whose main plot is machine learning, computer vision, deep learning, data analysis and visualization. Apart from them, my interest also lies in listening to business podcasts, use cases and reading self help books.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.

Now Reliance wants to conquer the AI space

Many believe that Reliance is aggressively scouting for AI and NLP companies in the digital space in a bid to create an Indian equivalent of FAANG – Facebook, Apple, Amazon, Netflix, and Google.

[class^="wpforms-"]
[class^="wpforms-"]