# How ML Frameworks Like TensorFlow And PyTorch Handle Gradient Descent  Optimisation is the most important component when we consider machine learning algorithms. It helps in reducing the error and improving the accuracy in the solution of a problem. Gradient Descent is one such algorithm which is used for the purpose of optimisation. Here we take a deeper look at what Gradient Descent is and how it helps in optimisation.

Gradient Descent is the most common optimisation strategy used in ML frameworks. It is basically an iterative algorithm used to minimise a function to its local or global minima. In simple words, Gradient Descent iterates overs a function, adjusting it’s parameters until it finds the minimum. A gradient can be called the partial derivative of a function with respect to its inputs. Basically, it is a measure of the variation in weights with respect to change in error or change in input.

Let us visualise with the simplest example. Consider the following image of a curve:

#### AIM Daily XO

##### Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy For better understanding visualise two-dimensional section the curve. We will get something like this: Now imagine a ball being rolled from the top most end of the curve. The objective is to reach the lowest point. The ball will roll down and then up, repeatedly until it rests at the steepest point. This is how Gradient Descent works. The algorithm repeats and adjusts its parameters or coefficients to find the steepest point.

In the ML context, the Gradient Descent is used to minimise the error by adjusting weights after passing through all the samples in the training set. If the weights are updated after a specified subset of training samples, or after each sample in the training set, then it is called a Stochastic Gradient Descent. The higher the gradient, the steeper the slope and the faster a model can learn. But if the slope is zero, the model stops learning.

With this basic understanding, let us now take a look at how the popular ML packages like TensorFlow and PyTorch solve Gradient Descent.

Consider the simplest example that illustrates the usage of GradientDescentOptimizer class. The highlighted part is where the GradientDescentOptimizer is invoked. GradientDescentOptimizer is called with a step of 0.01 which is the standard value.The minimise function minimises the value of the variable error which is defined as the square difference of the actual and predicted set.

The minimise function is a combination of two functions

• apply_gradients() : This is the second part of minimize(). It returns an Operation that applies gradients.

PyTorch uses the Class torch.optim.SGD to  Implement stochastic Gradient Descent.

Consider the following illustration. The lr parameter stands for learning rate or step of the Gradient Descent and model.parameters returns the parameters learned from the data. The gradient buffer is set to zero by the function optimizer.zero_grad() once for every training iteration to reset the gradient computed by the last data batch

## The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology. A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com

## Our Upcoming Events

24th Mar, 2023 | Webinar

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]