While building a deep learning model there are a lot of different things we need to define. First building the model with input layers followed by different dense layers and at last the output layer. Also, when the model structure gets ready while we compile it using optimizer, loss function, and metric to measure the performance of the model. Loss functions are used to compute the error between the actual value and predicted value whereas optimizers are used to reduce these errors so the model performs better and gives good results. We have several different loss functions and optimizers that are used in different situations. These optimizers make use of optimization algorithms.

Through this article, we will discuss more optimizers and the most commonly used optimizer gradient descent. We will explore how it works and will check its implementation in python.

**What we will learn from this article? **

- What are Optimizers?
- What is Gradient Descent? How does it work?
- How to implement gradient descent in python?
- How to use it while compiling a deep learning model?

**What are Optimizers?**

Optimizers are the ones that are used to reduce the loss in the model or to reduce the error rate made by deep learning models. The less the error rate better will be the performance of the model. There are several different types of optimizers that are used while compiling the models. Some of them include gradient descent, stochastic gradient descent, adam, etc. All these are used to optimize the performance of the model. They are commonly defined after defining the model structure. Refer to the below code to understand more about defining these.

`model.compile(loss=’binary_crossentropy’, optimizer=’sgd’, metric=[accuracy]) `

**What is Gradient Descent? How does it work?**

It is the most preferred optimizer that is used to optimize a deep learning model. It uses optimization algorithms to reduce the error and find the minimum values for a function. Gradient descent makes use of derivatives to reach the minima of a function. Also, there are steps that are taken to reach the minimum point which is set by defining the learning rate. It decides how many steps to take to reach the minima. If we define a big value to the learning rate we may exceed the minima of the function whereas if we define it to be very small then it would consume much time to reach the target. There can be chances that gradient descent will miss out on the target if the learning rate is very high.

The role of derivatives in optimization algorithms is to decide whether to increase or decrease the weights resulting in increasing or decreasing the loss function or cost function. We cannot train a neural network without defining the optimizer and loss functions. They are the mandatory parameters that need to be set while compiling a deep learning model.

**How to implement Gradient Descent in python?**

Now we will see how gradient descent can be implemented in python. We will start by defining the required library first that would be used for numerical calculation and for plotting the graphs. Refer to the below code for the same.

import numpy as np import matplotlib.pyplot as plt Now we will define a function f as a quadratic function and function to compute its gradient. Refer to the below code for the same. def function(x,a): f = a[2]*x*x + a[1]*x + a[0] return f def grad(x,a): g = 2*a[2]*x + a[1] return g

Now we will plot this function before we compute its minima. Use the below code to do the same.

x = np.array([-3,-2,-1,0,1,2,3,4,5,6]) a = np.array([-3, -2, 3]) f = funct(x,a) plt.scatter(x,f) plt.plot(x,f) plt.xlabel(‘X’) plt.ylabel(‘f(X)’)

We have values on the X-axis and f(x) on the y-axis. Now let’s define how to use gradient descent to find the minimum. Use the below code for the same. We will first define the starting point, learning rate, and the parameter to stop it like iterations or if the value does not change then it should stop.

`x = 8 `

`lr = 0.001`

`change = 1e-5`

`max_iteration = 500`

We have defined X_series the variable to check how the value of x is getting changed. Then in the loop, we have defined the function f at any point(x, a) followed by computing its gradient and then getting the changed values of x which gets computed by subtracting the original value of x from the product of the learning rate and gradient. Then we will define the condition to stop the loop by making use of maximum iteration and change that was previously defined. At last, we are plotting the values. Refer to the below code for the same.

series = [x] iterations = 1 while True: f = funct(x,a) g = grad(x,a) new_x = x - lr * g if np.sum(abs(new_x - x)) < change: break if iterations > max_iteration: break if iterations % (max_iteration/10) == 0: plt.scatter(x, f, marker='*') plt.plot(x, f) plt.xlabel(‘X’) plt.ylabel(‘f(X)’) iterations += 1 x = new_x series = np.concatenate((series,[x]))

Now let us see the minimum value of X after iterations. We will check this by printing the min value of the series we defined before.

`print(series.min())`

**Conclusion**

The article aimed to demonstrate how we compile a neural network by defining loss function and optimizers. In this article, we also discussed what gradient descent is and how it is used. At last, we did python implementation of gradient descent. Since we did a python implementation but we do not have to use this like this code. These optimizers are already defined in Keras. They can be directly imported and used like the way shown in 1 point. Different optimizers can be used while training a neural net and the performance also gets changed when you use different optimizers.

Also, check this article where you can monitor the loss and accuracy while training a deep learning model. “Tensorboard Tutorial – Visualize the Model Performance During Training”