Gradient Descent – Everything You Need To Know With Implementation In Python

Through this article, we will discuss more optimizers and the most commonly used optimizer gradient descent. We will explore how it works and will check its implementation in python.

While building a deep learning model there are a lot of different things we need to define. First building the model with input layers followed by different dense layers and at last the output layer. Also, when the model structure gets ready while we compile it using optimizer, loss function, and metric to measure the performance of the model. Loss functions are used to compute the error between the actual value and predicted value whereas optimizers are used to reduce these errors so the model performs better and gives good results. We have several different loss functions and optimizers that are used in different situations. These optimizers make use of optimization algorithms.

Through this article, we will discuss more optimizers and the most commonly used optimizer gradient descent. We will explore how it works and will check its implementation in python. 

What we will learn from this article? 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
  • What are Optimizers?
  • What is Gradient Descent? How does it work?
  • How to implement gradient descent in python? 
  • How to use it while compiling a deep learning model?
  1. What are Optimizers? 

Optimizers are the ones that are used to reduce the loss in the model or to reduce the error rate made by deep learning models. The less the error rate better will be the performance of the model. There are several different types of optimizers that are used while compiling the models. Some of them include gradient descent, stochastic gradient descent, adam, etc. All these are used to optimize the performance of the model. They are commonly defined after defining the model structure. Refer to the below code to understand more about defining these. 

model.compile(loss=’binary_crossentropy’, optimizer=’sgd’, metric=[accuracy]) 

  1. What is Gradient Descent? How does it work?

It is the most preferred optimizer that is used to optimize a deep learning model. It uses optimization algorithms to reduce the error and find the minimum values for a function. Gradient descent makes use of derivatives to reach the minima of a function. Also, there are steps that are taken to reach the minimum point which is set by defining the learning rate. It decides how many steps to take to reach the minima. If we define a big value to the learning rate we may exceed the minima of the function whereas if we define it to be very small then it would consume much time to reach the target. There can be chances that gradient descent will miss out on the target if the learning rate is very high. 

The role of derivatives in optimization algorithms is to decide whether to increase or decrease the weights resulting in increasing or decreasing the loss function or cost function. We cannot train a neural network without defining the optimizer and loss functions. They are the mandatory parameters that need to be set while compiling a deep learning model. 

  1. How to implement Gradient Descent in python? 

Now we will see how gradient descent can be implemented in python. We will start by defining the required library first that would be used for numerical calculation and for plotting the graphs. Refer to the below code for the same. 

import numpy as np
import matplotlib.pyplot as plt
Now we will define a function f as a quadratic function and function to compute its gradient. Refer to the below code for the same.
def function(x,a): 
    f = a[2]*x*x + a[1]*x + a[0] 
    return f
def grad(x,a): 
    g = 2*a[2]*x + a[1]
    return g

Now we will plot this function before we compute its minima. Use the below code to do the same. 

x = np.array([-3,-2,-1,0,1,2,3,4,5,6])
a = np.array([-3, -2, 3]) 
f = funct(x,a)

We have values on the X-axis and f(x) on the y-axis. Now let’s define how to use gradient descent to find the minimum. Use the below code for the same. We will first define the starting point, learning rate, and the parameter to stop it like iterations or if the value does not change then it should stop. 

x = 8 

lr = 0.001

change = 1e-5

max_iteration = 500

We have defined X_series the variable to check how the value of x is getting changed. Then in the loop, we have defined the function f at any point(x, a) followed by computing its gradient and then getting the changed values of x which gets computed by subtracting the original value of x from the product of the learning rate and gradient. Then we will define the condition to stop the loop by making use of maximum iteration and change that was previously defined. At last, we are plotting the values. Refer to the below code for the same.  

series = [x]
iterations = 1
while True:
    f = funct(x,a)
    g = grad(x,a)
    new_x = x - lr * g
if np.sum(abs(new_x - x)) < change:
    if iterations > max_iteration:
if iterations % (max_iteration/10) == 0:
        plt.scatter(x, f, marker='*')
        plt.plot(x, f)
    iterations += 1
    x = new_x
    series = np.concatenate((series,[x]))

Now let us see the minimum value of X after iterations. We will check this by printing the min value of the series we defined before. 



The article aimed to demonstrate how we compile a neural network by defining loss function and optimizers. In this article, we also discussed what gradient descent is and how it is used. At last, we did python implementation of gradient descent. Since we did a python implementation but we do not have to use this like this code. These optimizers are already defined in Keras. They can be directly imported and used like the way shown in 1 point. Different optimizers can be used while training a neural net and the performance also gets changed when you use different optimizers. 

Also, check this article where you can monitor the loss and accuracy while training a deep learning model. “Tensorboard Tutorial – Visualize the Model Performance During Training

Rohit Dwivedi
I am currently enrolled in a Post Graduate Program In Artificial Intelligence and Machine learning. Data Science Enthusiast who likes to draw insights from the data. Always amazed with the intelligence of AI. It's really fascinating teaching a machine to see and understand images. Also, the interest gets doubled when the machine can tell you what it just saw. This is where I say I am highly interested in Computer Vision and Natural Language Processing. I love exploring different use cases that can be build with the power of AI. I am the person who first develops something and then explains it to the whole community with my writings.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox