Gradient Ascent: When to use it in machine learning? 

Gradient ascent maximizes the loss function of the algorithm
Listen to this story

In gradient descent, to discover a local minimum of a function, take steps proportional to the negative of the function’s gradient or approximation gradient at the current location. Instead, taking steps proportional to the gradient is positive, one approaches a local maximum of that function; this method is known as gradient ascent. This article will help to understand the Gradient Ascent (GA). Following are the topics to be covered.

Table of contents

  1. Mathematics behind the Gradient Ascent
  2. When to use gradient ascent
  3. Implementing gradient ascent in logistic regression

Gradient ascent maximizes the loss function of the algorithm. Let’s start with understanding the mathematics behind GA.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Mathematics behind the Gradient Ascent

Gradient ascent is based on the principle of locating the greatest point on a function and then moving in the direction of the gradient.

In this method, the gradient function is the function of x and y differentiable values. If the coordinates are differentiated for x this means that the gradient function is moving in the x-direction. Similarly, y-direction gradients should be differentiated for y.

The function must be specified and differentiable around the places where it is being evaluated. The gradient ascent technique shown in the graphic representation takes a step in the gradient’s direction. The gradient operator will always indicate the direction of the most significant rise. The magnitude, or step size, will be obtained from the parameter value. This phase is continued until a defined number of steps or the algorithm is within a particular tolerance margin.

Image source

The gradient ascent method advances in the direction of the gradient at each step. The gradient is assessed beginning at point P0, and the function proceeds to the next point, P1. The function then advances to P2 when the gradient is reevaluated at P1. This loop will continue until a stopping condition is fulfilled. The gradient operator always ensures that we are travelling in the best direction feasible.

Are you looking for a complete repository of Python libraries used in data science, check out here.

When to use gradient ascent

Gradient climb operates similarly to gradient descent, with one exception. Its objective is to maximise some function rather than to minimise it. The reason for the distinction is that we may wish to maximise a function rather than minimise it at times; for example, if we want to maximise the distance between separation hyperplanes and observations.

In this sense, for each function “f” on which gradient descent is applied, there is a symmetric function “-f” on which gradient ascent may be applied.

This suggests that a problem solved using gradient descent can also be solved using gradient ascent if we mirror it on the axis of the independent variable.

Implementing gradient ascent in logistic regression

For this article, we will use gradient ascent for a logistic regression for a dataset related to social media marketers. The algorithm will predict whether the customer will purchase the product based on different features.

For this purpose, we have to build a custom logistic regression algorithm. Let’s start with importing the necessary libraries.

import numpy as np
import pandas as pd

Reading and preprocessing the data

data=pd.read_csv('Social_Network_Ads.csv')
data[:5]
Analytics India Magazine
from sklearn.preprocessing import LabelEncoder
enc=LabelEncoder()
data['Gender_enc']=enc.fit_transform(data['Gender'])
X=data.drop(['Gender','Purchased','User ID'],axis=1)
y=data['Purchased']

Custom algorithm

    def sigmoid(self, inX):
        sig = (1/(1+np.exp(-inX))) 
        return sig
    
    def log_likelihood(self, y_true, y_pred):
        y_pred = np.maximum(np.full(y_pred.shape, self.eps), np.minimum(np.full(y_pred.shape, 1-self.eps), y_pred))
        likelihood = (y_true*np.log(y_pred)+(1-y_true)*np.log(1-y_pred))        
        return np.mean(likelihood)
    
    def fit(self, X, y):       
        m = X.shape[0]
        n = X.shape[1]
        self.weights = np.zeros(n)    
 
        for i in range(self.max_iterations):
            y_pred = self.sigmoid(X*self.weights)
            gradient = np.mean((y-y_pred)*X.T, axis=1)
            self.weights +=  self.learning_rate*gradient
            likelihood = self.log_likelihood(y,y_pred)
            self.likelihoods.append(likelihood)

In the sigmoid function, we are calculating the sigmoid for all the data points which will be utilized for the gradient. The likelihood is the cost function for the algorithm. Maximization of the likelihood function is the motive of the algorithm by using the gradient ascent.

In the fit function generating weights for the ascent, it would be an array of either zeroes or one and the number of columns would be the same as the number of columns for the independent variable. The prediction would be calculated based on the sigmoid function. The gradient combined with the learning rate will give the final values for the cost function.

Since Gradient Ascent is an iterative optimization approach for locating local maxima of a differentiable function. We will iterate the steps for 500 cycles. The method advances in the direction of the gradient generated at each point of the cost function curve until the halting requirements are met.

Final code

class customclassification:
    def __init__(self, learning_rate=0.01, max_iterations=500):
        self.learning_rate  = learning_rate
        self.max_iterations = max_iterations
        self.likelihoods    = []
        self.eps = 1e-7
 
    def sigmoid(self, inX):
        sig = (1/(1+np.exp(-inX))) 
        return sig
    
    def log_likelihood(self, y_true, y_pred):
        y_pred = np.maximum(np.full(y_pred.shape, self.eps), np.minimum(np.full(y_pred.shape, 1-self.eps), y_pred))
        likelihood = (y_true*np.log(y_pred)+(1-y_true)*np.log(1-y_pred))        
        return np.mean(likelihood)
    
    def fit(self, X, y):       
        m = X.shape[0]
        n = X.shape[1]
        self.weights = np.zeros(n)    
 
        for i in range(self.max_iterations):
            z  = np.dot(X,self.weights)
            y_pred = self.sigmoid(z)
            gradient = np.mean((y-y_pred)*X.T, axis=1)
            self.weights +=  self.learning_rate*gradient
            likelihood = self.log_likelihood(y,y_pred)
            self.likelihoods.append(likelihood)
    
    def predict_prob(self,X):             
        z = np.dot(X,self.weights)
        probabilities = self.sigmoid(z)
        return probabilities
    
    def predict(self, X, threshold):
        predictions = np.array(list(map(lambda x: 1 if x>threshold else 0, self.predict_prob(X))))       
        return predictions

Use custom algorithm

LogRegres= customclassification()
LogRegres.fit(X_train,y_train)
y_hat=LogRegres.predict(X_test,0.5)
y_cap = y_hat - y_test
count = np.count_nonzero(y_cap==0)   
accuracy = (count/len(y_hat))*100
accuracy
Analytics India Magazine

This accuracy could be further improved by using different data wrangling techniques and by using Stochastic gradient ascent, leaving that to you.

Conclusion

The gradient is a vector that contains all partial derivatives of a function at a given position. On a convex function, gradient descent could be used, and on a concave function, gradient ascent could be used. Gradient descent finds the function’s nearest minimum, whereas gradient ascending seeks the function’s nearest maximum. If the objective function could be flipped, either style of optimization could be utilized for the same issue. With this article, we have understood the gradient ascent.

References

More Great AIM Stories

Sourabh Mehta
Sourabh has worked as a full-time data scientist for an ISP organisation, experienced in analysing patterns and their implementation in product development. He has a keen interest in developing solutions for real-time problems with the help of data both in this universe and metaverse.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.

Now Reliance wants to conquer the AI space

Many believe that Reliance is aggressively scouting for AI and NLP companies in the digital space in a bid to create an Indian equivalent of FAANG – Facebook, Apple, Amazon, Netflix, and Google.

[class^="wpforms-"]
[class^="wpforms-"]