Listen to this story
In gradient descent, to discover a local minimum of a function, take steps proportional to the negative of the function’s gradient or approximation gradient at the current location. Instead, taking steps proportional to the gradient is positive, one approaches a local maximum of that function; this method is known as gradient ascent. This article will help to understand the Gradient Ascent (GA). Following are the topics to be covered.
Table of contents
- Mathematics behind the Gradient Ascent
- When to use gradient ascent
- Implementing gradient ascent in logistic regression
Gradient ascent maximizes the loss function of the algorithm. Let’s start with understanding the mathematics behind GA.
Mathematics behind the Gradient Ascent
Gradient ascent is based on the principle of locating the greatest point on a function and then moving in the direction of the gradient.
In this method, the gradient function is the function of x and y differentiable values. If the coordinates are differentiated for x this means that the gradient function is moving in the x-direction. Similarly, y-direction gradients should be differentiated for y.
The function must be specified and differentiable around the places where it is being evaluated. The gradient ascent technique shown in the graphic representation takes a step in the gradient’s direction. The gradient operator will always indicate the direction of the most significant rise. The magnitude, or step size, will be obtained from the parameter value. This phase is continued until a defined number of steps or the algorithm is within a particular tolerance margin.
The gradient ascent method advances in the direction of the gradient at each step. The gradient is assessed beginning at point P0, and the function proceeds to the next point, P1. The function then advances to P2 when the gradient is reevaluated at P1. This loop will continue until a stopping condition is fulfilled. The gradient operator always ensures that we are travelling in the best direction feasible.
Are you looking for a complete repository of Python libraries used in data science, check out here.
When to use gradient ascent
Gradient climb operates similarly to gradient descent, with one exception. Its objective is to maximise some function rather than to minimise it. The reason for the distinction is that we may wish to maximise a function rather than minimise it at times; for example, if we want to maximise the distance between separation hyperplanes and observations.
In this sense, for each function “f” on which gradient descent is applied, there is a symmetric function “-f” on which gradient ascent may be applied.
This suggests that a problem solved using gradient descent can also be solved using gradient ascent if we mirror it on the axis of the independent variable.
Implementing gradient ascent in logistic regression
For this article, we will use gradient ascent for a logistic regression for a dataset related to social media marketers. The algorithm will predict whether the customer will purchase the product based on different features.
For this purpose, we have to build a custom logistic regression algorithm. Let’s start with importing the necessary libraries.
import numpy as np import pandas as pd
Reading and preprocessing the data
from sklearn.preprocessing import LabelEncoder enc=LabelEncoder() data['Gender_enc']=enc.fit_transform(data['Gender'])
X=data.drop(['Gender','Purchased','User ID'],axis=1) y=data['Purchased']
def sigmoid(self, inX): sig = (1/(1+np.exp(-inX))) return sig def log_likelihood(self, y_true, y_pred): y_pred = np.maximum(np.full(y_pred.shape, self.eps), np.minimum(np.full(y_pred.shape, 1-self.eps), y_pred)) likelihood = (y_true*np.log(y_pred)+(1-y_true)*np.log(1-y_pred)) return np.mean(likelihood) def fit(self, X, y): m = X.shape n = X.shape self.weights = np.zeros(n) for i in range(self.max_iterations): y_pred = self.sigmoid(X*self.weights) gradient = np.mean((y-y_pred)*X.T, axis=1) self.weights += self.learning_rate*gradient likelihood = self.log_likelihood(y,y_pred) self.likelihoods.append(likelihood)
In the sigmoid function, we are calculating the sigmoid for all the data points which will be utilized for the gradient. The likelihood is the cost function for the algorithm. Maximization of the likelihood function is the motive of the algorithm by using the gradient ascent.
In the fit function generating weights for the ascent, it would be an array of either zeroes or one and the number of columns would be the same as the number of columns for the independent variable. The prediction would be calculated based on the sigmoid function. The gradient combined with the learning rate will give the final values for the cost function.
Since Gradient Ascent is an iterative optimization approach for locating local maxima of a differentiable function. We will iterate the steps for 500 cycles. The method advances in the direction of the gradient generated at each point of the cost function curve until the halting requirements are met.
class customclassification: def __init__(self, learning_rate=0.01, max_iterations=500): self.learning_rate = learning_rate self.max_iterations = max_iterations self.likelihoods =  self.eps = 1e-7 def sigmoid(self, inX): sig = (1/(1+np.exp(-inX))) return sig def log_likelihood(self, y_true, y_pred): y_pred = np.maximum(np.full(y_pred.shape, self.eps), np.minimum(np.full(y_pred.shape, 1-self.eps), y_pred)) likelihood = (y_true*np.log(y_pred)+(1-y_true)*np.log(1-y_pred)) return np.mean(likelihood) def fit(self, X, y): m = X.shape n = X.shape self.weights = np.zeros(n) for i in range(self.max_iterations): z = np.dot(X,self.weights) y_pred = self.sigmoid(z) gradient = np.mean((y-y_pred)*X.T, axis=1) self.weights += self.learning_rate*gradient likelihood = self.log_likelihood(y,y_pred) self.likelihoods.append(likelihood) def predict_prob(self,X): z = np.dot(X,self.weights) probabilities = self.sigmoid(z) return probabilities def predict(self, X, threshold): predictions = np.array(list(map(lambda x: 1 if x>threshold else 0, self.predict_prob(X)))) return predictions
Use custom algorithm
LogRegres= customclassification() LogRegres.fit(X_train,y_train) y_hat=LogRegres.predict(X_test,0.5) y_cap = y_hat - y_test count = np.count_nonzero(y_cap==0) accuracy = (count/len(y_hat))*100 accuracy
This accuracy could be further improved by using different data wrangling techniques and by using Stochastic gradient ascent, leaving that to you.
The gradient is a vector that contains all partial derivatives of a function at a given position. On a convex function, gradient descent could be used, and on a concave function, gradient ascent could be used. Gradient descent finds the function’s nearest minimum, whereas gradient ascending seeks the function’s nearest maximum. If the objective function could be flipped, either style of optimization could be utilized for the same issue. With this article, we have understood the gradient ascent.