# Gradient Ascent: When to use it in machine learning?

Gradient ascent maximizes the loss function of the algorithm Listen to this story

In gradient descent, to discover a local minimum of a function, take steps proportional to the negative of the function’s gradient or approximation gradient at the current location. Instead, taking steps proportional to the gradient is positive, one approaches a local maximum of that function; this method is known as gradient ascent. This article will help to understand the Gradient Ascent (GA). Following are the topics to be covered.

1. Mathematics behind the Gradient Ascent
2. When to use gradient ascent
3. Implementing gradient ascent in logistic regression

Gradient ascent maximizes the loss function of the algorithm. Let’s start with understanding the mathematics behind GA.

## Mathematics behind the Gradient Ascent

Gradient ascent is based on the principle of locating the greatest point on a function and then moving in the direction of the gradient.

In this method, the gradient function is the function of x and y differentiable values. If the coordinates are differentiated for x this means that the gradient function is moving in the x-direction. Similarly, y-direction gradients should be differentiated for y.

The function must be specified and differentiable around the places where it is being evaluated. The gradient ascent technique shown in the graphic representation takes a step in the gradient’s direction. The gradient operator will always indicate the direction of the most significant rise. The magnitude, or step size, will be obtained from the parameter value. This phase is continued until a defined number of steps or the algorithm is within a particular tolerance margin.

The gradient ascent method advances in the direction of the gradient at each step. The gradient is assessed beginning at point P0, and the function proceeds to the next point, P1. The function then advances to P2 when the gradient is reevaluated at P1. This loop will continue until a stopping condition is fulfilled. The gradient operator always ensures that we are travelling in the best direction feasible.

Are you looking for a complete repository of Python libraries used in data science, check out here.

## When to use gradient ascent

Gradient climb operates similarly to gradient descent, with one exception. Its objective is to maximise some function rather than to minimise it. The reason for the distinction is that we may wish to maximise a function rather than minimise it at times; for example, if we want to maximise the distance between separation hyperplanes and observations.

In this sense, for each function “f” on which gradient descent is applied, there is a symmetric function “-f” on which gradient ascent may be applied.

This suggests that a problem solved using gradient descent can also be solved using gradient ascent if we mirror it on the axis of the independent variable.

## Implementing gradient ascent in logistic regression

For this article, we will use gradient ascent for a logistic regression for a dataset related to social media marketers. The algorithm will predict whether the customer will purchase the product based on different features.

For this purpose, we have to build a custom logistic regression algorithm. Let’s start with importing the necessary libraries.

```import numpy as np
import pandas as pd```

Reading and preprocessing the data

```data=pd.read_csv('Social_Network_Ads.csv')
data[:5]```
```from sklearn.preprocessing import LabelEncoder
enc=LabelEncoder()
data['Gender_enc']=enc.fit_transform(data['Gender'])```
```X=data.drop(['Gender','Purchased','User ID'],axis=1)
y=data['Purchased']```

Custom algorithm

```    def sigmoid(self, inX):
sig = (1/(1+np.exp(-inX)))
return sig

def log_likelihood(self, y_true, y_pred):
y_pred = np.maximum(np.full(y_pred.shape, self.eps), np.minimum(np.full(y_pred.shape, 1-self.eps), y_pred))
likelihood = (y_true*np.log(y_pred)+(1-y_true)*np.log(1-y_pred))
return np.mean(likelihood)

def fit(self, X, y):
m = X.shape
n = X.shape
self.weights = np.zeros(n)

for i in range(self.max_iterations):
y_pred = self.sigmoid(X*self.weights)
gradient = np.mean((y-y_pred)*X.T, axis=1)
likelihood = self.log_likelihood(y,y_pred)
self.likelihoods.append(likelihood)```

In the sigmoid function, we are calculating the sigmoid for all the data points which will be utilized for the gradient. The likelihood is the cost function for the algorithm. Maximization of the likelihood function is the motive of the algorithm by using the gradient ascent.

In the fit function generating weights for the ascent, it would be an array of either zeroes or one and the number of columns would be the same as the number of columns for the independent variable. The prediction would be calculated based on the sigmoid function. The gradient combined with the learning rate will give the final values for the cost function.

Since Gradient Ascent is an iterative optimization approach for locating local maxima of a differentiable function. We will iterate the steps for 500 cycles. The method advances in the direction of the gradient generated at each point of the cost function curve until the halting requirements are met.

Final code

```class customclassification:
def __init__(self, learning_rate=0.01, max_iterations=500):
self.learning_rate  = learning_rate
self.max_iterations = max_iterations
self.likelihoods    = []
self.eps = 1e-7

def sigmoid(self, inX):
sig = (1/(1+np.exp(-inX)))
return sig

def log_likelihood(self, y_true, y_pred):
y_pred = np.maximum(np.full(y_pred.shape, self.eps), np.minimum(np.full(y_pred.shape, 1-self.eps), y_pred))
likelihood = (y_true*np.log(y_pred)+(1-y_true)*np.log(1-y_pred))
return np.mean(likelihood)

def fit(self, X, y):
m = X.shape
n = X.shape
self.weights = np.zeros(n)

for i in range(self.max_iterations):
z  = np.dot(X,self.weights)
y_pred = self.sigmoid(z)
gradient = np.mean((y-y_pred)*X.T, axis=1)
likelihood = self.log_likelihood(y,y_pred)
self.likelihoods.append(likelihood)

def predict_prob(self,X):
z = np.dot(X,self.weights)
probabilities = self.sigmoid(z)
return probabilities

def predict(self, X, threshold):
predictions = np.array(list(map(lambda x: 1 if x>threshold else 0, self.predict_prob(X))))
return predictions```

Use custom algorithm

```LogRegres= customclassification()
LogRegres.fit(X_train,y_train)
y_hat=LogRegres.predict(X_test,0.5)
y_cap = y_hat - y_test
count = np.count_nonzero(y_cap==0)
accuracy = (count/len(y_hat))*100
accuracy
```

This accuracy could be further improved by using different data wrangling techniques and by using Stochastic gradient ascent, leaving that to you.

## Conclusion

The gradient is a vector that contains all partial derivatives of a function at a given position. On a convex function, gradient descent could be used, and on a concave function, gradient ascent could be used. Gradient descent finds the function’s nearest minimum, whereas gradient ascending seeks the function’s nearest maximum. If the objective function could be flipped, either style of optimization could be utilized for the same issue. With this article, we have understood the gradient ascent.

## More Great AIM Stories

### AIM Launches The Campus Ambassador Program Sourabh has worked as a full-time data scientist for an ISP organisation, experienced in analysing patterns and their implementation in product development. He has a keen interest in developing solutions for real-time problems with the help of data both in this universe and metaverse.

## Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.