# Gradient Ascent: When to use it in machine learning?

Gradient ascent maximizes the loss function of the algorithm
 Listen to this story

In gradient descent, to discover a local minimum of a function, take steps proportional to the negative of the function’s gradient or approximation gradient at the current location. Instead, taking steps proportional to the gradient is positive, one approaches a local maximum of that function; this method is known as gradient ascent. This article will help to understand the Gradient Ascent (GA). Following are the topics to be covered.

1. Mathematics behind the Gradient Ascent
2. When to use gradient ascent
3. Implementing gradient ascent in logistic regression

Gradient ascent maximizes the loss function of the algorithm. Let’s start with understanding the mathematics behind GA.

## Mathematics behind the Gradient Ascent

Gradient ascent is based on the principle of locating the greatest point on a function and then moving in the direction of the gradient.

In this method, the gradient function is the function of x and y differentiable values. If the coordinates are differentiated for x this means that the gradient function is moving in the x-direction. Similarly, y-direction gradients should be differentiated for y.

The function must be specified and differentiable around the places where it is being evaluated. The gradient ascent technique shown in the graphic representation takes a step in the gradient’s direction. The gradient operator will always indicate the direction of the most significant rise. The magnitude, or step size, will be obtained from the parameter value. This phase is continued until a defined number of steps or the algorithm is within a particular tolerance margin.

The gradient ascent method advances in the direction of the gradient at each step. The gradient is assessed beginning at point P0, and the function proceeds to the next point, P1. The function then advances to P2 when the gradient is reevaluated at P1. This loop will continue until a stopping condition is fulfilled. The gradient operator always ensures that we are travelling in the best direction feasible.

Are you looking for a complete repository of Python libraries used in data science, check out here.

## When to use gradient ascent

Gradient climb operates similarly to gradient descent, with one exception. Its objective is to maximise some function rather than to minimise it. The reason for the distinction is that we may wish to maximise a function rather than minimise it at times; for example, if we want to maximise the distance between separation hyperplanes and observations.

In this sense, for each function “f” on which gradient descent is applied, there is a symmetric function “-f” on which gradient ascent may be applied.

This suggests that a problem solved using gradient descent can also be solved using gradient ascent if we mirror it on the axis of the independent variable.

## Implementing gradient ascent in logistic regression

For this article, we will use gradient ascent for a logistic regression for a dataset related to social media marketers. The algorithm will predict whether the customer will purchase the product based on different features.

For this purpose, we have to build a custom logistic regression algorithm. Let’s start with importing the necessary libraries.

```import numpy as np
import pandas as pd```

```data=pd.read_csv('Social_Network_Ads.csv')
data[:5]```
```from sklearn.preprocessing import LabelEncoder
enc=LabelEncoder()
data['Gender_enc']=enc.fit_transform(data['Gender'])```
```X=data.drop(['Gender','Purchased','User ID'],axis=1)
y=data['Purchased']```

Custom algorithm

```    def sigmoid(self, inX):
sig = (1/(1+np.exp(-inX)))
return sig

def log_likelihood(self, y_true, y_pred):
y_pred = np.maximum(np.full(y_pred.shape, self.eps), np.minimum(np.full(y_pred.shape, 1-self.eps), y_pred))
likelihood = (y_true*np.log(y_pred)+(1-y_true)*np.log(1-y_pred))
return np.mean(likelihood)

def fit(self, X, y):
m = X.shape[0]
n = X.shape[1]
self.weights = np.zeros(n)

for i in range(self.max_iterations):
y_pred = self.sigmoid(X*self.weights)
likelihood = self.log_likelihood(y,y_pred)
self.likelihoods.append(likelihood)```

In the sigmoid function, we are calculating the sigmoid for all the data points which will be utilized for the gradient. The likelihood is the cost function for the algorithm. Maximization of the likelihood function is the motive of the algorithm by using the gradient ascent.

In the fit function generating weights for the ascent, it would be an array of either zeroes or one and the number of columns would be the same as the number of columns for the independent variable. The prediction would be calculated based on the sigmoid function. The gradient combined with the learning rate will give the final values for the cost function.

Since Gradient Ascent is an iterative optimization approach for locating local maxima of a differentiable function. We will iterate the steps for 500 cycles. The method advances in the direction of the gradient generated at each point of the cost function curve until the halting requirements are met.

Final code

```class customclassification:
def __init__(self, learning_rate=0.01, max_iterations=500):
self.learning_rate  = learning_rate
self.max_iterations = max_iterations
self.likelihoods    = []
self.eps = 1e-7

def sigmoid(self, inX):
sig = (1/(1+np.exp(-inX)))
return sig

def log_likelihood(self, y_true, y_pred):
y_pred = np.maximum(np.full(y_pred.shape, self.eps), np.minimum(np.full(y_pred.shape, 1-self.eps), y_pred))
likelihood = (y_true*np.log(y_pred)+(1-y_true)*np.log(1-y_pred))
return np.mean(likelihood)

def fit(self, X, y):
m = X.shape[0]
n = X.shape[1]
self.weights = np.zeros(n)

for i in range(self.max_iterations):
z  = np.dot(X,self.weights)
y_pred = self.sigmoid(z)
likelihood = self.log_likelihood(y,y_pred)
self.likelihoods.append(likelihood)

def predict_prob(self,X):
z = np.dot(X,self.weights)
probabilities = self.sigmoid(z)
return probabilities

def predict(self, X, threshold):
predictions = np.array(list(map(lambda x: 1 if x>threshold else 0, self.predict_prob(X))))
return predictions```

Use custom algorithm

```LogRegres= customclassification()
LogRegres.fit(X_train,y_train)
y_hat=LogRegres.predict(X_test,0.5)
y_cap = y_hat - y_test
count = np.count_nonzero(y_cap==0)
accuracy = (count/len(y_hat))*100
accuracy
```

This accuracy could be further improved by using different data wrangling techniques and by using Stochastic gradient ascent, leaving that to you.

## Conclusion

The gradient is a vector that contains all partial derivatives of a function at a given position. On a convex function, gradient descent could be used, and on a concave function, gradient ascent could be used. Gradient descent finds the function’s nearest minimum, whereas gradient ascending seeks the function’s nearest maximum. If the objective function could be flipped, either style of optimization could be utilized for the same issue. With this article, we have understood the gradient ascent.

## References

Sourabh has worked as a full-time data scientist for an ISP organisation, experienced in analysing patterns and their implementation in product development. He has a keen interest in developing solutions for real-time problems with the help of data both in this universe and metaverse.

## Oct 11-13, 2023 | Bangalore

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Why Big Techs Are Pushing Rust

Rust is bigger than ever among Google, Apple, Microsoft, and Meta.

### Mapping the Future of Sam’s Investment

Is Sam Altman’s diversified investments across crypto, biotech and energy industries a bet with uncertainty?

### Believe it or Not, 55% of Digital Frauds Happen Via UPI

Among the various payment systems in the country, UPI has emerged as a prime target for fraudsters

### AI Battle Heats Up: Microsoft to Take on Apple Head-on

With Microsoft’s new partnerships, the pillars of the PC ecosystem have teamed up to challenge Apple’s dominance in the AI ecosystem.

### 8 Ways NVIDIA Will Make Its Next Trillion

NVIDIA recently became the 7th company in the world to reach a trillion dollar market cap, but all the riches in the world aren’t enough.

### Merck Group and Palantir Forge Ahead with Open Collaboration

The open-source library created by Merck, in partnership with Palantir Technologies, serves as a crucial component of their digitalisation strategy. Subbu Iyer articulates the significance of this library

### Top 5 Companies Hiring for Data Science Roles

Microsoft, Zoom, Accenture, JP Morgan & Chase, and Cisco are among the leading tech giants that are hiring for roles in data science

### Is Indian Govt’s Battle Against AI Disinformation Flawed?

AI models like Stable Diffusion, Midjourney and DALL-E2 can generate hyper realistic images that can easily be mistaken for genuine ones

### Uncensored Models Outperform Aligned Language Models

Do you really want a chatbot to not give out the information you want just to stay aligned?

### In 30 Years, NVIDIA Died Almost 3 Times

Jensen Huang’s NTU speech highlights NVIDIA’s resilience and future-thinking in spite of the company reaching the brink of failure thrice in three decades