Modelling and evaluating the relationship between a categorical dependent variable and continuous or discrete explanatory variables are the goals of Logistic Regression. It uses linear discriminant analysis and is the reason for calling a classification algorithm regression. There are different methods in the logistic regression model through which it can classify data, one of which is the Newton Raphson method which would be explained in this article moving forward. Following are the topics covered in this article.
Table of contents
- About Logistic Regression
- What is the Newton Raphson method?
- Quadratic approximation in python
Let’s start with a brief introduction to logistic regression.
About Logistic Regression
A logistic regression analysis reveals the relationship between a categorical dependent variable and a set of independent variables. There is no assumption of normal distribution for the independent variables in logistic regression. In addition to the regression equation, the report includes odds ratios, confidence limits, likelihood and deviance. As a part of the comprehensive residual analysis, a log regression model can generate diagnostic residual reports and plots.
A subset selection search is performed to find the best regression model with fewer explanatory variables. For determining the best cutoff point for the classification, ROC curves are used to provide confidence intervals on predicted values. The results are automatically categorized by removing rows that were not used in the analysis. There is a certain assumption that is being made by Logistic Regression which is stated as:
- There should not be a high correlation between the independent variables.
- The independent variables should be linearly related to the log odds. If you’re not familiar with log-odds, we’ve included a brief explanation below.
- The larger the sample size, the more reliable (and powerful) you can expect the results of your analysis to be.
Logistic regression uses log-odds which is an alternate way of expressing probabilities. But there is some difference between odds and probabilities:
- Probability is the ratio of things happening to everything that could happen, while odds are the ratio of something happening to something not happening.
As we know Logistic Regression uses the concept of the Log-Likelihood of the Bernoulli distribution and also covers explained as the sigmoid function or the logistic function. It also uses the gradient descent method in conjunction with the Hessian square matrix. So, moving forward we would see how Newton and Raphson method is used to find the roots and to maximize the likelihood estimation. Let’s understand how to log odd is used to estimate likelihood.
What is the Newton Raphson method?
Newton’s method is an iterative method for finding the roots of the convex function. But log-likelihood is a concave function which means it has only one global max. So, to apply Newton’s method it should be applied to the derivative of the function. In this case, the cost function is quadratic so Newton’s method would be applied to its derivative which could also be called a quadratic approximation. In this manner, Newton’s method is different from the gradient descent method because the gradient descent method is applied to the cost function and minimizing the cost function is the goal of the method.
In the above representation, the working of Newton’s method is shown in which there are multiple iterations to find the exact root for the quadratic approximation.
Let’s see how this method is implemented in python.
Quadratic approximation in python
The logistic regression package is imported from the sklearn library. In logistic regression, there is a parameter called ‘solver’ which needs to provide the method to be used for the classification. We would be using “newton-cg”
Syntax
class sklearn.linear_model.LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)
Importing libraries and packages
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split import seaborn as sns import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression
Reading the data and pre-processing:
df=pd.read_csv("/content/drive/MyDrive/Datasets/cancer.csv") df.drop(['Unnamed: 32',"id"], axis=1, inplace=True) df.diagnosis = [1 if each == "M" else 0 for each in df.diagnosis] df.head()
The data is related to the diagnosis of breast cancer in which the “diagnosis” is encoded as 1 and 0 which is malignant and begins. The data has a total of 569 records and 31 features including the dependent variable.
Splitting the data:
X=df.drop('diagnosis',axis=1) y=df['diagnosis'] X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.30, random_state=42) print("Shape of independent variable train","\nno of rows=",X_train.shape[0],"\nno of columns=",X_train.shape[1]) print("-----------------------------------------------") print("Shape of dependent variable train","\nno of rows=",y_train.shape[0],"\nno of columns=1") print("-----------------------------------------------") print("Shape of independent variable test","\nno of rows=",X_test.shape[0],"\nno of columns=",X_test.shape[1]) print("-----------------------------------------------") print("Shape of dependent variable test","\nno of rows=",y_test.shape[0],"\nno of columns=1")
The data is split into tests and trains with a ratio of 70:30.
Fitting the data:
lr=LogisticRegression(max_iter=100,solver='newton-cg') lr.fit(X_train,y_train) lr_pred=lr.predict(X_test) log_pred=lr.predict_proba(X_test)[:,1].round(2) df_pred=pd.merge(X_test,pd.DataFrame(lr_pred,columns=['prediction']),left_index=True,right_index=True) df_final=pd.merge(df_pred,pd.DataFrame(log_pred,columns=['probability']),left_index=True,right_index=True) df_final.head()
Fitted the training dataset in the logistic regression with the solver as ‘newton-cg’ and the model predicted based on the test dataset and also the probability of the prediction.
Let’s plot the prediction and probability of a feature and observe the function.
Visualizing the regression model:
fig, axes = plt.subplots(1, 2 ,figsize=(15, 5),sharex=True) fig.suptitle('Plot for probability and prediction of data points in "Area_mean" feature') sns.regplot(x="area_mean", y='probability',data=df_final ,logistic=True, ci=None,ax=axes[0],scatter_kws={"color": "black"}, line_kws={"color": "red"}) sns.regplot(x="area_mean", y='prediction',data=df_final ,logistic=True, ci=None,ax=axes[1],scatter_kws={"color": "black"}, line_kws={"color": "red"}) plt.show()
In the above plot, the data points of the area_mean feature are used to visualize the newton’s method learner line. The left side subplot is for the probability of the data points to be either classified as 0 or 1 and on the right side are the predictions by the model for the data points. So, in the probability graph, the data points above the learner line are classified as 1 as shown in the prediction plot and similarly, the below are classified as 0.
Final Verdict
The logistic regression could be used by the quadratic approximation method which is faster than the gradient descent method. For the approximation method, the Newton Raphson method uses log-likelihood estimation to classify the data points. With a hands-on implementation of this concept in this article, we could understand how quadratic approximation could be used in Logistic Regression.