Comprehensive Guide To Regression For Dummies

In the guide to regression, we go through the basic principle behind regression analysis and discuss a few of its variants.
Comprehensive Guide To Regression

Regression is a set of statistical approaches used for approximating the relationship between a dependent variable and one or more independent variables. The term “regression” was coined by Francis Galton to describe the phenomenon of the heights of descendants of tall ancestors regressing down to the normal average, i.e., regression to mean. However, regression as a concept was created and employed by Legendre and Gauss, who used the least-squares method to determine the orbits of celestial bodies around the Sun.

Today regression is mainly used for two purposes. First, regression is used for prediction and forecasting problems. Secondly, it is used to map the causality of factors, to infer the cause and effect relationship between the dependent and independent variables. But aren’t those two the same thing? No, not exactly. You see, regression on its own can only infer the causal relationships between the independent and dependent variables in a limited dataset. The data scientists need to prove that relationships inferred from a sample, the dataset,  have predictive power for a new context, the global population, to use regressions for prediction. This can be accomplished by following a series of statistical methods that test whether the dataset belongs to the population’s distribution.

Read more (about regression)

Linear Regression

The most widely used variant of regression is linear regression, which finds a line, hyperplane, that most closely fits the data. This is illustrated in the plot below.

Linear Regression
Source: Gist

This line allows us to estimate the value of the dependent variable when the independent variables take on a given set of values. It can be formulated as:

Where wi represents the coefficients of the independent variables, Xi and w0 is the intercept. But there would be millions of lines, how do we decide which is the best? For the purposed of modelling/training, the performance of machine learning methods is measured using loss functions. In the case of linear regression, it indicates how well it fits the data. Let’s take the example of the mean square error, which calculates the squared difference between the actual value and the predicted value. 

mean square error loss function

Read more (about loss functions) 

Okay, so now we have a way to formulate our objective, the linear function, and a means to check how well it performs, the loss function. But how do we find the optimal parameters, w0 and wi, that correspond to the best-fitted line? The brute-force approach would be to try out all possible combination, but the coefficients are real numbers and can take an infinite number of values. To overcome this problem, we use the gradient descent algorithm that uses the derivatives to find the minima of the loss function. 

Source: Gist

Visually speaking, let’s say that the loss function plotted in 3D space looks like the plot above. Gradient descent starts on a random point in the loss function’s plane and uses the derivatives to determine the direction of minima and move towards it. It can be thought of as a ball placed on the loss function’s contour. 

Source: https://alykhantejani.github.io/images/gradient_descent_line_graph.gif

Read more (about gradient descent)

Polynomial Regression

More often than not, the dependent and independent variables will have a non-linear relationship. In such use-cases, linear regression fails to fit the data. To overcome this problem, a non-linear function is instead of a linear function. This variant of regression is called polynomial regression.

Source: Gist

Polynomial regression improves the linear model by introducing extra predictors obtained by raising the linear predictors to a certain power. For instance, a quadratic regression would have two terms X and X2  as predictors for each independent variable. This method enables the model to learn non-linear relationships. 

Polynomial Regression
Source: Gist

Ridge and Lasso Regression

So far, we have only been concerned with fitting the data well, but there’s more to a good machine learning model than just that. The other issue we need to address is overfitting; this is when the model performs really well on the training dataset, the sample, and not as well on the test data, the actual population. Regularization techniques offer an easy workaround to this issue. As far as regression tasks are considered, they’re two prominent regularized variants of regression: Ridge and Lasso. Both of these impose a constraint on the coefficients of the independent variables; this reduces the magnitude of coefficients helps reduce the model complexity and multi-collinearity. 

Ridge regression, also known as L2 regularization modifies the cost function and adds a penalty equivalent to the square of the coefficients:

The lambda term regularizes the coefficients so that if the coefficients take large values, the loss function is penalized. Lasso(Least Absolute Shrinkage and Selection Operator) regression is very similar to ridge regression; the only difference is that it uses the magnitude of the coefficient instead of taking their squares:

This type of regularization (L1) often leads to zero coefficients, i.e. some of the independent variables are completely ignored. Therefore, not only does lasso regression help reduce over-fitting,  but it also doubles as a feature selection technique.

Read more (about lasso and ridge regression)

Spline Regression 

In addition to using polynomial regression with regularization, there’s another approach for fitting non-linear data using regression – binning. Instead of considering the whole dataset at once, spline regression divides the dataset into bins and creates separate models for each bin. 

Dividing the data into separate pieces allows the model to fit linear or low degree polynomial functions. Knots are the points where the data is split and the sections are called splines. And the functions used for modelling each bin are called piecewise functions. Read more (about spline regression)

Logistic Regression

Alright, so regression can be used to estimate continuous dependent variables, but can it work in classification tasks? Yes, but not on its own. Instead of regressing to the best fit for the data, the line regresses to the optimal decision boundary for separating two classes. Logistic regression does this by altering the line equation and compositing it with the logistic/sigmoid function:

Where

sigmoid function

The goal remains the same, to find the best w0 and wi for the data. Logistic regression predicts the probability of the default class. If  we consider the example of a binary classification problem, the output h is the predicted probability that example xi is a positive match in the classification task, given by: 

When this probability is greater than 0.5 then we can classify the example xi as the default class. The probability is greater than 0.5 when g(H) is greater than 0.5, and this is true when H = w0 + wi ∗ ???? X ≥ 0. This make hyperplane,  w0 + wi ∗ ???? X = 0 the decision boundary. 

Read more (about logistic regression)

All clear on the theory and itching to write some code? Here are a few more posts to help you get started with implementing regression models using various tools and languages:

Download our Mobile App

Aditya Singh
A machine learning enthusiast with a knack for finding patterns. In my free time, I like to delve into the world of non-fiction books and video essays.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.