Hands-On Tutorial on ElasticNet Regression

Elastic Net is a regularized regression model that combines l1 and l2 penalties, i.e., lasso and ridge regression. regularization helps in overfitting problems of the models.

Elastic Net is a regression method that performs variable selection and regularization both simultaneously. The term regularization is the main concept behind the elastic net. Regularization comes into picture when the model is overfitted. Now we need to understand what overfitting means, so overfitting is a problem that occurs when the model is performing good with the training dataset, but with the test, dataset model is giving errors; in this situation the regularization is a technique to reduce the errors by fitting a function appropriately in the training dataset. These functions can be called penalties.

There are two types of penalties l1 and l2. A model which uses l1 penalty for regularization is called the lasso regression model, and the model which uses l2 penalty is called the ridge regression model. As discussed, the lasso regression model adds the absolute value of the magnitude of the coefficient as a penalty term. The ridge regression adds the squared magnitude of the coefficient as a penalty on the loss function.

Lasso stands for least absolute shrinkage and selection operator. As the name suggests in lasso regression it tries to shrink the coefficients to the absolute zero and if not possible to shrink to the absolute zero, then it eliminates the coefficient from the models. The ridge regression does not eliminate the coefficients from the model, which means it does not differentiate between important and less important predictive variables in the model and includes all of them by providing l2 penalty. It tries to shrink the unbiased coefficient by putting them with their squared magnitude into the model.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Mathematically we can represent the ridge function as follows.

Image source

And the lasso function can be represented as:

Image source

Where the formula inside the box represents the penalty function by the models.

But there are certain limitations of these models- ridge regression decreases the complexity of the model in performance but does not eliminate the unbiased variables hence we can increase the model’s accuracy in a large dataset to a point. The new unbiased variable generated model can stop performing well. The lasso regression model picks the points according to the number of observations, not the predictor presented in the data. This kind of limitation can be handled and removed by the elastic net regression model where it includes both kinds of ( l1 and l2) penalties in the model.

What is Elastic Net?

Elastic Net is a regularized regression model that combines l1 and l2 penalties, i.e., lasso and ridge regression. We have discussed the limitations of lasso regression, where we found the incapability of lasso is choosing the number of predictors. The elastic net includes the penalty of lasso regression, and when used in isolation, it becomes the ridge regression. In the procedure of regularization with an elastic net, first, we find the coefficient of ridge regression. After this, we perform a lasso algorithm on the ridge regression coefficient to shrink the coefficient.

This will be easier to understand by the following diagram. 

Image source

Here we can see that after performing the ridge regression, the lasso regression takes part in the procedure that considers all the variables from the dataset.

Mathematically we can represent the elastic net as follows.

Image source

Implementing ElasticNet Regression

We can perform ElasticNet in our analysis using python’s sklearn library, where the linear_model package consists of ElasticNet modules to perform an elastic net for regularization and variable selection. Next, in the article, I will compare the lasso, and elastic net regression in sklearn provided California housing data. In the data, we have got 20640 total samples with eight features. For a more detailed structure of the data, the reader can lead to this link.

Let’s start with loading the data.

from sklearn.datasets import fetch_california_housing

X_data, y_data = fetch_california_housing(return_X_y=True)

Splitting the data for training and testing purposes:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.3)

Checking the shape of the data:

print('shape of X :', X_data.shape, 'shape of Y :', y_data.shape)
print('shape of X-train :', X_train.shape, 'shape of Y-train :', y_train.shape)
print('shape of X-test :', X_test.shape, 'shape of Y-test :', y_test.shape)

Output:

Here we can see the structure of the data.

Importing lasso model and fighting into a model object:

from sklearn.linear_model import Lasso

alpha= 0.1
model_lasso = Lasso(alpha=alpha)

print(model_lasso)

Output:

Fitting lasso model:

model_lasso.fit(X_train, y_train)

pred_lasso = model_lasso.predict(X_test)

Checking for the R-Squared value:

from sklearn.metrics import r2_score

print("r^2 of lasso on test data : %f" % r2_score(y_test, pred_lasso))

Output:

Here we can see the r square value for the model. Again, it is quite good but can be improved. Next, in the article, we will try to improve the performance using an elastic net regression model based on r square value.

Importing the model and defining object for it:

from sklearn.linear_model import ElasticNet

model_enet = ElasticNet(alpha=alpha, l1_ratio=0.3)

print(model_enet)

Output:

Training the model:

model_enet.fit(X_train, y_train)
#Testing the model:
pred_enet = model_enet.predict(X_test)
print("r^2 on test data : %f" % r2_score(y_test, pred_enet))

Output:

Here we can see we have improved the r square value using the ElasticNet regression. We can also visualize the performances of the model.

Decreasing coefficient  alternated signs for visualization

idx = np.arange(8)

coef = (-1) ** idx * np.exp(-idx / 10)
coef[10:] = 0  # sparsify coef

y = np.dot(X_data, coef)
print(y)

Output:

Plotting the comparison graph for sparsity coefficients.

m, s, _ = plt.stem(np.where(model_enet.coef_)[0], model_enet.coef_[model_enet.coef_ != 0],                    markerfmt='bo', label='Elastic net coefficients')                                                                               plt.setp([m, s], color="green")                                                                                                             m, s, _ = plt.stem(np.where(model_lasso.coef_)[0], model_lasso.coef_[model_lasso.coef_ != 0],                     markerfmt='x', label='Lasso coefficients')                                                                        plt.setp([m, s], color='red')                                                                                                                   plt.stem(np.where(coef)[0], coef[coef != 0], label='true coefficients', markerfmt='bx')                      plt.legend()                                                                                                                                                  plt.title("Lasso R^2: %.3f, Elastic Net R^2: %.3f"  % (r2_score(y_test, pred_lasso),r2_score(y_test, pred_enet)))                                                                                                                                                                                 plt.show()

Output:

Here we can see the estimated coefficients by both models, and we can also compare them. Here we can see that the lasso performed almost equally to the ElasticNet, but in some cases, the elastic net performed better than the lasso that is the reason behind the improved r square value of elastic net model.

Here we have seen in the article how we can improve the performance of the regression models by using elastic net regression models. Earlier, we discussed the limitation of ridge and lasso regression and compare the performance score between lasso and ElasticNet. Many parameters can cause drastic changes in performances that the cross-validation methods can cross-check. I encourage you to perform those methods with the model as well to get more accurate results.

References

All the information in the article is gathered from:

Yugesh Verma
Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR