Hands-On Tutorial on ElasticNet Regression

Elastic Net is a regularized regression model that combines l1 and l2 penalties, i.e., lasso and ridge regression. regularization helps in overfitting problems of the models.

Elastic Net is a regression method that performs variable selection and regularization both simultaneously. The term regularization is the main concept behind the elastic net. Regularization comes into picture when the model is overfitted. Now we need to understand what overfitting means, so overfitting is a problem that occurs when the model is performing good with the training dataset, but with the test, dataset model is giving errors; in this situation the regularization is a technique to reduce the errors by fitting a function appropriately in the training dataset. These functions can be called penalties.

There are two types of penalties l1 and l2. A model which uses l1 penalty for regularization is called the lasso regression model, and the model which uses l2 penalty is called the ridge regression model. As discussed, the lasso regression model adds the absolute value of the magnitude of the coefficient as a penalty term. The ridge regression adds the squared magnitude of the coefficient as a penalty on the loss function.

Lasso stands for least absolute shrinkage and selection operator. As the name suggests in lasso regression it tries to shrink the coefficients to the absolute zero and if not possible to shrink to the absolute zero, then it eliminates the coefficient from the models. The ridge regression does not eliminate the coefficients from the model, which means it does not differentiate between important and less important predictive variables in the model and includes all of them by providing l2 penalty. It tries to shrink the unbiased coefficient by putting them with their squared magnitude into the model.


Sign up for your weekly dose of what's up in emerging technology.

Mathematically we can represent the ridge function as follows.

Image source

Download our Mobile App

And the lasso function can be represented as:

Image source

Where the formula inside the box represents the penalty function by the models.

But there are certain limitations of these models- ridge regression decreases the complexity of the model in performance but does not eliminate the unbiased variables hence we can increase the model’s accuracy in a large dataset to a point. The new unbiased variable generated model can stop performing well. The lasso regression model picks the points according to the number of observations, not the predictor presented in the data. This kind of limitation can be handled and removed by the elastic net regression model where it includes both kinds of ( l1 and l2) penalties in the model.

What is Elastic Net?

Elastic Net is a regularized regression model that combines l1 and l2 penalties, i.e., lasso and ridge regression. We have discussed the limitations of lasso regression, where we found the incapability of lasso is choosing the number of predictors. The elastic net includes the penalty of lasso regression, and when used in isolation, it becomes the ridge regression. In the procedure of regularization with an elastic net, first, we find the coefficient of ridge regression. After this, we perform a lasso algorithm on the ridge regression coefficient to shrink the coefficient.

This will be easier to understand by the following diagram. 

Image source

Here we can see that after performing the ridge regression, the lasso regression takes part in the procedure that considers all the variables from the dataset.

Mathematically we can represent the elastic net as follows.

Image source

Implementing ElasticNet Regression

We can perform ElasticNet in our analysis using python’s sklearn library, where the linear_model package consists of ElasticNet modules to perform an elastic net for regularization and variable selection. Next, in the article, I will compare the lasso, and elastic net regression in sklearn provided California housing data. In the data, we have got 20640 total samples with eight features. For a more detailed structure of the data, the reader can lead to this link.

Let’s start with loading the data.

from sklearn.datasets import fetch_california_housing

X_data, y_data = fetch_california_housing(return_X_y=True)

Splitting the data for training and testing purposes:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.3)

Checking the shape of the data:

print('shape of X :', X_data.shape, 'shape of Y :', y_data.shape)
print('shape of X-train :', X_train.shape, 'shape of Y-train :', y_train.shape)
print('shape of X-test :', X_test.shape, 'shape of Y-test :', y_test.shape)


Here we can see the structure of the data.

Importing lasso model and fighting into a model object:

from sklearn.linear_model import Lasso

alpha= 0.1
model_lasso = Lasso(alpha=alpha)



Fitting lasso model:

model_lasso.fit(X_train, y_train)

pred_lasso = model_lasso.predict(X_test)

Checking for the R-Squared value:

from sklearn.metrics import r2_score

print("r^2 of lasso on test data : %f" % r2_score(y_test, pred_lasso))


Here we can see the r square value for the model. Again, it is quite good but can be improved. Next, in the article, we will try to improve the performance using an elastic net regression model based on r square value.

Importing the model and defining object for it:

from sklearn.linear_model import ElasticNet

model_enet = ElasticNet(alpha=alpha, l1_ratio=0.3)



Training the model:

model_enet.fit(X_train, y_train)
#Testing the model:
pred_enet = model_enet.predict(X_test)
print("r^2 on test data : %f" % r2_score(y_test, pred_enet))


Here we can see we have improved the r square value using the ElasticNet regression. We can also visualize the performances of the model.

Decreasing coefficient  alternated signs for visualization

idx = np.arange(8)

coef = (-1) ** idx * np.exp(-idx / 10)
coef[10:] = 0  # sparsify coef

y = np.dot(X_data, coef)


Plotting the comparison graph for sparsity coefficients.

m, s, _ = plt.stem(np.where(model_enet.coef_)[0], model_enet.coef_[model_enet.coef_ != 0],                    markerfmt='bo', label='Elastic net coefficients')                                                                               plt.setp([m, s], color="green")                                                                                                             m, s, _ = plt.stem(np.where(model_lasso.coef_)[0], model_lasso.coef_[model_lasso.coef_ != 0],                     markerfmt='x', label='Lasso coefficients')                                                                        plt.setp([m, s], color='red')                                                                                                                   plt.stem(np.where(coef)[0], coef[coef != 0], label='true coefficients', markerfmt='bx')                      plt.legend()                                                                                                                                                  plt.title("Lasso R^2: %.3f, Elastic Net R^2: %.3f"  % (r2_score(y_test, pred_lasso),r2_score(y_test, pred_enet)))                                                                                                                                                                                 plt.show()


Here we can see the estimated coefficients by both models, and we can also compare them. Here we can see that the lasso performed almost equally to the ElasticNet, but in some cases, the elastic net performed better than the lasso that is the reason behind the improved r square value of elastic net model.

Here we have seen in the article how we can improve the performance of the regression models by using elastic net regression models. Earlier, we discussed the limitation of ridge and lasso regression and compare the performance score between lasso and ElasticNet. Many parameters can cause drastic changes in performances that the cross-validation methods can cross-check. I encourage you to perform those methods with the model as well to get more accurate results.


All the information in the article is gathered from:

More Great AIM Stories

Yugesh Verma
Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Do machines feel pain?

Scientists worldwide have been finding ways to bring a sense of awareness to robots, including feeling pain, reacting to it, and withstanding harsh operating conditions.

IT professionals and DevOps say no to low-code

The obsession with low-code is led by its drag-and-drop interface, which saves a lot of time. In low-code, every single process is shown visually with the help of a graphical interface that makes everything easier to understand.

Neuralink elon musk

What could go wrong with Neuralink?

While the broad aim of developing such a BCI is to allow humans to be competitive with AI, Musk wants Neuralink to solve immediate problems like the treatment of Parkinson’s disease and brain ailments.

Understanding cybersecurity from machine learning POV 

Today, companies depend more on digitalisation and Internet-of-Things (IoT) after various security issues like unauthorised access, malware attack, zero-day attack, data breach, denial of service (DoS), social engineering or phishing surfaced at a significant rate.