Elastic Net is a regression method that performs variable selection and regularization both simultaneously. The term regularization is the main concept behind the elastic net. Regularization comes into picture when the model is overfitted. Now we need to understand what overfitting means, so overfitting is a problem that occurs when the model is performing good with the training dataset, but with the test, dataset model is giving errors; in this situation the regularization is a technique to reduce the errors by fitting a function appropriately in the training dataset. These functions can be called penalties.
There are two types of penalties l1 and l2. A model which uses l1 penalty for regularization is called the lasso regression model, and the model which uses l2 penalty is called the ridge regression model. As discussed, the lasso regression model adds the absolute value of the magnitude of the coefficient as a penalty term. The ridge regression adds the squared magnitude of the coefficient as a penalty on the loss function.
Sign up for your weekly dose of what's up in emerging technology.
Lasso stands for least absolute shrinkage and selection operator. As the name suggests in lasso regression it tries to shrink the coefficients to the absolute zero and if not possible to shrink to the absolute zero, then it eliminates the coefficient from the models. The ridge regression does not eliminate the coefficients from the model, which means it does not differentiate between important and less important predictive variables in the model and includes all of them by providing l2 penalty. It tries to shrink the unbiased coefficient by putting them with their squared magnitude into the model.
Mathematically we can represent the ridge function as follows.
And the lasso function can be represented as:
Where the formula inside the box represents the penalty function by the models.
But there are certain limitations of these models- ridge regression decreases the complexity of the model in performance but does not eliminate the unbiased variables hence we can increase the model’s accuracy in a large dataset to a point. The new unbiased variable generated model can stop performing well. The lasso regression model picks the points according to the number of observations, not the predictor presented in the data. This kind of limitation can be handled and removed by the elastic net regression model where it includes both kinds of ( l1 and l2) penalties in the model.
What is Elastic Net?
Elastic Net is a regularized regression model that combines l1 and l2 penalties, i.e., lasso and ridge regression. We have discussed the limitations of lasso regression, where we found the incapability of lasso is choosing the number of predictors. The elastic net includes the penalty of lasso regression, and when used in isolation, it becomes the ridge regression. In the procedure of regularization with an elastic net, first, we find the coefficient of ridge regression. After this, we perform a lasso algorithm on the ridge regression coefficient to shrink the coefficient.
This will be easier to understand by the following diagram.
Here we can see that after performing the ridge regression, the lasso regression takes part in the procedure that considers all the variables from the dataset.
Mathematically we can represent the elastic net as follows.
Implementing ElasticNet Regression
We can perform ElasticNet in our analysis using python’s sklearn library, where the linear_model package consists of ElasticNet modules to perform an elastic net for regularization and variable selection. Next, in the article, I will compare the lasso, and elastic net regression in sklearn provided California housing data. In the data, we have got 20640 total samples with eight features. For a more detailed structure of the data, the reader can lead to this link.
Let’s start with loading the data.
from sklearn.datasets import fetch_california_housing X_data, y_data = fetch_california_housing(return_X_y=True)
Splitting the data for training and testing purposes:
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.3)
Checking the shape of the data:
print('shape of X :', X_data.shape, 'shape of Y :', y_data.shape) print('shape of X-train :', X_train.shape, 'shape of Y-train :', y_train.shape) print('shape of X-test :', X_test.shape, 'shape of Y-test :', y_test.shape)
Here we can see the structure of the data.
Importing lasso model and fighting into a model object:
from sklearn.linear_model import Lasso alpha= 0.1 model_lasso = Lasso(alpha=alpha) print(model_lasso)
Fitting lasso model:
model_lasso.fit(X_train, y_train) pred_lasso = model_lasso.predict(X_test)
Checking for the R-Squared value:
from sklearn.metrics import r2_score print("r^2 of lasso on test data : %f" % r2_score(y_test, pred_lasso))
Here we can see the r square value for the model. Again, it is quite good but can be improved. Next, in the article, we will try to improve the performance using an elastic net regression model based on r square value.
Importing the model and defining object for it:
from sklearn.linear_model import ElasticNet
model_enet = ElasticNet(alpha=alpha, l1_ratio=0.3)
Training the model:
model_enet.fit(X_train, y_train) #Testing the model: pred_enet = model_enet.predict(X_test) print("r^2 on test data : %f" % r2_score(y_test, pred_enet))
Here we can see we have improved the r square value using the ElasticNet regression. We can also visualize the performances of the model.
Decreasing coefficient alternated signs for visualization
idx = np.arange(8) coef = (-1) ** idx * np.exp(-idx / 10) coef[10:] = 0 # sparsify coef y = np.dot(X_data, coef) print(y)
Plotting the comparison graph for sparsity coefficients.
m, s, _ = plt.stem(np.where(model_enet.coef_), model_enet.coef_[model_enet.coef_ != 0], markerfmt='bo', label='Elastic net coefficients') plt.setp([m, s], color="green") m, s, _ = plt.stem(np.where(model_lasso.coef_), model_lasso.coef_[model_lasso.coef_ != 0], markerfmt='x', label='Lasso coefficients') plt.setp([m, s], color='red') plt.stem(np.where(coef), coef[coef != 0], label='true coefficients', markerfmt='bx') plt.legend() plt.title("Lasso R^2: %.3f, Elastic Net R^2: %.3f" % (r2_score(y_test, pred_lasso),r2_score(y_test, pred_enet))) plt.show()
Here we can see the estimated coefficients by both models, and we can also compare them. Here we can see that the lasso performed almost equally to the ElasticNet, but in some cases, the elastic net performed better than the lasso that is the reason behind the improved r square value of elastic net model.
Here we have seen in the article how we can improve the performance of the regression models by using elastic net regression models. Earlier, we discussed the limitation of ridge and lasso regression and compare the performance score between lasso and ElasticNet. Many parameters can cause drastic changes in performances that the cross-validation methods can cross-check. I encourage you to perform those methods with the model as well to get more accurate results.
All the information in the article is gathered from: