Excel sheets were so far used for storing small to medium-sized datasets either as CSV or in XLS formats and Pandas were used to read them. But what if I told you, you can now build machine learning models with excel itself? Wouldn’t that make things easy? You can store your data as CSV and apply the machine learning algorithm directly to the dataset. What is even better if you don’t have to code anything. Models like linear regression can be easily applied to the data through Microsoft excel.
In this article, we will learn about how to implement a predictive model using MS excel and implement a linear regression algorithm.
Getting the ToolPack in Excel.
In order to build models like linear regression, we need to first locate the packages to do these. The packages are available under Analysis ToolPack add-in. To access this, first, go to file→ options.
Sign up for your weekly dose of what's up in emerging technology.
Upon selecting the options you will see the following display.
Here, you will have to select the ‘Add-ins’ option and then select ok. Once you have selected this, you will see a number of packages under the add-ins.
Download our Mobile App
Among these packages, you will be able to locate the Analysis ToolPack. Select the package and click on GO.
Then, another pop-up is displayed in front of you.
Here, select the first package option and select ok. Now you have the package ready to be used.
Getting the Dataset
Since the analysis ToolPack is a great tool for regression algorithms, we will select a dataset that is suitable for linear regression. The dataset chosen for this project is the Boston housing dataset. The aim here is to predict a house price in Boston based on the features like the number of rooms, area etc. I downloaded the dataset from Kaggle. You can click here to download the dataset here. Download the data and open it with excel.
Here we have the features and the medv column is the target for us.
Linear Regression with Excel
Now that we have the dataset with us and the packages required, let us now start the linear regression modelling. To do this, first, go to the taskbar and select an option called Data.
The last option is called data analysis which contains the package for performing linear regression. Select this option and then click on regression.
As you can see apart from regression analysis, there is covariance, statistics and other options for performing data analysis.
After selecting regression you will see that you now have to select the Y and X ranges for the analysis to take place.
Here, you will have to select the range of data points in the Y and X axes. Click on the small arrow beside the column and since our target column is medv, drag and select how many data points you want. After this tick the residuals box and click on ok.
These points get selected as a target for training and repeat the same for even the feature column as well. I have selected the LSTAT as the feature to be used. Once you have made this selection just click on okay and almost immediately you will see the result of your analysis on your screen.
Here there are three result tables before us. The first one is the regression statistics. This contains the value of R square, that indicates the goodness of the fit. Since I chose only 154 data points, we have got an output of 0.48 which is a decent value.
The next table is the ANOVA which stands for analysis of variance. The important feature here is the significance F column which indicates whether the model is significant statistically or not. This is an indication that our results are not random and have a relationship. This value is considered good if it lies below 0.05 p-value. Ours lies below that and can be considered to be a good model.
The last table gives the coefficient components of regression. It indicates the relationship between the x value and the intercept in the equation y=mx+c.
Finally, you will see the residual output values below.
This value indicates the predicted values and the residual value is by how much it varies from the actual value. This can be useful in getting insights about model performance.
In this article, we saw how to use excel to build a linear regression model and analyse the model without actually writing any code at all. The ToolPack is a package with a wide range of options for data analysis and can help data scientists to get some basic information about the data before applying complex machine learning models on the data.