# Comprehensive Guide To Demand Forecasting Using ML.NET

Time series forecasting in Machine Learning refers to a task of fitting the model to some historical data, analyzing the associated patterns and predicting future trends or observations. In the conventional statistical terms, the process of making such future predictions is called ‘extrapolation’ whilst modern domains refer to it as ‘forecasting’. In this article, we create a .Net Core console application to forecast bike sharing demand using time series forecasting method and the ML.NET framework.

Not aware of the fundamentals of ML.NET ? Refer to our previous article before proceeding!

`Register for our upcoming Masterclass>>`

## Problem statement

Bike-sharing systems enable renting bikes from one place and returning it to a different location on an as-needed basis. The whole process of getting its membership, renting the bikes and returning them is automated via a network of kiosk locations throughout a city. The task here is to forecast futuristic bike sharing demand by studying the time series data comprising counts of bikes rented by bikers associated with a Capital Bikeshare program in Washington D.C.

`Looking for a job change? Let us help you.`

Here, we have used ‘univariate’ time-series forecasting which means a single numerical observation i.e. count of bikes rented over a period of time at specific intervals is taken into account for analysis.

## Dataset used

The Bike Sharing dataset used here comes from the UCI Machine Learning Repository. It is a univariate dataset having 17,389 instances and 16 attributes. It contains the hourly and daily count of bikes rented during 2011-2012 in the Capital bike-share system with the corresponding weather and seasonal information. Condense the dataset so that it contains only the following three columns required for time series forecasting:

• dteday:  date of observation.
• year: encoded year of observation (‘0’ for 2011, ‘1’ for 2012).
• cnt:  total number of bike rentals for that day

## Algorithm used

We have used the Singular Spectrum Analysis(SSA) algorithm here. SSA decomposes a time-series into a set of principal components. These components can be considered as the parts of a signal that correspond to several factors such as trends, noise, seasonality, and so on. They are then reconstructed and used to forecast future values.

## Prerequisites for the implementation

• Use Visual Studio 2019 or higher version
• Or use Visual Studio 2017 version 15.6 or higher with the “.NET Core cross-platform development” workload installed
• Use SQL Database for data storage, access and retrieval

Create your C# .NET Core console application and then install the Microsoft.ML NuGet Package. Click here for its installation.

Get to know about System.Data.SqlClient and Microsoft.ML.TimeSeries used.

Open the Program.cs file and replace the ‘using’ statements with the following ones:

``` using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.IO;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms.TimeSeries; ```

## Data Preparation

Map the data into an SQL database table as follows:

``` CREATE TABLE [RentalData] (
[Dt] DATE NOT NULL,
[Yr] INT NOT NULL,
[TotalRentals] INT NOT NULL
); ```

The data table should look something like this:

Define the model input schema below the ‘Program’ class by creating a class, say ‘Input’ as follows:

``` public class Input
{
public DateTime Dt { get; set; }   //date of observation
public float Yr { get; set; }  //year of observation
public float TotalRentals { get; set; }  //number of rentals of that day
} ```

Then define the output schema as follows:

``` public class Output
{

//predicted values for forecasted period
public float[] PredRentals { get; set; }

//minimum predicted values for forecasted period
public float[] MinPred { get; set; }

//maximum predicted values for forecasted period
public float[] MaxPred { get; set; }
} ```

## Path definitions and variable initialization

Inside the Main method, store path locations of your data and trained model and define the connection string.

``` //root directory
string root = Path.GetFullPath (Path.Combine (AppDomain.CurrentDomain.BaseDirectory, "../../../"));

//data location
string dbPath = Path.Combine(root, "Data", "DailyDemand.mdf");

//path to save trained model
string modelPath = Path.Combine(rootDir, "MLModel.zip");

//connection string
var conStr = \$"Data Source = (LocalDB)\\MSSQLLocalDB;AttachDbFilename = {db Path};Integrated Security = True;Connect Timeout = 40;"; ```

Instantiate MLContext class.

`MLContext myContext = new MLContext();`

`DatabaseLoader dbLoader = myContext.Data.CreateDatabaseLoader<Input>();`

Fire a query to load the data from the database

`string query = "SELECT Dt, CAST(Yr as REAL) as Yr, CAST(TotalRentals as REAL) as TotalRentals FROM RentalData";`

ML.NET algorithms expect data to be of type Single i.e. a single-precision floating-point value. So the numerical values which are not of type Real in the database need to be converted to Real using the CAST built-in function of SQL. ‘Yr’ and ‘TotalRental’ integer columns have thus been converted to Real type in the above lines of code.

Create a DatabaseSource for connecting the database and executing the query

``` DatabaseSource dbSrc = new DatabaseSource(SqlClientFactory.Instance,
conStr, query); ```

Load the data into an IDataView

`IDataView myData = dbLoader.Load(dbSrc);`

The RentalData table has data for two years (2011 and 2012). Form separate IDataView for each year. Data of 2011 will be used for model training while that of 2012 for model testing purpose.

```/*Create IDataView of the first year (2011)’s data. ‘Yr’ column has value 0 for 2011 so set maxPred to 1 so that data having Yr<1 will be extracted*/
IDataView fy = myContext.Data.FilterRowsByColumn(myData, "Yr", maxPred: 1);

/*Create IDataView of the second year (2012)’s data. ‘Yr’ column has value 1 for 2012 so set minPred to 1 so that data having Yr>=1 will be extracted*/
IDataView sy = myContext.Data.FilterRowsByColumn(myData, "Yr", minPred: 1); ```

## Define time-series analysis pipeline

``` var pipeline = myContext.Forecasting.ForecastBySsa(
outputColumnName: "PredRentals",
inputColumnName: "TotalRentals",
windowSize: 7,
seriesLength: 30,
trainSize: 400,
horizon: 5,
confidenceLevel: 0.95f,
confidenceLowerBoundColumn: "minPred",
confidenceUpperBoundColumn: "maxPred"); ```

The above-formed pipeline will take 400 training samples and split the data into monthly intervals (since ‘seriesLength’ is specified as 30). Each sample is analyzed through a weekly window (as ‘windowSize’ is 7) i.e. to forecast next period’s demand, data of the previous week will be used. As the ‘horizon’ parameter is set to 5, the model will make a forecast for the next 5 periods. ‘confidenceLevel’ of 0.95f means you can be 95% sure that the count values lie in the specified range of minPred to maxPred.

## Model training

Train the model using Fit method and fit the ‘fy’ IDataView (having 2011’s data to be used as the training set) to the pipeline.

`SsaForecastingTransformer forecaster = pipeline.Fit(fy);`

## Model evaluation

Create a utility method, say ‘Eval’ below the Main method as follows:

``` static void Eval(IDataView testData, ITransformer model, MLContext mlCon)
{
//use Transform method to forecast 2012’s data
IDataView pred = model.Transform(testData);

//use CreateEnumerable method to get actual values from data
IEnumerable<float> actualVal = mlCon.Data.CreateEnumerable<Input>(testData, true)
.Select(observed => observed.TotalRentals);

//use CreateEnumerable method to get forecast values from data
IEnumerable<float> forVal =
mlCon.Data.CreateEnumerable<Output>(pred, true)
.Select(prediction => prediction.PredRentals[0]);

/error calculation i.e. difference between actual and forecast value
var diff = actualVal.Zip(forVal, (actualVal, forVal) => actualVal -
forVal);

// compute Mean Absolute Error
var mae = diff.Average(error => Math.Abs(error));

// compute Root Mean Squared Error
var rmse = Math.Sqrt(diff.Average(error => Math.Pow(error, 2)));

//Output the metrics calculated above
Console.WriteLine(\$"Mean Absolute Error: {mae:F3}\n");
Console.WriteLine(\$"Root Mean Squared Error: {rmse:F3}\n");
//F3 means 3 digits after decimal point
} ```

Call the Eval method from the Main method to forecast values for ‘sy’ (2012’s data)

`Eval(sy, forecaster, myContext);`

## Save the model

Create a TimeSeriesPredictionEngine inside Main method

`var forEng = forecaster.CreateTimeSeriesEngine<Input, Output>(myContext);`

Use Checkpoint method to save the model at the location specified by ‘modelPath’ earlier

`forEng.CheckPoint(myContext, modelPath);`

## Use the model for demand forecasting

Create another utility method, say ‘Pred’ below the Eval method

``` static void Pred (IDataView testData, int horizon, TimeSeriesPredictionEngine<Input, Output> forecaster, MLContext mlCon)
{
//make predictions for next 7 days
Output forecast = forecaster.Predict();

//Align forecasted and actual values
IEnumerable<string> forOutput =
mlCon.Data.CreateEnumerable<Input>(testData, reuseRowObject: false)
.Take(horizon).Select((Input rental, int index) =>
{
string rentalDate = rental.Dt.ToShortDateString();
float actual = rental.TotalRentals;
float min = Math.Max(0, forecast.minPred[index]);
float estimate = forecast.PredRentals[index];
float max = forecast.maxPred[index];
return \$"Date: {rentalDate}\n" +
\$"Actual Rentals: {actual}\n" +
\$"Lower Estimate: {min}\n" +
\$"Forecast: {estimate}\n" +
\$"Upper Estimate: {max}\n";
});

//Iterate through the ‘forOutput’ and display the forecasted values on the console
foreach (var predVal in forOutput)
{
Console.WriteLine(predVal);
}
} ```

Call the Pred method from the Main method

``` Pred(sy, 7, forEng, myContext);
//test data, horizon, prediction engine and MLContext provided as arguments ```

## Output

Run the application. The obtained output will look like the following:

``` Mean Absolute Error: 726.416
Root Mean Squared Error: 987.658
Date: 1/1/2012
Actual Rentals: 2294
Lower Estimate: 1197.842
Forecast: 2334.443
Upper Estimate: 3471.044 ```

Note: This is just a sample of the condensed output. The figures and length of the actual output may vary.