Now Reading
Comprehensive Guide To Demand Forecasting Using ML.NET

Comprehensive Guide To Demand Forecasting Using ML.NET

demand forecasting ML.NET

Time series forecasting in Machine Learning refers to a task of fitting the model to some historical data, analyzing the associated patterns and predicting future trends or observations. In the conventional statistical terms, the process of making such future predictions is called ‘extrapolation’ whilst modern domains refer to it as ‘forecasting’. In this article, we create a .Net Core console application to forecast bike sharing demand using time series forecasting method and the ML.NET framework.

Not aware of the fundamentals of ML.NET ? Refer to our previous article before proceeding!

Register for our upcoming Masterclass>>

 We have already covered two use cases of this cross-platform library viz. image classification and product recommendation earlier. The application of concern in this article is demand forecasting.

Visit this page to learn about time series forecasting in detail.

Problem statement

Bike-sharing systems enable renting bikes from one place and returning it to a different location on an as-needed basis. The whole process of getting its membership, renting the bikes and returning them is automated via a network of kiosk locations throughout a city. The task here is to forecast futuristic bike sharing demand by studying the time series data comprising counts of bikes rented by bikers associated with a Capital Bikeshare program in Washington D.C. 

Looking for a job change? Let us help you.

Here, we have used ‘univariate’ time-series forecasting which means a single numerical observation i.e. count of bikes rented over a period of time at specific intervals is taken into account for analysis.

Dataset used

The Bike Sharing dataset used here comes from the UCI Machine Learning Repository. It is a univariate dataset having 17,389 instances and 16 attributes. It contains the hourly and daily count of bikes rented during 2011-2012 in the Capital bike-share system with the corresponding weather and seasonal information. Condense the dataset so that it contains only the following three columns required for time series forecasting:

  • dteday:  date of observation.
  • year: encoded year of observation (‘0’ for 2011, ‘1’ for 2012).
  • cnt:  total number of bike rentals for that day 

Here is the web link to download the dataset. Find the germane research paper here.

Algorithm used

We have used the Singular Spectrum Analysis(SSA) algorithm here. SSA decomposes a time-series into a set of principal components. These components can be considered as the parts of a signal that correspond to several factors such as trends, noise, seasonality, and so on. They are then reconstructed and used to forecast future values.

Click here to get in-depth knowledge of SSA.

Prerequisites for the implementation 

  • Use Visual Studio 2019 or higher version
  • Or use Visual Studio 2017 version 15.6 or higher with the “.NET Core cross-platform development” workload installed
  • Use SQL Database for data storage, access and retrieval

Create your C# .NET Core console application and then install the Microsoft.ML NuGet Package. Click here for its installation.

Get to know about System.Data.SqlClient and Microsoft.ML.TimeSeries used.

Open the Program.cs file and replace the ‘using’ statements with the following ones:

 using System;
 using System.Collections.Generic;
 using System.Data.SqlClient;
 using System.IO;
 using System.Linq;
 using Microsoft.ML;
 using Microsoft.ML.Data;
 using Microsoft.ML.Transforms.TimeSeries; 

Data Preparation

Map the data into an SQL database table as follows:

 CREATE TABLE [RentalData] (
     [Dt] DATE NOT NULL, 
     [Yr] INT NOT NULL,
     [TotalRentals] INT NOT NULL
 ); 

The data table should look something like this:

DtYrTotalRentals
1/3/20110234
1/23/20110675
2/14/2012127

Define the model input schema below the ‘Program’ class by creating a class, say ‘Input’ as follows:

 public class Input
 {
     public DateTime Dt { get; set; }   //date of observation
     public float Yr { get; set; }  //year of observation
     public float TotalRentals { get; set; }  //number of rentals of that day
 } 

Then define the output schema as follows:

 public class Output
 {
     
     //predicted values for forecasted period
     public float[] PredRentals { get; set; } 

     //minimum predicted values for forecasted period 
     public float[] MinPred { get; set; } 

     //maximum predicted values for forecasted period 
     public float[] MaxPred { get; set; }  
 } 

Path definitions and variable initialization

Inside the Main method, store path locations of your data and trained model and define the connection string.

 //root directory
 string root = Path.GetFullPath (Path.Combine (AppDomain.CurrentDomain.BaseDirectory, "../../../"));

 //data location
 string dbPath = Path.Combine(root, "Data", "DailyDemand.mdf");

 //path to save trained model
 string modelPath = Path.Combine(rootDir, "MLModel.zip");

 //connection string
 var conStr = $"Data Source = (LocalDB)\\MSSQLLocalDB;AttachDbFilename = {db Path};Integrated Security = True;Connect Timeout = 40;"; 

Instantiate MLContext class.

MLContext myContext = new MLContext();

Data Loading

Create a DatabaseLoader object load records of type ‘Input’

DatabaseLoader dbLoader = myContext.Data.CreateDatabaseLoader<Input>();

Fire a query to load the data from the database

string query = "SELECT Dt, CAST(Yr as REAL) as Yr, CAST(TotalRentals as REAL) as TotalRentals FROM RentalData";

ML.NET algorithms expect data to be of type Single i.e. a single-precision floating-point value. So the numerical values which are not of type Real in the database need to be converted to Real using the CAST built-in function of SQL. ‘Yr’ and ‘TotalRental’ integer columns have thus been converted to Real type in the above lines of code.

Create a DatabaseSource for connecting the database and executing the query

 DatabaseSource dbSrc = new DatabaseSource(SqlClientFactory.Instance,
                                           conStr, query); 

Load the data into an IDataView

IDataView myData = dbLoader.Load(dbSrc);

The RentalData table has data for two years (2011 and 2012). Form separate IDataView for each year. Data of 2011 will be used for model training while that of 2012 for model testing purpose.

/*Create IDataView of the first year (2011)’s data. ‘Yr’ column has value 0 for 2011 so set maxPred to 1 so that data having Yr<1 will be extracted*/
 IDataView fy = myContext.Data.FilterRowsByColumn(myData, "Yr", maxPred: 1);

/*Create IDataView of the second year (2012)’s data. ‘Yr’ column has value 1 for 2012 so set minPred to 1 so that data having Yr>=1 will be extracted*/
 IDataView sy = myContext.Data.FilterRowsByColumn(myData, "Yr", minPred: 1); 

Define time-series analysis pipeline

 var pipeline = myContext.Forecasting.ForecastBySsa(
     outputColumnName: "PredRentals",
     inputColumnName: "TotalRentals",
     windowSize: 7,
     seriesLength: 30,
     trainSize: 400,
     horizon: 5,
     confidenceLevel: 0.95f,
     confidenceLowerBoundColumn: "minPred",
     confidenceUpperBoundColumn: "maxPred"); 

The above-formed pipeline will take 400 training samples and split the data into monthly intervals (since ‘seriesLength’ is specified as 30). Each sample is analyzed through a weekly window (as ‘windowSize’ is 7) i.e. to forecast next period’s demand, data of the previous week will be used. As the ‘horizon’ parameter is set to 5, the model will make a forecast for the next 5 periods. ‘confidenceLevel’ of 0.95f means you can be 95% sure that the count values lie in the specified range of minPred to maxPred.

Model training

Train the model using Fit method and fit the ‘fy’ IDataView (having 2011’s data to be used as the training set) to the pipeline.

SsaForecastingTransformer forecaster = pipeline.Fit(fy);

Model evaluation

Create a utility method, say ‘Eval’ below the Main method as follows:

 static void Eval(IDataView testData, ITransformer model, MLContext mlCon)
 {
  //use Transform method to forecast 2012’s data
  IDataView pred = model.Transform(testData);

 //use CreateEnumerable method to get actual values from data
 IEnumerable<float> actualVal = mlCon.Data.CreateEnumerable<Input>(testData, true)
         .Select(observed => observed.TotalRentals);

 //use CreateEnumerable method to get forecast values from data
 IEnumerable<float> forVal =
     mlCon.Data.CreateEnumerable<Output>(pred, true)
         .Select(prediction => prediction.PredRentals[0]);

 /error calculation i.e. difference between actual and forecast value
 var diff = actualVal.Zip(forVal, (actualVal, forVal) => actualVal -   
 forVal);

 // compute Mean Absolute Error
 var mae = diff.Average(error => Math.Abs(error)); 

 // compute Root Mean Squared Error
 var rmse = Math.Sqrt(diff.Average(error => Math.Pow(error, 2))); 

 //Output the metrics calculated above
 Console.WriteLine($"Mean Absolute Error: {mae:F3}\n");
 Console.WriteLine($"Root Mean Squared Error: {rmse:F3}\n");
 //F3 means 3 digits after decimal point
 } 

Call the Eval method from the Main method to forecast values for ‘sy’ (2012’s data)

Eval(sy, forecaster, myContext);

Save the model

Create a TimeSeriesPredictionEngine inside Main method

var forEng = forecaster.CreateTimeSeriesEngine<Input, Output>(myContext);

Use Checkpoint method to save the model at the location specified by ‘modelPath’ earlier

forEng.CheckPoint(myContext, modelPath);

Use the model for demand forecasting

Create another utility method, say ‘Pred’ below the Eval method

 static void Pred (IDataView testData, int horizon, TimeSeriesPredictionEngine<Input, Output> forecaster, MLContext mlCon)
 {
   //make predictions for next 7 days
   Output forecast = forecaster.Predict();

   //Align forecasted and actual values  
   IEnumerable<string> forOutput =
         mlCon.Data.CreateEnumerable<Input>(testData, reuseRowObject: false)
         .Take(horizon).Select((Input rental, int index) =>
         {
             string rentalDate = rental.Dt.ToShortDateString();
             float actual = rental.TotalRentals;
             float min = Math.Max(0, forecast.minPred[index]);
             float estimate = forecast.PredRentals[index];
             float max = forecast.maxPred[index];
             return $"Date: {rentalDate}\n" +
             $"Actual Rentals: {actual}\n" +
             $"Lower Estimate: {min}\n" +
             $"Forecast: {estimate}\n" +
             $"Upper Estimate: {max}\n";
         });

 //Iterate through the ‘forOutput’ and display the forecasted values on the console      
   foreach (var predVal in forOutput)
   {
     Console.WriteLine(predVal);
   }
 } 

Call the Pred method from the Main method

 Pred(sy, 7, forEng, myContext); 
 //test data, horizon, prediction engine and MLContext provided as arguments 

Output

Run the application. The obtained output will look like the following:

 Mean Absolute Error: 726.416
 Root Mean Squared Error: 987.658
 Date: 1/1/2012
 Actual Rentals: 2294
 Lower Estimate: 1197.842
 Forecast: 2334.443
 Upper Estimate: 3471.044 

Note: This is just a sample of the condensed output. The figures and length of the actual output may vary.

Visit the following links to refer to the sources used in this article:

What Do You Think?

Join Our Discord Server. Be part of an engaging online community. Join Here.


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top