Active Hackathon

# A Guide to Different Evaluation Metrics for Time Series Forecasting Models

Measuring the performance of any machine learning model is very important, not only from the technical point of view but also from the business perspective.

Measuring the performance of any machine learning model is very important, not only from the technical point of view but also from the business perspective. Especially when the business decisions are dependent on the insights generated from the forecasting models, knowing its accuracy becomes vital. There are different types of evaluation metrics used in machine learning depending on the model used and the results generated. In the same context, there are different evaluation metrics used to measure the performance of a time-series forecasting model. In this post, we will discuss different evaluation metrics used for measuring the performance of a time series model with their importance and applicability. The major points to be covered in this article are listed below.

1. Measuring Time Series Forecasting Performance
2. Evaluation Metrics to Measure Performance
1. R-Squared
2. Mean Absolute Error
3. Mean Absolute Percentage Error
4. Mean Squared Error
5. Root Mean Squared Error
6. Normalized Root Mean Squared Error
7. Weighted Absolute Percentage Error
8. Weighted Mean Absolute Percentage Error
3. Summary

Let’s start the discussion by understanding why measuring the performance of a time series forecasting model is necessary.

#### Measuring Time Series Forecasting Performance

The fact that the future is wholly unknown and can only be predicted from what has already occurred is a significant distinction in forecasting. The ability of a time series forecasting model to predict the future is defined by its performance. This is frequently at the expense of being able to explain why a particular prediction was made, confidence intervals, and even a greater grasp of the problem’s underlying causes.

Time series prediction performance measurements provide a summary of the forecast model’s skill and capability in making the forecasts. There are numerous performance metrics from which to pick. Knowing which metric to use and how to interpret the data might be difficult.

Moving further, we will see different performance measures that can be applied to evaluate the forecasting model under different circumstances.

Evaluation Metrics to Measure Performance

Now, let us have a look at the popular evaluation metrics used to measure the performance of a time-series forecasting model.

#### R-Squared

The stationary R-squared is used in time series forecasting as a measure that compares the stationary part of the model to a simple mean model. ​​It is defined as,

Where SSres denotes the sum of squared residuals from expected values and SStot denotes the sum of squared deviations from the dependent variable’s sample mean. It denotes the proportion of the dependent variable’s variance that may be explained by the independent variable’s variance. A high R2 value shows that the model’s variance is similar to that of the true values, whereas a low R2 value suggests that the two values are not strongly related.

The most important thing to remember about R-squared is that it does not indicate whether or not the model is capable of making accurate future predictions. It shows whether or not the model is a good fit for the observed values, as well as how good of a fit it is. A high R2 indicates that the observed and anticipated values have a strong association.

#### Mean Absolute Error (MAE)

The MAE is defined as the average of the absolute difference between forecasted and true values. Where yi is the expected value and xi is the actual value (shown below formula). The letter n represents the total number of values in the test set.

The MAE shows us how much inaccuracy we should expect from the forecast on average. MAE = 0 means that the anticipated values are correct, and the error statistics are in the original units of the forecasted values.

The lower the MAE value, the better the model; a value of zero indicates that the forecast is error-free. In other words, the model with the lowest MAE is deemed superior when comparing many models.

However, because MAE does not reveal the proportional scale of the error, it can be difficult to distinguish between large and little errors. It can be combined with other measures to see if the errors are higher (see Root Mean Square Error below). Furthermore, MAE might obscure issues related to low data volume; for more information, check the last two metrics in this article.

#### Mean Absolute Percentage Error (MAPE)

MAPE is the proportion of the average absolute difference between projected and true values divided by the true value. The anticipated value is Ft, and the true value is At. The number n refers to the total number of values in the test set.

It works better with data that is free of zeros and extreme values because of the in-denominator. The MAPE value also takes an extreme value if this value is exceedingly tiny or huge.

The model is better if the MAPE is low. Remember that MAPE works best with data that is devoid of zeros and extreme values. MAPE, like MAE, understates the impact of big but rare errors caused by extreme values.

Mean Square Error can be utilized to address this issue. This statistic may obscure issues related to low data volume; for more information, check the last two metrics in this article.

#### Mean Squared Error (MSE)

MSE is defined as the average of the error squares. It is also known as the metric that evaluates the quality of a forecasting model or predictor. MSE also takes into account variance (the difference between anticipated values) and bias (the distance of predicted value from its true value).

Where y’ denotes the predicted value and y denotes the actual value. The number n refers to the total number of values in the test set. MSE is almost always positive, and lower values are preferable. This measure penalizes large errors or outliers more than minor errors due to the square term (as seen in the formula above).

The closer MSE is to zero, the better. While it overcomes MAE and MAPE extreme value and zero problems, it may be harmful in some instances. When dealing with low data volume, this statistic may ignore issues; to address this, see Weighted Absolute Percentage Error and Weighted Mean Absolute Percentage Error.

#### Root Mean Squared Error(RMSE)

This measure is defined as the square root of mean square error and is an extension of MSE. Where y’ denotes the predicted value and y denotes the actual value. The number n refers to the total number of values in the test set. This statistic, like MSE, penalizes greater errors more.

This statistic is likewise always positive, with lower values indicating higher performance. The RMSE number is in the same unit as the projected value, which is an advantage of this technique. In comparison to MSE, this makes it easier to comprehend.

The RMSE can also be compared to the MAE to see whether there are any substantial but uncommon inaccuracies in the forecast. The wider the gap between RMSE and MAE, the more erratic the error size. This statistic can mask issues with low data volume.

#### Normalized Root Mean Squared Error (NRMSE)

The normalized RMSE is used to calculate NRMSE, which is an extension of RMSE. The mean or the range of actual values are the two most used methods for standardizing RMSE (difference of minimum and maximum values). The maximum true value is ymax, while the smallest true value is ymin.

NRMSE is frequently used to compare datasets or forecasting models with varying sizes (units and gross revenue, for example). The smaller the value, the better the model’s performance. When working with little amounts of data, this metric can be misleading. However, Weighted Absolute Percentage Error and Weighted Mean Absolute Percentage Error can help.

#### Weighted Mean Absolute Percentage Error (WMAPE)

WMAPE (sometimes called wMAPE) is an abbreviation for Weighted Mean Absolute Percentage Error. It is a measure of a forecasting method’s prediction accuracy. It is a MAPE version in which errors are weighted by real values (e.g. in the case of sales forecasting, errors are weighted by sales volume).

where A is the current data vector and F is the forecast This metric has an advantage over MAPE in that it avoids the ‘infinite error’ problem.

The higher the model’s performance, the lower the WMAPE number. When evaluating forecasting models, this metric is useful for low volume data where each observation has a varied priority. The weight value of observations with a higher priority is higher. The WMAPE number increases as the error in high-priority forecast values grows.

#### Summary

Let’s have a quick summary of all the above-mentioned measures.

• When the relation between the forecasted and actual value is to be known then R2 is used.
• When absolute error must be measured, MAE is useful. It is simple to understand, but in the case of data with extreme values, it is inefficient. MAPE is also simple to understand and is used to compare different forecast models or datasets because it is a percentage value. MAPE has the same problem as MAE in that it is inefficient when data contains extreme values.
• MSE is beneficial when the spread of prediction values is significant and larger values must be punished. However, because it is a squared value, this metric is frequently difficult to comprehend.
• When the spread is important and bigger values need to be penalized, RMSE (NRMSE) is also useful. When compared to MSE, RMSE is easier to interpret because the RMSE number is on the same scale as the projected values.
• When dealing with low-volume data, WMAPE is also useful. WMAPE uses the weight (priority value) of each observation to help incorporate the priority.

#### Conclusion

Through this post, we have seen different performance evaluation metrics used in time series forecasting in different scenarios. Most of all above-mentioned measures can directly be utilized from sklearn.metrics class or can be directly implemented from scratch with NumPy and math modules.

## More Great AIM Stories

### Moving Beyond Transformers: Microsoft Enhances Bing Search Results With MEB

Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

## Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.

### Council Post: Enabling a Data-Driven culture within BFSI GCCs in India

Data is the key element across all the three tenets of engineering brilliance, customer-centricity and talent strategy and engagement and will continue to help us deliver on our transformation agenda. Our data-driven culture fosters continuous performance improvement to create differentiated experiences and enable growth.

### Indian IT is Trying to Make Their Metaverse Happen

TCS is working on 60 metaverse projects globally.

### Should we call Rust a Failed Programming Language?

Rust has been ranked as the most liked language by its users for two years in surveys but programmers say otherwise

### WhatsApp Journeys – Instant Gratification with No frills

It is not merely the availability of customers on WhatsApp that is of value but also, the ease of their journey.

### Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter

### Why the Government is Right to Block the Startup Sales to Big Tech

This concentration of power and wealth in big tech mirrors the rise in inequality in the broader society.

### Lessons from Tech Firms’ internal skill-building platforms

When the trends of re-shuffling, reassessing and re-inventing are widespread among employees, providing adequate career advancement opportunities seems wiser

### IT attrition might be down, but let’s not cheer yet

Tech Mahindra is one of the few IT companies to have witnessed a decline in attrition, noting a 2 per cent drop compared to the previous quarter.

### DataStax in a crowded NoSQL Market

With Astra Streaming integrated into Astra DB, DataStax delivers an open stack that unifies all aspects of real-time data

### Now Microsoft wants a share of the ‘AI image generator’ pie

Compared to DALL-E, Imagen and Midjourney, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation, says Microsoft

[class^="wpforms-"]
[class^="wpforms-"]