Machine Learning is widely used for classification and forecasting problems on time series problems. When there is a predictive model to predict an unknown variable; where time acts as an independent variable and a target dependent variable, time-series forecasting comes into the picture.
A predicted value can be anything — from the salaries of a potential employee or credit score of an account holder in the bank. Any data science aspirant with a formal introduction to statistics would have come across confidence intervals which are a measure of certainty of a certain model.
Current models encounter a large number of false positives; and with changing characteristics of the time-series, these models require additional training.
Time series analysis is done to predict the future values of the series using current information from the dataset.
Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, astronomy. In short, almost any domain which involves temporal measurements.
Most time series patterns can be described in terms of two basic classes of components:
Trend represents a general systematic linear or nonlinear component that changes over time and does not repeat within the time range captured by the data Whereas, seasonality has formally similar nature however, it repeats itself in systematic intervals over time. Trend and seasonality can co-exist too.
Univariate Vs Multivariate Time Series
The term "univariate time series" refers to a time series that consists of single (scalar) observations recorded sequentially over equal time increments. Some examples are monthly CO2concentrations and southern oscillations to predict el nino effects.
Whereas Multivariate time series models are designed to capture the dynamic of multiple time series simultaneously and leverage dependencies across these series for more reliable predictions.
In the case of predicting the temperature of a room every second univariate analysis is preferred since there is only one unit that is changing.
But to calculate the altitude of the rocket from the time of its launch, a multivariate time series analysis comes in handy as there will be other changes like reduction in fuel with time.
In the case of economics, multivariate time series are used to understand how policy changes to one variable, for example, an interest rate, may affect other variables over different horizons.
The data ingested for analysis comes with a lot of non-linearities and these fluctuations have to be smoothed out to make sense out of the data.
Usually, time series models are adequately approximated by a linear function; if there is a clear monotonous nonlinear component, the data first need to be transformed to remove the nonlinearity. Usually, logarithmic, exponential, or polynomial function are used.
Following are a few methods to implement multivariate time series analysis with Python:
Vector Autoregression (VAR)
The Vector Autoregression method uses an AR model. It is the generalization of AR to multiple parallel time series.
from statsmodel.tsa.vector_ar.var_model import VAR
Vector Autoregression Moving-Average (VARMA)
It is the generalization of ARMA to multiple parallel time series, e.g. multivariate time series.
from statsmodel.tsa.statespace.armax import VARMAX
Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)
The Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX) is an extension of the VARMA model that also includes the modelling of exogenous variables. It is a multivariate version of the ARMAX method.
Holt Winter’s Exponential Smoothing (HWES)
The Holt Winter’s Exponential Smoothing (HWES) is an exponentially weighted linear function of observations at prior time steps, taking trends and seasonality into account.
from statsmodel.tsa.holtwinters import ExponentialSmoothing
The components of time-series are as complex and sophisticated as the data itself. With increasing time, the data obtained increases and it doesn’t always mean that more data means more information but, larger sample avoids the error that arises due to random sampling.
“The scale of the data revolution is extraordinary: the past two years alone have witnessed the creation of 90% of all data that exists in the world today, and by 2020, each of the 7.7 billion people worldwide is expected to produce 1.7 MB of new information every second of every day. On the other hand, back in 2012, only 0.5% of all data was ever analyzed and used, whereas 33% is deemed to have value by 2020. The gap between data availability and usage is likely to narrow quickly as global investments in analytics are set to rise beyond $210 billion by 2020, while the value creation potential is a multiple higher,” observes the author in the book titled Hands on machine learning for algorithmic trading.
Get practical with time series analysis here.
Register for our upcoming events:
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- WEBINAR: HOW TO BEGIN A CAREER IN DATA SCIENCE | 24th Oct
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad