How To Do Multivariate Time Series Forecasting Using LSTM

This is the 21st century, and it has been revolutionary for the development of machines so far and enabled us...

This is the 21st century, and it has been revolutionary for the development of machines so far and enabled us to perform supposedly impossible tasks; predicting the future was one of them. But now, with the help of advanced computational power and a tremendous boost in the field of artificial intelligence, machine learning, the process of predicting the future has become quite simple and fast. Some of the major applications of this field are Image recognition, Speech recognition, Traffic prediction, Self-driving car, Virtual Personal assistance, and the list continues.

Time series forecasting is also an important area in machine learning. However, it is neglected due to its complexity, and this complexity is due to the time components like trend, seasonality, base level of series, Noise. Time series forecasting involves fitting models on historical data and using the fitment to predict the future data the same as the other ML technique. The only major difference between the simple prediction based model and forecasting model is that here the forecasting is completely unavailable and must be only estimated with the help of what has already happened.

This article will discuss deep learning techniques used to address forecasting using multiple dependent variables and one target variable. This technique is taken from the Book called ‘Hands on Time series analysis using Python’.  The author used a Bidirectional LSTM based network with customized data preparation, and the result is supposed to follow the trend.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Let’s check the result practically by leveraging python.

Code implementation Multivariate Time Series Forecasting Using LSTM

Import all dependencies:
 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
 import as px # to plot the time series plot
 from sklearn import metrics # for the evaluation
 from sklearn.preprocessing import LabelEncoder,MinMaxScaler
 import tensorflow as tf 

Dataset is about the Metro interstate traffic status comprising nine variables and the target variable, and the samples are taken for six years from 2012 to 2018. First, let’s have a look at the data frame. 

 data = pd.read_csv('metro data.csv')

Check out the trend using Plotly target variable and date; here target variable is nothing but the traffic_volume for one year. 

Some of the variables are categorical. So we have to use LabelEncoder to convert it into numbers and use MinMaxScaler to scale down the values. The neural network converges sooner when it exposes the same scaled features and gives better accuracy.

 for i in data.select_dtypes('object').columns:
   le = LabelEncoder().fit(data[i])
   data[i] = le.transform(data[i]) 
 X_scaler = MinMaxScaler()
 Y_scaler = MinMaxScaler()
 X_data = X_scaler.fit_transform(data[['holiday', 'temp', 'rain_1h', 'snow_1h', 'clouds_all', 'weather_main',
 Y_data = Y_scaler.fit_transform(data[['traffic_volume']]) 

Below is the user-defined function which preprocesses the data suitable for forecasting. 

 def custom_ts_multi_data_prep(dataset, target, start, end, window, horizon):
     X = []
     y = []
     start = start + window
     if end is None:
         end = len(dataset) - horizon
     for i in range(start, end):
         indices = range(i-window, i)
         indicey = range(i+1, i+1+horizon)
     return np.array(X), np.array(y) 

As we are doing multiple-step forecasting, let’s allow the model to see past 48 hours of data and forecast the 10 hrs after data; for that, we set the horizon to 10.

 hist_window = 48
 horizon = 10
 TRAIN_SPLIT = 30000
 x_train, y_train = custom_ts_multi_data_prep(X_data, Y_data, 0, TRAIN_SPLIT, hist_window, horizon)
 x_vali, y_vali = custom_ts_multi_data_prep(X_data, Y_data, TRAIN_SPLIT, None, hist_window, horizon) 

The train window should contain eight variables and one target variable for about ten observations. 

 print ('Multiple window of past history\n')
 print ('\n Target horizon\n')
 print (y_train[0]) 


 Multiple window of past history
 [[0.63636364 0.92972555 0.         0.         0.4        0.1
   0.7        0.76167582]
  [0.63636364 0.93320863 0.         0.         0.75       0.1
   0.06666667 0.62032967]
  [0.63636364 0.93391815 0.         0.         0.9        0.1
   0.56666667 0.65480769]
  [0.63636364 0.93569194 0.         0.         0.9        0.1
   0.56666667 0.69038462]
  [0.63636364 0.93894927 0.         0.         0.75       0.1
   0.06666667 0.67554945]
  [0.63636364 0.94081981 0.         0.         0.01       0.
   0.73333333 0.71167582]
  [0.63636364 0.94549618 0.         0.         0.01       0.
   0.73333333 0.76703297]
  [0.63636364 0.94772148 0.         0.         0.01       0.
   0.73333333 0.82623626]
  [0.63636364 0.9486245  0.         0.         0.2        0.1
   0.13333333 0.79546703]
  [0.63636364 0.94527042 0.         0.         0.2        0.1
   0.13333333 0.65521978]
  [0.63636364 0.93840101 0.         0.         0.2        0.1
   0.13333333 0.48612637]
  [0.63636364 0.93327313 0.         0.         0.01       0.
   0.73333333 0.38241758]
  [0.63636364 0.93078982 0.         0.         0.01       0.
   0.73333333 0.32431319]
  [0.63636364 0.92611346 0.         0.         0.01       0.
   0.73333333 0.21002747]]
  Target horizon

Prepare the training data and validation data using the TensorFlow data function, which faster and efficient way to feed data for training.

 batch_size = 256
 buffer_size = 150
 train_data =, y_train))
 train_data = train_data.cache().shuffle(buffer_size).batch(batch_size).repeat()
 val_data =, y_vali))
 val_data = val_data.batch(batch_size).repeat() 

Build and compile the model

 lstm_model = tf.keras.models.Sequential([
   tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(200, return_sequences=True), 
     tf.keras.layers.Dense(20, activation='tanh'),
     tf.keras.layers.Dense(20, activation='tanh'),
     tf.keras.layers.Dense(20, activation='tanh'),
 lstm_model.compile(optimizer='adam', loss='mse')

Configure the model and start training with early stopping and checkpoint. Early stopping stops training when monitored loss starts increasing above the patience, and checkpoint saves the model weight as it reaches the minimum loss.

 model_path = 'Bidirectional_LSTM_Multivariate.h5'
 early_stopings = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1, mode='min')
 checkpoint =  tf.keras.callbacks.ModelCheckpoint(model_path, monitor='val_loss', save_best_only=True, mode='min', verbose=0)

history =,epochs=150,steps_per_epoch=100,validation_data=val_data,validation_steps=50,verbose=1,callbacks=callbacks)

Early stopping has done its job; out of 150 epochs model stopped training at 32 epochs.

Check the loss curve for training and validation. 

 plt.title('Model loss')
 plt.legend(['train loss', 'validation loss']) 

Prepare the testing data for the last 48 hrs and check the prediction against it by visualizing the actual and predicted values. Finally, evaluate the result with standard performance metrics.

   data_val = X_scaler.fit_transform(data[['holiday', 'temp', 'rain_1h', 'snow_1h', 'clouds_all', 'weather_main','weather_description', 'traffic_volume']].tail(48))
   val_rescaled = data_val.reshape(1, data_val.shape[0], data_val.shape[1])
 pred = lstm_model.predict(val_rescaled)
 pred_Inverse = Y_scaler.inverse_transform(pred)
 def timeseries_evaluation_metrics_func(y_true, y_pred):
     def mean_absolute_percentage_error(y_true, y_pred): 
         y_true, y_pred = np.array(y_true), np.array(y_pred)
         return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
     print('Evaluation metric results:-')
     print(f'MSE is : {metrics.mean_squared_error(y_true, y_pred)}')
     print(f'MAE is : {metrics.mean_absolute_error(y_true, y_pred)}')
     print(f'RMSE is : {np.sqrt(metrics.mean_squared_error(y_true, y_pred))}')
     print(f'MAPE is : {mean_absolute_percentage_error(y_true, y_pred)}')
     print(f'R2 is : {metrics.r2_score(y_true, y_pred)}',end='\n\n') 


 plt.plot( list(validate['traffic_volume']))
 plt.plot( list(pred_Inverse[0]))
 plt.title("Actual vs Predicted")
 plt.ylabel("Traffic volume")


We have seen how the time series forecasting differs from any other prediction technique and the component like a trend; seasonality affects the analysis. We have discussed only one technique from the book, where the author has covered many more techniques for single-step multi-step analysis. The discussed method is quite impressive and can be adept in real-time situations. 


Vijaysinh Lendave
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox