Building a transfer learning model for time series forecasting

This article is about the transfer learning technique and how to use it in time series forecasting problems.
Listen to this story

Transfer learning is an approach to save effort during training large machine learning or deep learning models. It helps in avoiding repetitive processes to learn the feature from the data. There are various pretrained models used in computer vision to facilitate transfer learning. Here in this article, we will learn how to leverage transfer learning in time series forecasting problems especially when we use a deep learning model such as LSTM for predictions. We will build a model for one task of time series forecasting and we will use the same as a pretrained model in a different but similar time series forecasting application without spending much effort on training. 

Table of Contents

  1. What is transfer learning?
  2. Building transfer learning model for time series forecasting
  3. Using a pretrained time series forecasting model
  4. Summary

What is transfer learning?

Transfer learning is one of the techniques of using readily available or pretrained weights of different models trained for similar tasks and using it in our tasks to produce efficient results. There are various transfer learning models provided by the TensorFlow framework but there are more suitable for image classification.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

So this article includes a case study of how to implement the transfer learning technique for time series data, wherein first, a model is built for some data and there is one more corresponding model developed for the same kind of data and used to obtain predictions.

Building Transfer Learning Model

So for the case study in this article, we have used time-series data in order to forecast the household power consumption using time series forecasting techniques.

The data acquired initially was in form of a text (txt) file and this data was suitably preprocessed using the pandas’ framework to obtain the text (txt) file in form of a comma-separated values (CSV) file and also the parse_date() function was used to obtain the data suitable for time series forecasting. The steps to follow are shown below.

df = pd.read_csv('/content/drive/MyDrive/Colab notebooks/Transfer learning with time series data/household_power_consumption.txt', sep=';',
                parse_dates={'dt' : ['Date', 'Time']}, infer_datetime_format=True,
                low_memory=False, na_values=['nan','?'], index_col='dt')

Once the appropriate preprocessing was done the data was visualized for initial t observations using the head() function of pandas as shown below.

df.head()

So once the data was visualized the data was split into train and test using the scikit-learn model and 20% of the data available was set aside for validation. The steps to follow for the same are shown below.

main,val=train_test_split(df,test_size=0.2)

Here the “main” data was used to build the first LSTM model. Using the main dataset the data was visualized for trends and seasonality present, where certain features were resampled on a monthly basis on various aggregate parameters like sum and mean. 

The feature named GlobalActivePower was resampled for sum and mean to visualize the distribution as shown below.

main.Global_active_power.resample('D').sum().plot(title='Resampling for sum')
plt.tight_layout()
plt.show()  
 
main.Global_active_power.resample('D').mean().plot(title='Resampling for mean', color='red')
plt.tight_layout()
plt.show()

In a similar manner, any feature of the dataset can be resampled accordingly to check the distribution across various aggregate functions. Even certain features can be resampled on various frequency parameters of time series data. Here is a sample code of resampling one of the features monthly and visualizing it is given below.

main['Voltage'].resample('M').mean().plot(kind='bar', color='red')
plt.xticks(rotation=60)
plt.ylabel('Voltage')
plt.title('Voltage per quarter (summed over quarter)')
plt.show()

So as we have seen earlier there are various features that have to be normalized on a common scale. So for this purpose, the min-max scaler library of the scikit learn module was used and suitable preprocessing for suitable model fitting was done as shown below.

from sklearn.preprocessing import MinMaxScaler
## If you would like to train based on the resampled data (over hour), then used below
values = df_resample.values
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
reframed = series_to_supervised(scaled, 1, 1)
 
# drop columns we don't want to predict
reframed.drop(reframed.columns[[8,9,10,11,12,13]], axis=1, inplace=True)
print(reframed.head())

Now the scaled values are suitably preprocessed for splitting them into train and test and facilitate model building. The steps involved are shown below.

# split into train and test sets
values = reframed.values
 
n_train_time = 365*24
train = values[:n_train_time, :]
test = values[n_train_time:, :]
##test = values[n_train_time:n_test_time, :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
# We reshaped the input into the 3D format as expected by LSTMs, namely [samples, timesteps, features].

Now as the data is split is successful we proceed with the model building where a recurrent neural network is built. But first, let’s import the necessary libraries for the same as shown below.

import tensorflow as tf
import keras
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import EarlyStopping
from keras.utils import np_utils
import itertools
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Conv1D
from tensorflow.keras.layers import MaxPooling1D
from tensorflow.keras.layers import Dropout

So now the model is built with the layers as shown below.

model = Sequential()
model.add(LSTM(100, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(1))

Now the model is suitably compiled as shown below and the metric used for evaluating the model is the root mean square as it is a more relevant parameter for the evaluation of time series data. The steps involved are shown below.

model.compile(loss='mean_squared_error', optimizer='adam')

Now the model is fitted to the split data as shown below.

history = model.fit(train_X, train_y, epochs=20, batch_size=70, validation_data=(test_X, test_y), verbose=2, shuffle=False)

Now using this model let’s try to obtain predictions and as this model was compiled for mean_squared_error let’s evaluate this model on the same grounds only. For time-series data especially for obtaining predictions we have to perform some preprocessing of the target variable and the steps for the same are shown below.

# make a prediction
ypred = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], 7))
# invert scaling for forecasted values
inv_ypred = np.concatenate((ypred, test_X[:, -6:]), axis=1)
inv_ypred = scaler.inverse_transform(inv_ypred)
inv_ypred = inv_ypred[:,0]
# invert scaling for actual values
test_y = test_y.reshape((len(test_y), 1))
inv_yact = np.concatenate((test_y, test_X[:, -6:]), axis=1)
inv_yact = scaler.inverse_transform(inv_yact)
inv_yact = inv_yact[:,0]
# calculate RMSE
rmse = np.sqrt(mean_squared_error(inv_yact, inv_ypred))
print('Test RMSE: %.3f' % rmse)

So for the model developed we obtain a Test RMSE  of 0.622 as shown below.

Using a pretrained time series forecasting model

Now let’s save the model weights and parameters in an h5 format as shown below.

model.save('lstm_model_new.h5')

So now in a new instance for a similar kind of data, the saved model can be loaded into the working environment as shown below.

from tensorflow.keras.models import load_model
loaded_model=load_model('/content/lstm_model_new.h5')

The layers of the loaded model can be obtained as shown below.

loaded_model.layers

If we can recall we had kept aside a certain part of data for validation. So for the new model created the validation data was suitably preprocessed as mentioned above and in a similar fashion, a Sequential model was built by freezing certain layers to facilitate transfer learning. The steps to follow are shown below.

# extract all the layers from base model except the last layer
for layer in loaded_model.layers[:-1]:
 model1.add(layer)
 
# Freeze all the layers of base model
for layer in loaded_model.layers:
 layer.trainable=False
 
# adding new layers
model1.add(Dense(50,input_dim=1))
model1.add(Dropout(0.1))
model1.add(Dense(1))

So once the freezing of the necessary layers is done the model was compiled in a similar fashion and the model was fitted for the split data. Similar to the pretrained model model1 was also evaluated for root mean square error and the model is showing excellent performance and yielding almost the same results as that of the pretrained model. 

The Root Mean Square of the new model obtained was 0.621.

So this is how we can implement transfer learning for time series data where pretrained models for similar kinds of data can be used to obtain easier predictions.

Note

Time series data are very uncertain and hold various parameters like trend and seasonality. So it is a best practice to visualize the series first and use the models which are pretrained for similar kinds of data.

Summary

Transfer learning is one of the techniques to produce effective models, but the underlying fact is that as data varies the pretrained models to be used varies and time-series data includes various technical factors such as stationarity, seasonality, and trends and it becomes important to opt the right pretrained model for the right type of data.

More Great AIM Stories

Darshan M
Darshan is a Master's degree holder in Data Science and Machine Learning and an everyday learner of the latest trends in Data Science and Machine Learning. He is always interested to learn new things with keen interest and implementing the same and curating rich content for Data Science, Machine Learning,NLP and AI

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.

Now Reliance wants to conquer the AI space

Many believe that Reliance is aggressively scouting for AI and NLP companies in the digital space in a bid to create an Indian equivalent of FAANG – Facebook, Apple, Amazon, Netflix, and Google.

[class^="wpforms-"]
[class^="wpforms-"]