Listen to this story
|
Transfer learning is an approach to save effort during training large machine learning or deep learning models. It helps in avoiding repetitive processes to learn the feature from the data. There are various pretrained models used in computer vision to facilitate transfer learning. Here in this article, we will learn how to leverage transfer learning in time series forecasting problems especially when we use a deep learning model such as LSTM for predictions. We will build a model for one task of time series forecasting and we will use the same as a pretrained model in a different but similar time series forecasting application without spending much effort on training.
Table of Contents
- What is transfer learning?
- Building transfer learning model for time series forecasting
- Using a pretrained time series forecasting model
- Summary
What is transfer learning?
Transfer learning is one of the techniques of using readily available or pretrained weights of different models trained for similar tasks and using it in our tasks to produce efficient results. There are various transfer learning models provided by the TensorFlow framework but there are more suitable for image classification.
So this article includes a case study of how to implement the transfer learning technique for time series data, wherein first, a model is built for some data and there is one more corresponding model developed for the same kind of data and used to obtain predictions.
Building Transfer Learning Model
So for the case study in this article, we have used time-series data in order to forecast the household power consumption using time series forecasting techniques.
The data acquired initially was in form of a text (txt) file and this data was suitably preprocessed using the pandas’ framework to obtain the text (txt) file in form of a comma-separated values (CSV) file and also the parse_date() function was used to obtain the data suitable for time series forecasting. The steps to follow are shown below.
df = pd.read_csv('/content/drive/MyDrive/Colab notebooks/Transfer learning with time series data/household_power_consumption.txt', sep=';', parse_dates={'dt' : ['Date', 'Time']}, infer_datetime_format=True, low_memory=False, na_values=['nan','?'], index_col='dt')
Once the appropriate preprocessing was done the data was visualized for initial t observations using the head() function of pandas as shown below.
df.head()

So once the data was visualized the data was split into train and test using the scikit-learn model and 20% of the data available was set aside for validation. The steps to follow for the same are shown below.
main,val=train_test_split(df,test_size=0.2)
Here the “main” data was used to build the first LSTM model. Using the main dataset the data was visualized for trends and seasonality present, where certain features were resampled on a monthly basis on various aggregate parameters like sum and mean.
The feature named GlobalActivePower was resampled for sum and mean to visualize the distribution as shown below.
main.Global_active_power.resample('D').sum().plot(title='Resampling for sum') plt.tight_layout() plt.show() main.Global_active_power.resample('D').mean().plot(title='Resampling for mean', color='red') plt.tight_layout() plt.show()

In a similar manner, any feature of the dataset can be resampled accordingly to check the distribution across various aggregate functions. Even certain features can be resampled on various frequency parameters of time series data. Here is a sample code of resampling one of the features monthly and visualizing it is given below.
main['Voltage'].resample('M').mean().plot(kind='bar', color='red') plt.xticks(rotation=60) plt.ylabel('Voltage') plt.title('Voltage per quarter (summed over quarter)') plt.show()

So as we have seen earlier there are various features that have to be normalized on a common scale. So for this purpose, the min-max scaler library of the scikit learn module was used and suitable preprocessing for suitable model fitting was done as shown below.
from sklearn.preprocessing import MinMaxScaler ## If you would like to train based on the resampled data (over hour), then used below values = df_resample.values scaler = MinMaxScaler(feature_range=(0, 1)) scaled = scaler.fit_transform(values) reframed = series_to_supervised(scaled, 1, 1) # drop columns we don't want to predict reframed.drop(reframed.columns[[8,9,10,11,12,13]], axis=1, inplace=True) print(reframed.head())
Now the scaled values are suitably preprocessed for splitting them into train and test and facilitate model building. The steps involved are shown below.
# split into train and test sets values = reframed.values n_train_time = 365*24 train = values[:n_train_time, :] test = values[n_train_time:, :] ##test = values[n_train_time:n_test_time, :] # split into input and outputs train_X, train_y = train[:, :-1], train[:, -1] test_X, test_y = test[:, :-1], test[:, -1] # reshape input to be 3D [samples, timesteps, features] train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1])) test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1])) print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) # We reshaped the input into the 3D format as expected by LSTMs, namely [samples, timesteps, features].
Now as the data is split is successful we proceed with the model building where a recurrent neural network is built. But first, let’s import the necessary libraries for the same as shown below.
import tensorflow as tf import keras from tensorflow.keras.layers import Dense from tensorflow.keras.models import Sequential from tensorflow.keras.utils import to_categorical from tensorflow.keras.optimizers import SGD from tensorflow.keras.callbacks import EarlyStopping from keras.utils import np_utils import itertools from tensorflow.keras.layers import LSTM from tensorflow.keras.layers import Conv1D from tensorflow.keras.layers import MaxPooling1D from tensorflow.keras.layers import Dropout
So now the model is built with the layers as shown below.
model = Sequential() model.add(LSTM(100, input_shape=(train_X.shape[1], train_X.shape[2]))) model.add(Dropout(0.2)) model.add(Dense(1))
Now the model is suitably compiled as shown below and the metric used for evaluating the model is the root mean square as it is a more relevant parameter for the evaluation of time series data. The steps involved are shown below.
model.compile(loss='mean_squared_error', optimizer='adam')
Now the model is fitted to the split data as shown below.
history = model.fit(train_X, train_y, epochs=20, batch_size=70, validation_data=(test_X, test_y), verbose=2, shuffle=False)
Now using this model let’s try to obtain predictions and as this model was compiled for mean_squared_error let’s evaluate this model on the same grounds only. For time-series data especially for obtaining predictions we have to perform some preprocessing of the target variable and the steps for the same are shown below.
# make a prediction ypred = model.predict(test_X) test_X = test_X.reshape((test_X.shape[0], 7)) # invert scaling for forecasted values inv_ypred = np.concatenate((ypred, test_X[:, -6:]), axis=1) inv_ypred = scaler.inverse_transform(inv_ypred) inv_ypred = inv_ypred[:,0] # invert scaling for actual values test_y = test_y.reshape((len(test_y), 1)) inv_yact = np.concatenate((test_y, test_X[:, -6:]), axis=1) inv_yact = scaler.inverse_transform(inv_yact) inv_yact = inv_yact[:,0] # calculate RMSE rmse = np.sqrt(mean_squared_error(inv_yact, inv_ypred)) print('Test RMSE: %.3f' % rmse)
So for the model developed we obtain a Test RMSE of 0.622 as shown below.
Using a pretrained time series forecasting model
Now let’s save the model weights and parameters in an h5 format as shown below.
model.save('lstm_model_new.h5')
So now in a new instance for a similar kind of data, the saved model can be loaded into the working environment as shown below.
from tensorflow.keras.models import load_model loaded_model=load_model('/content/lstm_model_new.h5')
The layers of the loaded model can be obtained as shown below.
loaded_model.layers

If we can recall we had kept aside a certain part of data for validation. So for the new model created the validation data was suitably preprocessed as mentioned above and in a similar fashion, a Sequential model was built by freezing certain layers to facilitate transfer learning. The steps to follow are shown below.
# extract all the layers from base model except the last layer for layer in loaded_model.layers[:-1]: model1.add(layer) # Freeze all the layers of base model for layer in loaded_model.layers: layer.trainable=False # adding new layers model1.add(Dense(50,input_dim=1)) model1.add(Dropout(0.1)) model1.add(Dense(1))
So once the freezing of the necessary layers is done the model was compiled in a similar fashion and the model was fitted for the split data. Similar to the pretrained model model1 was also evaluated for root mean square error and the model is showing excellent performance and yielding almost the same results as that of the pretrained model.
The Root Mean Square of the new model obtained was 0.621.
So this is how we can implement transfer learning for time series data where pretrained models for similar kinds of data can be used to obtain easier predictions.
Note
Time series data are very uncertain and hold various parameters like trend and seasonality. So it is a best practice to visualize the series first and use the models which are pretrained for similar kinds of data.
Summary
Transfer learning is one of the techniques to produce effective models, but the underlying fact is that as data varies the pretrained models to be used varies and time-series data includes various technical factors such as stationarity, seasonality, and trends and it becomes important to opt the right pretrained model for the right type of data.