Time series refers to plotting data points in sequential time order. Now those data points can use a data of an athlete’s performance, cricket player according to most run in one-day, weather reading every month, the daily closing price of company stock. Time series analysis is also the same term, but it is concerned with taking that data-points and cleaning, understanding, and forecasting them using some tools or programming languages. Now time series is sometimes called panel data. Panel data is a general class, multidimensional dataset, on the side time series dataset is a one-dimensional panel.
Let’s talk about Time series forecasting as we already know that time series analysis is all about analyzing the time series data and extracting meaningful insights from it.
Time-series Forecasting is more of using models to predict future values based on previously observed cleaned processed time series data.
Components of Time Series
There are four categories of a component of time series: Trend, Seasonal & Cycle Variation, and Random or Irregular movements. Seasonal changes are more of a short time change.
- Trends show the insights about higher or the lower peak in any dataset.
- Periodic fluctuations are the type of time series which shows repetition in their visualization over a while. They are of two types:
- Seasonal Variations: These periodic fluctuations change over a regular period, and change happens in less than a year
- Cyclic Variations: These periodic fluctuations changes over more than one year of the time cycle.
- Random Movement time series or Noise: In this data points are unpredictable, and it hard to make a time series forecasting on these kinds of data because we can’t find patterns easily.
Real world data before cleaning always has some noise, trends, and seasonality.
Tensorflow models for forecasting
Now time series forecasting or predictive modeling can be done using any framework, TensorFlow provides us a few different styles of models for like Convolution Neural Network (CNN), Recurrent Neural Networks (RNN), you can forecast a single time step using a single feature or you can forecast multiple steps and make all predictions at once using Single-shot.
Setup
The necessary module you need import to get started they will help you in modeliing, visulization, file handling, data exploration and all sort of thing.
import os
import datetime
import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False
Lets’s take the Weather dataset from Max Planck Institute for Biogeochemistry , this dataset contains 14 different feature: air temperature, humidity, atmospheric pressure. From 2003 these datapoints were collected on basis of every 10 minute. Let’s explore the dataset:
#download the zip file of dataset file_path = tf.keras.utils.get_file( origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip', fname='jena_climate_2009_2016.csv.zip', extract=True) csv_path, _ = os.path.splitext(file_path) #explore the dataset df = pd.read_csv(csv_path) df = data[5::6] date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S') df.head()
#Data visualization over the years with some features
plot_cols = ['T (degC)', 'p (mbar)', 'rho (g/m**3)']
plot_features = df[plot_cols]
plot_features.index = date_time
_ = plot_features.plot(subplots=True)
plot_features = df[plot_cols][:480]
plot_features.index = date_time[:480]
_ = plot_features.plot(subplots=True)
Let’s clean data for better modelling and visualization:
wv = df['wv (m/s)'] bad_wv = wv == -9999.0 wv[bad_wv] = 0.0 max_wv = df['max. wv (m/s)'] bad_max_wv = max_wv == -9999.0 max_wv[bad_max_wv] = 0.0 df['wv (m/s)'].min() #convert wind direction and velocity column into a wind vector wv = df.pop('wv (m/s)') max_wv = df.pop('max. wv (m/s)') # Convertion to radians. wd_rad = df.pop('wd (deg)')*np.pi / 180 #wind x and y components. df['Wx'] = wv*np.cos(wd_rad) df['Wy'] = wv*np.sin(wd_rad) # max wind x and y components dataframe. df['max Wx'] = max_wv*np.cos(wd_rad) df['max Wy'] = max_wv*np.sin(wd_rad) #let’s plot plt.hist2d(df['Wx'], df['Wy'], bins=(50, 50), vmax=400) plt.colorbar() plt.xlabel('Wind X [m/s]') plt.ylabel('Wind Y [m/s]') ax = plt.gca() ax.axis('tight')
Let’s convert date time in seconds and convert the signals to sin cos format :
timestamp_s = date_time.map(datetime.datetime.timestamp) day = 24*60*60 year = (365.2425)*day df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day)) df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day)) df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year)) df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))
Plot time of day signal sin and cos function
plt.plot(np.array(df['Day sin'])[:25]) plt.plot(np.array(df['Day cos'])[:25]) plt.xlabel('Time [h]') plt.title('Time of day signal')
Split the data for time series forecasting
column_indices = {name: i for i, name in enumerate(df.columns)} n = len(df) train_df = df[0:int(n*0.7)] val_df = df[int(n*0.7):int(n*0.9)] test_df = df[int(n*0.9):] num_features = df.shape[1]
Data normalization:as it is a crucial step before training your neural network, for normalization we are going to subtract the mean and divide by the standard deviation.
train_mean = train_df.mean() train_std = train_df.std() train_df = (train_df - train_mean) / train_std val_df = (val_df - train_mean) / train_std test_df = (test_df - train_mean) / train_std
Let’s plot the violenplot of all the feature to see if data is biased
df_std = (df - train_mean) / train_std df_std = df_std.melt(var_name='Column', value_name='Normalized') plt.figure(figsize=(12, 6)) ax = sns.violinplot(x='Column', y='Normalized', data=df_std) _ = ax.set_xticklabels(df.keys(), rotation=90)
Data Windowing
In tensorflow, we have to do data windowing of our input dataframe, so that it can be used in further multiple models and we can see which forecast better. Also, rest of this section defines a WindowGenerator class. This class will contain all the logic for the input and label indices.
It also handles the indexes and offset, split window feature into (feauture, labels) pairs and plot the content of resulting window. Also this class will generate batches of these windows from train, test, and evaluation dataset, using tf.data.Dataset.
class WindowGenerator(): def __init__(self, input_width, label_width, shift, train_df=train_df, val_df=val_df, test_df=test_df, label_columns=None): # Store the raw data. self.train_df = train_df self.val_df = val_df self.test_df = test_df # Work out the label column indices. self.label_columns = label_columns if label_columns is not None: self.label_columns_indices = {name: i for i, name in enumerate(label_columns)} self.column_indices = {name: i for i, name in enumerate(train_df.columns)} # Work out the window parameters. self.input_width = input_width self.label_width = label_width self.shift = shift self.total_window_size = input_width + shift self.input_slice = slice(0, input_width) self.input_indices = np.arange(self.total_window_size)[self.input_slice] self.label_start = self.total_window_size - self.label_width self.labels_slice = slice(self.label_start, None) self.label_indices = np.arange(self.total_window_size)[self.labels_slice] def __repr__(self): return '\n'.join([ f'Total window size: {self.total_window_size}', f'Input indices: {self.input_indices}', f'Label indices: {self.label_indices}', f'Label column name(s): {self.label_columns}'])
With the help of above code you can create window of your choice, let’s create a demo window:
w1 = WindowGenerator(input_width=6, label_width=1, shift=1, label_columns=['T (degC)']) w1
Create tensorflow dataset using tf.data.Datasets utilities and create a make_dataset function that will take the time-series dataframe.
def make_dataset(self, data):
data = np.array(data, dtype=np.float32)
ds = tf.keras.preprocessing.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=1,
shuffle=True,
batch_size=32,)
ds = ds.map(self.split_window)
return ds
WindowGenerator.make_dataset = make_dataset
Now WindowGenerator is holding the train, test and validation data, Let’s procede further for training
def make_dataset(self, data): data = np.array(data, dtype=np.float32) ds = tf.keras.preprocessing.timeseries_dataset_from_array( data=data, targets=None, sequence_length=self.total_window_size, sequence_stride=1, shuffle=True, batch_size=32,) ds = ds.map(self.split_window) return ds WindowGenerator.make_dataset = make_dataset
Using Tensorflow Single Step model
inputs(t=0) –> | Model –> | Predictions(t=) | Labels(t=2) |
This model is used when we have this sort of simplest data to forecast and it return a single predicted value(predicting 1hour in future).
As we already setup the WindoowGenerator object, let’s configure it to run for single step model i.e. (input, label) pair.
def make_dataset(self, data): data = np.array(data, dtype=np.float32) ds = tf.keras.preprocessing.timeseries_dataset_from_array( data=data, targets=None, sequence_length=self.total_window_size, sequence_stride=1, shuffle=True, batch_size=32,) ds = ds.map(self.split_window) return ds WindowGenerator.make_dataset = make_dataset
Create baseline class to compare your model outputs with it:
class Baseline(tf.keras.Model): def __init__(self, label_index=None): super().__init__() self.label_index = label_index def call(self, inputs): if self.label_index is None: return inputs result = inputs[:, :, self.label_index] return result[:, :, tf.newaxis]
Evaluate the model:
baseline = Baseline(label_index=column_indices['T (degC)']) baseline.compile(loss=tf.losses.MeanSquaredError(), metrics=[tf.metrics.MeanAbsoluteError()]) val_performance = {} performance = {} val_performance['Baseline'] = baseline.evaluate(single_step_window.val) performance['Baseline'] = baseline.evaluate(single_step_window.test, verbose=0)
Let’s create a wider WindowGenerator that generates window 24h
wide_window = WindowGenerator( input_width=24, label_width=24, shift=1, label_columns=['T (degC)']) wide_window
print('Input shape:', wide_window.example[0].shape) print('Output shape:', baseline(wide_window.example[0]).shape)
Plot baseline model forecasting
wide_window.plot(baseline)
- The blue “inputs” line shows the input temperature at each time step.
- Green “Labels” dots show the prediction value.
- Orange “Prediction” cross is the predictive output by our model.
Conclusion
We discussed time series, time series analysis, components of time series and a code example of doing time series forecasting on a weather dataset by our single-step model and the result were pretty close to accurate, now there are many other models for time series forecasting you can use like Linear model(a layer.dense with no activation is called linear model), Dense, Multistep Dense, Convolutional neural network and recurrent neural network.
We didn’t cover the whole tutorial here which is not possible with one article for reading a full demonstrated explanation. Please refer to the official website of TensorFlow here as now you have a basic understanding of what time-series forecasting is all about! An extended version of code is available here.