Guide to Time Series Forecasting using Tensorflow Core

Time series refers to plotting data points in sequential time order. Now those data points can use a data of an athlete’s performance, cricket player according to most run in one-day, weather reading every month, the daily closing price of company stock. Time series analysis is also the same term, but it is concerned with taking that data-points and cleaning, understanding, and forecasting them using some tools or programming languages. Now time series is sometimes called panel data. Panel data is a general class, multidimensional dataset, on the side time series dataset is a one-dimensional panel.

Let’s talk about Time series forecasting as we already know that time series analysis is all about analyzing the time series data and extracting meaningful insights from it.

Time-series Forecasting is more of using models to predict future values based on previously observed cleaned processed time series data.

Introduction to Data Visualization Vol. 8 - Time Series Trend - Percentage  Differences

Components of Time Series

There are four categories of a component of time series: Trend, Seasonal & Cycle Variation, and Random or Irregular movements. Seasonal changes are more of a short time change.

Image for post
https://www.toppr.com/guides/business-mathematics-and-statistics/time-series-analysis/components-of-time-series/
  1. Trends show the insights about higher or the lower peak in any dataset.
Image for post
  1. Periodic fluctuations are the type of time series which shows repetition in their visualization over a while. They are of two types:
  1. Seasonal Variations: These periodic fluctuations change over a regular period, and change happens in less than a year
  2. Cyclic Variations: These periodic fluctuations changes over more than one year of the time cycle.
Periodic fluctuations in the North Atlantic Oscillation index for... |  Download Scientific Diagram
  1. Random Movement time series or Noise: In this data points are unpredictable, and it hard to make a time series forecasting on these kinds of data because we can’t find patterns easily.
Time Series Forecasting using TensorFlow | by Aryan Pegwar | Analytics  Vidhya | Medium

Real world data before cleaning always has some noise, trends, and seasonality.

Tensorflow models for forecasting

Now time series forecasting or predictive modeling can be done using any framework, TensorFlow provides us a few different styles of models for like Convolution Neural Network (CNN), Recurrent Neural Networks (RNN), you can forecast a single time step using a single feature or you can forecast multiple steps and make all predictions at once using Single-shot.

Setup

The necessary module you need import to get started they will help you in modeliing, visulization, file handling, data exploration and all sort of thing.

import os
import datetime
import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False

Lets’s take the Weather dataset from Max Planck Institute for Biogeochemistry , this dataset contains 14 different feature: air temperature, humidity, atmospheric pressure. From 2003 these datapoints were collected on basis of every 10 minute. Let’s explore the dataset:

#download the zip file of dataset
file_path = tf.keras.utils.get_file(
origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
    fname='jena_climate_2009_2016.csv.zip',
    extract=True)
csv_path, _ = os.path.splitext(file_path)
#explore the dataset
df = pd.read_csv(csv_path)
df = data[5::6]
date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S')
df.head()
#Data visualization over the years with some features
plot_cols = ['T (degC)', 'p (mbar)', 'rho (g/m**3)']
plot_features = df[plot_cols]
plot_features.index = date_time
_ = plot_features.plot(subplots=True)
plot_features = df[plot_cols][:480]
plot_features.index = date_time[:480]
_ = plot_features.plot(subplots=True)
png

Let’s clean data for better modelling and visualization:

wv = df['wv (m/s)']
bad_wv = wv == -9999.0
wv[bad_wv] = 0.0
max_wv = df['max. wv (m/s)']
bad_max_wv = max_wv == -9999.0
max_wv[bad_max_wv] = 0.0
df['wv (m/s)'].min()
#convert wind direction and velocity column into a wind vector
wv = df.pop('wv (m/s)')
max_wv = df.pop('max. wv (m/s)')
# Convertion to radians.
wd_rad = df.pop('wd (deg)')*np.pi / 180
#wind x and y components.
df['Wx'] = wv*np.cos(wd_rad)
df['Wy'] = wv*np.sin(wd_rad)
# max wind x and y components dataframe.
df['max Wx'] = max_wv*np.cos(wd_rad)
df['max Wy'] = max_wv*np.sin(wd_rad)
#let’s plot
plt.hist2d(df['Wx'], df['Wy'], bins=(50, 50), vmax=400)
plt.colorbar()
plt.xlabel('Wind X [m/s]')
plt.ylabel('Wind Y [m/s]')
ax = plt.gca()
ax.axis('tight')
png

Let’s convert date time in seconds and convert the signals to sin cos format :

timestamp_s = date_time.map(datetime.datetime.timestamp)
day = 24*60*60
year = (365.2425)*day
df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day))
df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day))
df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year))
df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))

Plot  time of day signal sin and cos function

plt.plot(np.array(df['Day sin'])[:25])
plt.plot(np.array(df['Day cos'])[:25])
plt.xlabel('Time [h]')
plt.title('Time of day signal')
png

Split the data for time series forecasting

column_indices = {name: i for i, name in enumerate(df.columns)}
n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]
num_features = df.shape[1]

Data normalization:as it is a crucial step before training your neural network, for normalization we are going to subtract the mean and divide by the standard deviation.

train_mean = train_df.mean()
train_std = train_df.std()

train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std

Let’s plot the violenplot of all the feature to see if data is biased 

df_std = (df - train_mean) / train_std
df_std = df_std.melt(var_name='Column', value_name='Normalized')
plt.figure(figsize=(12, 6))
ax = sns.violinplot(x='Column', y='Normalized', data=df_std)
_ = ax.set_xticklabels(df.keys(), rotation=90)
png

Data Windowing

In tensorflow, we have to do data windowing of our input dataframe, so that it can be used in further multiple models and we can see which forecast better. Also, rest of this section defines a WindowGenerator class. This class will contain all the logic for the input and label indices.

It also handles the indexes and offset, split window feature into (feauture, labels) pairs and plot the content of resulting window. Also this class will generate batches of these windows from train, test, and evaluation dataset, using tf.data.Dataset.

class WindowGenerator():
  def __init__(self, input_width, label_width, shift,
               train_df=train_df, val_df=val_df, test_df=test_df,
               label_columns=None):
    # Store the raw data.
    self.train_df = train_df
    self.val_df = val_df
    self.test_df = test_df

    # Work out the label column indices.
    self.label_columns = label_columns
    if label_columns is not None:
      self.label_columns_indices = {name: i for i, name in
                                    enumerate(label_columns)}
    self.column_indices = {name: i for i, name in
                           enumerate(train_df.columns)}

    # Work out the window parameters.
    self.input_width = input_width
    self.label_width = label_width
    self.shift = shift

    self.total_window_size = input_width + shift

    self.input_slice = slice(0, input_width)
    self.input_indices = np.arange(self.total_window_size)[self.input_slice]

    self.label_start = self.total_window_size - self.label_width
    self.labels_slice = slice(self.label_start, None)
    self.label_indices = np.arange(self.total_window_size)[self.labels_slice]

  def __repr__(self):
    return '\n'.join([
        f'Total window size: {self.total_window_size}',
        f'Input indices: {self.input_indices}',
        f'Label indices: {self.label_indices}',
        f'Label column name(s): {self.label_columns}'])

With the help of above code you can create window of your choice, let’s create a demo window:

w1 = WindowGenerator(input_width=6, label_width=1, shift=1,
                     label_columns=['T (degC)'])
w1

Create tensorflow dataset using tf.data.Datasets utilities and create a make_dataset function that will take the time-series dataframe.

def make_dataset(self, data):
  data = np.array(data, dtype=np.float32)
  ds = tf.keras.preprocessing.timeseries_dataset_from_array(
      data=data,
      targets=None,
      sequence_length=self.total_window_size,
      sequence_stride=1,
      shuffle=True,
      batch_size=32,)
  ds = ds.map(self.split_window)
  return ds
WindowGenerator.make_dataset = make_dataset

Now WindowGenerator is holding the train, test and validation data, Let’s procede further for training

def make_dataset(self, data):
  data = np.array(data, dtype=np.float32)
  ds = tf.keras.preprocessing.timeseries_dataset_from_array(
      data=data,
      targets=None,
      sequence_length=self.total_window_size,
      sequence_stride=1,
      shuffle=True,
      batch_size=32,)
  ds = ds.map(self.split_window)
  return ds
WindowGenerator.make_dataset = make_dataset

Using Tensorflow Single Step model

inputs(t=0) –>Model –>Predictions(t=) Labels(t=2)
Flow chart single step models

This model is used when we have this sort of simplest data to forecast and it return a single predicted value(predicting 1hour in future).

As we already setup the WindoowGenerator object, let’s configure it to run for single step model i.e. (input, label) pair.

def make_dataset(self, data):
  data = np.array(data, dtype=np.float32)
  ds = tf.keras.preprocessing.timeseries_dataset_from_array(
      data=data,
      targets=None,
      sequence_length=self.total_window_size,
      sequence_stride=1,
      shuffle=True,
      batch_size=32,)
  ds = ds.map(self.split_window)
  return ds
WindowGenerator.make_dataset = make_dataset

Create baseline class to compare your model outputs with it:

class Baseline(tf.keras.Model):
  def __init__(self, label_index=None):
    super().__init__()
    self.label_index = label_index

  def call(self, inputs):
    if self.label_index is None:
      return inputs
    result = inputs[:, :, self.label_index]
    return result[:, :, tf.newaxis]

Evaluate the model:

baseline = Baseline(label_index=column_indices['T (degC)'])
baseline.compile(loss=tf.losses.MeanSquaredError(),
                 metrics=[tf.metrics.MeanAbsoluteError()])
val_performance = {}
performance = {}
val_performance['Baseline'] = baseline.evaluate(single_step_window.val)
performance['Baseline'] = baseline.evaluate(single_step_window.test, verbose=0)

Let’s create a wider WindowGenerator that generates window 24h 

wide_window = WindowGenerator(
    input_width=24, label_width=24, shift=1,
    label_columns=['T (degC)'])

wide_window
One prediction 1h into the future, ever hour.
print('Input shape:', wide_window.example[0].shape)
print('Output shape:', baseline(wide_window.example[0]).shape)

Plot baseline model forecasting

wide_window.plot(baseline)
png
  1. The blue “inputs” line shows the input temperature at each time step.
  2. Green “Labels” dots show the prediction value.
  3. Orange “Prediction” cross is the predictive output by our model.

Conclusion

We discussed time series, time series analysis, components of time series and a code example of doing time series forecasting on a weather dataset by our single-step model and the result were pretty close to accurate, now there are many other models for time series forecasting you can use like Linear model(a layer.dense with no activation is called linear model), Dense, Multistep Dense, Convolutional neural network and recurrent neural network.

We didn’t cover the whole tutorial here which is not possible with one article for reading a full demonstrated explanation. Please refer to the official website of TensorFlow here as now you have a basic understanding of what time-series forecasting is all about! An extended version of code is available here

Download our Mobile App

Mohit Maithani
Mohit is a Data & Technology Enthusiast with good exposure to solving real-world problems in various avenues of IT and Deep learning domain. He believes in solving human's daily problems with the help of technology.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring