# A Beginner’s Guide to Time Series Modelling Using PyCaret

When it comes to determining whether a business will succeed or fail, time is the most important factor.

When it comes to determining whether a business will succeed or fail, time is the most important factor. Pre-processing, trend normalization, and, most importantly, a cross-check of all available algorithms take time when building a robust forecasting model from scratch. There are a variety of AutoML tools in the market that allow us to perform modelling on raw data with just a few lines of code, saving our time. However, in this article, we will concentrate on PyCaret, an AutoML tool. Time series modelling necessitates special treatment due to the presence of its component, which PyCaret provides. The main points to be discussed in this article are given in the below table of contents.

1. What is Time Series Modelling?
2. All About PyCaret
3. Quick Start Modelling with PyCaret

Let’s start the discussion by understanding what time series modelling is.

#### What is Time Series Modelling

A time series is a quantity that is measured over time in a progressive manner. In its broadest sense, time series analysis attempts to forecast what will happen in the future by inferring what happened to a set of data points in the past. We will, however, approach time series quantitatively by assuming that they are realizations of random variable sequences.

#### THE BELAMY

##### Sign up for your weekly dose of what's up in emerging technology.

To put it another way, we will assume that our time series are generated by an underlying generating process based on one or more statistical distributions from which these variables are drawn. The goal of time series analysis is to learn about the past while also predicting the future.

Time series data is naturally organized chronologically. Time series analysis differs from cross-sectional research in which the observations are not naturally ordered (for example, explaining people’s earnings by reference to their educational degrees, where the individuals’ data might be input in any order).

Time series analysis varies from spatial data analysis in that the observations are typically linked to specific physical places (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses).

In general, a stochastic model for a time series will represent the fact that observations near in time are more tightly related than ones further away. Furthermore, time series models commonly use the inherent one-way ordering of time to express values for a given period as derived from past rather than future values.

#### All About PyCaret

PyCaret is a low-code machine learning library and end-to-end model management solution for automating machine learning workflows that are created in Python. It is quite popular due to its ease of use, simplicity, and ability to quickly and efficiently construct and deploy end-to-end machine learning prototypes.

PyCaret is a low-code alternative that may be used to replace hundreds of lines of code with just a few lines. As a result, the experiment cycle becomes exponentially faster and more efficient.

PyCaret is a straightforward and straightforward application. All of PyCaret’s operations are saved in a Pipeline, which is fully automated and ready for deployment. PyCaret automates everything from missing values through one-hot encoding, categorical data transformation, feature engineering, and hyperparameter tuning.

In this section, we will leverage the power of PyCaret to model Time Series Data. The dataset used is of climate parameters such as temperature, humidity, wind pressure, and an atmospheric pressure of a city in Delhi. All the instances are recorded from the year 2013 to 2017 and it is taken from this Kaggle repository.

To work with PyCaret you need to install it via simply pip command like! pip install pycaret[full] which installs all the core functionality of the package. Now let’s take a look at the dataset.

```import pandas as pd
data = pd.read_csv('/content/DailyDelhiClimateTrain (2).csv')
data
```

As you know, the time series forecasting or modelling is a kind of regression model for that we can use PyCaret’s regression module to deal with it.

The PyCaret Regression Module is a supervised machine learning module that computes the relationships between one or more independent variables and one or more dependent variables (also known as the “outcome variable”, or “target”). Regression is used to anticipate continuous data such as sales amount, quantity, temperature, and customer count. The setup function in PyCaret provides a number of pre-processing options for preparing data for modelling across all modules.

The default settings of PyCaret’s Regression module are not ideal for time series data because they involve a few data preparation steps that are not valid for ordered data (data with a sequence such as time series data).

Time-series data necessitates a different type of cross-validation because the order of dates must be preserved. When evaluating models, the PyCaret regression module employs k-fold random cross-validation by default. For time-series data, the default cross-validation setting is insufficient. Because algorithms cannot deal with dates directly, we have extracted some simple features from dates, such as month and year, and removed the original date column.

Now here we are using only one feature from our data because, at the time of predicting the values of the features, we also need to supply close realistic values for those independent variables which are hard to simulate the same as data we have. But we can also extend it to multiple features prediction as usual by holding a test split from data but make sure you do not shuffle the data as it is a time series problem.

```data['month'] = [i.month for i in data['date']]
data['year'] = [i.year for i in data['date']]
data['day_of_week'] = [i.dayofweek for i in data['date']]
data['day_of_year'] = [i.dayofyear for i in data['date']]
```

Let’s split data for train and test.

```train = data[data['year'] < 2016]
test = data[data['year'] >= 2016]```

Now we will explicitly pass the training data, test data, and cross-validation strategy to the setup function via the fold strategy parameter.

```from pycaret.regression import *
# initialize setup
Setup_ = setup(data = train, test_data = test, target = ['meantemp'], fold_strategy = 'timeseries', numeric_features = ['year','month','day_of_week','day_of_year'], fold = 3, transform_target = True, session_id = 123)```

Now let’s train and validate all the available models:

`best = compare_models(sort = 'MAE')`

PyCaret does provide a detailed report of all the models, same as other AutoML tools. We have sorted the result based on Mean Absolute Error and based on that Gradient Booster regressor outperforms the rest not only w.r.to MAE but also with all other metrics.

Technically speaking, we have trained these models on only features and those retrieved from simple date format. In order to make a future prediction, we need to simulate the same four attributes from the date format.

As you have seen, the last observation of our data is on 2017-01-01, now we are going to create some future instances up to 2019-01-01 nearly for 2 years. We are expecting that the model should predict the same trend as it has seen previously.

```future_dates = pd.date_range(start = '2017-01-02', end = '2019-01-01', freq = 'D')
future_df = pd.DataFrame()
future_df['month'] = [i.month for i in future_dates]
future_df['year'] = [i.year for i in future_dates]
future_df['day_of_week'] = [i.dayofweek for i in future_dates]
future_df['day_of_year'] = [i.dayofyear for i in future_dates]```

Now let’s finalize the model and make the predictions.

```final_best = finalize_model(best)
predictions_future = predict_model(final_best, data=future_df)
```

Now we will visualize the result using Plotly express. Plotly is an open-source library used widely to create beautiful and more insightful visualization. To know more about the Plotly visualization, you can refer to this article.

```concat_df = pd.concat([data,predictions_future], axis=0)
concat_df_i = pd.date_range(start='2013-01-01', end = '2019-01-01', freq = 'D')
concat_df.set_index(concat_df_i, inplace=True)
fig = px.line(concat_df, x=concat_df.index, y=["meantemp", "Label"])
fig.show()
```

What do you think by observing the above result? I would say it is the more beautiful plot I have ever seen. Because the models have mostly identified major trends in data especially the peak trend and downtrend.

#### Conclusion

Technically we have done the univariate modelling, but it resembles multivariate because we have used four features like day, month, year, and day of the year. As we discussed earlier, for multivariate modelling, we can pass multiple features inside the setup function for the attribute numeric_features.

Through this post, we have discussed what time series modelling is, and in contrast to a low code base, we practically see how an AutoML tool like PyCaret can be used to perform outstanding modelling.

## More Great AIM Stories

### Is Age Discrimination in Tech for Real?

Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

## AIM Upcoming Events

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 10th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Top BI tools for Mainframes

Without BI, organisations will not be able to dominate with data-driven decision-making but focus on experiences, intuition, and gut feelings.

### Interview with Alice Wong, Founder and Principal data scientist, Hyperplane Consulting

Some common microaggressions that women face are automatically assuming the male peer or junior teammate is the female’s manager, paying attention to someone’s words only when a male says them, even after a female has already said the exact same thing.

### Why are social media platforms obsessed with NFTs

Meta wants to be the go-to marketplace to buy and sell NFTs.

### These modern researches aim to make AI similar to human intelligence

There have been several types of research in modern AGI focussed on building machines that are capable of behaving like us humans.

### Emerging trends in low-code/no-code platforms in AI

Before launching Amazon Sagemaker Canvas, AWS rolled out two no-code & low-code services.

### How oscillatory activation function overcomes problems with gradient descent and XOR

In this paper, the researchers have discovered and introduced many oscillating functions that could solve the XOR problem with a single neuron.

### Behind NVIDIA’s latest image editing tool called EditGAN

EditGAN allows users to edit desired images with simple commands like drawing without compromising the original imag

### Council Post: How to develop a comprehensive AI governance & ethics function

Though the definitions of AI governance vary, the basic tenet remains the same – building trust in AI systems.

### Meta gives away a free video dataset of 846 hours

The Casual Conversations dataset comprises 846 hours of 45,000 videos, each up to a minute long on average.

### 3D animation using AI: Behind Plask

This free AI-powered 3D animation editor and mocap tool will completely change the way we edit our videos.