Time series data is a type of data that changes over a time period. The sales data of a company does not remain the same for every year, sometimes it’s higher than the previous year, and sometimes it’s lower. Similarly, we see that stock prices are always changing.
Although it is not easy to predict the time series data due to various factors on which it depends still Python has different machine learning models that can be used to analyze and predict the time-series data.
PyFlux is a library for time series analysis and prediction. We can choose from a flexible range of modeling and inference options, and use the output for forecasting. PyFlux has most of the time series prediction models such as ARIMA, Garch, etc. predefined we just need to call the model we need to analyze.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
In this article, we will explore PyFlux and the features that are present in PyFlux for time series analysis and prediction.
Implementation
We will start by installing PyFlux by pip install PyFlux.
- Importing required libraries
Foe exploring PyFlux we will be analyzing the stock data, we will download the stock data from ‘Yahoo’ using Pandas DataReader and the ticker for the respective stock. Let us import the libraries required.
from pandas_datareader.data import DataReader
import matplotlib.pyplot as plt
import PyFlux as pf
- Downloading the data
We will be using the Microsoft stock data for this article, we can download it using Pandas DataReader and Yahoo. The stock symbol for Microsoft is MSFT.
msft = DataReader('MSFT', 'yahoo', datetime(2000,6,1), datetime(2020,6,1))
msft.head()
- Calculating the Stock Returns
We need to calculate the return of the share and store it in a data frame named returns. This data frame will only contain the return and the date column.
# Finding the returns
returns = pd.DataFrame(np.diff(np.log(msft['Adj Close'].values)))
#Setting date column as index
returns.index = msft.index.values[1:msft.index.values.shape[0]]
returns.columns = ["Returns"]
returns.head()
- Visualizing the Data
We will analyze the returns using the Matplotlib visualization library.
plt.figure(figsize=(15, 5))
plt.ylabel("Returns")
plt.plot(returns)
plt.show()
Similarly, we will use PyFlux for visualizing the ACR(Auto Correlation) Plot.
pf.acf_plot(returns.values.T[0])
- Return analysis using different Models
Now we will create and analyze different models and predict returns accordingly. PyFlux supports different models but we will mainly focus on GARCH and ARIMA.
- GARCH Model
Generalized Autoregressive Conditional Heteroskedasticity (GARCH) is a model that is used to analyze different types of financial data. It is used to estimate the volatility of the stock returns etc.
We will start by creating a GARCH model which is predefined in PyFlux.
gar_model = pf.GARCH(p=1, q=1, data=returns)
The above statement defines our model with ‘p’ = the number of autoregressive lags and ‘q’ = the number of ARCH terms.
Now we will fit the model and use the summary function to display the summary of the Model.
gar = gar_model.fit()
gar.summary()
The next step is visualizing the fit over a chart.
gar_model.plot_fit(figsize=(15,5))
Here we can clearly visualize the estimated data with the actual data and how it picks up the volatility in the data.
The next step is visualizing the Prediction by the model. Here we will pass the ‘h’ parameter which defines the number of steps to be forecasted ahead.
gar_model.plot_predict(h=20, figsize=(15,5))
Here we can analyze the prediction which is given by the GARCH Model.
- ARIMA Model
ARIMA stands for AutoRegressive Integrated Moving Average. It is a class of model that works on predicting the time series data based on the previous data given. It is pre-defined in PyFlux we just need to call it.
Let us create the ARIMA model by defining the Autoregressive lags and Moving Average lags. The family is the distribution of the time series which we will be using as pf.normal. We are trying to predict ‘Returns’ so our target value is ‘Returns’.
arm_model = pf.ARIMA(data=returns, ar=4, ma=4, target='Returns', family = pf.Normal())
Similar to the GARCH model we will fit this model with our data and analyze the summary using summary function. The latent_variable attribute we will use here can be ‘M-H’ or ‘MLE’, we will be using’ MLE’
arm = model.fit("MLE")
arm.summary()
Now similar to the steps followed for the GARCH model we will visualize the fit plot and the plot of the predicted values.
arm_model.plot_fit(figsize=(15,8))
We will predict the future values with ‘past values = 200’ and 20 steps ahead for forecasting i.e value of h=20.
arm_model.plot_predict(h=20,past_values=50,figsize=(15,5))
Here we can clearly analyze the forecasting of the returns on the Microsoft Stock using the ARIMA Model defined under PyFlux.
Conclusion:
In this article, we have learned about PyFlux an open-source python library used for Time series prediction. We saw how PyFlux makes it easier for us to select different models and analyze results given by those models. Here we have discussed GARCH and ARIMA model, PyFlux contains a variety of os other models also which we can use for time series analysis and prediction.