Guide To Pastas: A Python Framework For Hydrogeological Time Series Analysis

pastas

Introduction to Pastas

Pastas is an open-source Python framework designed for processing, simulation and analysis of hydrogeological time series models. It has built-in tools for statistically analyzing, visualizing and optimizing such models. It was introduced by Raoul A. Collenteur, Mark Bakker, Ruben Calje, Stijn A. Klop and Frans Schaars in an article published by the National Groundwater Association (NGWA) in July 2019.

Before going into Pastas’ details, let us first understand the core term ‘hydrogeology’ – an area in which the Pastas library marks its significant contribution.

What is hydrogeology?

The word ‘hydrogeology’ is a combination of two terms: ‘hydro’ meaning water and ‘geology’ which means studying the Earth. Hydrogeology is thus a branch of geology dealing with movement and distribution of groundwater in soil and rocks of Earth’s crust (also known as ‘aquifers’). You can read in-depth about this domain here

Why Pastas?

The two major objectives of this state-of-the-art library are:

  • To provide a scientific software package for the development and testing of new hydrogeological methods using a few lines of Python code
  • To provide an efficient and easy-to-handle library for groundwater practitioners.

Pastas employ Transfer Function Noise (TFN) time series models using predefined response functions that define how groundwater level responds to evaporation, rainfall etc..

Installation of Pastas

Pastas can be installed using pip or conda command (using Anaconda Distribution of Python is recommended for the installation step)

pip install pastas   -or-  conda install pastas

If the library has already been installed, it can be updated to use its latest version as:

pip install pastas --upgrade

Prerequisite: Python 3.6 or higher version

Dependencies:

  • jupyter >= 1.0.0
  • numpy >=1.16.5
  • pandas >= 1.0
  • matplotlib >= 1.0
  • scipy >= 1.1 (
  • lmfit >= 1.0.0 (for non-linear optimizations and curve fitting problems in Python)
  • numba (a JIT (Just-In-Time) compiler that converts Python code into optimized machine code)

These dependencies get automatically installed when you instal pastas using pip install manager.

Practical implementation

Here, we have shown two use cases of Pastas library. The demonstrations have been coded in Google colab with Python version 3.7.10.

Step-wise explanation of both the examples is as follows:

Time series analysis using Pastas

  1. Install pastas library

!pip install pastas

  1. Import required libraries 
 import pandas as pd #for data manipulation
 import pastas as ps
 import matplotlib.pyplot as plt  #for visualization 
  1. Check the versions of the dependencies using show_versions().

ps.show_versions()

Sample output:

 Python version: 3.7.10 (default, Feb 20 2021, 21:17:23) 
 [GCC 7.5.0]
 Numpy version: 1.19.5
 Scipy version: 1.4.1
 Pandas version: 1.1.5
 Pastas version: 0.16.0
 Matplotlib version: 3.2.2 
  1. Load head observations

The term ‘head’ here refers to the hydraulic head (also known as ‘static level’) of groundwater flow i.e. the elevation to which water will naturally rise. The file containing the head observations can be downloaded from here.

 head_data = pd.read_csv('https://raw.githubusercontent.com/pastas/
 pastas/master/examples/data/head_nb1.csv', parse_dates=['date'],  
 index_col='date', squeeze=True) 

‘parse_dates’ enables parsing of the ‘date’ column into datetime. ‘index_col’ sets the ‘date’ column as index and ‘squeeze’ parameter squeezes the head observations into a pandas Series object.

 5. Check the initial records and datatype of head_data.

print(‘Data type of head_data:’,type(head_data))

Output: Data type of head_data: <class 'pandas.core.series.Series'>

head(head_data)

Output:

 date
 1985-11-14    27.61
 1985-11-28    27.73
 1985-12-14    27.91
 1985-12-28    28.13
 1986-01-13    28.32
 Name: head, dtype: float64 

6. Plot the head observations

 head_data.plot(style='.', figsize=(12, 4))  #Size of plot
 plt.ylabel('Head (metres)');  #Y-axis label
 plt.xlabel('Year');  #X-axis label 

Output:

7. Load the stresses

The causes of head variations are believed to be rain and evaporation of water. We load the .csv files having data of measured rainfall and measured potential evaporation.

 #Load rain data
 rain = pd.read_csv('https://raw.githubusercontent.com/pastas/
 pastas/master/examples/data/rain_nb1.csv', parse_dates=['date'], index_col='date', squeeze=True)
 #Initial records
 rain.head() 

Output:

 date
 1980-01-01    0.0033
 1980-01-02    0.0025
 1980-01-03    0.0003
 1980-01-04    0.0075
 1980-01-05    0.0080
 Name: rain, dtype: float64 
 #Load evaporation data
 evap = pd.read_csv('https://raw.githubusercontent.com/pastas/
 pastas/master/examples/data/evap_nb1.csv', parse_dates=['date'], index_col='date', squeeze=True)
 #Initial records
 evap.head() 

Output:

 date
 1980-01-01    0.0002
 1980-01-02    0.0003
 1980-01-03    0.0002
 1980-01-04    0.0001
 1980-01-05    0.0001
 Name: evap, dtype: float64 

8. Plot the rain and evaporation time series

 plt.figure(figsize=(12, 4)) #Size of plot
 rain.plot(label='rain') #Title for rain plot
 evap.plot(label='evap') #Title for evaporation plot
 plt.xlabel(‘Year') #X-axis label
 plt.ylabel('Rainfall/Evaporation (m/d)') #Y-axis label
 plt.legend(loc='best'); #Location of legend 

Output:

9. Calculate and plot recharge

The term ‘groundwater recharge’ (also known as ‘deep percolation’ or ‘deep drainage’) refers to the hydraulic process in which water moves downwards from surface water to groundwater.

We approximate recharge as the difference between rain and evaporation level.

 recharge = rain - evap
 #Plot the recharge time series
 plt.figure(figsize=(12, 4)) #Size of plot
 recharge.plot()
 plt.xlabel('Year') #X-axis label
 plt.ylabel('Recharge (m/d)') #Y-axis label 

10. Time series model creation

Initiate a pastas time series model using Model class

model = ps.Model(head_data)

Create a stress model which translates input time series (here #‘recharge’) into a contribution explaining part of the output time series

stressModel = ps.StressModel(recharge, ps.Gamma, name='groundwater recharge', settings='prec')

For each stress, a StressModel object is created. The first parameter is the name of the time series of the stress. The second argument is the response function used, ‘name’ specifies just a name given to the series and ‘prec’(stands for ‘precipitation’) is the kind of series.

Add the stress model to the time series model

model.add_stressmodel(stressModel)

Solve model for specified time period (tmin to tmax).Different available solvers can be found here.

model.solve(tmin='1990', tmax='2015')

Sample output:

11. Plot the model

  • Code source
  • Google colab notebook of above implementation can be found here.

Analyze effect of pumping well using Pastas

Here, a TFN model is created first with the net groundwater recharge as the single stress explaining the observed heads.Then, the model is extended to include the effect of adding a pumping well on the heads by adding another stress model. A comparison is made between the simulated heads in presence and absence of the pumping well.The addition of the pumping well is clearly found to have improved the simulation of the heads.

(1)-(3) Perform steps (1) to (3) as explained in the above time series analysis demo first. Then, proceed as follows:

4. Load the datasets containing information about head observations, rainfall, evaporation and   pumping extraction rate available here.

 #Head observations
 head = pd.read_csv("https://raw.githubusercontent.com/pastas/
 pastas/master/examples/notebooks/data_notebook_5/head_wellex.csv",
 index_col="Date", parse_dates=True, squeeze=True)
 #Rainfall data
 rain =  pd.read_csv("https://raw.githubusercontent.com/pastas/
 pastas/master/examples/notebooks/data_notebook_5/prec_wellex.csv",
 index_col="Date", parse_dates=True)
 #Evaporation data
 evap =  pd.read_csv("https://raw.githubusercontent.com/pastas/
 pastas/master/examples/notebooks/data_notebook_5/evap_wellex.csv",
 index_col="Date", parse_dates=True)
 #Pumping well extraction rate
 well =  pd.read_csv("https://raw.githubusercontent.com/pastas/
 pastas/master/examples/notebooks/data_notebook_5/well_wellex.csv",
 index_col="Date", parse_dates=True) 
  1. Plot the four time series defined in the previous step
 fig, ax = plt.subplots(4, 1, sharex=True, figsize=(10,5));
 ax[0].plot(head, label=head.name, linestyle=" ", marker=".", markersize=2)
 ax[0].legend()
 ax[1].plot(rain, label="rain")
 ax[1].legend()
 ax[2].plot(evap, label="evap")
 ax[2].legend()
 ax[3].plot(well, label="well")
 ax[3].legend() 

Output:

  1. Pastas time series model creation

model = ps.Model(head, name="groundwater")

Add the stress model for the net groundwater recharge simulated using RechargeModel class

rechargeModel = ps.RechargeModel(rain, evap, name="groundwater recharge",rfunc=ps.Exponential, recharge=ps.rch.Linear())

‘rfunc’ is the response function used. ps.rch.Linear() is the linear model used for computing precipitation excess.

 model.add_stressmodel(rechargeModel)
 model.solve(noise=True) #Solve the model
 model.plot(figsize=(10,4)) #Plot the model 

Output plot:

  1. Add the stress model for pumping well
 stressModel = ps.StressModel(well/1e6, rfunc=ps.Gamma, name="pumping well",  
 settings="well", up=False)
 model.add_stressmodel(stressModel) 

Solve the model 

model.solve()

Plot the decomposition of the time series in various stresses

 axes = model.plots.decomposition(figsize=(10,8))
 axes[0].plot(simulation) # Add the previously simulated values to the plot 

Output plot:

  1. Residuals analysis

Compute residuals time series using residuals() method

model.residuals().plot(figsize=(20, 8))

Plot the residuals series

 residual.plot() 
 plt.legend(["With well", "Without well"]) #Legend of plot 

Output:

References

To dive deeper into the Pastas framework, refer to the following sources:

Download our Mobile App

Nikita Shiledarbaxi
A zealous learner aspiring to advance in the domain of AI/ML. Eager to grasp emerging techniques to get insights from data and hence explore realistic Data Science applications as well.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.