Petroleum Exploration Can Now Be Fuelled By Machine Learning Algorithms

Geology has evolved from contemporary techniques such as remote sensing, geographic information systems, relative dating methods and radiometric dating; to machine learning and statistical analysis disciplines. These new technologies come in handy, especially with respect to various aspects of imagery and natural resources, because they help oil and gas extraction companies to make the best use of resources.

In this article, we will explore a specific area in petroleum exploration called horizontal shale well estimated ultimate recovery (EUR) determination developed by Abhishek Gaurav at Texas Standard Oil LLC, which integrates ML into the process. The goal is to improve production when it comes to extracting oil or gas from shale wells. In addition, a type-curve determination is done if the production parameters holds good by comparing various literature reviews.


Sign up for your weekly dose of what's up in emerging technology.

Estimated Ultimate Recovery (EUR) For Improving Production

The term ‘Estimated Ultimate Recovery’ (EUR) refers to the method which gives an estimate of how much oil or gas can be extracted from the oil well or an oil reserve. For this work, the EUR was determined for horizontally-structured shale wells around the Texas region (Permian basin) called the Wolfcamp formation where the wells are approximately 1,000 feet thick and are segregated into layers (called as ‘benches’). The methodology, as described by the author as:

“The objective is to select only those wells that have optimized completions for each bench and potentially develop separate type-curves for each bench. The first step is determining in advance which well parameters may be important. This involved consideration of other researchers’ work and the general direction in which the industry is headed (more proppant, longer laterals, and landing depths). The second step is identifying pre-existing correlations between well parameters—for instance, ppg and fluid/ft should ideally be correlated. These pre-existing correlations deteriorate the quality of identified relationships between the performance and variables. The ranges of selected variables were evaluated for quality-control purposes.

If there were obvious outliers that could have been a result of an error in reporting, they were removed. Because of the unavailability of petrophysical-properties maps at the time of initial assessment of the assets, they were not included in this data analysis. However, once the identified pattern revealed a relationship between better production and landing depths, the local specialized shale logs were evaluated to determine the difference in petrophysical properties among different benches.”

For this study, a total of 250 shale wells were considered to observe patterns. The statistical method used was multivariate analysis due to the presence of many variables such as well area, well thickness, production quantity and so on. Moreover, the statistical parameters for the study were a six-month production output in line with a lateral length of 10,500 feet. The lateral length of the well forms the ‘Y’ (dependent variable) for the multilinear regression in the context.

Machine Learning In EUR

Preliminary statistical analysis which is known as the hypothesis testing, is performed prior to applying machine learning algorithms. The parameter data is plotted to obtain a multivariate plot for visual representation. This helps in establishing patterns within those wells and see which one produces more or produces less shale oil and gas. The ML algorithms are performed on all the wells which are classified according to tiers (a total of 5 tiers). These algorithms give quicker results and identify wells suitable for more shale oil/gas production with respect to the plot. Except for a few wells which fell under two different tiers, most of them fared well.


Although this is a new study under geology particularly in petroleum exploration, ML and Statistics are yet to achieve a significant success. It should be noted these areas only provide useful results and assess performance as long as the data and variables stay within the operating range. Furthermore, this study was conducted at an oil-rich region concentrated on a specific area. It may not yield the same EUR at different areas of the same region. On top of that, the study might not be feasible in different countries since they might not have the same resources and are not applicable based on the variables for consideration under this study.


More Great AIM Stories

Abhishek Sharma
I research and cover latest happenings in data science. My fervent interests are in latest technology and humor/comedy (an odd combination!). When I'm not busy reading on these subjects, you'll find me watching movies or playing badminton.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM