Here’s Why Data Scientists Shouldn’t Rely Too Much On P-Values In Machine Learning Experiments



Statistical concepts go hand-in-hand with machine learning, but may not always fulfil capabilities to the latter. At times, machine learning models cannot perform better if certain statistical revisions are made in them. Then again, it is open to interpretation and depends on the problem which the ML algorithms aim to solve. In this article, we consider a specific case called ‘p-values’ in statistics and discuss how it affects machine learning in general.

What Do P-Values Signify

In statistics, hypothesis tests are conducted to check whether the inference made from the population holds good or not in an experiment. The hypothesis tests are mainly categorised into two types, null hypothesis and alternate (sometimes called the alternative) hypothesis. The null hypothesis, which is the foundation for any statistical experiment, establishes no statistical relation for the observations collected in the sample or the population. In null hypothesis, it is generally accepted if no contradictory argument is found. Alternate hypothesis provides the basis for rejecting the null hypothesis. In simple words, it is the alternative statement to the null hypothesis.

Now, the p-value is used to factually assess the strength of both null and alternate hypothesis. P-values are decimal numbers between 0 and 1, which serves as a probabilistic reference to weigh the hypothesis. Sometimes, it is also expressed in a percentage format. Typically, a small p-value (less than 0.05) suggests that null hypothesis is to be rejected while a large p-value (greater than 0.05) denotes that null hypothesis is to be accepted due to lack of counter proposition against it. Values equal to or nearer to 0.05 denote that experimenters can take their own call.

Contradiction In P-Values

Many times, p-values are wrongly interpreted. They are sometimes considered probability values themselves for the experiment, without taking hypothesis testing into account. This will certainly lead to incorrect conclusions for the experiment in the statistical context. Another instance would be studying different variables under a project in statistics, for example, in regression analysis. If the variables are not correctly selected, the analysis would be void.

When analysing with p-values, experimenters should have an idea of what is to be tested that lies ahead of time, because, once the p-values are set into effect, it is difficult to get the same statistical sense if they are manipulated later. In fact, the assumptions in the theory for deriving p-values are partly misleading sometimes, which makes it unsuitable for machine learning applications because statistical significance will be diminished.

In a journal paper by Mark Schervish, a professor at the Carnegie Mellon University, says that p-values are logically flawed when they are used informally, without giving much thought to statistical considerations. In the paper titled P Values: What They Are And What They Are Not, Schervish presents an argument that p-values are continuous in value as opposed to a definite value (as mentioned earlier). The study examines point-null and one-sided hypothesis to prove that p-values are continuous in function. He asserts, “Just as the point-null and one-sided hypotheses are limits of interval hypotheses, so too are their P values limits of the P values of the interval hypotheses for every data value. This observation allows us to think of point-null hypotheses as approximations to interval hypotheses”.

Also, p-values do not act as statistical support because they rarely satisfy all the same criteria for multiple statistical comparisons. Therefore, p-values are not viable for machine learning models due to the fact that data is always continuous, and can change statistical inferences made in the models.

Using Bayesian Approach

For a machine learning environment, the Bayesian approach works well because this approach deals with probability distributions rather than devising a hypothesis and its subsequent testing. Unlike p-values, the Bayesian approach has a subjective perspective, wherein the experimenter can acknowledge the reason for choosing a specific probability distribution, and can also make updates according to the statistical experiment. On top of that, the approach provides an easier way to depict data values visually, which can bring more information into the context.


Machine learning projects generally take care of all the statistics before they are deployed into practice. There are a host of techniques such as dimensionality reduction, and principal component analysis (PCA), among others that take care of assumptions in data for machine learning. Incorporating statistical concepts such as hypothesis testing and p-values on an already well-set machine learning model might lead to increased complexity in deducing data. Ultimately, p-values show their significance only when there are fewer parameters or variables involved in the experiments or projects.

Download our Mobile App

Abhishek Sharma
I research and cover latest happenings in data science. My fervent interests are in latest technology and humor/comedy (an odd combination!). When I'm not busy reading on these subjects, you'll find me watching movies or playing badminton.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring