Last updated November 20, 2021
In AI Origins & Evolution

Myths & Misconceptions About P-Value

Despite its seemingly simple definition, the interpretation of p-value is considered to be extraordinarily difficult because it is not a part of any formal system of statistical inference.

Share

Published on November 22, 2021

by Shraddha Goled

P-value, or probability value, is an important concept in statistics. It is a number that describes how likely it is that a particular data might have occurred by mere chance. Despite its seemingly simple definition, the interpretation of p-value is considered to be extraordinarily difficult because it is not a part of any formal system of statistical inference. Because of this, it is often observed that the inferential meaning of p-value is often misconstrued. One of the major drawbacks of these widely circulated misconceptions is that of a false belief that the probability of a conclusion being erroneous can be calculated from data in a single experiment without considering external evidence or the underlying mechanism.

Steven Goodman is the Associate Dean of Clinical and Translational Research at Stanford University. He also co-founded Meta-research Innovation Center at Stanford (METRICS), which is a group dedicated to improving biomedical research. Goodman is also known for coining the term ‘p-value fallacy’, where effects and interactions are classified as noise or real based on whether the p-value is greater or less than .05. Such blatant classification can lead to misconstruals of evidence offered by an experiment.

Credit: Steven Goodman

In 2008, Goodman authored a paper — “A Dirty Dozen: Twelve P-Value Misconceptions”, which is highly cited even today. In this article, we will explore more about these misconceptions.

Credit: Steven Goodman

Misconception 1

The null hypothesis has only a 0.05 chance of being true. As per Goodman, this is the ‘most pervasive and pernicious of the many misconceptions about the P value’. This misconception iterates that data alone tells us whether the derived conclusions are right or wrong. Goodman further explains that it is incorrect to note that P-value is calculated under the assumption that the null hypothesis is true, and it thus cannot simultaneously be a probability of the null hypothesis being false.

Misconception 2

A nonsignificant difference denotes no difference between the groups — this is a myth. In fact, a nonsignificant difference means that the null effect is consistent with the observed results while considering the effects included in the confidence interval. Regardless of its significance, the effect best supported by the data of an experiment is still an observed effect.

Misconception 3

It is untrue that a statistically significant finding would also be clinically important. Firstly, the difference is often too small to be of any clinical significance. The P-value carries no information about the magnitude of an effect. It is captured by the effect estimate and confidence intervals. Secondly, as seen in the case of surrogate outcomes, the endpoint might not be important itself.

Misconception 4

The fourth misconception is that studies with P values on the either side of 0.05 are conflicting. Goodman explains that even when the estimates of treatment benefit are identical, studies can have differing degrees of significance just by changing the precision of the estimate by changing the sample size.

Misconception 5

Another misconception is that the studies with the same P-value provide the same evidence against the null hypothesis. In fact, there are cases where dramatically different observed effects may have the same P-value. “This seeming incongruity occurs because the P-value defines “evidence” relative to only one hypothesis—the null. There is no notion of positive evidence—if data with a P .05 are evidence against the null, what are they evidence for?” writes Goodman.

Credit: Steven Goodman

Misconception 6

P 0.05 means that the observed data would occur only 5 per cent of the time under the null hypothesis — this is another common misconception. It is often considered true because the definition of P-value might indicate so. In reality, the P-value is all of the possible results included in the tail area that defines it.

Misconception 7

Another misconception is that P = 0.05 and P<= 0.05 are the same. This makes working with P values a tad challenging as it is very difficult to explain or understand them. While there is a big difference between the two in terms of weight of evidence, they appear the same as the same number is associated with each. It can only be calculated using the Bayesian evidence metric.

Misconception 8

The next misconception in the list is that P values are properly written as inequalities. This confusion arises from the combination of hypothesis tests and P values. In a hypothesis test, a pre-set rejection threshold is established.

Misconception 9

Misconception number 9 is that P .05 means that if one rejects the null hypothesis, the probability of type I error is just 5 per cent. Goodman calls it a logical quicksand. A 5% chance of false rejection is the same as saying that there is a 5% chance that the null hypothesis is true, which again is Misconception #1.

Misconception 10

The next misconception is similar to the previous one. It says that with a P 0.05 threshold for significance, the chances of a type I error will be 5%. The only difference between the two misconceptions is that here we are considering the chance of type I error before the experiment and not after the rejection.

Misconception 11

One should use a one-sided P-value when the result in one direction is not of much concern or a difference in that direction is not possible. This particular misconception has received much technical discussion. The operational effect of using a one-sided P value is to increase the strength of evidence for a result based on considerations not found in the data.

Misconception 12

A most common myth is that the scientific conclusion should be based on the significance of the P-value. This is the same as saying that the magnitude of the effect is not relevant and that only the evidence relevant to a scientific conclusion is in the experiment. “To justify actions, we must incorporate the seriousness of errors flowing from the actions together with the chance that the conclusions are wrong,” Goodman writes.

Read the full paper here.

Access all our open Survey & Awards Nomination forms in one place

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.