A Bayesian perspective on interpreting statistical significance

Print this Topic

Interpreting low (and high) P values is tricker than it looks.

Imagine that you are screening drugs to see if they lower blood pressure. Based on the amount of scatter you expect to see and the minimum change you would care about, you've chosen the sample size for each experiment to have 80% power to detect the difference you are looking for with a P value less than 0.05.

If you do get a P value less than 0.05, what is the chance that the drug truly works?

The answer is: It depends.

It depends on the context of your experiment. Let's look at the same experiment performed in three alternative scenarios. In scenario A, you know a bit about the pharmacology of the drugs and expect 10% of the drugs to be active. In this case, the prior probability is 10%. In scenario B, you know a lot about the pharmacology of the drugs and expect 80% to be active. In scenario C, the drugs were selected at random, and you expect only 1% to be active in lowering blood pressure.

What happens when you perform 1000 experiments in each of these contexts? The details of the calculations are shown on pages 143-145 of Intuitive Biostatistics, by Harvey Motulsky (Oxford University Press, 1995). Since the power is 80%, you expect 80% of truly effective drugs to yield a P value less than 0.05 in your experiment. Since you set the definition of statistical significance to 0.05, you expect 5% of ineffective drugs to yield a P value less than 0.05. Putting these calculations together creates these tables.

 

A. Prior probability=10%

 

 

 

 

Drug really works

Drug really doesn't work

Total

P<0.05, “significant”

80

45

125

P>0.05, “not significant”

20

855

875

Total

100

900

1000

 

B. Prior probability=80%

 

 

 

 

Drug really works

Drug really doesn't work

Total

P<0.05, “significant”

640

10

650

P>0.05, “not significant”

160

190

350

Total

800

200

1000

 

C. Prior probability=1%

 

 

 

 

Drug really works

Drug really doesn't work

Total

P<0.05, “significant”

8

50

58

P>0.05, “not significant”

2

940

942

Total

10

990

1000

 

The totals at the bottom of each column are determined by the prior probability the context of your experiment. The prior probability equals the fraction of the experiments that are in the leftmost column. To compute the number of experiments in each row, use the definition of power and alpha. Of the drugs that really work, you won't obtain a P value less than 0.05 in every case. You chose a sample size to obtain a power of 80%, so 80% of the truly effective drugs yield “significant” P values and 20% yield “not significant” P values. Of the drugs that really don't work (middle column), you won't get “not significant” results in every case. Since you defined statistical significance to be “P<0.05” (alpha=0.05), you will see a "statistically significant" result in 5% of experiments performed with drugs that are really inactive and a “not significant” result in the other 95%.

If the P value is less than 0.05, so the results are “statistically significant”, what is the chance that the drug is, in fact, active? The answer is different for each experiment.

 

Experiments with P<0.05 and...

Fraction of experiments with P<0.05 where drug really works

Prior probability

Drug really works

Drug really doesn't work

A. Prior probability=10%

80

45

80/125 = 64%

B. Prior probability=80%

640

10

640/650 = 98%

C. Prior probability=1%

8

50

8/58 =The analysis checklists are part of Prism's help system, and have proven to be quite useful. We reprint them here, without the surrounding discussion of the tests. But the checklists alone might prove useful, even if only provoking you to read more about these tests. 14%

 

For experiment A, the chance that the drug is really active is 80/125 or 64%. If you observe a statistically significant result, there is a 64% chance that the difference is real and a 36% chance that the difference simply arose in the course of random sampling. For experiment B, there is a 98.5% chance that the difference is real. In contrast, if you observe a significant result in experiment C, there is only a 14% chance that the result is real and an 86% chance that it is due to random sampling. For experiment C, the vast majority of “significant” results are due to chance.

You can't interpret a P value in a vacuum. Your interpretation depends on the context of the experiment. Interpreting results requires common sense, intuition, and judgment.



Copyright (c) 2007 GraphPad Software Inc. All rights reserved.
URL: http://www.graphpad.com/help/Prism5/Prism5Help.html?a_bayesian_perspective_on_interpreting_statistical_significance.htm