p-values & Significance

What a p-value is -- and what it is not

Back

Definition

A p-value is the probability of observing results at least as extreme as yours, assuming the null hypothesis is true.

p = P( data as/extremer | H₀ true )

Common misunderstandings

  • Not: "probability H₀ is true"
  • Not: "clinical importance"
  • Not: "replication guarantee"

Interactive: p-value from a z-score

Move the z-score and see the two-tailed p-value as shaded tail area.

01.964.0
Two-tailed p: -
Rule-of-thumb: z≈1.96 ↔ p≈0.05

Clinical note

A tiny p-value can happen with a trivial effect if n is huge. Always pair p-values with effect sizes and CIs.

Standard normal (H₀)

Shaded tails correspond to the two-tailed p-value.

Dental Scenario: Whitening Gel Effectiveness

Testing if a new whitening gel significantly improves shade scores

The study: A dental researcher tests a new whitening gel on n = 36 patients. The known population mean shade improvement with the standard gel is μ₀ = 2.0 shades. The new gel produces a sample mean of x̄ = 2.8 shades with s = 1.5. Does the new gel perform significantly better?

1

State the Hypotheses

H₀: μ = 2.0 (The new gel is no better than the standard)
H₁: μ > 2.0 (The new gel improves shade scores more)
One-tailed test, α = 0.05
2

Compute the z-Score

SE = s / √n = 1.5 / √36 = 1.5 / 6 = 0.25
z = (x̄ − μ₀) / SE = (2.8 − 2.0) / 0.25 = 0.8 / 0.25 = 3.20
3

Locate z = 3.20 on the Normal Curve & Shade the Tail

The shaded right tail represents the probability of observing z ≥ 3.20 under H₀.

4

Interpret: p-value vs α

p-value = P(Z ≥ 3.20) = 0.0007
Since p = 0.0007 < α = 0.05, we reject H₀.
z = 3.20
Test statistic
p = 0.0007
One-tailed p-value
Reject H₀
Significant at α = 0.05

Conclusion: There is strong statistical evidence that the new whitening gel produces greater shade improvement than the standard gel (z = 3.20, p = 0.0007). However, clinicians should also consider whether a 0.8-shade difference is clinically meaningful to patients.

Real Dental Scenario

Interpreting p = 0.03 in a Toothpaste Comparison

The study: Researchers compared Brand A vs Brand B toothpaste on plaque reduction in 200 patients over 6 months. The study found a statistically significant difference with p = 0.03.

Significance Meter

p = 1.0 (No evidence) p = 0.05 p = 0.001 p ~ 0 (Strong)
p = 0.03
Not significant
Borderline
Significant
Highly significant

Correct: Evidence against H₀

If H₀ were true (no difference), there is only a 3% chance of seeing a difference this large or larger. This is moderate evidence against H₀.

Correct: Statistically significant at α = 0.05

Since p = 0.03 < 0.05, we reject H₀ at the conventional 5% significance level.

Wrong: "There is a 97% probability Brand A is better"

p-values do NOT give the probability that one treatment is better. They assume H₀ is true and measure data extremeness.

Wrong: "The difference is clinically meaningful"

Statistical significance does NOT equal clinical significance. A tiny plaque reduction may be statistically significant with a large sample but clinically trivial.

Wrong: "If we repeat the study, 97% of the time we'll get the same result"

The p-value is NOT a replication probability. Replication depends on effect size, sample size, and other factors.

Wrong: "There is only a 3% chance H₀ is true"

The p-value is NOT the probability that H₀ is true. It is the probability of the data (or more extreme) given H₀. These are fundamentally different.

Dental example

Suppose a new sealant reduces mean caries score by 0.2 units with p=0.01. The evidence against H₀ is strong, but you still need to judge whether a 0.2-unit reduction is clinically meaningful, and whether it generalizes.