Hypothesis testing

Log in to access the full course.

The basic question

Hypothesis testing is a formal procedure for deciding whether data provides enough evidence to reject a specific claim about the world. It does not prove things — it measures how surprising the data would be if a particular assumption were true.

The two hypotheses

Every test starts with two competing claims:

  • Null hypothesis (H0H_0): the default assumption — usually "no effect," "no difference," or "nothing interesting is happening." Example: the new drug has no effect on blood pressure.
  • Alternative hypothesis (H1H_1 or HaH_a): the claim you are trying to find evidence for. Example: the drug does lower blood pressure.

The test asks: is the data sufficiently inconsistent with H0H_0 to warrant rejecting it?

The logic

Hypothesis testing uses a proof by contradiction structure:

  1. Assume H0H_0 is true.
  2. Compute how likely the observed data (or something more extreme) would be under that assumption.
  3. If the data is very unlikely under H0H_0, reject H0H_0 in favor of H1H_1.

You never "accept" H0H_0 — you either reject it or fail to reject it. Failing to reject just means the data is not inconsistent enough with H0H_0 to rule it out.

The test statistic and p-value

A test statistic is a number computed from the data that summarizes how far the observation is from what H0H_0 predicts. Common examples: the z-statistic, t-statistic, chi-squared statistic.

The p-value is the probability of observing a test statistic as extreme as the one computed (or more extreme), assuming H0H_0 is true. A small p-value means the data is unlikely under H0H_0.

A worked example

You want to test whether a coin is fair (H0:p=0.5H_0: p = 0.5). You flip it 100 times and get 62 heads.

The z-statistic for a proportion is:

z=p^p0p0(1p0)/n=0.620.50.5×0.5/100=0.120.05=2.4z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} = \frac{0.62 - 0.5}{\sqrt{0.5 \times 0.5 / 100}} = \frac{0.12}{0.05} = 2.4

From a standard normal table, the two-tailed p-value for z=2.4z = 2.4 is approximately 0.0160.016.

The significance level α\alpha

Before conducting the test, you choose a significance level α\alpha — the threshold for how small the p-value must be to reject H0H_0. The most common choice is α=0.05\alpha = 0.05.

  • If p-value<αp\text{-value} < \alpha: reject H0H_0. The result is statistically significant.
  • If p-valueαp\text{-value} \geq \alpha: fail to reject H0H_0.

In the coin example: 0.016<0.050.016 < 0.05, so we reject H0H_0. The evidence suggests the coin is not fair.

One-tailed vs. two-tailed tests

  • Two-tailed: tests for a difference in either direction (H1:p0.5H_1: p \neq 0.5). Use unless you have a strong prior reason to only care about one direction.
  • One-tailed: tests for a difference in a specific direction (H1:p>0.5H_1: p > 0.5). Has more power in that direction but is only valid if you committed to the direction before seeing the data.