Hypothesis testing
Log in to access the full course.
The basic question
Hypothesis testing is a formal procedure for deciding whether data provides enough evidence to reject a specific claim about the world. It does not prove things — it measures how surprising the data would be if a particular assumption were true.
The two hypotheses
Every test starts with two competing claims:
- Null hypothesis (): the default assumption — usually "no effect," "no difference," or "nothing interesting is happening." Example: the new drug has no effect on blood pressure.
- Alternative hypothesis ( or ): the claim you are trying to find evidence for. Example: the drug does lower blood pressure.
The test asks: is the data sufficiently inconsistent with to warrant rejecting it?
The logic
Hypothesis testing uses a proof by contradiction structure:
- Assume is true.
- Compute how likely the observed data (or something more extreme) would be under that assumption.
- If the data is very unlikely under , reject in favor of .
You never "accept" — you either reject it or fail to reject it. Failing to reject just means the data is not inconsistent enough with to rule it out.
The test statistic and p-value
A test statistic is a number computed from the data that summarizes how far the observation is from what predicts. Common examples: the z-statistic, t-statistic, chi-squared statistic.
The p-value is the probability of observing a test statistic as extreme as the one computed (or more extreme), assuming is true. A small p-value means the data is unlikely under .
A worked example
You want to test whether a coin is fair (). You flip it 100 times and get 62 heads.
The z-statistic for a proportion is:
From a standard normal table, the two-tailed p-value for is approximately .
The significance level
Before conducting the test, you choose a significance level — the threshold for how small the p-value must be to reject . The most common choice is .
- If : reject . The result is statistically significant.
- If : fail to reject .
In the coin example: , so we reject . The evidence suggests the coin is not fair.
One-tailed vs. two-tailed tests
- Two-tailed: tests for a difference in either direction (). Use unless you have a strong prior reason to only care about one direction.
- One-tailed: tests for a difference in a specific direction (). Has more power in that direction but is only valid if you committed to the direction before seeing the data.