The ROC Curve and AUC
Threshold-independent evaluation
All the metrics from the previous lesson — accuracy, precision, recall, F1 — depend on a chosen classification threshold. Change the threshold and all the numbers change. This makes it hard to compare two models without committing to a threshold first.
The ROC curve and AUC evaluate the model across all possible thresholds simultaneously, giving a threshold-independent view of performance.
The ROC curve
ROC stands for Receiver Operating Characteristic — a term from signal detection theory. The ROC curve is a plot with:
- x-axis: False Positive Rate (FPR) = — the fraction of actual negatives incorrectly predicted as positive.
- y-axis: True Positive Rate (TPR) = — recall; the fraction of actual positives correctly identified.
Both axes range from 0 to 1.
How the curve is constructed
- Sort all test examples by predicted probability , from highest to lowest.
- Start with threshold : predict everything negative. TPR = 0, FPR = 0. This is the bottom-left point .
- Lower the threshold one step at a time. Each time a positive example crosses the threshold (TPR rises) or a negative example crosses it (FPR rises), plot the new point.
- At threshold : predict everything positive. TPR = 1, FPR = 1. This is the top-right point .
The resulting curve connects to through a series of steps.
Interpreting the ROC curve
- Perfect classifier: the curve goes straight up the left edge (TPR = 1 at FPR = 0), then right along the top. Hugs the top-left corner.
- Random classifier: the curve follows the diagonal from to . A random guess gives TPR = FPR at every threshold.
- Worse than random: the curve falls below the diagonal. This means the model's ranking is systematically wrong — flipping all predictions would improve it.
A model's ROC curve should be as far above the diagonal as possible, bowing toward the top-left corner.
AUC: Area Under the Curve
AUC (Area Under the ROC Curve) summarizes the entire ROC curve as a single number between 0 and 1:
| AUC value | Interpretation |
|---|---|
| 1.0 | Perfect classifier |
| 0.9–1.0 | Excellent |
| 0.8–0.9 | Good |
| 0.7–0.8 | Fair |
| 0.5 | Random classifier (no skill) |
| < 0.5 | Worse than random |
Probabilistic interpretation
AUC has an elegant interpretation: it equals the probability that the model will rank a randomly chosen positive example higher than a randomly chosen negative example:
This makes AUC a direct measure of the model's ranking quality, independent of any threshold.
AUC vs. F1: when to use which
| AUC | F1 | |
|---|---|---|
| Threshold required | No | Yes |
| Measures | Ranking quality | Classification quality at one threshold |
| Sensitive to class imbalance | Less so | More so |
| Best for | Comparing models | Operational decisions |
Use AUC when comparing models or when the threshold will be tuned later. Use F1 (or precision/recall) when you have a specific threshold and operational requirements.
Precision-Recall curve
For highly imbalanced datasets (e.g. 1% positive examples), the ROC curve can look optimistic because FPR is naturally low when negatives are abundant. The Precision-Recall (PR) curve — plotting precision on the y-axis against recall on the x-axis — is more informative in this setting.
The area under the PR curve (Average Precision, AP) is the analogous summary metric. When positive examples are rare, prefer AP over AUC-ROC as your headline metric.