The ROC Curve and AUC

Threshold-independent evaluation

All the metrics from the previous lesson — accuracy, precision, recall, F1 — depend on a chosen classification threshold. Change the threshold and all the numbers change. This makes it hard to compare two models without committing to a threshold first.

The ROC curve and AUC evaluate the model across all possible thresholds simultaneously, giving a threshold-independent view of performance.

The ROC curve

ROC stands for Receiver Operating Characteristic — a term from signal detection theory. The ROC curve is a plot with:

x-axis: False Positive Rate (FPR) = $\frac{FP}{FP + TN}$ — the fraction of actual negatives incorrectly predicted as positive.
y-axis: True Positive Rate (TPR) = $\frac{TP}{TP + FN}$ — recall; the fraction of actual positives correctly identified.

Both axes range from 0 to 1.

How the curve is constructed

Sort all test examples by predicted probability $\hat{p}$ , from highest to lowest.
Start with threshold $t = 1$ : predict everything negative. TPR = 0, FPR = 0. This is the bottom-left point $(0, 0)$ .
Lower the threshold one step at a time. Each time a positive example crosses the threshold (TPR rises) or a negative example crosses it (FPR rises), plot the new point.
At threshold $t = 0$ : predict everything positive. TPR = 1, FPR = 1. This is the top-right point $(1, 1)$ .

The resulting curve connects $(0,0)$ to $(1,1)$ through a series of steps.

Interpreting the ROC curve

Perfect classifier: the curve goes straight up the left edge (TPR = 1 at FPR = 0), then right along the top. Hugs the top-left corner.
Random classifier: the curve follows the diagonal from $(0,0)$ to $(1,1)$ . A random guess gives TPR = FPR at every threshold.
Worse than random: the curve falls below the diagonal. This means the model's ranking is systematically wrong — flipping all predictions would improve it.

A model's ROC curve should be as far above the diagonal as possible, bowing toward the top-left corner.

AUC: Area Under the Curve

AUC (Area Under the ROC Curve) summarizes the entire ROC curve as a single number between 0 and 1:

AUC value	Interpretation
1.0	Perfect classifier
0.9–1.0	Excellent
0.8–0.9	Good
0.7–0.8	Fair
0.5	Random classifier (no skill)
< 0.5	Worse than random

Probabilistic interpretation

AUC has an elegant interpretation: it equals the probability that the model will rank a randomly chosen positive example higher than a randomly chosen negative example:

$\text{AUC} = P(\hat{p}_{\text{positive}} > \hat{p}_{\text{negative}})$

This makes AUC a direct measure of the model's ranking quality, independent of any threshold.

AUC vs. F1: when to use which

	AUC	F1
Threshold required	No	Yes
Measures	Ranking quality	Classification quality at one threshold
Sensitive to class imbalance	Less so	More so
Best for	Comparing models	Operational decisions

Use AUC when comparing models or when the threshold will be tuned later. Use F1 (or precision/recall) when you have a specific threshold and operational requirements.

Precision-Recall curve

For highly imbalanced datasets (e.g. 1% positive examples), the ROC curve can look optimistic because FPR is naturally low when negatives are abundant. The Precision-Recall (PR) curve — plotting precision on the y-axis against recall on the x-axis — is more informative in this setting.

The area under the PR curve (Average Precision, AP) is the analogous summary metric. When positive examples are rare, prefer AP over AUC-ROC as your headline metric.

← Prev Next →