Bias-variance tradeoff

Two sources of prediction error

When a machine learning model makes predictions, its errors come from two distinct sources: bias and variance. Understanding both — and the tension between them — is one of the most fundamental ideas in ML.

Bias

Bias is the error from incorrect assumptions in the model. A high-bias model is too simple: it misses the true patterns in the data regardless of how much training data you give it.

Imagine trying to fit a curved relationship with a straight line. No matter how much data you collect, the line will never capture the curve. The model is systematically wrong — it under-predicts in some regions and over-predicts in others. This is underfitting.

Examples of high-bias models: linear regression applied to non-linear data, a decision tree with depth 1.

Variance

Variance is the error from sensitivity to fluctuations in the training data. A high-variance model learns the training data too specifically — including its noise — and falls apart on new data.

If you trained the same model on two slightly different datasets and got very different predictions, that is high variance. The model is memorizing quirks of the training set rather than the underlying pattern. This is overfitting.

Examples of high-variance models: an unpruned decision tree, a nearest-neighbor classifier with $K=1$ .

The tradeoff

Bias and variance pull in opposite directions as model complexity changes:

Increase complexity (more parameters, deeper trees, more features): bias decreases, variance increases.
Decrease complexity (simpler model, regularization): variance decreases, bias increases.

The total expected error is:

$\text{Total error} = \text{Bias}^2 + \text{Variance} + \text{Irreducible noise}$

The irreducible noise is the inherent randomness in the data — no model can eliminate it. The goal of model selection and tuning is to find the complexity sweet spot that minimizes bias² + variance.

Visualizing the tradeoff

Think of a U-shaped curve with model complexity on the x-axis and test error on the y-axis. On the left (too simple): high bias dominates. On the right (too complex): high variance dominates. The bottom of the U is the optimal complexity.

Training error, by contrast, keeps falling as complexity increases — it never shows the right side of the U. This is why you need a validation set: to see the full U-shaped curve.

Practical implications

If your model has high training error and high test error: likely high bias — try a more complex model, add features.
If your model has low training error but high test error: likely high variance — regularize, get more data, simplify.
More data primarily helps with variance, not bias. If the model is fundamentally misspecified (high bias), more data will not fix it.