RMSE / MAE

Measuring prediction error in original units

Regression models predict continuous values. The most natural way to evaluate them is to measure how far the predictions are from the true values — the residuals $e^{(i)} = y^{(i)} - \hat{y}^{(i)}$ .

Two metrics dominate: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Both are in the same units as the target variable, making them directly interpretable.

Mean Absolute Error (MAE)

$\text{MAE} = \frac{1}{m} \sum_{i=1}^{m} |y^{(i)} - \hat{y}^{(i)}|$

MAE is the average absolute error — how far off the model is, on average, ignoring the direction.

Interpretation: an MAE of 12 means the model's predictions are off by 12 units on average.

Key property: MAE treats all errors equally. A prediction that is off by 100 contributes 100 times more than a prediction off by 1 — proportionally. There is no extra penalty for large errors.

Root Mean Squared Error (RMSE)

$\text{RMSE} = \sqrt{\frac{1}{m} \sum_{i=1}^{m} (y^{(i)} - \hat{y}^{(i)})^2}$

RMSE takes the square root of the mean squared error, returning units to the original scale.

Interpretation: an RMSE of 15 means the typical prediction error is around 15 units — but with extra weight on larger errors.

Key property: RMSE penalizes large errors disproportionately. Squaring the errors means a prediction off by 10 contributes 100 to the average, while one off by 1 contributes 1. This makes RMSE more sensitive to outliers than MAE.

Comparing the two

	MAE	RMSE
Units	Same as target	Same as target
Outlier sensitivity	Low	High
When to prefer	Outliers are common, errors roughly equal cost	Large errors are especially costly
Optimization target	L1 loss	L2 loss (MSE)
Interpretability	More intuitive	Harder to interpret directly

When MAE is preferable: if your training data has outliers and you do not want a few extreme examples to dominate the metric, MAE gives a fairer picture of typical performance. House price prediction with a few unusually priced mansions in the dataset is an example.

When RMSE is preferable: when large errors are particularly bad and you want the metric to reflect this. Forecasting energy demand where large misses cause grid instability, for example — being off by 1000 MW is much worse than being off by 100 MW 10 times.

Mean Absolute Percentage Error (MAPE)

When the target spans multiple orders of magnitude or when relative error matters more than absolute error, MAPE is useful:

$\text{MAPE} = \frac{1}{m} \sum_{i=1}^{m} \left| \frac{y^{(i)} - \hat{y}^{(i)}}{y^{(i)}} \right| \times 100\%$

An MAPE of 8% means predictions are off by 8% of the true value on average. This is scale-independent — a 10-unit error on a target of 100 and a 1000-unit error on a target of 10,000 are treated equally.

Watch out: MAPE is undefined when $y^{(i)} = 0$ and becomes unstable when true values are near zero. It also asymmetrically penalizes under-predictions more than over-predictions of the same percentage.

A practical note

RMSE is the most widely reported regression metric in papers and competitions. MAE is often more useful in practice for communicating to stakeholders. Report both when in doubt — they highlight different aspects of the model's error distribution.