Ordinary Least Squares

What is OLS?

Ordinary Least Squares (OLS) is the classical method for fitting a linear regression model. It finds the parameter values $\theta_0$ and $\theta_1$ that minimize the sum of squared residuals — exactly the MSE cost function from the previous lesson — by solving a system of equations analytically (with a formula) rather than iteratively.

Because it gives you the exact minimum in one step, OLS is often called a closed-form solution.

Deriving the OLS formulas

To minimize $J(\theta_0, \theta_1)$ , take the partial derivatives with respect to each parameter, set them to zero, and solve.

Starting from:

$J = \frac{1}{2m} \sum_{i=1}^{m} (\theta_0 + \theta_1 x^{(i)} - y^{(i)})^2$

Setting $\frac{\partial J}{\partial \theta_0} = 0$ yields:

$\theta_0 = \bar{y} - \theta_1 \bar{x}$

where $\bar{x} = \frac{1}{m}\sum x^{(i)}$ and $\bar{y} = \frac{1}{m}\sum y^{(i)}$ are the sample means.

Setting $\frac{\partial J}{\partial \theta_1} = 0$ and substituting the expression for $\theta_0$ yields:

$\theta_1 = \frac{\sum_{i=1}^{m}(x^{(i)} - \bar{x})(y^{(i)} - \bar{y})}{\sum_{i=1}^{m}(x^{(i)} - \bar{x})^2}$

These two formulas together are the OLS estimators. Plug in your data, compute the means, and you immediately have the best-fit slope and intercept.

Worked example

Suppose you have four training examples:

$x$ (size, sq ft)	$y$ (price, $000)
1000	200
1500	260
2000	330
2500	380

Step 1 — compute means:

$\bar{x} = \frac{1000+1500+2000+2500}{4} = 1750, \qquad \bar{y} = \frac{200+260+330+380}{4} = 292.5$

Step 2 — compute $\theta_1$ :

$\theta_1 = \frac{(1000-1750)(200-292.5)+(1500-1750)(260-292.5)+\cdots}{(1000-1750)^2+(1500-1750)^2+\cdots}$

Numerator: $(-750)(-92.5)+(-250)(-32.5)+(250)(37.5)+(750)(87.5) = 69375+8125+9375+65625 = 152500$

Denominator: $562500+62500+62500+562500 = 1250000$

$\theta_1 = \frac{152500}{1250000} = 0.122$

Step 3 — compute $\theta_0$ :

$\theta_0 = 292.5 - 0.122 \times 1750 = 292.5 - 213.5 = 79.0$

Result: $\hat{y} = 79.0 + 0.122\,x$

Properties of OLS estimators

Under standard assumptions (covered in the Assumptions lesson), OLS estimators have important statistical guarantees known as the Gauss–Markov theorem: they are the Best Linear Unbiased Estimators (BLUE). This means:

Unbiased: on average, $\theta_1$ equals the true slope.
Minimum variance: no other linear unbiased estimator has lower variance.

Limitations of the simple formulas

The formulas above work perfectly for simple linear regression (one feature). When you have multiple features, OLS generalizes to the Normal Equation in matrix form — covered in the Multiple Features lesson.

Key takeaway

OLS gives you the exact best-fit parameters in one calculation. There is no approximation, no iteration, and no learning rate to tune. For small to medium datasets with a modest number of features, it is the preferred approach.

← Prev Next →