Ordinary Least Squares

What is OLS?

Ordinary Least Squares (OLS) is the classical method for fitting a linear regression model. It finds the parameter values θ0\theta_0 and θ1\theta_1 that minimize the sum of squared residuals — exactly the MSE cost function from the previous lesson — by solving a system of equations analytically (with a formula) rather than iteratively.

Because it gives you the exact minimum in one step, OLS is often called a closed-form solution.

Deriving the OLS formulas

To minimize J(θ0,θ1)J(\theta_0, \theta_1), take the partial derivatives with respect to each parameter, set them to zero, and solve.

Starting from:

J=12mi=1m(θ0+θ1x(i)y(i))2J = \frac{1}{2m} \sum_{i=1}^{m} (\theta_0 + \theta_1 x^{(i)} - y^{(i)})^2

Setting Jθ0=0\frac{\partial J}{\partial \theta_0} = 0 yields:

θ0=yˉθ1xˉ\theta_0 = \bar{y} - \theta_1 \bar{x}

where xˉ=1mx(i)\bar{x} = \frac{1}{m}\sum x^{(i)} and yˉ=1my(i)\bar{y} = \frac{1}{m}\sum y^{(i)} are the sample means.

Setting Jθ1=0\frac{\partial J}{\partial \theta_1} = 0 and substituting the expression for θ0\theta_0 yields:

θ1=i=1m(x(i)xˉ)(y(i)yˉ)i=1m(x(i)xˉ)2\theta_1 = \frac{\sum_{i=1}^{m}(x^{(i)} - \bar{x})(y^{(i)} - \bar{y})}{\sum_{i=1}^{m}(x^{(i)} - \bar{x})^2}

These two formulas together are the OLS estimators. Plug in your data, compute the means, and you immediately have the best-fit slope and intercept.

Worked example

Suppose you have four training examples:

xx (size, sq ft)yy (price, $000)
1000200
1500260
2000330
2500380

Step 1 — compute means:

xˉ=1000+1500+2000+25004=1750,yˉ=200+260+330+3804=292.5\bar{x} = \frac{1000+1500+2000+2500}{4} = 1750, \qquad \bar{y} = \frac{200+260+330+380}{4} = 292.5

Step 2 — compute θ1\theta_1:

θ1=(10001750)(200292.5)+(15001750)(260292.5)+(10001750)2+(15001750)2+\theta_1 = \frac{(1000-1750)(200-292.5)+(1500-1750)(260-292.5)+\cdots}{(1000-1750)^2+(1500-1750)^2+\cdots}

Numerator: (750)(92.5)+(250)(32.5)+(250)(37.5)+(750)(87.5)=69375+8125+9375+65625=152500(-750)(-92.5)+(-250)(-32.5)+(250)(37.5)+(750)(87.5) = 69375+8125+9375+65625 = 152500

Denominator: 562500+62500+62500+562500=1250000562500+62500+62500+562500 = 1250000

θ1=1525001250000=0.122\theta_1 = \frac{152500}{1250000} = 0.122

Step 3 — compute θ0\theta_0:

θ0=292.50.122×1750=292.5213.5=79.0\theta_0 = 292.5 - 0.122 \times 1750 = 292.5 - 213.5 = 79.0

Result: y^=79.0+0.122x\hat{y} = 79.0 + 0.122\,x

Properties of OLS estimators

Under standard assumptions (covered in the Assumptions lesson), OLS estimators have important statistical guarantees known as the Gauss–Markov theorem: they are the Best Linear Unbiased Estimators (BLUE). This means:

  • Unbiased: on average, θ1\theta_1 equals the true slope.
  • Minimum variance: no other linear unbiased estimator has lower variance.

Limitations of the simple formulas

The formulas above work perfectly for simple linear regression (one feature). When you have multiple features, OLS generalizes to the Normal Equation in matrix form — covered in the Multiple Features lesson.

Key takeaway

OLS gives you the exact best-fit parameters in one calculation. There is no approximation, no iteration, and no learning rate to tune. For small to medium datasets with a modest number of features, it is the preferred approach.