The Decision Boundary

From probability to prediction

Logistic regression outputs a probability p^(0,1)\hat{p} \in (0, 1). To make a concrete class prediction, a threshold tt is applied:

y^={1if p^t0if p^<t\hat{y} = \begin{cases} 1 & \text{if } \hat{p} \geq t \\ 0 & \text{if } \hat{p} < t \end{cases}

The default threshold is t=0.5t = 0.5, which corresponds to predicting whichever class is more probable. The decision boundary is the set of points where p^=t\hat{p} = t exactly — the dividing line between predicted class 0 and predicted class 1.

The linear decision boundary

With threshold t=0.5t = 0.5, the decision boundary occurs where p^=0.5\hat{p} = 0.5, which means σ(z)=0.5\sigma(z) = 0.5, which means z=0z = 0. So the boundary is defined by:

θ0+θ1x1++θnxn=0\theta_0 + \theta_1 x_1 + \cdots + \theta_n x_n = 0

This is a linear equation — it defines a straight line (in 2D), a plane (in 3D), or a hyperplane (in higher dimensions). This is why logistic regression is called a linear classifier: it separates classes with a flat boundary.

Two-feature example

Suppose a model with two features x1x_1 (study hours) and x2x_2 (sleep hours) predicts whether a student passes an exam:

p^=σ(6+0.8x1+0.5x2)\hat{p} = \sigma(-6 + 0.8\,x_1 + 0.5\,x_2)

The decision boundary is where:

6+0.8x1+0.5x2=0    x2=121.6x1-6 + 0.8\,x_1 + 0.5\,x_2 = 0 \implies x_2 = 12 - 1.6\,x_1

Points above this line (more sleep relative to study hours) are predicted to pass (y^=1\hat{y} = 1); points below are predicted to fail (y^=0\hat{y} = 0).

Interpreting the coefficients

Each coefficient θj\theta_j has a direct interpretation in terms of log-odds:

  • Increasing xjx_j by one unit multiplies the odds of the positive class by eθje^{\theta_j}, holding all other features fixed.

For the example above, θ1=0.8\theta_1 = 0.8: one additional study hour multiplies the odds of passing by e0.82.23e^{0.8} \approx 2.23 — more than doubling the odds, all else equal.

This log-odds interpretation makes logistic regression highly interpretable, especially compared to black-box models.

Adjusting the threshold

The 0.5 threshold is not always optimal. The right threshold depends on the costs of different errors:

  • A false negative (predicting 0 when truth is 1): e.g. missing a disease.
  • A false positive (predicting 1 when truth is 0): e.g. unnecessary treatment.

If false negatives are very costly (medical diagnosis), lower the threshold to predict positive more readily. If false positives are costly (spam filtering where legitimate emails matter), raise the threshold.

Changing the threshold moves the decision boundary but does not retrain the model — the underlying probabilities stay the same.

Non-linear decision boundaries

Logistic regression itself is always linear. However, you can create non-linear boundaries by adding engineered features such as:

  • Polynomial features: x12x_1^2, x22x_2^2, x1x2x_1 x_2
  • Interaction terms: products of pairs of features
  • Transformations: logx1\log x_1, x2\sqrt{x_2}

For example, adding x12x_1^2 and x22x_2^2 allows the decision boundary to be an ellipse. The model remains logistic regression — only the feature set has changed.

Linear separability

If a straight hyperplane can perfectly separate the two classes, the data is linearly separable. In this case, gradient descent will try to push the boundary further and further from all training points, driving coefficients toward ±\pm\infty — a phenomenon called perfect separation or the complete separation problem, discussed further in the Assumptions lesson.