The Decision Boundary

From probability to prediction

Logistic regression outputs a probability $\hat{p} \in (0, 1)$ . To make a concrete class prediction, a threshold $t$ is applied:

$\hat{y} = \begin{cases} 1 & \text{if } \hat{p} \geq t \\ 0 & \text{if } \hat{p} < t \end{cases}$

The default threshold is $t = 0.5$ , which corresponds to predicting whichever class is more probable. The decision boundary is the set of points where $\hat{p} = t$ exactly — the dividing line between predicted class 0 and predicted class 1.

The linear decision boundary

With threshold $t = 0.5$ , the decision boundary occurs where $\hat{p} = 0.5$ , which means $\sigma(z) = 0.5$ , which means $z = 0$ . So the boundary is defined by:

$\theta_0 + \theta_1 x_1 + \cdots + \theta_n x_n = 0$

This is a linear equation — it defines a straight line (in 2D), a plane (in 3D), or a hyperplane (in higher dimensions). This is why logistic regression is called a linear classifier: it separates classes with a flat boundary.

Two-feature example

Suppose a model with two features $x_1$ (study hours) and $x_2$ (sleep hours) predicts whether a student passes an exam:

$\hat{p} = \sigma(-6 + 0.8\,x_1 + 0.5\,x_2)$

The decision boundary is where:

$-6 + 0.8\,x_1 + 0.5\,x_2 = 0 \implies x_2 = 12 - 1.6\,x_1$

Points above this line (more sleep relative to study hours) are predicted to pass ( $\hat{y} = 1$ ); points below are predicted to fail ( $\hat{y} = 0$ ).

Interpreting the coefficients

Each coefficient $\theta_j$ has a direct interpretation in terms of log-odds:

Increasing $x_j$ by one unit multiplies the odds of the positive class by $e^{\theta_j}$ , holding all other features fixed.

For the example above, $\theta_1 = 0.8$ : one additional study hour multiplies the odds of passing by $e^{0.8} \approx 2.23$ — more than doubling the odds, all else equal.

This log-odds interpretation makes logistic regression highly interpretable, especially compared to black-box models.

Adjusting the threshold

The 0.5 threshold is not always optimal. The right threshold depends on the costs of different errors:

A false negative (predicting 0 when truth is 1): e.g. missing a disease.
A false positive (predicting 1 when truth is 0): e.g. unnecessary treatment.

If false negatives are very costly (medical diagnosis), lower the threshold to predict positive more readily. If false positives are costly (spam filtering where legitimate emails matter), raise the threshold.

Changing the threshold moves the decision boundary but does not retrain the model — the underlying probabilities stay the same.

Non-linear decision boundaries

Logistic regression itself is always linear. However, you can create non-linear boundaries by adding engineered features such as:

Polynomial features: $x_1^2$ , $x_2^2$ , $x_1 x_2$
Interaction terms: products of pairs of features
Transformations: $\log x_1$ , $\sqrt{x_2}$

For example, adding $x_1^2$ and $x_2^2$ allows the decision boundary to be an ellipse. The model remains logistic regression — only the feature set has changed.

Linear separability

If a straight hyperplane can perfectly separate the two classes, the data is linearly separable. In this case, gradient descent will try to push the boundary further and further from all training points, driving coefficients toward $\pm\infty$ — a phenomenon called perfect separation or the complete separation problem, discussed further in the Assumptions lesson.

← Prev Next →