What Is Logistic Regression?

From regression to classification

In the linear regression course, the goal was to predict a continuous number — a house price, a temperature, a revenue figure. Many real-world problems instead ask for a category: is this email spam or not? Does this patient have the disease? Will this customer churn?

These are classification problems, and logistic regression is one of the most widely used algorithms for solving them.

Despite its name, logistic regression is a classification algorithm, not a regression algorithm. The name is historical: it builds on the same linear combination of features as linear regression, but passes the result through a function that squashes it into a probability.

Binary classification

The simplest case is binary classification: the target variable yy takes exactly two values, coded as 0 and 1.

  • y=1y = 1: the positive class (e.g. spam, disease present, will churn).
  • y=0y = 0: the negative class (e.g. not spam, disease absent, will not churn).

The goal of logistic regression is to estimate the probability that a given input belongs to the positive class:

p^=P(y=1x)\hat{p} = P(y = 1 \mid \mathbf{x})

Once you have a probability, a prediction is made by applying a threshold (usually 0.5):

y^={1if p^0.50if p^<0.5\hat{y} = \begin{cases} 1 & \text{if } \hat{p} \geq 0.5 \\ 0 & \text{if } \hat{p} < 0.5 \end{cases}

Why not use linear regression for classification?

It is tempting to apply linear regression directly: fit a line to 0/1 targets and threshold at 0.5. This fails for several reasons:

  1. Unbounded outputs. Linear regression can predict values far outside [0,1][0, 1] — probabilities of 0.3-0.3 or 2.72.7 are meaningless.
  2. Poor fit. The relationship between features and a binary outcome is inherently non-linear; a straight line is a poor model.
  3. Violated assumptions. The residuals from fitting a line to 0/1 data are not normally distributed and exhibit severe heteroscedasticity.

Logistic regression solves this by replacing the linear output with a function that always produces a value between 0 and 1.

The logistic regression model

Logistic regression computes a linear score (called the log-odds or logit) and passes it through the sigmoid function σ\sigma:

p^=σ(θ0+θ1x1++θnxn)\hat{p} = \sigma(\theta_0 + \theta_1 x_1 + \cdots + \theta_n x_n)

The sigmoid function and its properties are the subject of the next lesson.

Applications

Logistic regression is used across many domains:

DomainExample task
EmailSpam vs. not spam
MedicineDisease present vs. absent
FinanceLoan default vs. repayment
MarketingCustomer churns vs. stays
NLPSentiment positive vs. negative

Key advantages

  • Interpretable: each coefficient θj\theta_j has a clear meaning in terms of log-odds (covered in the Decision Boundary lesson).
  • Probabilistic output: gives a calibrated probability, not just a hard label.
  • Efficient: fast to train even on large datasets.
  • Strong baseline: often hard to beat with more complex models on tabular data.

Key terms

  • Binary classification: predicting one of two classes (0 or 1).
  • Positive class (y=1y = 1): the class the model is trained to detect.
  • Predicted probability (p^\hat{p}): the model's estimate of P(y=1x)P(y = 1 \mid \mathbf{x}).
  • Threshold: the cutoff applied to p^\hat{p} to produce a hard class prediction.
  • Log-odds / logit: the linear combination θ0+θ1x1++θnxn\theta_0 + \theta_1 x_1 + \cdots + \theta_n x_n before the sigmoid is applied.