Assumptions and Common Pitfalls

Assumptions of logistic regression

1. Linearity of the log-odds

Logistic regression assumes the log-odds of the positive class are a linear function of the features:

logp1p=θ0+θ1x1++θnxn\log\frac{p}{1-p} = \theta_0 + \theta_1 x_1 + \cdots + \theta_n x_n

This does not require a linear relationship between xjx_j and pp directly — only between xjx_j and the log-odds. However, severely non-linear log-odds relationships will cause poor fit.

How to check: Plot the log-odds of yy against each feature (using binned empirical probabilities) and look for non-linearity. Consider adding polynomial or interaction terms if needed.

2. Independence of observations

Training examples must be independent. If observations are clustered (students within schools), repeated measures (multiple readings per patient), or sequential (time series), standard logistic regression is misspecified.

Remedy: Use mixed-effects logistic regression or account for the structure in feature engineering.

3. Little or no multicollinearity

Highly correlated features inflate coefficient variance and make estimates unstable, just as in linear regression.

How to check: Compute VIF (Variance Inflation Factor) for each feature. Remedy: Remove redundant features or apply L2 regularization.

4. Large sample size

Logistic regression relies on maximum likelihood estimation, which has good statistical properties asymptotically — with large samples. A common rule of thumb is at least 10–20 events (positive examples) per feature included in the model. With too few events per variable, estimates are biased and confidence intervals are unreliable (a problem known as overfitting in the small-sample regime).

5. No extreme outliers

Unlike linear regression (where outliers affect the line via squared errors), logistic regression is affected by outliers in the feature space (high-leverage points). An extreme feature value can exert disproportionate influence on the decision boundary.

How to check: Inspect Cook's distance or examine standardized residuals for influential points.

Common pitfalls

Complete separation

Complete separation occurs when the training data is perfectly linearly separable — some linear combination of features perfectly predicts the outcome. In this case, the maximum likelihood estimate does not exist: the algorithm will try to push coefficients to ±\pm\infty to drive all predicted probabilities to 0 or 1.

Symptoms include: extremely large coefficients, huge standard errors, and warnings from the optimizer about non-convergence.

Remedies: Apply L2 regularization (keeps coefficients finite), collect more data, or use Firth logistic regression (a bias-reduction method).

Class imbalance

When one class is much rarer than the other (e.g. 1% fraud among transactions), the model can achieve high accuracy by always predicting the majority class. The learned decision boundary may be poor for the minority class.

Remedies:

  • Adjust the classification threshold (lower it to predict positive more often).
  • Use class-weighted loss (class_weight='balanced' in scikit-learn), which up-weights minority-class errors.
  • Oversample the minority class (SMOTE) or undersample the majority class.
  • Evaluate with AUC or F1 rather than accuracy.

Ignoring feature scaling

Gradient-based optimization of logistic regression is sensitive to feature scale. Unscaled features lead to slow convergence and may make regularization behave incorrectly (features on larger scales are penalized less by the regularizer in absolute terms).

Always standardize features before fitting logistic regression, especially when using regularization.

Treating predicted probabilities as perfectly calibrated

Logistic regression probabilities are generally well-calibrated on the training distribution, but can be poorly calibrated on new distributions or after regularization. If you need reliable probability estimates (e.g. for expected value calculations), use Platt scaling or isotonic regression to calibrate the model's output.

Extrapolation

Like linear regression, logistic regression should not be used to make predictions far outside the range of the training data. The sigmoid function saturates, so extrapolation will confidently (but wrongly) predict extreme probabilities.

Summary checklist

Before deploying a logistic regression model, verify:

  1. Log-odds vs. features are approximately linear (check residual plots).
  2. Observations are independent.
  3. No severe multicollinearity (check VIF).
  4. Sufficient sample size relative to number of features (at least 10–20 events per variable).
  5. No complete separation (check for extremely large coefficients).
  6. Class imbalance handled if present.
  7. Features are scaled.
  8. Probabilities calibrated if needed.