What Is Linear Regression?
The core idea
Linear regression is one of the most fundamental tools in machine learning and statistics. Its goal is simple: given some input data, find a straight-line relationship that lets you predict a numerical output.
For example:
- Predict a house's sale price from its size in square feet.
- Estimate a person's weight from their height.
- Forecast monthly revenue from advertising spend.
In each case, you have one or more input variables (also called features or predictors) and one output variable (also called the target or response). Linear regression assumes there is an approximately linear relationship between them.
Supervised learning
Linear regression is a supervised learning algorithm. This means you train it on a dataset of labeled examples — pairs of inputs and known outputs. The algorithm learns from these examples so it can predict outputs for new, unseen inputs.
For instance, if you have a dataset of 500 houses with known sizes and prices, you can train a linear regression model on that data. Once trained, the model can estimate the price of a new house given only its size.
Why "linear"?
The word linear refers to the shape of the relationship the model learns. It fits a straight line (in two dimensions) or a flat hyperplane (in higher dimensions) through the data.
This is a simplifying assumption. Real data is rarely perfectly linear. But a linear approximation is often surprisingly useful, especially when:
- The true relationship is close to linear over the range of interest.
- You want an interpretable, fast model.
- You have limited training data.
Regression vs. classification
Machine learning prediction tasks fall into two broad categories:
| Task | Output | Example |
|---|---|---|
| Regression | Continuous number | Predict house price: $342,000 |
| Classification | Discrete category | Predict if email is spam: yes/no |
Linear regression is a regression algorithm — it predicts a continuous number, not a category.
A quick visual intuition
Imagine plotting data points on a graph: house size on the x-axis, price on the y-axis. Each house is a dot. Linear regression finds the single straight line that passes as close as possible to all those dots.
Once you have that line, prediction is easy: pick a size on the x-axis, follow it up to the line, and read off the predicted price on the y-axis.
Key terms to remember
- Feature (): an input variable used to make a prediction.
- Target (): the output variable you are trying to predict.
- Training set: the dataset of pairs used to fit the model.
- Model: the learned function that maps inputs to predicted outputs, written .
- Parameters: the numbers inside the model (slope and intercept) that are adjusted during training.
In the next lesson, you will see exactly what that line looks like as a mathematical equation and what the parameters mean.