Perceptron
Log in to access the full course.
The origin
The perceptron, proposed by Frank Rosenblatt in 1958, is the simplest possible model of a neuron and the historical ancestor of modern neural networks. Understanding it builds the foundation for everything that follows.
What a perceptron does
A perceptron takes binary or continuous inputs , computes a weighted sum, and outputs a binary decision:
The parameters are the weights (one per input) and the bias . The weights control how much each input influences the decision; the bias shifts the threshold.
This is essentially a linear classifier — it draws a hyperplane through the input space and assigns each side a class label.
The perceptron learning rule
The perceptron has a simple learning algorithm:
- Initialize all weights to zero (or small random values).
- For each training example, make a prediction.
- If the prediction is correct, do nothing.
- If the prediction is wrong, update weights:
- If predicted 0, true is 1: add the input to the weights ().
- If predicted 1, true is 0: subtract the input ().
- Repeat until all examples are classified correctly (or a maximum number of iterations).
The perceptron convergence theorem guarantees that if the training data is linearly separable, this algorithm will find a separating hyperplane in a finite number of steps.
The fundamental limitation
The perceptron can only learn linearly separable problems. The most famous example of its failure is XOR: the two classes (input pairs that produce 0 and pairs that produce 1) cannot be separated by any straight line. Minsky and Papert proved this limitation formally in 1969, which temporarily dampened enthusiasm for neural networks.
The solution — combining multiple perceptrons in layers — took another decade to develop fully.
The perceptron vs. logistic regression
A perceptron is a hard threshold classifier: it outputs 0 or 1. Logistic regression uses the sigmoid function instead of a hard threshold, producing a probability and allowing gradient-based learning. Modern neural networks use logistic regression-style units (soft threshold, continuous output) rather than true perceptrons.
The perceptron's learning rule is also different from gradient descent: it only updates when the prediction is wrong, rather than computing a gradient from a smooth loss. This makes it less flexible and harder to generalize.
Why it still matters
The perceptron establishes the core structure that persists through all of deep learning: a unit takes weighted inputs, applies a function, and produces an output. That output is the building block of every neural network. The key innovation of deep learning is stacking many such units in layers and learning the weights with gradient descent — but the unit itself has not fundamentally changed since 1958.