Derivatives
Log in to access the full course.
The rate of change
A derivative measures how much a function's output changes in response to a small change in its input. It is the instantaneous rate of change — the slope of the function at a specific point.
For a function , the derivative at point is:
You nudge the input by a tiny amount , measure how much the output changes, and divide by . As shrinks to zero, this ratio settles at the derivative.
Geometric interpretation: the derivative is the slope of the tangent line to the curve at the point .
- : the function is increasing at — moving right increases .
- : the function is decreasing at .
- : the function is flat at — a potential minimum, maximum, or saddle point.
Common derivatives
| Function | Derivative |
|---|---|
| Constant | |
The sigmoid derivative is particularly important in ML — it appears in backpropagation whenever sigmoid activations are used.
Key rules
Sum rule:
Product rule:
Chain rule: — covered fully in its own lesson.
Finding minima
A critical property: at a local minimum or maximum, the derivative is zero (). The function is momentarily flat. This is the mathematical foundation for optimization — to find where a function is minimized, find where its derivative is zero.
For a function of one variable, you check the sign of the second derivative to distinguish minima from maxima:
- : local minimum (the function is concave up, like a bowl).
- : local maximum (concave down, like a hill).
In ML, we do not solve for zero derivatives analytically — the functions are too complex. Instead, we use gradient descent: repeatedly move in the direction of decreasing derivative until we reach a flat region.