Derivatives

The rate of change

A derivative measures how much a function's output changes in response to a small change in its input. It is the instantaneous rate of change — the slope of the function at a specific point.

For a function $f(x)$ , the derivative at point $x$ is:

$f'(x) = \frac{df}{dx} = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}$

You nudge the input by a tiny amount $h$ , measure how much the output changes, and divide by $h$ . As $h$ shrinks to zero, this ratio settles at the derivative.

Geometric interpretation: the derivative is the slope of the tangent line to the curve $y = f(x)$ at the point $x$ .

$f'(x) > 0$ : the function is increasing at $x$ — moving right increases $f$ .
$f'(x) < 0$ : the function is decreasing at $x$ .
$f'(x) = 0$ : the function is flat at $x$ — a potential minimum, maximum, or saddle point.

Common derivatives

Function $f(x)$	Derivative $f'(x)$
Constant $c$	$0$
$x^n$	$nx^{n-1}$
$e^x$	$e^x$
$\ln x$	$1/x$
$\sin x$	$\cos x$
$\sigma(x) = \frac{1}{1+e^{-x}}$	$\sigma(x)(1 - \sigma(x))$

The sigmoid derivative is particularly important in ML — it appears in backpropagation whenever sigmoid activations are used.

Key rules

Sum rule: $\frac{d}{dx}[f(x) + g(x)] = f'(x) + g'(x)$

Product rule: $\frac{d}{dx}[f(x)g(x)] = f'(x)g(x) + f(x)g'(x)$

Chain rule: $\frac{d}{dx}[f(g(x))] = f'(g(x)) \cdot g'(x)$ — covered fully in its own lesson.

Finding minima

A critical property: at a local minimum or maximum, the derivative is zero ( $f'(x) = 0$ ). The function is momentarily flat. This is the mathematical foundation for optimization — to find where a function is minimized, find where its derivative is zero.

For a function of one variable, you check the sign of the second derivative $f''(x)$ to distinguish minima from maxima:

$f''(x) > 0$ : local minimum (the function is concave up, like a bowl).
$f''(x) < 0$ : local maximum (concave down, like a hill).

In ML, we do not solve for zero derivatives analytically — the functions are too complex. Instead, we use gradient descent: repeatedly move in the direction of decreasing derivative until we reach a flat region.