Expectation & variance

Summarizing a distribution

A probability distribution tells you everything about a random variable — but sometimes you just want a few key numbers that summarize its behavior. The two most important summaries are the expected value (where the distribution is centered) and the variance (how spread out it is).

Expected value

The expected value $E[X]$ — also called the mean or expectation — is the long-run average value of $X$ over many repetitions.

For a discrete random variable:

$E[X] = \sum_x x \cdot P(X = x)$

For a continuous random variable:

$E[X] = \int_{-\infty}^{\infty} x \cdot f(x)\, dx$

Example: a game pays $10 with probability 0.2, $0 with probability 0.5, and −$5 with probability 0.3. The expected payout is:

$E[X] = 10(0.2) + 0(0.5) + (-5)(0.3) = 2 + 0 - 1.5 = \$0.50$

On average, the game pays 50 cents per play.

Properties of expectation

Expectation is linear — two rules that make calculations much easier:

$E[aX + b] = aE[X] + b$

$E[X + Y] = E[X] + E[Y]$

The second rule holds whether or not $X$ and $Y$ are independent. This is surprisingly powerful — you can break complex expectations into simpler pieces.

Variance

The variance $\text{Var}(X)$ measures how much $X$ tends to deviate from its mean. It is the expected squared deviation:

$\text{Var}(X) = E\left[(X - E[X])^2\right] = E[X^2] - (E[X])^2$

The second formula — often easier to compute — says: variance equals the mean of the square minus the square of the mean.

Example (continued): for the game above, $E[X] = 0.5$ .

$E[X^2] = 100(0.2) + 0(0.5) + 25(0.3) = 20 + 0 + 7.5 = 27.5$

$\text{Var}(X) = 27.5 - (0.5)^2 = 27.5 - 0.25 = 27.25$

Standard deviation

The standard deviation $\sigma = \sqrt{\text{Var}(X)}$ is the square root of variance, returning units to the same scale as $X$ . For the game: $\sigma = \sqrt{27.25} \approx 5.22$ .

The standard deviation is more interpretable than variance: it tells you roughly how far a typical outcome strays from the mean.

Properties of variance

$\text{Var}(aX + b) = a^2\,\text{Var}(X)$

Adding a constant shifts the distribution but does not change its spread. Multiplying by $a$ scales the spread by $|a|$ — and because variance is in squared units, it scales by $a^2$ .

For independent $X$ and $Y$ :

$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$

Note: this only holds when $X$ and $Y$ are independent. In general, $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\,\text{Cov}(X, Y)$ .

Higher moments

Expectation is the first moment and variance is related to the second. Two higher moments are worth knowing:

Skewness (third moment): measures asymmetry. Positive skew means a longer right tail; negative skew means a longer left tail.
Kurtosis (fourth moment): measures tail heaviness. High kurtosis means extreme values are more common than a normal distribution would predict.