Expectation and Variance

So far, probability distributions have helped us describe uncertainty in detail. They tell us how likely different outcomes are, and how probability is spread across possible values. In practice, however, we often want something simpler.

Instead of using the full distribution, we often use a few summary quantities. Expectation and variance are the most common.

Expectation

The expectation of a random variable is its probability-weighted average value. It tells us the long-run average outcome if we repeat the random process many times.

For a discrete random variable $X$ with distribution $P(X = x)$ , the expectation is defined as

\mathbb{E}[X] = \sum_x x , P(x).

Each value of $X$ contributes to the expectation according to its probability. Values with higher probability affect the average more.

For a continuous random variable with density $p(x)$ , we use an integral instead of a sum.

\mathbb{E}[X] = \int x , p(x), dx.

In both cases, expectation is the long-run average value from repeated trials. It is not always the most likely value, but it is the average over many repetitions.

Expectation of functions

We can also take the expectation of a function of a random variable. If $g$ is a function and $X$ is a random variable, then $g(X)$ has its own expectation.

For example, we can consider $\mathbb{E}[X^2]$ or $\mathbb{E}[\exp(X)]$ . This idea will be important later, especially when working with variance.

Linearity of expectation

One of the most useful properties of expectation is linearity. For any random variables $X$ and $Y$ , and constants $a$ and $b$ ,

\mathbb{E}[aX + bY] = a,\mathbb{E}[X] + b,\mathbb{E}[Y].

This holds even if $X$ and $Y$ are not independent. Linearity lets us break problems into smaller parts.

It is important to note that expectation behaves very differently from probabilities. While probabilities of joint events depend on the dependence structure, expectations of sums do not.

Variance

While expectation tells us the average value of a random variable, it does not tell us how variable the outcomes are. Two distributions can have the same expectation, but behave very differently. The variance of a random variable measures how far values typically deviate from the expectation.

Formally, the variance of $X$ is defined as

\mathrm{Var}[X] = \mathbb{E}\big[(X - \mathbb{E}[X])^2\big].

This definition can be read as determining how far $X$ is from its mean, squaring that deviation, and then averaging the squared deviations. Squaring ensures that positive and negative deviations do not cancel each other out, and that larger deviations count more heavily.

Another common formula for variance is

\mathrm{Var}[X] = \mathbb{E}[X^2] - \big(\mathbb{E}[X]\big)^2.

Both definitions describe the same quantity.

Standard deviation

Variance uses squared units, which are often hard to interpret. We usually use the standard deviation instead, which is written as

\sigma_X = \sqrt{\mathrm{Var}[X]}.

The standard deviation has the same units as the random variable itself and can be interpreted as a typical scale of deviation from the mean.

Variance and independence

Variance does not behave as simply as expectation when combining variables. In general,

\mathrm{Var}[X + Y] \neq \mathrm{Var}[X] + \mathrm{Var}[Y].

However, if $X$ and $Y$ are independent, then

\mathrm{Var}[X + Y] = \mathrm{Var}[X] + \mathrm{Var}[Y].

This distinction is important in modelling and inference. Independence affects how uncertainty adds up.

Why expectation and variance matter

Expectation and variance provide a compact summary of a distribution. Expectation captures where probability is centred, while variance captures how spread out it is.

These quantities help us compare distributions, reason about uncertainty, and make predictions without using the full probability model.