Introduction to Probability (Part 3)

Expectation and variance explained clearly. Learn how probability distributions are summarised, how averages differ from spread, and why these concepts matter in statistics, ML, and probabilistic modelling.

Updated Feb 2026
4 min

Expectation and Variance

So far, probability distributions have helped us describe uncertainty in detail. They tell us how likely different outcomes are, and how probability is spread across possible values. In practice, however, we often want something simpler.

Instead of using the full distribution, we often use a few summary quantities. Expectation and variance are the most common.

Expectation

The expectation of a random variable is its probability-weighted average value. It tells us the long-run average outcome if we repeat the random process many times.

For a discrete random variable XX with distribution P(X=x)P(X = x), the expectation is defined as

E[X]=xx,P(x).\mathbb{E}[X] = \sum_x x , P(x).

Each value of XX contributes to the expectation according to its probability. Values with higher probability affect the average more.

For a continuous random variable with density p(x)p(x), we use an integral instead of a sum.

E[X]=x,p(x),dx.\mathbb{E}[X] = \int x , p(x), dx.

In both cases, expectation is the long-run average value from repeated trials. It is not always the most likely value, but it is the average over many repetitions.

Expectation of functions

We can also take the expectation of a function of a random variable. If gg is a function and XX is a random variable, then g(X)g(X) has its own expectation.

For example, we can consider E[X2]\mathbb{E}[X^2] or E[exp(X)]\mathbb{E}[\exp(X)]. This idea will be important later, especially when working with variance.

Linearity of expectation

One of the most useful properties of expectation is linearity. For any random variables XX and YY, and constants aa and bb,

E[aX+bY]=a,E[X]+b,E[Y].\mathbb{E}[aX + bY] = a,\mathbb{E}[X] + b,\mathbb{E}[Y].

This holds even if XX and YY are not independent. Linearity lets us break problems into smaller parts.

It is important to note that expectation behaves very differently from probabilities. While probabilities of joint events depend on the dependence structure, expectations of sums do not.

Variance

While expectation tells us the average value of a random variable, it does not tell us how variable the outcomes are. Two distributions can have the same expectation, but behave very differently. The variance of a random variable measures how far values typically deviate from the expectation.

Formally, the variance of XX is defined as

Var[X]=E[(XE[X])2].\mathrm{Var}[X] = \mathbb{E}\big[(X - \mathbb{E}[X])^2\big].

This definition can be read as determining how far XX is from its mean, squaring that deviation, and then averaging the squared deviations. Squaring ensures that positive and negative deviations do not cancel each other out, and that larger deviations count more heavily.

Another common formula for variance is

Var[X]=E[X2](E[X])2.\mathrm{Var}[X] = \mathbb{E}[X^2] - \big(\mathbb{E}[X]\big)^2.

Both definitions describe the same quantity.

Standard deviation

Variance uses squared units, which are often hard to interpret. We usually use the standard deviation instead, which is written as

σX=Var[X].\sigma_X = \sqrt{\mathrm{Var}[X]}.

The standard deviation has the same units as the random variable itself and can be interpreted as a typical scale of deviation from the mean.

Variance and independence

Variance does not behave as simply as expectation when combining variables. In general,

Var[X+Y]Var[X]+Var[Y].\mathrm{Var}[X + Y] \neq \mathrm{Var}[X] + \mathrm{Var}[Y].

However, if XX and YY are independent, then

Var[X+Y]=Var[X]+Var[Y].\mathrm{Var}[X + Y] = \mathrm{Var}[X] + \mathrm{Var}[Y].

This distinction is important in modelling and inference. Independence affects how uncertainty adds up.

Why expectation and variance matter

Expectation and variance provide a compact summary of a distribution. Expectation captures where probability is centred, while variance captures how spread out it is.

These quantities help us compare distributions, reason about uncertainty, and make predictions without using the full probability model.