Expectation, Variance & Moments

From a whole distribution to one number

In the last guide a random variable gave us a whole distribution — the complete list of what can happen and how likely each outcome is. That picture is honest but unwieldy. A pricing committee cannot reason with a hundred-bar histogram; they want a number. The art of this guide is squeezing a distribution down to a few honest summaries without quietly throwing away the truth that matters.

The first and most famous summary is the expected value, written E[X]. Forget the formula for a moment and hold the picture: if you could replay the random experiment forever, the expected value is the long-run average of the outcomes. It is the balance point of the distribution — the spot where the histogram would sit level on a fingertip. For an actuary this is not a curiosity; it is the seed of every premium.

Expectation: the long-run average

Computing expectation is gentler than it sounds: take each possible value, weight it by how likely it is, and add the pieces up. The likely outcomes pull the average toward themselves; rare outcomes barely tug. Roll one fair die. Each face 1 through 6 has probability one-sixth, so the expected value is (1+2+3+4+5+6)/6 = 3.5. Notice the punchline: 3.5 is a number the die can never actually show. The expectation is a long-run average, not a prediction of any single roll.

Now bend the die into an insurance shape. Imagine a one-year policy where, in a year, there is a 5% chance of a claim and the claim is always exactly 2,000; otherwise the loss is zero. The expected loss is 0.05 × 2,000 + 0.95 × 0 = 100. That 100 is the pure premium — the expected cost of the risk before any expenses or profit margin. Almost everything an actuary prices begins as an expected value of some carefully chosen random variable.

Variance: the spread the actuary actually fears

The expected value tells you where the distribution sits, but says nothing about how wildly it swings. Two risks can share an expected loss of 100 yet feel utterly different: a book of small dental claims that lands near 100 every year, and a single satellite policy that pays nothing most years but occasionally explodes into millions. The average is identical; the danger is not. Capturing that danger is the job of the variance.

The variance measures spread by asking: on average, how far does an outcome land from the expected value? We look at each gap from the mean, square it (so that a shortfall and an overshoot both count as distance rather than cancelling, and so big gaps are punished harder than small ones), and then average those squared gaps. Because we squared, the units are squared too — claims-squared is meaningless to a human — so we take the square root and call it the standard deviation. That brings us back to honest dollars: a typical distance from the average.

Back to the 5%-of-2,000 policy. Its expected loss is 100, but its standard deviation works out near 436 — more than four times the average. That single number screams the truth a price tag of 100 hides: in any given year you almost certainly pay 0 or 2,000, and almost never anything near 100. The whole reason insurance is hard, and the whole reason actuaries hold capital, is that real risks have large spread around a modest mean.

Higher moments: skewness and the tail

Expectation and variance are the first two of a whole family. The pattern that produces them — take a power of the deviation from the mean and average it — keeps going. These are the moments of a distribution. The first moment locates it (the mean), the second describes its spread (the variance), and the third and fourth start to describe its shape.

The third moment, scaled, gives skewness — whether the distribution leans. Insurance losses are almost always right-skewed: a long thin tail of rare, enormous claims stretching to the right of a hump of ordinary ones. The fourth gives kurtosis, a measure of how heavy that tail is — how much probability hides far from the centre. For an actuary these are not academic. Skewness and a fat tail mean the genuine danger lives precisely where the mean and even the variance look reassuringly calm.

This is the honest warning to carry forward. A summary number is a lossy compression of reality. Quote only the mean and you have hidden the spread; quote mean and variance and you may still have hidden a monstrous tail. A famous catastrophe portfolio can post a friendly mean and a tame variance for nineteen quiet years, and bankrupt its insurer in the twentieth. Moments are tools, not truth — and the higher you climb, the more they whisper rather than shout.

The moment generating function: one function, all the moments

If the moments matter, it would be handy to have one object that carries them all at once. That object is the moment generating function, or MGF, written M(t) = E[e^(tX)]. Do not be intimidated by the exponent. The MGF is best thought of as a fingerprint: a single function of a helper variable t that uniquely identifies the whole distribution, and from which every moment can be extracted by a mechanical recipe.

Two of its powers make the MGF genuinely useful rather than a party trick. First, because it is a fingerprint, if two random variables share the same MGF they have the same distribution — a clean way to prove what something is. Second, and the reason actuaries love it: the MGF of a sum of independent risks is just the product of their MGFs. Adding up a thousand independent policies, which would be a nightmare with histograms, becomes a multiplication. This is the engine behind several of the named distributions you will meet next, and behind the central limit theorem at the end of this rung.

Policy X:  P(loss=0)=0.95,  P(loss=2000)=0.05
E[X]      = 0.05*2000              = 100        (pure premium)
Var[X]    = 0.05*(2000-100)^2 + 0.95*(0-100)^2 = 190000
SD[X]     = sqrt(190000)           ~ 436        (>4x the mean!)
MGF: M(t) = 0.95 + 0.05*e^(2000t)

The tiny claim example in numbers: a modest mean of 100 hides a standard deviation over four times larger — exactly the spread that makes the risk worth insuring.

Putting it to work

These three summaries are not a hierarchy of importance — they answer different questions, and a good actuary keeps all of them on the table at once. Here is the order of thinking they tend to fall into in practice.

Find the expected value — the pure premium, the long-run average cost. This answers: what does this risk cost on average?
Find the variance and standard deviation — the spread. This answers: how badly could a single year, or this whole portfolio, deviate from that average? This drives how much capital must stand behind the promise.
Check the shape — skewness and the tail. This answers: is the real danger hidden far out where the average can't see it? If so, the mean and variance alone are dangerously comforting.

With expectation, variance, and moments in hand, you can finally describe any uncertain quantity with a few trustworthy numbers — and, just as important, know what those numbers leave out. Next you will meet the handful of named distributions whose moments and MGFs are already worked out for you, the off-the-shelf shapes actuaries reach for to model how often claims arrive and how large they grow.