From outcomes to numbers
In the last guide we built the stage: a sample space of every possible outcome, with probability spread across it. But outcomes are often clumsy things — "the policyholder files a fire claim and the kitchen burns but the garage survives." To price risk, we need to *measure* outcomes, not just list them. A random variable is the tool that does this: it is simply a rule that attaches a number to every outcome.
Imagine one motor policy over a year. The outcome could be "no accident," "one fender-bender," "a serious crash," and so on. Define X to be *the number of claims this policy files*. Now "no accident" maps to X = 0, "one fender-bender" to X = 1, and so on. We have not changed the world — we have only chosen a lens that reports a number. The capital letter X is the variable; a particular value it lands on (like 2) is written with a small letter, x.
Two flavours: counting and measuring
Random variables come in two great families, and the distinction between discrete and continuous runs through everything an actuary does. A discrete random variable takes values you can list one by one — usually whole numbers. The *count* of claims on a policy is discrete: 0, 1, 2, 3 claims, never 2.4. A continuous random variable can land anywhere in a range, on a smooth dial with no gaps. The *size* of a claim in dollars is continuous: it could be $1,203.77 or $1,203.78 or anything between.
This split is not academic — it decides which mathematics you reach for. Discrete variables are summed; continuous ones are integrated (the smooth cousin of summing). Insurance pricing lives at exactly this seam: how *often* losses happen is usually counted with a discrete model like the Poisson distribution, while how *big* each loss is gets measured with a continuous one like the exponential or normal distribution. You will meet this frequency-and-severity pairing again and again.
Where the probability sits: mass and density
Knowing X is discrete or continuous is not enough; we must say *how much probability* sits on each value. For a discrete X this is easy and literal. The probability mass function, written p(x), gives the actual chance of each value — and because *something* must happen, all the masses add up to exactly 1. For our motor policy we might have p(0) = 0.90, p(1) = 0.08, p(2) = 0.018, p(3) = 0.002. Each is a genuine probability you could bet on.
Continuous variables hide a beautiful subtlety. Ask "what is the probability the claim is *exactly* $1,203.770000…?" The answer is zero — there are infinitely many possible amounts, so any single one carries no weight. Probability lives in *ranges*, not points. So instead of mass we describe a continuous X with a probability density function f(x). Density is not a probability; it is probability *per unit of x*, like how thickly the chance is spread. To get an actual probability you take the area under the curve between two values. The whole area under f(x) equals 1.
The CDF: one function that works for everyone
Mass and density describe discrete and continuous variables in different dialects. The cumulative distribution function, written F(x), speaks one language for both. It answers a single running question: "what is the probability that X comes out at most x?" — that is, F(x) = P(X ≤ x). As you sweep x from far left to far right, F climbs from 0 up to 1, never going back down, accumulating all the probability it passes.
The shape of F quietly reveals the flavour of X. For a discrete variable F is a staircase: flat, then a sudden jump at each value, where the height of the jump *is* that value's mass. For our motor policy, F sits at 0.90 just below 1 claim, then leaps by 0.08 to 0.98 at x = 1. For a continuous variable F is a smooth ramp, with no jumps, because no single point carries weight. This is why actuaries love the CDF: it handles the awkward mix that real data shows — a fat spike of probability at "zero claim" sitting beside a smooth spread of positive claim sizes.
P(X = 0) = 0.90 F(0) = 0.90 P(X = 1) = 0.08 F(1) = 0.98 P(X = 2) = 0.018 F(2) = 0.998 P(X = 3) = 0.002 F(3) = 1.000 P(at least 1 claim) = 1 - F(0) = 0.10
A distribution is the whole story
Put the pieces together and you arrive at the central idea: the distribution of X is the complete description of the uncertain quantity. Give me the mass function, or the density, or the CDF — any one of them, since each can be recovered from the others — and I know everything that can be known about X *before* nature decides. There is no further hidden fact about the claim count waiting to be revealed; the distribution already encodes the full menu of possibilities and their weights.
This is why the next guide can talk about expectation — the long-run average value of X — as something we *compute from the distribution*. Once you hold the whole distribution, the average claim count, the spread, the chance of a bad year: all of it follows. A great deal of actuarial work is, at bottom, choosing a sensible distribution for a risk and then squeezing out the numbers premiums and reserves depend on.
Be honest about one thing, though: the distribution is a *model*, not the world. We never observe a policy's true distribution; we choose one — say a Poisson for counts — and estimate its settings from data, knowing the fit is imperfect. The model is a faithful map, not the territory. Whole later rungs are devoted to choosing distributions well, checking them against reality, and staying humble about the far tail, where rare catastrophes hide and where many a confident model has been wrong.