A small toolkit for a big world
By now you know what a random variable is, that its distribution is the whole story, and how to squeeze that story into an expectation and a variance. But a distribution can have any shape at all. Does an actuary invent a fresh one for every risk? No. In practice a handful of named distributions show up over and over, because they arise from the same simple stories that real risks keep telling. Learn that handful and you can recognise most of what you will ever meet.
The neatest way to organise them is the great split you have already met: *counting* versus *measuring*. Some questions ask how many — how many claims will this policy file, how many storms will hit this year? Those are answered by discrete distributions over the whole numbers 0, 1, 2, … Other questions ask how big — given that a claim happens, how many dollars is it? Those are answered by continuous distributions over the positive numbers. Almost every distribution in this guide falls cleanly into one camp or the other.
Counting things: Bernoulli, binomial, Poisson
Start with the simplest possible random variable. A single yes/no trial — a claim happens or it doesn't, a coin lands heads or tails — is a Bernoulli experiment. The Bernoulli distribution is just one number p, the probability of "yes," with 1 − p for "no." Its mean is p and its variance is p(1 − p). Tiny as it is, it is the atom from which the counting distributions are built.
Now run the same yes/no trial n independent times and count the yeses. That count follows the binomial distribution: out of n policies each with claim-probability p, how many will claim? Its mean is np — wonderfully intuitive: 100 policies each with a 3% chance gives, on average, 3 claims. The binomial assumes a *fixed* number of trials, each independent, each with the *same* p. When those assumptions hold it is exactly right; when they don't — say claims cluster because one flood soaks a whole neighbourhood — it quietly understates the risk.
Often, though, there is no natural "number of trials." How many claims will arrive at a busy insurer next month? Events trickle in continuously from a huge pool of policies, each individually unlikely to claim. In that limit the binomial slides into the Poisson distribution, the workhorse for claim frequency. The Poisson has a single parameter λ that is *both* its mean and its variance — so it carries a built-in assumption: that the variance equals the mean. Real claim counts are often more erratic than that, a clue we will return to. For now, picture λ as the average number of events per period: if a portfolio averages 8 claims a week, a Poisson with λ = 8 describes the wobble around that average.
Measuring sizes: normal, exponential, gamma, lognormal
Once a claim happens, how big is it? Now we are measuring dollars, so we switch to continuous distributions. The most famous is the normal distribution — the symmetric bell curve, set by a mean and a standard deviation. It earns its fame honestly: when many small independent effects add up, their total tends to look normal (you will see exactly why in the next guide). But a single claim amount is rarely normal. Claim sizes can't go negative, they pile up near small values, and they trail off with a long right arm — the bell's tidy symmetry is the wrong picture for a loss.
So for claim *sizes* actuaries reach for right-skewed, positive distributions. The exponential distribution is the gentlest: it describes a quantity with no memory — the chance the loss exceeds another $100 is the same no matter how big it already is. It is simple and a fine first sketch, but its tail decays fast. The gamma distribution generalises it with an extra shape parameter, so you can bend the curve to fit data that humps up before falling away. The lognormal distribution tells a multiplicative story: if a loss is the product of many random factors (think compounding percentage effects), its logarithm is normal, and the loss itself is lognormal — skewed, always positive, with a heavier tail than the gamma.
Frequency times severity
Here is where the two camps join hands, and it is the single most useful idea in the guide. To model the total cost of claims from a book of business, actuaries split the problem in two — the frequency–severity decomposition. Frequency is *how many* claims occur, modelled by a counting distribution. Severity is *how big* each one is, modelled by a sizing distribution. Total cost is, roughly, frequency multiplied by average severity.
A concrete sketch makes it click. Say a portfolio's claim count is Poisson with mean 200 claims a year, and each claim's size is lognormal with an average of $5,000. Then the *expected* total cost is simply 200 × $5,000 = $1,000,000. That expected figure is the heart of the pure premium — the part of everyone's premium that pays for losses, before expenses and profit are added on top. The decomposition is powerful because the two halves are estimated separately: a winter cold snap might raise *frequency* without changing *severity*, while medical inflation pushes *severity* up while *frequency* holds steady. Splitting them lets you see, and price, each force on its own.
expected count lambda = 200 claims / year (Poisson) expected severity mean = $5,000 / claim (lognormal) expected total cost = 200 x $5,000 = $1,000,000 pure premium per policy (1,000 policies) = $1,000,000 / 1,000 = $1,000
A first warning about heavy tails
There is one more sizing distribution every actuary must respect, because it behaves unlike the rest: the Pareto distribution. Its tail does not fade away exponentially — it dwindles only as a power of the loss size, so genuinely enormous claims stay stubbornly possible. This is the textbook example of a heavy tail, and lines like liability, earthquake, and pandemic insurance live in this world. Under a Pareto, the single largest claim in a year can dwarf the sum of all the others — a pattern the exponential and lognormal simply never produce.
Why does this matter so much? Because heavy tails quietly break the comfortable intuitions you have been building. The average can be dominated by a single freak loss, so it stabilises slowly and is a treacherous guide. In the most extreme cases the variance — even the mean — can be mathematically *infinite*, meaning no finite premium is high enough on a per-policy basis. A heavy-tailed model is also fragile to fit: a handful of giant historical claims can swing the estimate wildly. Choosing the wrong tail is one of the classic ways an insurer goes broke despite looking perfectly profitable in the calm years.
Keep this honest caveat close as you go on. A distribution is a chosen model, and the tail is the part we have the least data on and the most to lose from. The bell curve and the exponential lull you into thinking extremes are negligible; the Pareto reminds you they are not. The two theorems in the next guide — the law of large numbers and the central limit theorem — are what make insurance possible at all, but they lean on assumptions that heavy tails can quietly violate. Knowing *which* distribution you are standing on, and how trustworthy its tail is, is half of being an actuary.