How Sure Are We? Confidence & Standard Error

Two samples, two answers

In the last guide we used point estimation to squeeze a single best number out of data — say, an average claim of $4,200 from a year's worth of motor policies. That number feels solid. But here is the uncomfortable truth at the heart of statistics: if you had collected a *different* year's policies, you would have gotten a different average. Maybe $4,050. Maybe $4,380. The data you happened to see is just one draw from a much larger population, and another draw would tell a slightly different story.

This wobble is called sampling variation, and it is not a mistake or sloppy data collection — it is built into the very act of looking at a sample instead of the whole world. The estimate you compute is itself a random variable: feed it a fresh sample and it jumps around. So the real question is no longer just "what is the average claim?" but "how much would my answer wobble if I could repeat the whole exercise?" An actuary who reports only the point estimate, and stays silent about that wobble, has told half the story.

The standard error: measuring the wobble

Imagine you could draw a fresh sample, compute its average, write it down, and repeat thousands of times. You would get a whole cloud of averages scattered around the true value. That cloud has its own distribution — the sampling distribution of the estimate. The **standard error** is simply the standard deviation of that cloud: a single number that says how far a typical estimate strays from the true value. A small standard error means your estimate is tightly pinned down; a large one means it could easily have come out quite different.

We cannot actually repeat the year thousands of times — but we do not have to. For a sample mean, the standard error is the data's own standard deviation divided by the square root of the sample size: SE = s ÷ √n. That square root is the quiet engine of the whole field. It says that to halve your uncertainty you must *quadruple* your data; precision is expensive, and it gets more expensive the more of it you want. With 100 motor policies whose claim sizes scatter with a standard deviation of $3,000, the standard error of the mean is 3000 ÷ √100 = $300.

sample size n = 100      SE = 3000 / sqrt(100)  = 300
sample size n = 400      SE = 3000 / sqrt(400)  = 150
sample size n = 1600     SE = 3000 / sqrt(1600) =  75

Quadrupling the data only halves the standard error — the √n law. Precision does not come cheap.

Why the wobble is bell-shaped

Here is the small miracle that makes the standard error usable. Claim sizes are wildly un-bell-shaped — most are small, a few are enormous, the histogram is lopsided. Yet the central limit theorem tells us that the *average* of enough independent claims behaves like a smooth normal distribution, almost regardless of how ugly the individual claims look. The cloud of possible sample means is, to good approximation, a tidy bell centred on the truth, with a width equal to the standard error.

That bell shape is what lets us turn a standard error into a statement about chances. For a normal curve, about 68% of the cloud sits within one standard error of the centre, and about 95% within roughly two. So if our estimate is $4,200 with a standard error of $300, we can say a sample mean would land within about $600 of the truth roughly 95% of the time. The standard error sets the scale; the normal shape lets us read percentages off it.

Be honest about the limits, though. The central limit theorem needs *enough* data and reasonably independent observations, and it leans on the average, not on rare extremes. For the heavy-tailed losses that haunt catastrophe and liability work — where a single claim can dwarf a thousand others — the bell approximation for the mean can be slow to arrive and dangerous to trust. The standard-error machinery is a faithful guide for ordinary spread, and a poor guide for the far tail. We will return to that tail repeatedly; it is where confident models go to die.

The confidence interval: an honest range

Rather than reporting a lone point, we report a range that openly carries our uncertainty: a **confidence interval**. The recipe is short. Take the point estimate, then step out a fixed number of standard errors on each side. For the familiar 95% interval that multiplier is about 1.96 — call it 2 in your head. With our estimate of $4,200 and a standard error of $300, you step out 1.96 × 300 ≈ $588 either way: the 95% interval runs from about $3,612 to $4,788. In words, it is simply "the estimate, give or take roughly two standard errors."

A wider net catches the truth more often: a 99% interval uses about 2.58 standard errors and so is broader, a 90% interval uses about 1.64 and is tighter. Notice the trade-off built right in. You can be more confident, or more precise, but not both at once from the same data — the only way to win on both fronts is to collect more data and shrink the standard error itself. That tension between confidence and precision is something an actuary lives with on every estimate.

Reading it honestly — and the trap most people fall into

Now the part almost everyone gets wrong. It is tempting to say "there is a 95% probability the true mean lies between $3,612 and $4,788." That sentence is, strictly, false. The true mean is a fixed number; it is either inside this particular interval or it is not — there is no probability about it once the interval is drawn. The 95% describes the *procedure*, not this one interval: if you repeated the whole sampling-and-interval recipe over and over, about 95% of the intervals you build would capture the true mean. This one is either a hit or a miss; you just do not know which.

Two more honest cautions. First, the stated confidence covers only sampling variation — the luck of which policies you drew. It says nothing about a biased sample, a mismeasured field, or a model assumed wrongly. A beautifully narrow 95% interval built on the wrong data is precisely, confidently wrong. Second, the interval narrows with √n, so with enough data it can shrink to a razor while still sitting around an estimate that is systematically off. Width measures noise, not truth; never let a tight interval lull you into thinking the underlying number must be right.

This is also the quiet bridge to hypothesis testing, the next guide. Asking "could the true mean plausibly be $4,500?" is the same as asking "does $4,500 fall inside my confidence interval?" The interval and the test are two views of one idea — a range of values the data cannot rule out. For now, hold onto the habit that defines the profession: an actuary never quotes a point without quoting its uncertainty. A reserve of "$50 million" means little; "$50 million, with a 95% interval of $44m to $58m" tells a board what it actually needs to know to decide.