Confidence Intervals: How Sure Is Your Mean?

The mean of a mean wobbles too

You weighed the coin five times and got a mean of 4.010 g. But here's an unsettling thought: if a colleague weighed it five times too, their mean would be a little different — maybe 4.008, maybe 4.013. The single number 4.010 is itself just one draw from a slightly wobbly process. The honest question is not 'what is my mean?' but 'how far might my mean be from the true value?'

The good news: means wobble much less than individual readings. Random highs and lows cancel inside each average, so a mean of five is steadier than any single weighing. The number that captures this steadiness is the standard error of the mean.

Standard error: spread of the mean itself

The standard error of the mean is the standard deviation divided by the square root of the number of readings: s ÷ √n. For the coin: 0.0158 ÷ √5 = 0.0158 ÷ 2.236 ≈ 0.0071 g. Notice it's smaller than s itself — averaging shrinks the wobble.

The √n is the key insight — and a warning. To halve your standard error you need four times as many measurements; to cut it by ten you need a hundred times as many. Precision bought by sheer repetition gets expensive fast, which is why chemists also work hard to make each individual reading better.

Wrapping a 'plus or minus' around the mean

A confidence interval turns the standard error into a stated range: 'mean ± (a multiplier) × standard error'. A 95% confidence interval is built so that, if you repeated the whole experiment many times, about 95% of such intervals would contain the true value. It is the formal way to say measurement uncertainty out loud — see measurement uncertainty.

Why the multiplier isn't just 1.96

If you knew the true spread of the population, the 95% multiplier from the Gaussian distribution would be 1.96. But you don't — you estimated s from a tiny handful of readings, so s itself is uncertain. To stay honest, we use a slightly larger multiplier, called a t-value, that depends on how many readings you had.

That count is captured by degrees of freedom, which for one set of repeats is n − 1 (the same n − 1 from the standard-deviation formula). With few readings, the t-value is well above 1.96 — for five readings (4 degrees of freedom) the 95% t-value is about 2.78. As you gather more data, t shrinks back toward 1.96. Small samples are punished with wider intervals, exactly as fairness demands.

Working the coin through

Mean = 4.010 g, s = 0.0158 g, n = 5, so degrees of freedom = 4.
Standard error = s ÷ √n = 0.0158 ÷ 2.236 = 0.0071 g.
Look up the 95% t-value for 4 degrees of freedom: 2.78.
Half-width = 2.78 × 0.0071 = 0.020 g, so report 4.010 ± 0.020 g (95% confidence).

Now your result speaks honestly: not a bare '4.010', but '4.010 ± 0.020 g, and here's the confidence behind the ±'. That single habit — always attaching a confidence interval — separates a number you can trust from a number you merely wrote down.