Limited Fluctuation (Classical) Credibility

An old idea with a humble goal

The previous guide left you with one clean line — the credibility-weighted estimate, your own data times Z plus the benchmark times (1 minus Z) — and one nagging question: where does the weight Z actually come from? This guide gives the first historical answer. Born in workers'-compensation ratemaking around 1914, limited fluctuation credibility (also called *classical* credibility) was the profession's original attempt to pin Z down with a rule rather than a hunch.

Its goal is deliberately modest, and that modesty is the key to understanding everything that follows. It does not try to find the *most accurate* estimate. It asks only one thing of your own data: do not let the estimate jump around too much from period to period. The name says it outright — we want to *limit the fluctuation* of the number we quote. The reasoning is human and easy to defend to a regulator: a premium that lurches up and down every year is unfair, hard to explain, and probably reflects luck more than signal. So the rule is, roughly, 'trust your own experience as much as you can without letting random noise throw the answer around'.

Full credibility: how much data earns total trust

Start at the easy end. Suppose your own data are so plentiful that the noise has all but vanished — the estimate barely wobbles. Then you may as well set Z = 1, throw away the benchmark, and quote your own number. The amount of data at which you are willing to do this is the standard for full credibility. It is the bar you must clear; clear it, and your data are 'fully credible'.

Where does the bar come from? From two choices you make up front, plus the central limit theorem you already met. First, a *tolerance*: how close do you want the observed experience to land to its true mean — say within plus or minus 5 percent? Second, a *probability*: how often do you want to be that close — say 90 percent of the time? The CLT tells you that an average of many independent observations is roughly normal, so it converts those two choices into a required sample size. For a claim count that follows a Poisson pattern, estimating the frequency this well needs about (1.645 / 0.05) squared, which works out to roughly 1,082 expected claims. That single number — around a thousand-odd claims — is the famous classical full-credibility standard for frequency.

Full-credibility standard for frequency (Poisson counts):

  N_full = ( z_p / k )^2
     z_p = normal score for the chosen probability p
     k   = chosen tolerance (relative)

  '90% within +/- 5%' :  z_p = 1.645,  k = 0.05
     N_full = (1.645 / 0.05)^2 = (32.9)^2  ~= 1,082 expected claims

  Demand more -> bar climbs fast:
  '99% within +/- 1%' :  N_full = (2.576 / 0.01)^2 ~= 66,400 claims

The standard is just two judgement calls run through the normal curve. Tighter tolerance or higher confidence inflates it sharply — note the jump from ~1,082 to ~66,400.

Two honest footnotes. First, that 1,082 is for frequency *only* — if the *size* of each claim also varies (it always does), the standard rises to cover that extra noise, often substantially. Second, and more importantly, nothing here is a law of nature. Change the tolerance from 5 percent to 2.5 percent, or the confidence from 90 to 95 percent, and the bar moves. The full-credibility standard is a chosen convention — sometimes blessed by a regulator — not a fact about the world. Two careful actuaries can defensibly pick different standards for the very same book.

Partial credibility and the square-root rule

Here is the catch that makes credibility a daily craft rather than a curiosity: almost no real book ever reaches a thousand-odd claims. A single auto territory, a mid-size employer's group plan, a new product in its third year — they all fall short. Does falling short mean your data are worthless and you must surrender entirely to the benchmark? Of course not. Below the bar you earn partial credibility: a fractional weight Z, strictly between 0 and 1, that grows as your data grow.

Classical credibility sets that fractional weight with the celebrated square-root rule: Z equals the square root of (the data you actually have, divided by the data you would need for full credibility), capped at 1. The square root is the whole point — and it surprises beginners. If you have only a *quarter* of the data you need, your Z is not a quarter; it is the square root of a quarter, which is *one half*. Weight grows along a curve, not a straight line, rising steeply at first and then leveling off as you approach full credibility.

Why a square root, and not something simpler? Because the random error of an average shrinks in proportion to the square root of the sample size — the same square-root law you met with the law of large numbers. To *halve* the noise, you need *four times* the data. So if you set Z by the square root of your data ratio, the leftover wobble in the partially-credible estimate exactly meets the same safety target that full credibility meets on the nose. The rule is not arbitrary fitting; it is the fluctuation goal, carried consistently down below the full-credibility bar.

A worked example, end to end

Picture a small region of an auto insurer. Over its history it has produced 270 claims; the full-credibility standard for this line is 1,082. Its own data say the loss cost per car is 850; the countrywide benchmark — the *complement of credibility* it will blend toward — is 1,000. We have everything we need to set Z and quote a rate. Walk through it.

Compare your data to the bar: you have 270 claims against a full-credibility standard of 1,082, a ratio of about 0.25 — a quarter of the way there.
Apply the square-root rule: Z = sqrt(270 / 1,082) = sqrt(0.25) which is about 0.50. A quarter of the data, but half the weight.
Blend with the credibility-weighted formula: rate = 0.50 x 850 + 0.50 x 1,000 = 925, landing neatly between your own cost and the benchmark.

Notice how undramatic the arithmetic is once Z is in hand — the whole credibility factor Z does its work in a single multiplication. Now watch the square-root curve bite. To push this region's Z from 0.50 up to a respectable 0.75, you do not need 50 percent more data; you need to *more than double* it, to roughly 608 claims, because 0.75 squared times 1,082 is about 608. And to reach full credibility you would need all 1,082 — quadrupling your current evidence to roughly double the weight you started with. Credibility is earned slowly, and ever more slowly the closer you get.

The appeal — and the honest blind spot

It is easy to see why classical credibility endured for a century. It is *transparent*: a regulator or a policyholder can follow every step. It is *cheap*: two judgement calls and a square root, no heavy estimation. And it is *defensible*: 'we trust your data once it is stable enough to be within 5 percent of the truth nine years in ten' is a sentence anyone can grasp. It is still baked into rating manuals and regulatory formulas around the world, and for many books it is simply good enough.

But now the honesty the profession owes itself. Read back over everything we did and notice what never once appeared: any measure of *how different this region truly is from the countrywide average*. Classical credibility looks only inward, at the noise in your own data, and treats the benchmark as if it were exactly the right thing for you to fall back on. It never asks whether the countrywide 1,000 is even close to this region's true cost. Z is built entirely from the *variance of your own experience* — never from the genuine spread between risks.

That single missing ingredient — a measure of real between-risk difference — is exactly what the next guide supplies. Bühlmann credibility weighs the noise *within* a risk against the genuine spread *between* risks, and from that balance derives a Z that aims not merely to limit fluctuation but to be as accurate as possible. Classical credibility is the honest, intuitive first draft; Bühlmann is the principled revision. You needed the first to truly appreciate the second.