Greatest-Accuracy (Bühlmann) Credibility

From a rule of thumb to a principle

In the previous guide you met limited-fluctuation credibility: you decide on a standard for full credibility — say 1,082 claims — and below that you blend your own experience with the manual rate using a weight Z. It works, and for a century it was how the job got done. But be honest about what it is: a recipe. It answers "how much data do I need before random noise is small enough?" It never asks the deeper question — *how different are the risks I am trying to tell apart in the first place?* If every driver in my book were secretly identical, no amount of their own data would tell me anything new about them; I should lean entirely on the average. If drivers differed wildly, even a little of their own data would be precious.

Greatest-accuracy credibility — usually called Bühlmann credibility, after Hans Bühlmann's 1967 paper — takes that deeper question seriously. Instead of starting from a noise tolerance, it starts from a goal: find the weighted blend of "your own data" and "the overall average" that, on average, lands closest to the truth — that minimises the expected squared error. It turns out the best possible *linear* answer has a stunningly clean form, and every quantity in it is something you can actually estimate from data. No magic number 1,082 imported from a table; the weight emerges from the structure of the risks themselves.

Two kinds of variance hiding in your data

The whole engine rests on a single act of seeing: when you look at a pile of numbers from many different risks, the spread you observe is made of *two* completely different things mixed together. Picture a hundred restaurants, each filing fire-claim costs for several years. The numbers jump around for two reasons. First, even one fixed restaurant has good years and bad years — pure luck, the roll of the dice. Second, the restaurants are genuinely not the same: a fryer-heavy diner is riskier than a salad bar, year in and year out. Bühlmann's insight is to name and measure these two separately, because the *ratio* between them is precisely what should govern how much you trust one restaurant's own record.

The first piece is the expected process variance, the EPV. Fix a single risk — one restaurant, with its own true underlying claim rate — and ask how much its yearly numbers bounce around *just from luck*. That within-risk wobble is the process variance. Different restaurants have different amounts of it, so we take the expectation across all of them: the EPV is the average within-risk noise. High EPV means even a known, fixed risk produces wildly swinging results — its own data is mostly static, hard to read.

The second piece is the variance of the hypothetical means, the VHM. Imagine you could magically know each restaurant's *true* long-run average cost — its hypothetical mean, the number it would settle on after infinitely many years. Those true means are not all equal; the fryer diner's is genuinely higher than the salad bar's. The VHM is the variance of those true means across the population. High VHM means the risks are really, deeply different from one another — so knowing *which* one you are looking at matters enormously, and its own data is worth listening to.

The formula, and why it reads like a sentence

Now the payoff. Define a single number, the Bühlmann k, as the ratio of the two variances: k = EPV / VHM. Then for a risk you have watched for n periods, the credibility factor Z is Z = n / (n + k). Your final estimate — the credibility premium — is the credibility-weighted blend Z times your own observed mean, plus (1 − Z) times the grand mean. That is the entire method. Three lines, and they carry the whole weight of a century of practice.

k = EPV / VHM
Z = n / (n + k)
estimate = Z * (your own mean) + (1 - Z) * (grand mean)

The complete Bühlmann recipe. k is the noise-to-signal ratio; Z grows toward 1 as you gather more years n.

Read Z = n / (n + k) slowly and it almost speaks. The k acts like a number of "phantom" years of average experience that you always carry on your back. If k = 4, then standing next to your n real years of data are 4 invisible years that quietly insist on the grand mean. With n = 4 of your own years you get Z = 4 / 8 = 0.5 — a perfect tie, half your data, half the average. Pile up more years and your real evidence outvotes the phantom: n = 36 gives Z = 36 / 40 = 0.9. As n grows without bound, Z climbs toward 1 and the average fades away — exactly as it should, because with enough of your own data you no longer need to borrow strength from anyone else.

Why homogeneous risks earn more credibility

Here is the part most worth carrying home. Because k = EPV / VHM, the credibility you assign is governed by a tug-of-war between noise and signal. EPV is the noise inside one risk; VHM is the signal of true differences between risks. When VHM is large — the risks are heterogeneous, genuinely far apart — k is small, so Z is large: your own data wins, because telling the risks apart really matters and your record really does pin down which one you are. When EPV is large — each risk's numbers are wild and luck-driven — k is large, Z is small: your own data is mostly noise, so you fall back on the broad average.

Now the headline claim — *more-homogeneous risks earn more credibility* — needs a careful word, because "homogeneous" can mean two opposite things and only one of them is right here. It does NOT mean a pool where every member is identical; in that pool VHM is zero, so k is infinite and Z collapses to zero — you should trust nobody's individual data, because there is nothing individual to learn. What earns high credibility is a risk that is *internally* stable and steady — low EPV, its own numbers calm and repeatable — sitting in a population where risks *differ from each other* — high VHM. Steady-within, varied-between: that is the sweet spot where one risk's own quiet, consistent record is gold.

A small worked example, and honest limits

Let us put numbers to it. Suppose a study of many factory accounts tells you the grand mean claim cost is 500 per year, the EPV (average within-account luck) is 90,000, and the VHM (true spread between accounts) is 30,000. Then k = 90,000 / 30,000 = 3. One factory you have insured for n = 6 years has averaged 800 per year — well above the book. Its credibility is Z = 6 / (6 + 3) = 2/3. Its credibility premium is (2/3)(800) + (1/3)(500) = 533 + 167 = 700. You believe most, but not all, of its bad record — you nudge it two-thirds of the way from the average toward its own experience.

Two honest cautions. First, in real life nobody hands you EPV and VHM — you must *estimate* them from the data, and that is its own craft (the unbiased estimators, and the empirical-Bayes machinery that extends Bühlmann to risks of unequal size, are the subject of the next guides). Bad variance estimates can even come out negative, which is nonsense and must be floored at zero; treat your k with the same humility you give any estimate. Second, Bühlmann is the best *linear* rule, not the best rule full stop. If you are willing to assume a complete probability model — a prior over the risks and a likelihood for the data — then full Bayesian credibility gives the genuinely optimal posterior estimate, and Bühlmann is its straight-line shadow.

Even with those caveats, this is the idea that makes credibility *theory* rather than custom. Limited fluctuation answered "is my data big enough?" with a yes-or-no threshold. Bühlmann answers the better question — "given how noisy each risk is and how different the risks truly are, what mix is provably closest to the truth?" — and its answer is one short, self-checking formula that an actuary can defend in front of a regulator. That is why, when people say credibility is pure actuarial DNA found nowhere else, this is the piece of code they have in mind.