Why frequency gets its own model
In the previous guide you met the master pattern of non-life insurance: the frequency–severity decomposition. Total claim cost splits cleanly into *how often* claims happen and *how big* each one is. This guide zooms all the way in on the first half — the frequency, the random count of claims a policy or a portfolio produces in a period. We will leave severity for later and treat the count, on its own, as a random variable taking the whole-number values 0, 1, 2, 3, …
Why bother modelling the count separately at all? Because the forces that drive *how often* are usually quite different from those that drive *how big*. A harsh winter, a new safety law, or a fraud crackdown changes the claim frequency without touching the size of any single claim; medical inflation does the reverse. Modelling frequency on its own — a claim-frequency distribution — lets you see, estimate, and stress-test each force in isolation, then recombine them into the total. Almost all of pricing, reserving, and risk theory is built on this discipline.
Three counting distributions
An actuary's whole counting toolkit really comes down to three named distributions, each telling a slightly different story. The binomial counts successes in a *fixed* number of independent trials — natural when there is a hard ceiling, like "each of my 40 trucks either crashes this year or doesn't." The count can never exceed the number of trials, and crucially its variance is *smaller* than its mean. That last fact will turn out to matter a great deal.
The Poisson is the workhorse — the Poisson frequency model is the default first choice for claim counts. Its story is *rare events from a large pool*: many policies, each individually unlikely to claim, events trickling in independently at a steady average rate λ. The Poisson's signature property is that its variance *equals* its mean — both are λ. So a portfolio averaging 200 claims a year, modelled as Poisson, is implicitly assumed to wobble around 200 with a variance of 200 (a standard deviation of about 14). It is clean, has just one parameter, and adds up beautifully: combine two independent Poisson books and the total is again Poisson, with rates summed.
The negative binomial is the one you reach for when the Poisson looks too tame. The negative binomial frequency model has *two* parameters, which buys it a variance that is always *larger* than its mean. There are two equivalent stories behind it: counting failures before a fixed number of successes, and — far more useful to an actuary — a Poisson whose own rate λ is itself random, varying from policy to policy. Hold that second story; it is the key to the most important idea in this guide.
One elegant rule: the (a,b,0) class
Here is the beautiful surprise that ties the three together. Those three distributions — binomial, Poisson, negative binomial — and *only* those three (plus the geometric, a special negative binomial) obey one simple recursive rule. The probability of seeing exactly k claims relates to the probability of seeing k − 1 by a ratio that is a *straight line* in 1/k. Write that ratio as a + b/k, and you have defined the entire (a,b,0) class. The "0" marks that the recursion starts from the probability at zero claims.
(a,b,0) class: P(k) / P(k-1) = a + b/k, for k = 1, 2, 3, ... Poisson(lambda): a = 0, b = lambda Binomial(n, q): a = -q/(1-q), b = (n+1)*q/(1-q) (a < 0) Negative binomial(r, beta): a = beta/(1+beta), b = (r-1)*beta/(1+beta) (a > 0) sign of a tells the whole story: a<0 binomial, a=0 Poisson, a>0 neg. binomial
This is not just tidy bookkeeping — it is a genuine working tool. Because every (a,b,0) distribution shares the same recursive skeleton, a single algorithm can generate the whole probability table for any of them: start with P(0), then march upward one step at a time. That same recursion is the engine behind Panjer's recursion, which (in a later guide) lets you compute the distribution of *total* claim cost exactly, without simulation. Learning the (a,b,0) family is therefore an investment that pays off twice.
Bending the zero: the (a,b,1) class
Real claim data has an awkward habit the (a,b,0) class can't quite match: the number of policies with *zero* claims is often very different from what these distributions predict. Most policies never claim at all, so the spike at zero can be much taller — or, in data collected only *after* a claim, much shorter — than the recursion wants. The fix is the (a,b,1) class: keep the exact same a + b/k recursion for k = 2, 3, 4, …, but *cut it loose at zero* and set the probability of zero claims freely. The "1" signals that the recursion now starts from k = 1 instead of k = 0.
This little freedom unlocks two genuinely useful shapes. A zero-truncated distribution forces the probability of zero to be exactly nil — perfect for data where you only ever observe policies that *did* claim (you can't see a claim of size zero). A zero-modified distribution, the broader zero-modified case, lets you dial the zero probability up or down to whatever the data shows, then rescales the rest to still sum to one. With it you can model, say, a motor book where 92% of drivers never claim but the ones who do follow a negative-binomial pattern.
Over-dispersion: when claims cluster
Now the most important honest warning of the guide. The Poisson's tidy variance-equals-mean assumption is a *modelling choice*, not a law of nature — and real claim counts routinely violate it. Far more often than not, the observed variance is bigger than the mean. This is over-dispersion, and it is the rule, not the exception, in non-life data. When you see it, the Poisson is lying to you: it will tell you the count is calmer and more predictable than it truly is.
Why does over-dispersion happen? Two everyday reasons. First, heterogeneity: policyholders are not identical — a careful driver and a reckless one are both in the book, so the true rate λ differs from policy to policy. If you blend many Poissons with different rates, the mixture is over-dispersed, and — recall the second story from earlier — it comes out as a *negative binomial*. Second, contagion or clustering: one hailstorm files a thousand claims at once, so claims are not independent the way Poisson demands. Either way, the spread of the count balloons past its mean.
The consequence is concrete and costly. Pricing and especially capital depend on the *spread* of outcomes, not just the average. If the real variance is double what your Poisson assumes, the probability of a very bad year is far higher than the model admits — and the capital you hold against that bad year is too thin. This is exactly why the negative binomial is the practitioner's frequent default: it earns its extra parameter by letting the variance speak honestly. The discipline to remember is simple — always check the variance against the mean before you trust a Poisson. A frequency model is a chosen description of the world, never the world itself; over-dispersion is the world reminding you of the difference.