Modern Pricing: GLMs & the Rating Plan

Why one-way tables quietly double-count

In the last guide you built a class plan the old-fashioned way: take each rating variable in turn, line up the loss experience by its levels, and read off a relativity — young drivers cost 1.4 times the base, sports cars cost 1.3 times the base, and so on. This is called a one-way analysis, because you look down one variable at a time. It is intuitive and it was the whole craft for decades. It also has a flaw that hides in plain sight.

The flaw is that rating variables overlap. Suppose young drivers really do crash more — but young drivers also disproportionately drive fast cars. When you build the age table, the high losses you blame on youth are partly caused by the cars they happen to own; when you build the car-type table, the high losses you blame on sports cars are partly caused by the young people who happen to own them. Each one-way table soaks up some of the other's effect. Charge a young driver 1.4 for age and 1.3 for car and you have counted the same underlying badness twice — the customer is overcharged, and a savvy competitor will pick them off.

This is just correlation between rating variables, the same gremlin you met back in statistics, now wearing a pricing costume. A one-way table cannot see it, because it never looks at two variables together. What you want is the effect of age holding car type fixed, and the effect of car type holding age fixed — every variable's true contribution after the others have had their say.

The GLM untangles everything at once

You already met the tool that fixes this. Back in the statistics rung you saw the generalized linear model: regression's weighted sum of drivers, fitted with a distribution that matches insurance data and a log link that turns each factor into a multiplier. Here is its quiet superpower for pricing. A GLM estimates every variable's relativity simultaneously, so each coefficient is automatically the effect of that variable after the model has already accounted for all the others. The double-counting dissolves; what survives is each driver's genuine, marginal contribution to risk.

Concretely, the modern recipe fits two GLMs over the same multivariate data, echoing the frequency–severity split: a Poisson model for how often claims arrive, and a gamma model for how big they are. Each variable gets one fair relativity in each model, untangled from its neighbours. Multiply the frequency prediction by the severity prediction and you have a pure premium — the expected loss cost — for any combination of risk characteristics, not just the cells you happened to observe a lot of.

From model to rating plan: the loss-cost multiplier

A GLM gives you the pure premium — the expected losses — but a customer's bill is not just losses. Recall the fundamental insurance equation: premium must also pay for expenses and a fair profit. The bridge from the model's loss cost to the final price is the loss-cost multiplier (LCM). It is one number that grosses the pure premium up to cover everything outside losses, so the actuary who built the model and the company that loads on expenses can work independently.

The arithmetic is small and worth seeing. Say fixed expenses are 10% of premium, commission and other variable expenses are another 15%, and the company wants a 5% underwriting profit. Then losses must fit inside the remaining 70% of every premium dollar — that 70% is the permissible loss ratio. The loss-cost multiplier is simply one divided by that fraction, 1 ÷ 0.70 ≈ 1.43. Every dollar of modelled loss cost becomes about $1.43 of premium.

Permissible loss ratio = 1 - 0.10 - 0.15 - 0.05 = 0.70
Loss-cost multiplier   = 1 / 0.70           = 1.4286

RATING ALGORITHM (one policy)
  Base loss cost                       300
  x  Age relativity   (driver 22)    x 1.35
  x  Vehicle relativity (sports)     x 1.20
  x  Territory relativity (urban)    x 1.10
  = Modelled loss cost  300*1.35*1.20*1.10 = 534.6
  x  Loss-cost multiplier            x 1.4286
  = Indicated premium                = 763.7
  +  Policy fee                      + 25
  = Final premium                    = 788.7

The rating algorithm is the exact, reproducible sequence: start from a base, multiply the GLM relativities for this risk, gross up by the loss-cost multiplier, then add flat fees. Anyone can rerun it and get the same number — that reproducibility is what makes a price filable.

That sequence is the rating algorithm: the precise, ordered set of multiplications, additions, caps, and minimums that turns a base rate into the number on the customer's renewal notice. The GLM supplies the relativities; the rating algorithm is how they are assembled, plus the bits a GLM does not model — flat policy fees, increased-limits factors for higher coverage, deductible credits, and rules that no premium fall below a floor. The whole plan — every base rate, factor, and rule — is what an insurer files with the regulator.

Fair, adequate, and explainable

A sharper rating plan is not automatically a better one. Three standards travel with every rate, and you have met them: it must be adequate (enough to pay claims and stay solvent), not excessive (not gouging), and not unfairly discriminatory. That last phrase is the subtle one — rate equity does not mean charging everyone the same. It means price differences must rest on genuine, allowed cost differences. A GLM that charges a sports car more because sports cars genuinely cost more to insure is fair; one that charges more because of a variable the law forbids, or a variable that merely stands in for one it forbids, is not.

This is where the proxy problem bites. Drop a forbidden variable like race from the model and you have not necessarily removed its influence — a permitted variable such as postcode or occupation may quietly carry it, because the data was generated in a society where those things correlate. A model can be perfectly objective in its math and still encode a historical injustice it learned from the data. This is exactly the data quality and ethics warning from the statistics rung, now with real money and real people attached. The actuary, not the algorithm, owns the question of whether a variable is a fair cause or an unfair proxy.

Transparency is the price of admission. In most markets an insurer must submit its plan through rate and form filing and defend it to a regulator, who can reject it. This is precisely why the explainable GLM, not a higher-accuracy black box, remains the industry workhorse for filed rates. A regulator can read a table of relativities and ask why drivers under 25 pay 1.35; nobody can interrogate a neural network the same way. Accuracy that you cannot explain, or that hides an unfair proxy, is accuracy you cannot use.

The arms race, and where the actuary stands

There is a competitive edge to all this that is worth naming plainly. If your rival prices multivariately and you price one-way, they can identify which of your customers you are overcharging and quote them less, while leaving you the ones you are undercharging. Over time you keep the bad risks and lose the good ones — the same adverse selection spiral from the very first rung, now driven by who has the better model. Segmentation, once a refinement, has become a survival skill.

That closes the Ratemaking & Pricing rung. You began with the fundamental insurance equation — premium = losses + expenses + profit — chose between the pure-premium and loss-ratio roads to an overall indication, trended and developed raw history into a fair view of the future, broke the rate down by class, and now you have seen modern pricing knit those classes together into one coherent, multivariate, filable plan. The thread back to statistics is now unmistakable: ratemaking is applied probability and regression, governed by professional judgement. Next the ladder turns to reinsurance — how an insurer, having priced its risk, hands part of that risk on.