Five Sigma: When Is It a Discovery?

A bump is not yet a discovery

By now you know how a hidden particle announces itself. You plot the reconstructed invariant mass of millions of events, and a heavy parent betrays itself as a little hill rising above the smooth slope of background — a bump where extra events pile up at one mass. The trouble is that the background is never perfectly smooth. It jitters. Even with nothing new in the data, random counting wobbles will throw up little hills and valleys all on their own, and from across the room a lucky cluster of background events can look exactly like the start of a real peak.

So when an experiment sees an excess, it faces a sharp, honest question before it dares say the word "discovery." Is this a new particle — or is it just the background having an unusually good day? The whole machinery of this guide exists to answer that one question with a number, not a hunch. The key idea is to ask: if there were truly nothing new here, how *often* would chance alone serve up a bump at least this big? If the answer is "all the time," shrug. If the answer is "almost never," you may be looking at something real.

How surprising is a wobble? Sigma and the p-value

To turn the question into a number you first need a yardstick for ordinary wobble. Counting is governed by a simple rule of thumb: if you expect about N events of background in some mass window, the random scatter around that expectation is roughly the square root of N. So if you expect 100 background events, you should not be startled to find 90 or 110 — a swing of about ten, which is the square root of 100. That square root is your natural unit of surprise, written with the Greek letter sigma. An excess of one sigma is utterly humdrum; the background does that constantly. The further an excess climbs above its expected scatter, the harder it is to wave away as luck.

Sigma is really a disguised probability. Behind it sits the p-value: the chance that background alone, with no new physics whatsoever, would fake an excess at least as large as the one you saw. A small p-value means a big surprise. Translating between the two is the everyday language of the field: a one-sigma fluctuation happens roughly one time in six, two sigma about one time in forty-four, three sigma about one in seven hundred. Each extra sigma is not a small step but a steep cliff — the odds of a fake plummet fast as sigma climbs. That steepness is exactly why physicists set the bar where they do.

expected background N      ->  natural scatter  ~  sqrt(N)
significance (sigma)       =   (observed - expected) / sqrt(N)

1 sigma  ~ 1 in 6        3 sigma ~ 1 in 740      5 sigma ~ 1 in 3,500,000
2 sigma  ~ 1 in 44       4 sigma ~ 1 in 31,600

A rough significance is how far the excess rises above the expected scatter, in units of that scatter. Each step in sigma is a cliff, not a stair: the chance of a chance fake drops by huge factors with every additional sigma. (Real analyses use fuller statistical methods than this back-of-envelope ratio, but the spirit is exactly this.)

Why five, and not three?

Here is the convention that rules the field. To claim evidence for something, particle physicists want about three sigma — a roughly one-in-740 chance of a fluke. But to claim an outright discovery, the bar is [[statistical-significance-five-sigma|five sigma]]: an excess so large that pure background would fake it only about once in three and a half million tries. That is a deliberately, almost absurdly, stringent demand. Why so harsh? Three sigma sounds rare, but three-sigma bumps appear and then evaporate with embarrassing regularity in this field — promising hills that melt away once more data arrives. The history of the subject is littered with three-sigma ghosts.

There are three honest reasons for setting the bar so high. First, physicists run an enormous number of searches, in countless mass windows and channels, so even rare flukes are bound to crop up somewhere — a point so important it gets its own section below. Second, the stakes are huge: a claimed discovery rewrites textbooks and steers a field of thousands, so the cost of a false alarm is severe. Third — and most subtly — the simple sigma calculation only counts random scatter, and it quietly trusts that the background was modelled perfectly. It never is. The extra cushion of five sigma is partly insurance against the imperfections in that model, the systematic uncertainties we will meet shortly.

The look-elsewhere effect: a thousand lottery tickets

Now the trap that catches the unwary. Suppose you scan a mass spectrum from end to end, hunting for a bump anywhere along it. At each spot the chance of a big random wobble is small — but you are not looking at one spot, you are looking at hundreds. It is the lottery: one ticket almost never wins, but buy a thousand and the chance that *some* ticket wins climbs sharply. A three-sigma bump at one specific mass you predicted in advance is genuinely surprising; the very same bump appearing *somewhere* in a wide spectrum you scanned freely is far less so, because you gave chance a thousand spots to throw one up. This inflation of apparent significance is the [[look-elsewhere-effect|look-elsewhere effect]].

Physicists handle this with two honest bookkeeping terms. The local significance is how surprising the bump is at the exact place it appeared, as if you had aimed there in advance. The global significance is how surprising it is once you fairly account for all the places you could have found a bump. The global number is always the smaller, more sober one — and it is the global figure that must reach five sigma for a discovery. Many a tantalising local four-sigma excess has quietly faded once the look-elsewhere correction dragged it down toward the unremarkable.

Two kinds of uncertainty: the jitter and the bias

Behind every significance lies a measurement, and every measurement carries error bars of two profoundly different kinds. The first is statistical uncertainty — the random scatter you have already met, the square-root-of-N jitter of finite counting. Its defining feature is generous: it shrinks as you collect more data. Run the collider longer, gather four times the collisions, and this uncertainty roughly halves. Statistical error is the part of your ignorance that patience and more luminosity will cure.

The second kind is the dangerous one. Systematic uncertainty is not random scatter but a bias — a way your whole experiment might be consistently off in one direction. Maybe your calorimeter's energy scale is calibrated half a percent high, so every energy reads slightly large. Maybe your simulation of the background is subtly imperfect, so you misjudge how many ordinary events to expect. Crucially, this kind of error does *not* shrink with more data: take a billion measurements with a miscalibrated scale and you simply get a billion measurements that are all wrong by the same amount, with great precision. Think of a bathroom scale that reads two kilograms heavy — stepping on it a thousand times averages away the wobble but never the two-kilogram lie.

Five sigma in the wild: the discipline of a discovery

Put it all together and you can see why a real discovery is a slow, disciplined act, not a flash of insight. The 2012 Higgs discovery is the textbook case: two independent experiments each scanned for a bump, each watched a modest excess grow as the data piled up over months, each corrected for the look-elsewhere effect, each wrestled its systematic uncertainties to the ground — and only when both independently crossed five sigma, *at the same mass*, did the field allow itself to say the word. Two separate teams reaching the same answer is its own powerful check, far stronger than either alone.

Underpinning all of it is a quiet discipline you will meet again in the next guide: [[blind-analysis-combination|blind analysis]]. Because humans see what they hope to see, experiments fix every selection rule and every cut *before* anyone is allowed to look at the bump region — so the analysis cannot be unconsciously tuned to flatter a hopeful wobble. Only when the method is frozen do they unblind and read the answer. That habit, together with five sigma and the look-elsewhere correction, is the immune system that keeps the field honest.

One last honest word, to keep the whole standard in proportion. Five sigma proves you have found *something* real beyond the background — a genuine excess that is not a fluke. It does not prove *what* that something is. The discovery announced in 2012 was, strictly, a new boson of about the right mass; calling it *the* Higgs took further years of measuring how it decayed and how strongly it coupled to mass, checking each property against the prediction. And five sigma is no guarantee of immortality: a result can still be overturned by a discovered flaw in the analysis, which is exactly why independent confirmation matters so much. The standard is not a magic wand. It is a sober, hard-won discipline for telling the rare truth from the common lie — and a fitting capstone to everything this rung has taught about turning collisions into knowledge. Next we will watch that discipline play out across the great discoveries themselves.