The Design of Experiments
Randomise the experiment, and let the data risk proving you wrong.
A statistician, a summer afternoon, and a single cup of tea became the moment science learned how to run a fair experiment.
The idea, unpacked
Before Fisher, experiments were often a muddle: change something, see what happens, then argue about whether the result was real. Fisher sharpened the question. Instead of asking 'is my idea true?', he asked 'could pure chance have produced a result this striking?' You begin by stating a deliberately dull claim — the null hypothesis, that there is no real effect and it is all luck — and then build an experiment that gives the facts a fair chance to overthrow it.
His second move was randomisation. By deciding the order and layout of an experiment by chance — a coin, a shuffle — you turn luck from your enemy into your measuring stick. Now you can calculate exactly how often chance alone would fool you, and choose to believe an effect only when it beats those odds.
Where it came from
The scene is the Rothamsted agricultural research station in England in the 1920s. A colleague, the algae scientist Muriel Bristol, insisted that a cup of tea tasted different depending on whether the milk or the tea was poured in first. Fisher thought it nonsense; another scientist present, William Roach, said simply: let's put her to the test.
Fisher had spent years at Rothamsted wrestling messy crop-trial data into shape, inventing the analysis of variance and the rules of good experimental design as he went. In that tea party he saw an entire philosophy of experiment in miniature — and in 1935 he opened his book The Design of Experiments with it.
Why it mattered
It handed every science a shared, honest procedure for telling a real signal from a lucky fluke, and a shared language — null hypothesis, significance, p-value — to argue in. Above all it placed randomisation at the heart of credible evidence. The randomised controlled trial, the reason we can trust that a medicine truly works rather than merely seeming to, is Fisher's cup of tea grown all the way up.
A sharpened coin-toss
Suppose a friend claims they can call a coin before it lands. One correct call proves nothing — anyone is right half the time. But ten correct calls in a row? Chance would manage that only about once in a thousand tries, so now you start to believe them. Fisher's tea test is exactly this, made precise: with eight cups there are seventy ways to split them into two fours, so a perfect sorting happens by luck only one time in seventy — rare enough to take seriously. Be the lady yourself below.
What came before and after
Fisher built on the older mathematics of probability — including bayes-1763 — but turned it in a new, practical direction: not updating beliefs, but designing experiments. His significance testing was soon challenged by Jerzy Neyman and Egon Pearson, whose rival 'error-rate' framework still competes with it today, and by the Bayesians. Yet the randomised experiment he championed became the gold standard of evidence, from medicine to economics. When you read that a finding is 'statistically significant', you are reading Fisher.
A lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup.
the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.