That One Weird Reading: Outliers and the Q-Test

The reading you wish you hadn't taken

You titrate a sample four times and get 10.12, 10.15, 10.13, and 10.31 mL. Three results huddle together near 10.13; the fourth sits far out at 10.31. A result that lies suspiciously far from the rest is called an outlier. The temptation is obvious: delete the ugly one and the average looks lovely. But deleting data you simply dislike is how honest science quietly becomes fiction.

There are two innocent explanations and they call for opposite actions. Either a real blunder happened — you misread the burette, an air bubble slipped through, you recorded the wrong digit — which is a gross error and genuinely deserves deletion. Or the point is just an unlucky tail of ordinary random error, a legitimate reading you must keep. The hard part is that the number alone looks the same either way.

The honest first move: investigate, don't delete

The cleanest reason to drop a point is a documented physical cause. So before any statistics, check your notebook: did you note a bubble, a spill, a re-zeroed instrument? If you find a recorded blunder, remove the point and say why. If you find nothing, you are not allowed to delete on a hunch — and that's where a fair statistical test earns its keep.

The Q-test: a fair ruler for suspects

The Q-test (Dixon's Q) gives a simple, defensible rule. Its idea: compare the gap between the suspect and its nearest neighbour against the total spread of the data. If that gap is a big fraction of the whole range, the point is too far away to be ordinary scatter; if it's a small fraction, the point is just part of the crowd.

Sort the values: 10.12, 10.13, 10.15, 10.31. The suspect is 10.31.
Gap to nearest neighbour = 10.31 − 10.15 = 0.16. Total range = 10.31 − 10.12 = 0.19.
Q(calculated) = gap ÷ range = 0.16 ÷ 0.19 = 0.84.
Look up Q(critical) for 4 values at 95% confidence: 0.829. Since 0.84 > 0.829, you may reject 10.31.

Calculated Q beats the critical value, so the test gives you permission to discard 10.31 as an outlier and average the remaining three. Had Q come out below the critical value, you would have been obliged to keep every point — wishful thinking is not a statistic.

Handle with care

The Q-test has real limits. With only three or four points it is weak — it can't tell a genuine blunder from bad luck very reliably. Never apply it twice to throw out a second point. And it judges only one value at a time, so it's blind to two outliers hiding on the same side. The honest fix for a shaky dataset is almost always to measure more replicates, not to test harder.

And remember what rejection actually changes. With 10.31 gone, the mean drops to 10.133 and the standard deviation shrinks dramatically — your result looks far more precise. That's exactly why the rule must be objective and decided before you peek at the answer, never bent to flatter the number you were hoping for.