Same or Different? t-Tests, F-Tests, and Correlation

The skeptic's starting point

A new, cheaper method gives a mean of 9.87% iron; the trusted reference method gives 9.91%. They differ by 0.04 — but both methods scatter, so any two means will differ a little by pure chance. The grown-up question is: is 0.04 bigger than what chance alone would produce? To answer it, statisticians start from a deliberately boring assumption.

That assumption is the null hypothesis: 'there is no real difference; the gap is just random scatter.' We don't believe it or disbelieve it at the start — we put it on trial. A test then asks: if the null hypothesis were true, how surprising would my data be? If the data would be very surprising under 'no difference', we reject the null and conclude the difference is real.

How surprising is 'too surprising'? You set that threshold in advance: the significance level, usually 5% (written α = 0.05). It is the chance you're willing to accept of crying 'difference!' when there really isn't one — a false alarm. A smaller α makes you more cautious but easier to fool by a genuine effect.

The t-test: do two means differ?

The t-test compares two means while accounting for their scatter. Its logic is a ratio: the difference between the means, divided by the uncertainty in that difference. A big difference riding on tiny scatter gives a large t — convincing. A small difference drowning in big scatter gives a small t — unconvincing. You compare your calculated t against a critical t from a table, just as with the Q-test.

When the two methods share a similar scatter, we combine their two standard deviations into one better estimate called the pooled standard deviation. Pooling makes sense because both data sets are telling you about the same underlying random error, so merging them gives more degrees of freedom and a sharper test than either set alone.

The F-test: do two spreads differ?

Sometimes the interesting question isn't about the means but about the spreads. Is the cheap new method less precise than the reference — does it scatter more? The F-test compares two precisions by taking the ratio of their variances (the squares of the standard deviations), always putting the larger variance on top so F ≥ 1.

If the two methods are equally precise, their variances are about equal and F sits near 1. If one scatters far more, F climbs well above 1; once it passes the critical F from the table, you conclude the precisions genuinely differ. The F-test is also the gatekeeper for the t-test: you usually check with F whether the two scatters are similar enough to justify pooling their standard deviations in the first place.

Correlation: do two things move together?

A different family of question: as one quantity rises, does another rise with it? Plot instrument signal against known concentration and you hope for a straight line. The correlation coefficient (r) measures how tightly points hug a straight line. It runs from −1 to +1: r near +1 means a tight upward line, r near −1 a tight downward line, and r near 0 means a shapeless cloud with no linear trend.

These tools share one quiet discipline. Choose your test and your significance level before you see the result, state the null hypothesis plainly, and report the verdict whether or not it's the one you hoped for. That discipline is what turns a pile of numbers into a conclusion someone else can trust.