Uncorrelated Is Not Independent

Two claims that look like one

In the last guide you built covariance and the correlation coefficient, and you saw that independent variables always have Cov(X, Y) = 0. It is tempting to read that sentence backwards and conclude that zero covariance means independence. It does not. The forward claim and its reverse are two different statements, and only the forward one is true: independence implies uncorrelated, but uncorrelated does not imply independence. This guide is about that one-way street and why it matters.

It helps to remember what each word actually demands. Independence is a statement about the whole joint distribution: it requires P(X in A and Y in B) to factor as P(X in A) times P(Y in B) for every pair of events — knowing Y tells you absolutely nothing about X, ever. Uncorrelated is a single-number condition, Cov(X, Y) = 0, which is the same as E[XY] = E[X]E[Y]. Independence is an infinite list of conditions; being uncorrelated is just one of them. One number cannot possibly capture the whole joint behaviour, so passing the covariance test is far weaker than passing them all.

The point of a U: a tiny counterexample

The cleanest way to feel the gap is a counterexample so small you can check it by hand. Let X take the values -1, 0, +1, each with probability 1/3, and define Y = X^2 exactly. So Y is a deterministic function of X: when X = -1, Y = 1; when X = 0, Y = 0; when X = +1, Y = 1. These variables are about as dependent as two variables can be — fix X and Y is completely pinned down. Yet, as we will compute, they are perfectly uncorrelated.

Find E[X]. By symmetry of -1, 0, +1 each at 1/3, E[X] = (-1 + 0 + 1)/3 = 0.
Find E[XY] = E[X times X^2] = E[X^3]. Since X^3 is -1, 0, +1 with equal weight, E[X^3] = (-1 + 0 + 1)/3 = 0.
Form the covariance: Cov(X, Y) = E[XY] - E[X]E[Y] = 0 - 0 times E[Y] = 0. So X and Y are uncorrelated.
Now test independence and watch it fail: P(X = 1 and Y = 0) = 0 (X = 1 forces Y = 1), but P(X = 1) times P(Y = 0) = (1/3)(1/3) = 1/9, which is not 0. The factorization breaks, so X and Y are NOT independent.

Why the trick works every time

That was not luck; it was symmetry. Covariance, through the bilinearity you met last guide, measures only the straight-line, signed tendency of Y to rise as X rises. Our relationship Y = X^2 is a U: as X goes from -1 to 0, Y falls; as X goes from 0 to +1, Y rises. The down-slope on the left and the up-slope on the right are mirror images, so the positive and negative contributions to the covariance cancel exactly. Covariance sees the net tilt of the cloud of points, and a symmetric U has no net tilt at all.

This reveals the real lesson. The correlation coefficient rho is a measure of *linear* association only, bounded between -1 and +1, and it equals plus or minus 1 exactly when Y is a perfect straight-line function of X. A perfect parabola, a perfect circle, a perfect sine wave — all can yield rho = 0 while being perfectly predictable. Zero correlation says 'no straight-line trend,' which is genuinely useful information, but it is silent about curves, clusters, and any pattern that is not a line. This is the heart of the uncorrelated-is-not-independent phenomenon.

The one place where they do coincide

There is a famous and important exception, and knowing it precisely keeps you from over-applying it. If (X, Y) follow a bivariate normal distribution together, then zero correlation *does* imply independence. For jointly Gaussian variables, the single correlation parameter controls the entire dependence structure, so killing the correlation really does sever every link. This is why, in the comfortable world of the bell curve, people sometimes blur the two ideas — and inside that world the blur is harmless.

But the condition that earns this gift is sharper than it sounds. It is not enough that X is normal and Y is normal *separately* — they must be *jointly* normal, a property of the whole joint distribution. You can build two variables that are each perfectly normal on their own yet whose joint shape is not the smooth Gaussian hill, and such a pair can be uncorrelated without being independent. So the safe statement is narrow: under genuine joint normality, uncorrelated and independent coincide; in general they do not. Treat the Gaussian case as a special privilege, not a default.

What to do, and where this points next

So how do you actually check independence rather than mere uncorrelatedness? You go back to the definition and the factorization criterion: X and Y are independent exactly when the joint pmf or pdf factors as the product of the two marginals, f(x, y) = f_X(x) times f_Y(y), for all x and y. If you cannot factor it — if there is even one corner of the joint where the product fails, as with P(X = 1, Y = 0) above — they are not independent, no matter how the covariance comes out. Factorization is the real test; zero correlation is only a necessary symptom, never the diagnosis.

If you want a single number that, unlike correlation, *does* vanish only under genuine independence, there is one: mutual information. It measures how much knowing one variable reduces uncertainty about the other, and it equals zero if and only if X and Y are independent — catching curved and tangled dependence that correlation walks straight past. It is heavier machinery and lives more in information theory than in a first course, but it is the honest answer to 'is there any relationship at all?', whereas correlation only answers 'is there a straight-line one?'

Carry one practical habit out of this guide. When variances must add cleanly — the next guide's topic — what you truly need is zero covariance, not full independence; uncorrelated is exactly the right, weaker condition there. But whenever you need to factor a joint probability, simulate a system, or claim that one variable carries no information about another, only genuine independence will do, and a passing correlation test will quietly betray you. Match the assumption to the job: never spend 'independent' when 'uncorrelated' is all you have earned.