Independence, Product Measures, and the Zero-One Law

Independence, re-read through the measure lens

Back on the first rungs of this ladder we wrote that two events are independent when P(A and B) = P(A)P(B), and that this is not the same thing as being mutually exclusive — disjoint events are about as *dependent* as events get, since knowing one happened tells you the other did not. That definition was correct, but it floated free of any machinery. Now that the earlier guides in this rung have rebuilt a probability space as a genuine triple (Omega, F, P) with a sigma-algebra and a probability measure, we can finally see what independence *is*: a statement that a measure factorizes.

The mature definition climbs from events to sigma-algebras. Two sub-sigma-algebras G and H of F are independent when P(A and B) = P(A)P(B) holds for *every* A in G and every B in H at once — not just for a single chosen pair. Two random variables, which the third guide of this rung taught us to read as measurable functions, are then declared independent exactly when the sigma-algebras they generate are independent. This sounds like an upgrade in bureaucracy, but it is really an upgrade in honesty: it forces the factorization to hold for the whole information carried by each variable, not for one convenient event.

The pi-system shortcut that makes independence checkable

There is an obvious worry hiding in that definition. A sigma-algebra generated by a single continuous random variable is enormous — it contains every Borel event, uncountably many of them. Must we really verify P(A and B) = P(A)P(B) for all of them to call two variables independent? Mercifully, no, and the rescue is a tool the second guide of this rung introduced for exactly this kind of bootstrapping: the pi-system / lambda-system machinery.

A pi-system is just a family of sets closed under intersection — for a real random variable X, the half-lines {X <= x} form one, and they generate its whole sigma-algebra. Dynkin's pi-lambda theorem says that if two probability measures agree on a pi-system, they agree on the entire sigma-algebra it generates. Apply it here and the moral is enormous: to check X and Y are independent, it is enough to verify P(X <= x and Y <= y) = P(X <= x)P(Y <= y) for all real x and y — that is, the joint cdf factors into the product of the marginal cdfs. The unwieldy 'for every Borel event' collapses to a condition on a small generating family. This is the same spirit as the factorization criterion you met earlier, now justified rigorously rather than asserted.

Product measures: building independence from scratch

So far we have *recognized* independence inside a given space. But where does an independent pair come from in the first place? The construction is the product measure. Given two probability spaces (Omega_1, F_1, P_1) and (Omega_2, F_2, P_2), there is a unique probability measure P_1 x P_2 on the product Omega_1 x Omega_2 whose value on a rectangle is the product of the side measures: (P_1 x P_2)(A x B) = P_1(A) P_2(B). The rectangles form a pi-system, so by the same Dynkin argument that single rule extends to one and only one measure on the whole product sigma-algebra. Independence is not assumed here — it is *manufactured*, baked into the very definition of the joint space.

Rectangle rule:   (P_1 x P_2)(A x B) = P_1(A) * P_2(B)

Fubini / Tonelli (integrating over the product):

   E[ g(X, Y) ] = integral integral g(x, y) dP_1(x) dP_2(y)
                = integral [ integral g(x, y) dP_2(y) ] dP_1(x)

   -- swap the order of integration freely when either
      g >= 0  (Tonelli)  or  E[ |g(X, Y)| ] < infinity  (Fubini)

Consequence, for independent X, Y:

   E[ f(X) g(Y) ] = E[ f(X) ] * E[ g(Y) ]
   (expectations factor whenever both sides are finite)

The product measure factors on rectangles; Fubini-Tonelli lets you integrate one variable at a time and is what makes 'expectations of independent things multiply' a theorem.

The companion to the product measure is the Fubini-Tonelli theorem, and it is exactly why integrals over a product can be done one coordinate at a time. Recall from the fourth guide of this rung that expectation *is* the Lebesgue integral against P; on a product space that integral splits into an inner and an outer integral, in either order. Two honest fences guard the swap. Tonelli lets you exchange the order freely when the integrand is non-negative; Fubini permits it for a signed or complex integrand only after you have checked the double integral of its absolute value is finite. Skip that finiteness check and you can get two different 'answers' from the two orders — the swap is a theorem with hypotheses, not a free move.

One product makes a pair; to do real probability we need an *infinite* sequence of independent variables — an unending stream of coin flips, say. That is delivered by the Kolmogorov extension theorem, which stitches together a consistent family of finite-dimensional product distributions into a single measure on the space of infinite sequences. It is what guarantees that the phrase 'let X_1, X_2, X_3, ... be i.i.d.' actually refers to something — a bona fide probability space exists on which all of them live at once.

Tail events and the zero-one law

Now the prize. With an infinite independent sequence X_1, X_2, ... in hand, some events depend only on the *long-run, asymptotic* behavior of the sequence and not on any finite chunk of its beginning. Formally these are the tail events: an event lies in the tail sigma-algebra if, for every n, it is unchanged by altering the first n terms. Examples: 'the series sum of X_k converges', 'X_n exceeds 100 infinitely often', 'the running average X-bar_n converges to a limit'. Each of these is decided by the infinite future and shrugs off any finite prefix you scribble over.

Kolmogorov's zero-one law makes a startling claim about such events: for an independent sequence, every tail event has probability exactly 0 or exactly 1 — never anything strictly in between. There is no tail event with probability 0.5. Asymptotic questions about independent sequences are not gambles at all; they are settled, deterministic facts that merely happen to be dressed in probabilistic language. Either the series converges with certainty or it diverges with certainty; either the average has a limit almost surely or it almost surely does not.

Why is it true? The proof is a small marvel of the structure we just built, and it leans on independence twice. A tail event T ignores the first n variables, so it is independent of the sigma-algebra of (X_1, ..., X_n) for *every* n. Those finite sigma-algebras grow to generate the entire sigma-algebra of the whole sequence — and they form a pi-system, so by the pi-lambda theorem T ends up independent of *everything*, including itself. An event independent of itself satisfies P(T) = P(T and T) = P(T)P(T), so P(T) = P(T)^2, whose only solutions are P(T) = 0 and P(T) = 1.

Fix a tail event T. By definition it ignores any finite prefix, so T is independent of the sigma-algebra generated by (X_1, ..., X_n), for every n.
These finite-prefix sigma-algebras form a pi-system whose generated sigma-algebra is the entire sequence's sigma-algebra. By the pi-lambda theorem, independence extends from the pi-system to that whole sigma-algebra.
But T itself lives inside that whole sigma-algebra. So T is independent of a sigma-algebra that contains T — in particular T is independent of itself.
Self-independence forces P(T) = P(T and T) = P(T)^2, and x = x^2 has only the roots 0 and 1. Hence every tail event has probability 0 or 1.

What the zero-one law buys you

The law is not a curiosity; it changes how you reason. Suppose you want to know whether the running average of an i.i.d. sequence converges. The event 'X-bar_n converges' is a tail event, so before computing anything you already know its probability is 0 or 1 — there is no halfway. That reduces a hard question to deciding *which* of the two it is, which is far easier than estimating a number in between. Combined with the earlier limit theorems, this dichotomy is the backbone of the strong law of large numbers: the average converges to mu not 'with some probability' but with probability exactly 1, almost surely.

A close cousin sharpens the picture further. The Borel-Cantelli lemmas decide the tail event 'A_n happens infinitely often'. The first lemma says that if the probabilities sum to a finite number, sum of P(A_n) < infinity, then almost surely only finitely many A_n occur — no independence needed. The second says that *if the A_n are independent* and the probabilities sum to infinity, then almost surely infinitely many occur. Together they pin a sharp 0-or-1 verdict on 'infinitely often', echoing the zero-one law: a divergent sum of independent-event probabilities forces the event to recur forever; a convergent one forces it to stop.