Independence of Random Variables

From independent events to independent variables

Two rungs ago you met independent events: A and B are independent when P(A and B) = P(A) * P(B), which is just a tidy way of saying that learning B happened leaves the chance of A untouched. This guide lifts that same idea from single events up to whole random variables. The previous guide showed how a joint distribution packages two variables X and Y together and how each one's marginal distribution is recovered by summing or integrating the other away. Independence of random variables asks the natural follow-up question: when does the pair carry no extra information beyond the two pieces on their own?

The clean definition says X and Y are independent if, for every pair of values (or ranges) x and y, the joint probability splits into a product of the marginals: P(X = x and Y = y) = P(X = x) * P(Y = y) in the discrete case. Picture rolling a red die and a blue die. The chance the red shows 3 AND the blue shows 5 is just (1/6) * (1/6) = 1/36, because the blue die has never heard of the red one. That factorization, holding for every cell at once, is the whole content of independence.

The factorization criterion: split the joint into a product

The most useful working test is the factorization criterion: X and Y are independent exactly when their joint pmf or pdf factors as a product of a function of x alone times a function of y alone, across the whole plane. For a joint pmf, p(x, y) = p_X(x) * p_Y(y) for all x, y; for a joint density, f(x, y) = f_X(x) * f_Y(y) for all x, y. The point is that nothing in the joint description ties the two variables together — the formula for X never mentions Y, and vice versa.

Two honest cautions make this test trustworthy. First, "for all x and y" is non-negotiable: the product rule must hold in every cell, not just on average or for one convenient pair. If even one cell breaks it, the variables are dependent. Second, watch the support — the set of (x, y) where the joint is positive. If that region is not a rectangle (say, the variables are only allowed when X <= Y), then knowing X already constrains Y, so independence fails no matter how nicely the formula looks elsewhere. A non-rectangular support is an instant tell of dependence.

Joint pmf p(x, y) for X = red die hi/lo, Y = blue die parity

            Y = even   Y = odd  |  row sum p_X(x)
  X = low      1/4       1/4    |     1/2
  X = high     1/4       1/4    |     1/2
  ------------------------------+------------------
  col p_Y(y)   1/2       1/2    |      1

  Test factorization in each cell:
    p(low, even)  = 1/4 ?= p_X(low) * p_Y(even) = 1/2 * 1/2 = 1/4   OK
    every other cell checks the same way                            OK
  => X and Y are INDEPENDENT (each cell = row sum * col sum)

  Contrast (dependent):  if p(low, even) were 1/2 and p(high, even) 0,
  then 1/2 != p_X(low)*p_Y(even), so one broken cell => DEPENDENT.

Independence shows up as every inner cell equaling its row total times its column total; a single cell that disobeys breaks it.

Independence as conditional indifference

There is a second, more intuitive face of the same fact, built from the conditional distributions you met last guide. Recall that a conditional pmf is p(y given x) = p(x, y) / p_X(x) — the slice of the joint at a fixed x, renormalized to sum to 1. If X and Y are independent, then p(y given x) = [p_X(x) * p_Y(y)] / p_X(x) = p_Y(y). In words: conditioning on X changes nothing. Every slice of the joint, no matter which x you fix, has the very same shape — the plain marginal of Y.

This is the deepest way to feel independence: it is conditional indifference. Tell me X is 3, or X is 100, or X is anything at all, and my best guess for the distribution of Y does not budge. That is exactly why independence makes prediction pointless across the pair — there is no leverage, no information to borrow. The moment the slices start to differ as you slide x, you have dependence, and that difference is precisely the information one variable carries about the other.

Why independence is the workhorse assumption

Independence is prized because it makes hard joint problems collapse into easy one-variable problems. The clearest payoff is for sums: when X and Y are independent, the distribution of their sum is found by a convolution — you slide one distribution across the other and add up the overlapping probability. The Bernoulli atoms of the previous rung become a binomial precisely because each trial is independent of the rest; you literally add independent copies. Without independence, you would need the full joint table, which is far larger and rarely known.

Independence also has a precious algebraic gift for expectations. Linearity, E[aX + bY] = a E[X] + b E[Y], holds for ANY variables, dependent or not — that one never needs independence. But the multiplicative rule E[XY] = E[X] * E[Y] holds only when X and Y are independent (or at least uncorrelated). That single factorization of the mean of a product is the seed from which covariance, correlation, and the variance of a sum all grow in the next guides — and it fails the instant the variables depend on each other.

A common pattern names this directly: a sequence of variables that are independent and identically distributed, or i.i.d. — independent of one another and all sharing the same distribution. Repeated coin tosses, repeated measurements under fixed conditions, a clean random sample: these are the i.i.d. settings on which the law of large numbers and the central limit theorem (coming in a later rung) are built. The independence half is what lets the average settle down; the identically-distributed half is what gives it a single target to settle toward.

Traps, limits, and honest fine print

First trap, the most famous: independent trials have no memory. The gambler's fallacy is the belief that after five reds at roulette, black is "due." It is not. If the spins are independent, the wheel cannot remember the past, so the next spin is exactly as likely to be red as ever. Independence forbids any such pull toward balance on the next trial. The long-run frequencies do steady out, but not because individual results compensate for one another — they steady out because new independent trials dilute the early imbalance, not erase it.

Second trap, a forward warning: independence is strictly stronger than being uncorrelated. Independent variables are always uncorrelated, but the reverse can fail — there exist variables that are uncorrelated yet still dependent, because correlation only sees straight-line association while dependence can hide in curves. This gap between zero correlation and independence is important enough to get its own guide later in this rung; for now just hold the one-way arrow: independent implies uncorrelated, never the other way around as a rule.

A final piece of honest fine print, important once you handle three or more variables: pairwise independence is weaker than mutual independence. Three variables can be independent in every pair yet still jointly constrained — knowing two of them can pin down the third even though no single pair shows any link. Full (mutual) independence demands that the joint distribution of ALL of them factor at once, not merely two at a time. So when a problem says "independent," check whether it means each pair or the whole collection; the difference quietly matters.