The Law of the Unconscious Statistician

The averaging problem one rung up

Guide 1 of this rung built expectation as a long-run average: a weighted average of the values a random variable can take, each weighted by its probability. For a discrete X you summed value times probability, E[X] = sum of x times P(X = x); for a continuous one you integrated x against the density f, [[expectation-continuous-case|E[X] = integral of x f(x) dx]]. That answers "what is the average of X?" But almost nothing interesting asks for the average of X itself. We want the average of X^2 to measure spread, the average of e^(tX) to build a moment generating function, the average of a payoff g(X). The question quietly changed: not "average X", but "average some function of X".

Here is the obvious-but-painful route. If Y = g(X), then Y is itself a random variable, so by the definition of expectation E[Y] = sum of y times P(Y = y). To use that you must first work out the whole distribution of Y — figure out which y-values Y can hit and with what probabilities, which can mean inverting g, tracking overlaps where several x map to one y, and (for continuous variables) a change-of-variables with a Jacobian. That is a real chunk of work, and you only wanted a single number at the end. There has to be a shortcut.

LOTUS: average without re-deriving

The shortcut is the Law of the Unconscious Statistician, or LOTUS. It says: to average g(X), do not touch the distribution of Y at all. Keep X's own probabilities and just apply g to each value before averaging. For a discrete X, E[g(X)] = sum of g(x) times P(X = x); for a continuous X, E[g(X)] = integral of g(x) f(x) dx. You reweight the function's outputs by X's original weights — nothing about Y's distribution is ever needed.

Watch the laziness pay off. Let X be a fair die, values 1 to 6 each with probability 1/6, and ask for E[X^2]. The painful route: Y = X^2 takes values 1, 4, 9, 16, 25, 36, each with probability 1/6 — so here you would still have to list Y's distribution. LOTUS skips that bookkeeping and reads straight off X: E[X^2] = (1/6)(1 + 4 + 9 + 16 + 25 + 36) = 91/6 ~ 15.17. Same answer, but you never paused to think about "the distribution of X^2". For a function that is not one-to-one the savings are dramatic, because LOTUS never makes you untangle which x's collide onto the same y.

LOTUS builds variance and the moments

LOTUS is not a party trick — it is the machinery under the rest of this rung. The moments of X are just expectations of powers: the k-th moment is E[X^k], computed by LOTUS with g(x) = x^k. The first moment is the mean. The second moment E[X^2] feeds straight into spread. And the moment generating function, the next guide's star, is E[e^(tX)] — another LOTUS expectation, this time with g(x) = e^(tx). Every one of these is the same move: pick the function, reweight by X's probabilities, sum or integrate.

Take the most important example: variance, the average squared distance from the mean. Its definition is itself a LOTUS expectation, Var(X) = E[(X - mu)^2] with mu = E[X] and g(x) = (x - mu)^2. Computing it from the definition means averaging (x - mu)^2 over X's values. But LOTUS plus the linearity you will meet in guide 3 hands you the famous shortcut [[variance-computational-formula|Var(X) = E[X^2] - (E[X])^2]]: expand the square, average term by term, and the cross term collapses. So variance needs only two LOTUS averages — E[X] and E[X^2] — and a subtraction.

Variance of the fair die via LOTUS, in two averages:

  E[X]   = (1/6)(1+2+3+4+5+6)        = 21/6 = 3.5
  E[X^2] = (1/6)(1+4+9+16+25+36)     = 91/6 ~ 15.1667    (LOTUS, g(x)=x^2)

  Var(X) = E[X^2] - (E[X])^2
         = 91/6 - (3.5)^2
         = 15.1667 - 12.25
         = 2.9167                    (= 35/12)

  SD(X)  = sqrt(2.9167) ~ 1.71

Two LOTUS averages plus a subtraction give the variance — no distribution of X^2 ever constructed.

A bend in the road: Jensen's inequality

LOTUS makes one fact unmistakable, and it trips up nearly everyone the first time: averaging and applying a function do not commute. E[g(X)] is usually not g(E[X]). The mean of the squares is not the square of the mean — indeed the variance formula above is exactly the gap E[X^2] - (E[X])^2, which is positive whenever X has any spread at all. Plug a random variable into a curved function and the average output drifts away from "the function of the average input". This is not an error in the arithmetic; it is a structural feature of curvature.

Jensen's inequality turns this drift into a precise, signed rule. If g is convex (curving upward, like x^2 or e^x), then E[g(X)] >= g(E[X]). If g is concave (curving downward, like ln x or sqrt x), the inequality flips: E[g(X)] <= g(E[X]). Equality holds only in the boring case where X is a constant or g is straight, so any genuine randomness through any genuine curve creates a definite gap in a known direction. The picture is a smile-shaped curve: average the inputs and you sit at one point on the curve; average the outputs and the curve's bowl pulls you upward, above that point.

Using LOTUS without slipping

In practice LOTUS reduces almost any expectation to a fixed recipe. The art is only in choosing g and remembering you weight by X's original probabilities, never by anything about g(X).

Name the function. Write the quantity as E[g(X)] and read off g — for variance g(x) = (x - mu)^2, for the k-th moment g(x) = x^k, for the mgf g(x) = e^(tx).
Keep X's own weights. Use P(X = x) for a discrete X or the density f(x) for a continuous X. Do not derive the distribution of Y = g(X); that is the whole point of LOTUS.
Sum or integrate. Compute E[g(X)] = sum of g(x) P(X = x), or the integral of g(x) f(x) dx over the support of X.
Sanity-check the answer. Confirm it is finite (heavy tails can make E[X^2] or the mean infinite), and recall E[g(X)] need not equal g(E[X]) — use Jensen to predict which side the gap falls on.

Two honest cautions before you lean on it. First, LOTUS only delivers a number when the sum or integral actually converges absolutely; for a heavy-tailed law the relevant expectation can be infinite or undefined, and the formula then has no value to report rather than a wrong one. Second, LOTUS hands you E[g(X)] and nothing more — it deliberately throws away the shape of g(X). If you need a probability like P(g(X) > 5), a quantile, or the variance of g(X) on its own, you really do have to find the distribution of g(X); LOTUS averages, it does not describe. Keep those two limits in view and it becomes the single most-used tool in the whole subject.