The Distribution of a Function: The CDF Method

A function of a random variable is a random variable

By now you are fluent with a single random variable X — its mass or density, its cdf, its expectation. But the questions that matter rarely stop at X itself. An engineer measures a random voltage X and cares about the power, which goes like X^2. A physicist clocks a random speed and wants the kinetic energy. A trader holds a random log-return and needs the actual price, which is e raised to that return. In every case you start with something whose distribution you know, push it through a function g, and ask: what is the distribution of the new quantity Y = g(X)?

The first thing to settle is that there is nothing exotic here. Y = g(X) is itself a perfectly ordinary random variable: feed it the same random outcome, apply g, read off a number. So Y has its own cdf, its own density, its own expectation — and this whole rung is about the machinery for finding them. This guide covers the distribution of a function by the most reliable route of all, the cdf method. Later guides specialize it into the slick change-of-variables formula, into convolution for sums, and into the order statistics; but the cdf method is the bedrock they all rest on, and the one to reach for when you are unsure.

Chase the probability, not the formula

Here is the whole idea in one sentence. To find the distribution of Y = g(X), do not try to transform the density directly — instead translate every question about Y back into a question about X, where you already know the answer. The cleanest place to do this is the cumulative distribution function, because the cdf only ever asks one kind of question: "what is the probability of being at most y?" Write F_Y(y) = P(Y <= y), then substitute Y = g(X) and solve the inequality g(X) <= y for X. You convert an event about Y into an event about X, evaluate it with the cdf of X you already hold, and the answer that drops out is F_Y, the cdf of Y.

Once you have F_Y in hand, getting the density is the move you already learned: differentiate. Recovering the density from the cdf is just f_Y(y) = the derivative of F_Y(y). So the full method is a tidy three-step loop, and notice it works whether g goes up or down, whether it is one-to-one or folds two inputs onto one output — because at the cdf level you are only ever managing an inequality, and inequalities handle all of that honestly. That robustness is exactly why this is the method of last resort that never lets you down.

The CDF method, in three steps:

  1.  F_Y(y) = P(Y <= y) = P( g(X) <= y )
  2.  solve g(X) <= y  ->  an event about X
                          ->  evaluate with F_X
  3.  f_Y(y) = d/dy F_Y(y)        (differentiate)

Set up the cdf of Y, translate the event into one about X, then differentiate to get the density.

A worked example: the square of a uniform

Let X be uniform on [0, 1] — flat density 1 on that interval — and let Y = X^2. Intuitively, squaring numbers between 0 and 1 *shrinks* them and bunches the results toward 0 (since 0.5^2 = 0.25, 0.1^2 = 0.01), so we expect Y to pile up near the low end. The cdf method makes that intuition exact. For a target y between 0 and 1, write F_Y(y) = P(X^2 <= y). Since X is never negative here, X^2 <= y is the same event as X <= sqrt(y). That is the crucial translation: a question about Y has become a question about X.

Set up: F_Y(y) = P(Y <= y) = P(X^2 <= y) for y in [0, 1].
Translate the event: because X >= 0, the inequality X^2 <= y is exactly X <= sqrt(y). So F_Y(y) = P(X <= sqrt(y)) = F_X(sqrt(y)).
Use what you know: the uniform cdf is F_X(x) = x on [0, 1], so F_X(sqrt(y)) = sqrt(y). Hence F_Y(y) = sqrt(y).
Differentiate for the density: f_Y(y) = d/dy of sqrt(y) = 1 / (2 sqrt(y)) for y in (0, 1].

Read what the answer is telling you. The density f_Y(y) = 1 / (2 sqrt(y)) blows up as y approaches 0 — it is huge near the bottom and small near 1 — which is exactly the "pile up near zero" we guessed. And it confirms a lesson from the previous rung: a density is not a probability. Here f_Y(y) shoots off toward infinity near 0, yet that is fine, because density is not probability; the probability of any single point is still zero, and only the *area* under f_Y over an interval is a real probability. (As a sanity check, the total area is the integral of 1 / (2 sqrt(y)) from 0 to 1, which equals 1 — a genuine distribution.)

When g goes the wrong way, or folds

The square example had g increasing on the relevant range, so the inequality flipped over cleanly. Two complications come up constantly, and the cdf method swallows both because it only ever reasons about events. First, a decreasing g reverses the inequality. If Y = -X, say, then Y <= y means -X <= y, i.e. X >= -y, so F_Y(y) = P(X >= -y) = 1 - F_X(-y). Whenever you divide by a negative number or invert a decreasing function, the inequality sign flips — and getting that flip right is the single most common slip. Leaning on the monotonicity of probability (bigger sets carry at least as much probability) keeps the bookkeeping honest.

Second, and more interesting, g may fold — two different inputs land on the same output, so g is not one-to-one. Squaring a variable that can be negative does this: both x and -x give the same x^2. Suppose X can range over the whole line and Y = X^2. Then Y <= y (for y > 0) means X^2 <= y, which is the *two-sided* event -sqrt(y) <= X <= sqrt(y). So F_Y(y) = F_X(sqrt(y)) - F_X(-sqrt(y)): you must collect both branches that fold onto y. This is precisely where the cdf method earns its keep — the later change-of-variables formula assumes g is one-to-one, but the cdf method needs no such promise. It just gathers every piece of the X-line that maps below y.

Why this method, and where it leads

It is worth naming why the cdf carries the load and the density does not. The cdf is a probability — F_Y(y) really is P(Y <= y), a genuine number between 0 and 1 — so substituting and manipulating it never breaks any rule. A density is a rate, not a probability, and you cannot simply "plug g into" a density; the change-of-variables formula in the next guide exists precisely to repair what goes wrong when you try, by inserting a stretching factor (the Jacobian) to account for how g compresses or expands the axis. The cdf method sidesteps that entirely: because it works at the level of events, the stretching is taken care of automatically when you differentiate at the very end.

With this one method you can already reach surprising destinations. Apply the cdf method with g chosen as a variable's *own* cdf and you get the probability integral transform, the magic fact that F(X) is uniform — the engine behind simulating any distribution, and the subject of guide 4. Apply it to the maximum or minimum of several variables and you get the order statistics of guide 5. Feed it a sum X + Y and the cdf method produces convolution, guide 3. Every later guide in this rung is, at heart, the same three steps you just learned: write the cdf of the new quantity, translate the event back to what you know, and differentiate.