The same old random variable, with one new clause
When you first met a random variable X, the picture was already exactly right: X is a function that takes an outcome omega and returns a real number X(omega). The randomness is never in X itself — X is a fixed deterministic rule — it is in which outcome the experiment serves up. Roll two dice, let X be the sum, and X((3,5)) = 8 with no uncertainty about the rule at all. Nothing about that story changes in this rung. We are not replacing the idea; we are checking its license.
Here is the gap the previous guides forced into the open. In guide 1 you saw why naive probability breaks: on a continuous space you cannot assign a sensible size to every subset, so a non-measurable set has no probability at all. In guide 2 you fixed this by living on a measurable space (Omega, F) and only ever assigning probability to events in the sigma-algebra F. But that fix has a consequence for random variables that elementary courses quietly ignore: when we write P(X <= 3), we are asking for the probability of the set {omega : X(omega) <= 3}. If that set is not in F, the question is not hard — it is meaningless. So X cannot be just any function; it has to be one whose questions always land on legal events.
Measurability, precisely: preimages must be events
Here is the clean definition. Fix a probability space (Omega, F, P). A function X: Omega -> R is a random variable — a measurable function — if for every Borel set B on the real line, the preimage X^(-1)(B) = {omega : X(omega) in B} belongs to F. The preimage is the set of all outcomes that X sends into B; saying it is in F says that set is a legal event we can measure. Notice the direction carefully: we pull sets B from the line *back* to Omega and demand the result is an event. Measurability is a statement about preimages going the right way, not about images going forward.
Quantifying over every Borel set B sounds like an impossible amount of checking, and here measure theory hands you a beautiful shortcut. It is enough to verify a single family: {omega : X(omega) <= x} is an event for every real number x. Why so little? Because the half-lines (-infinity, x] generate the whole Borel sigma-algebra — every Borel set can be built from them by complements and countable unions. Since preimages respect those exact operations (the preimage of a complement is a complement, the preimage of a union is a union), if all the half-lines pull back to events, then so does everything you can build from them. This is the same generate-from-a-small-family trick you saw used to build the Borel sets in the first place.
Two coin tosses: Omega = {HH, HT, TH, TT}, F = all subsets
X = number of heads.
B on the line X^(-1)(B) (pull back to Omega) in F?
------------------ --------------------------------- -----
{1} {HT, TH} yes
(-inf, 0] {TT} yes
(-inf, 1] {HT, TH, TT} yes
[2, 5] {HH} yes
Every range pulls back to a legal event, so X is measurable, and
P(X = 1) = P({HT, TH}) = 1/2.Why this is not just bureaucracy
It is tempting to file measurability under boring fine print, but it is load-bearing. Recall from the earlier random-variable rung that the cumulative distribution function F(x) = P(X <= x) was the one description that worked for discrete, continuous, and mixed variables alike. That definition only makes sense if {X <= x} is an event for every x — which is precisely the measurability condition. So measurability is not an add-on bolted onto random variables; it is the exact minimal condition under which the cdf, and therefore the entire distribution, even exists.
So when does a function fail to be measurable? Almost never, in practice — and this is the genuinely reassuring news. Every function you can write down with formulas, sums, products, compositions, limits, and continuous operations is automatically measurable. Continuous functions are measurable; sums and products of measurable functions are measurable; a pointwise limit of measurable functions is measurable, which is exactly why the convergence theorems in guide 4 will not constantly throw you off a cliff. The only functions that fail are pathological monsters built using the axiom of choice — the same machinery that produced the non-measurable sets of guide 1. You will never accidentally write one down.
The payoff: pushing probability forward to the line
Now the reward. Once X is measurable, every preimage X^(-1)(B) is a genuine event, so we can measure it — and that lets us define a brand-new probability measure that lives entirely on the real line. For each Borel set B set P_X(B) = P(X^(-1)(B)) = P(X in B): the weight P_X gives to a region B is just the original probability of all the outcomes X maps into B. This transported measure P_X is the pushforward of P along X, and it is what we have been calling the law (or distribution) of X all along. We push the probability forward from the abstract Omega out onto R.
This is the precise reason you are allowed to forget the underlying sample space in everyday work. When you say X ~ Normal(mu, sigma^2), you are not describing any Omega at all — you are naming the pushforward measure on R directly. Two random variables with the same pushforward are statistically identical for every question you can ask about their values, even if one was built from dice and the other from radioactive decay. The cdf, densities, expectation E[X], variance Var(X), and every moment are all computed from P_X alone. The messy Omega did its job carrying us here, and now it can quietly retire.
- Start with the experiment as a probability space (Omega, F, P) — the abstract world of raw outcomes.
- Pick a measurable X: Omega -> R; measurability guarantees each {X in B} is a real event.
- Push forward: define P_X(B) = P(X in B), a probability measure that lives only on the real line.
- Work entirely with P_X from here — cdf, density, E[X], Var(X) — and never mention Omega again.
Honest fine print, and a bridge ahead
Two honest cautions keep the picture true. First, equal distributions do not mean equal random variables. X and -X for a standard normal share the exact same Normal(0,1) law — the same pushforward — yet on any given outcome they take opposite values and are never the same function. The distribution forgets the outcome; it remembers only the spread of values. Second, the pushforward really does discard information: from P_X alone you cannot recover Omega, nor how X relates to a second variable Y on the same experiment. Joint behavior and dependence live upstream on Omega, which is why we cannot push everything forward and walk away forever.
That second caution is exactly the hand-off to the rest of the rung. Because every {X in B} is a genuine event, the collection of all such events forms its own sub-sigma-algebra of F — the information carried by X. Guide 4 will use measurability to define E[X] as a Lebesgue integral against P, a single construction that handles discrete sums and continuous integrals at once and comes with limit theorems that the old Riemann integral could not give. Guide 5 will then ask when two random variables carry *separate* information — independence and product measures — culminating in the zero-one law. Measurability is the quiet hinge all of it turns on.