From events to numbers
In the rungs behind you, probability lived on a sample space of raw outcomes, and you assigned numbers to events — to whole collections of outcomes — using the Kolmogorov axioms. That is powerful, but it is also clumsy for everyday questions. Roll two dice and the raw outcome is a pair like (3, 5). Most of the time you do not care about the pair; you care about the total, here 8, or whether the total is at least 10, or how the totals behave on average. You want to work with numbers, not with bags of outcomes.
This is the whole reason a random variable exists. It is the bridge that carries you from the world of outcomes into the world of numbers, where calculus and arithmetic can finally help. Everything in this rung — the mass function, the density, the cumulative distribution function, quantiles — is just a different way of describing the numbers a random variable produces. So getting the core idea exactly right now pays off in every guide that follows.
A random variable is a function, not a number
Here is the precise idea, and it surprises almost everyone the first time. A random variable is a function that takes an outcome and returns a real number. We usually write it with a capital letter, say X, and write X(omega) for the number it assigns to the outcome omega. The randomness does not live in X — the function itself is a fixed, deterministic rule. The randomness lives in which outcome omega the experiment happens to produce. X just reads a number off whatever shows up.
Take the two-dice example. The sample space is all 36 ordered pairs. Define X = the sum of the two faces. Then X is the rule "add the two numbers": X((3, 5)) = 8, X((6, 6)) = 12, X((1, 1)) = 2. Before you roll, you cannot say what value X will take — but X itself was never uncertain. It was decided the moment you said "the sum." This is why the name is a little misleading: a random variable is neither especially random nor a variable in the algebra sense. "Random function on outcomes" would be more honest, but the traditional name has stuck.
omega X(omega) = sum ---------- -------------- (1,1) 2 (3,5) 8 (5,3) 8 (6,6) 12 X maps each of the 36 outcomes to a number on the real line.
How a random variable inherits its probabilities
Once X turns outcomes into numbers, every statement about X is secretly a statement about an event. When we write P(X = 8), we mean the probability of the event {all outcomes omega with X(omega) = 8}. For the dice sum, those outcomes are (2,6), (3,5), (4,4), (5,3), (6,2) — five of the 36 equally likely pairs — so P(X = 8) = 5/36. Likewise P(X >= 10) is the probability of the event {outcomes whose sum is 10, 11, or 12}. The random variable does not invent new probabilities; it pulls them back from the events it points at.
This pull-back gives the random variable its own probability bookkeeping, called the law (or distribution) of X: the recipe that says, for any set of numbers B, how much probability X lands in B. There is even a technical name for the requirement that makes this always work — X must be a measurable function, meaning the set {X is in B} is always a genuine event we can assign probability to. For the ordinary variables in this course that condition holds automatically, so you can hold it lightly for now; just know it is the honest fine print under the hood.
Two flavors: counts versus measurements
Random variables come in two broad styles, and the split decides which tools you reach for. A discrete random variable takes values in a list you can count — typically whole numbers: the dice sum (2 through 12), the number of heads in ten flips, the number of emails arriving in an hour. A continuous random variable takes values across a whole interval of the real line: a person's exact height, the time until a bulb burns out, a measurement read off a dial. The distinction is the topic of discrete versus continuous random variables, and the next two guides give each style its own chapter.
For a discrete variable, you describe it by listing how much probability sits on each value: P(X = 2) = 1/36, P(X = 3) = 2/36, and so on. That table of point-probabilities is the probability mass function, the pmf. For a continuous variable you cannot do this, for a reason that genuinely matters: a single exact value has probability zero. The chance that a height is *exactly* 170.000... cm, with infinitely many trailing digits all zero, is 0. So instead you describe a continuous variable with a density, which measures probability per unit length rather than at a point.
One description to rule them all: the cdf
The pmf works for discrete variables and the density works for continuous ones, but there is a single description that works for both, and for the messy in-between cases too: the cumulative distribution function, or cdf. It is defined as F(x) = P(X <= x): the probability that X comes out at or below the threshold x. As you slide x from far left to far right, F climbs from 0 up to 1, sweeping up probability as it goes. Because it answers "how much probability is to the left of here?", the cdf always makes sense, whether the probability sits on points, smears across an interval, or does both.
That universality is why the cdf gets its own guide (guide 4) and why it is the deepest of the four descriptions. It also carries a beautiful fact worth previewing: the cdf is the full fingerprint of the variable. The distribution is the complete probabilistic description — two random variables with the same cdf are statistically indistinguishable in every question you could ask about their values, even if one is the dice sum and the other comes from a totally different experiment. When two variables share a law like this, we call them identically distributed.
- Discrete: describe X by its pmf, a table of point-masses P(X = value), and the cdf jumps up at each value.
- Continuous: describe X by its density, where probability is area under the curve, and the cdf rises smoothly.
- Either way: the cdf F(x) = P(X <= x) works, climbing from 0 to 1 and encoding the full distribution.
Reading the distribution backwards: quantiles and survival
The cdf asks "given a value x, how much probability is below it?" Often you want the reverse question: "given a probability, which value sits at that level?" That inverse is a quantile. The most familiar quantile is the median, the value with half the probability on each side, P(X <= m) = 0.5. Percentiles are the same idea in hundredths: the 90th percentile of an exam is the score that 90 percent of people fall at or below. Quantiles are how you talk about typical and extreme values without committing to a single average, and they are the subject of guide 5.
There is also a flipped cousin of the cdf that shows up everywhere in reliability and survival analysis: the survival function S(x) = P(X > x) = 1 - F(x), the probability of lasting *beyond* x. If X is the lifetime of a machine, S(x) is the chance it is still running after time x. The survival function and its rate of decay (the hazard rate) are the natural language for "how long until something fails," and they round out guide 5. Notice they are not new information — both are just the cdf, read from the other end.
Step back and you can see the architecture of the whole rung. A random variable is one function from outcomes to numbers; the support is the set of values it can actually take; the pmf, the density, the cdf, the quantile function, and the survival function are five different windows onto the *same* underlying distribution. Choosing the right window for the question — area for an interval, the cdf for "at most," a quantile for "which value," survival for "how long" — is most of the practical skill you are building here.