Discrete Variables and the Probability Mass Function

From the random variable to its values

In the previous guide we met the random variable as a rule that turns each outcome of an experiment into a number — formally a function from the sample space Omega to the real line. This guide zooms in on the friendliest kind: a discrete random variable, one whose possible values form a separated list you could read out loud, like 0, 1, 2, 3, ... or {1, 2, 3, 4, 5, 6}. A die face, the number of heads in ten tosses, the count of emails before lunch — all discrete.

The contrast — drawn fully in the next guide — is with a continuous random variable, whose values fill an entire interval with no gaps, like "the exact time the next bus arrives." The honest dividing line is not "are the numbers whole?" but "can you list the values, one by one, even if the list never ends?" If you can, the variable is discrete and the tool of this guide applies. If the values form a smear with no first neighbour, you need a density instead.

The set of values a discrete variable actually lands on with positive probability has a name: the support of the distribution. For a fair die the support is {1, 2, 3, 4, 5, 6}; for the count of heads in three tosses it is {0, 1, 2, 3}. Everything outside the support gets probability exactly zero, and we usually do not even bother to list those values. Pinning down the support is the quiet first step of describing any discrete variable.

The probability mass function: a value, a weight

The complete description of a discrete variable X is shockingly simple: just say, for each value x, how likely X is to equal exactly that value. That rule is the probability mass function, written p(x) = P(X = x). The word mass is the right picture: imagine the real line as a long thin rod, and you are sprinkling a fixed total of one kilogram of probability onto it, dropping discrete lumps only at the support points. The pmf tells you the weight of the lump sitting on each value.

A function p(x) earns the name pmf exactly when it obeys two honest rules, both inherited straight from the axioms of probability. First, no lump can be negative: p(x) >= 0 for every x, because a probability can never be below zero. Second, all the lumps must add up to the whole kilogram: the sum of p(x) over every value in the support equals 1, because something in the support must happen. Any non-negative function whose values sum to 1 is a legitimate pmf, and every pmf is such a function — there is nothing more to it.

X = number of heads in 3 fair coin tosses

   x   |   0     1     2     3
  p(x) |  1/8   3/8   3/8   1/8        <- the pmf (a table of lumps)

  check non-negative:  every entry >= 0          OK
  check total mass:    1/8 + 3/8 + 3/8 + 1/8 = 8/8 = 1   OK

  P(X = 2)        = p(2)         = 3/8
  P(X >= 2)       = p(2) + p(3)  = 3/8 + 1/8 = 4/8 = 1/2
  P(1 <= X <= 2)  = p(1) + p(2)  = 3/8 + 3/8 = 6/8 = 3/4

A pmf is just a table of weights that sum to 1; the probability of any event is found by adding up the relevant lumps.

Once you have the pmf, you can answer every probability question about X by addition. The chance that X falls in some set of values is just the total mass sitting on those values: P(X in A) = sum of p(x) over the x in A. There is no calculus, no subtlety — discrete probability is bookkeeping, the careful adding-up of lumps. That simplicity is exactly what breaks down in the continuous case, where single points carry no mass and sums must become integrals.

Three pmfs worth knowing by sight

Most of the discrete distributions you will ever meet are dressed-up versions of a few stories. The simplest is the Bernoulli distribution: a single yes/no trial that succeeds with probability p and fails with probability 1 - p. Its pmf is just two lumps, p(1) = p and p(0) = 1 - p. A coin that lands heads with chance p, a free throw that goes in, a customer who does or does not buy — every binary event is a Bernoulli variable, and it is the atom from which richer counts are built.

Stack n independent Bernoulli trials, each with success probability p, and count the successes: that count follows the binomial distribution. Its pmf, p(k) = C(n, k) * p^k * (1 - p)^(n - k), reads like a sentence — choose which k of the n trials succeeded (the binomial coefficient C(n, k)), multiply by the probability of k successes (p^k) and n - k failures ((1 - p)^(n - k)). The three-coin table above is exactly the binomial with n = 3 and p = 1/2, which is why those C(3, k) counts 1, 3, 3, 1 appeared.

Even gentler is the discrete uniform distribution — the fair die — where every value in a finite support of size n gets the same lump, p(x) = 1/n. It is the discrete cousin of "equally likely outcomes" from the foundations rung. A different and important shape arrives when you count rare events over a fixed window, like typos on a page or calls in an hour: that is the Poisson distribution, whose single parameter lambda is both the average count and a knob for how the mass spreads. We unpack each of these properly in the very next rung.

From mass to running totals: the cdf

There is a second, equivalent way to package the same information: instead of asking how much mass sits exactly at x, ask how much mass lies at or below x. That running total is the cumulative distribution function, F(x) = P(X <= x). For a discrete variable, F is built by sweeping left to right and adding lumps as you pass them, so its graph is a staircase: flat between support points, then jumping straight up by exactly p(x) at each value where mass sits.

The staircase and the table of lumps carry exactly the same content — you can rebuild either from the other. The size of the jump in F at a value x is precisely the pmf there: p(x) = F(x) - F(x just below x). So a discrete variable shows itself as a cdf with genuine vertical jumps, whereas a continuous variable's cdf climbs smoothly with no jumps at all. This is the cleanest litmus test for which world you are in, and it is the heart of the dedicated cdf guide later in this rung.

What the pmf is, and what it is not

The pmf is the variable's full identity card. Two different experiments that yield the same table of lumps are, for every probability purpose, the same random variable — they are identically distributed even if their underlying outcomes are wildly different. The number of heads in two fair tosses and the number of even results in two die rolls have the very same pmf {0: 1/4, 1: 1/2, 2: 1/4}; once you know the pmf you know everything probability can tell you, and the original sample space quietly drops out of sight.

Two honest cautions before the next guide. First, a pmf value really is a probability — p(2) = 3/8 means a genuine 37.5 percent chance — and this is exactly the property a continuous density does not share: a density value is not a probability, and a single point in the continuous world carries probability zero. Hold that distinction; it is the headline of the next guide. Second, p(x) must stay between 0 and 1 because it is a probability, whereas the heights of a density can legitimately exceed 1 — another sign that mass and density are different beasts.

There is also a tidy bridge between the two worlds: a pmf and a density are two answers to the same question — how is the one unit of probability spread out? — given for the two kinds of support. Discrete variables hang it in lumps you add; continuous variables smear it as a height you integrate. And some variables do both at once (a rainfall total with a real chance of being exactly zero, then a smear over positive amounts), a mixed case the cdf handles gracefully because a staircase and a smooth climb can simply coexist in one graph.