Why the pmf trick breaks down
In the previous guide we tamed a discrete variable with the probability mass function: a list that assigns a chunk of probability to each value the variable can take, with all the chunks summing to 1. That works beautifully when the values are separated — 0, 1, 2, 3 dots on a die — so you can give each one its own positive slice. Now picture a variable that can land on ANY real number in a range: the exact time in seconds you wait for a bus, the precise height of the next person through the door, the angle a spun pointer settles at. This is a continuous random variable, the other half of the discrete-versus-continuous split.
Here the pmf trick collapses, and the reason is worth feeling in your bones. Suppose the bus is equally likely to arrive at any instant in the next 10 minutes. How much probability sits exactly at 'wait = 3.000000... minutes', to infinite decimal places? There are uncountably many such instants crammed into the interval. If each one carried a positive slice, even a tiny one, infinitely many positive slices would add up to infinity, not to 1. The only consistent answer is that each EXACT value carries probability zero. For a continuous variable, P(X = c) = 0 for every single c.
From mass to density: spread it out
If no single point can hold any mass, where did the probability go? We smear it out. Instead of asking how much sits AT a point, we ask how thickly probability is packed NEAR a point — its density. The tool that records this is the probability density function, written f(x). It is exactly the same move as physics: a thin wire has zero mass at any single point, yet it has a mass-per-centimetre at each point, and you recover the mass of a stretch by integrating that density over the stretch. Probability density plays the identical role, with 'probability' standing in for 'mass'.
So probabilities become AREAS. The probability that X lands in an interval [a, b] is the area under the density curve between a and b — that is, the integral P(a <= X <= b) = integral from a to b of f(x) dx. This is why a single point gives zero: the 'interval' from c to c has no width, and a region of zero width has zero area, so P(X = c) = 0 falls right out. A happy by-product: for a continuous variable the endpoints never matter, P(a <= X <= b) = P(a < X < b), because the two extra points contribute zero area each.
A density cannot be just any curve; it must obey two rules that mirror the pmf's rules. First, it can never dip below zero, f(x) >= 0 everywhere — you cannot have negative probability anywhere. Second, the TOTAL area underneath must equal 1: the integral over the whole line of f(x) dx = 1, because X must land somewhere with certainty. The set of x where f(x) > 0 is the support of the distribution; outside it the density is flat zero, and the variable simply never goes there.
The headline: a density is not a probability
Now the single most important point of this whole guide, the one in the title: a density value f(x) is not a probability. This is the density-is-not-probability point, and it trips up almost everyone at first. The number f(x) is a HEIGHT, not a chance. Probabilities live in the AREA under the curve, not in the curve's elevation at a point. You can read off f(x) at a value and learn how relatively likely that neighbourhood is — but f(x) itself is not P(anything).
The cleanest proof that a density is not a probability is that it can exceed 1. Take a uniform distribution on the short interval from 0 to 1/2. The total area must be 1, and that area is height times width = height times (1/2), so the height must be 2. The density is f(x) = 2 over that interval — a perfectly valid density with a value of 2, far above any probability, which can never beat 1. No contradiction: 2 is a height, and the area it makes over a width of 1/2 is exactly 2 times 1/2 = 1, as required. If f could never exceed 1 it would be a probability; the fact that it can proves it is not.
So how should you read a density value at all? Through a TINY interval. The probability that X lands in a sliver of width dx around the point x is approximately f(x) times dx — height times width, an area. So f(x) tells you probability PER UNIT of x, a rate, just like speed is distance per unit time. Doubling f(x) at a point means probability piles up there twice as fast, so a small window there is twice as likely — but you only get an actual probability once you multiply by a width and form an area.
discrete P(X = x) = pmf(x) (an actual probability)
continuous P(X = x) = 0 (every single point)
P(a <= X <= b) = integral_a^b f(x) dx (an AREA)
P(x < X < x+dx) ~ f(x) dx (height x width)
rules: f(x) >= 0 and integral over all x of f(x) dx = 1
f(x) is a height / rate, NOT a probability -- it can exceed 1A worked sliver
Let us make it concrete with the simplest continuous model. Spin a fair pointer so the resting angle X is uniform on [0, 1] (think of it as a fraction of a full turn). Its density is flat: f(x) = 1 for x in [0, 1], and 0 outside. Check the rules — f is never negative, and the total area is height 1 times width 1 = 1. Good. Now find the probability the pointer lands in the first quarter, between 0 and 0.25. It is the area of a rectangle: height 1 times width 0.25 = 0.25. Here the density happens to equal 1, so it looks like a probability — but that coincidence is exactly the uniform-on-[0,1] case, and it does not generalise.
- Take a non-flat density to break the coincidence: f(x) = 2x on [0, 1], zero elsewhere (more probability piled toward 1).
- Confirm it is a legal density: it is never negative on [0, 1], and the area is the integral of 2x from 0 to 1, which equals x^2 evaluated from 0 to 1 = 1. Good.
- Read the height at x = 0.5: f(0.5) = 2 times 0.5 = 1. That is the value 1, but it is a HEIGHT, not a probability — the chance of exactly 0.5 is still zero.
- Find a real probability by integrating, e.g. P(0 <= X <= 0.5) = integral of 2x from 0 to 0.5 = 0.5^2 = 0.25. That area is the answer.
Notice the punchline: at x = 0.5 the density was 1, yet the probability of being anywhere in the entire left HALF of the range was only 0.25. The height and the probability are different numbers carrying different meanings, and only the integral converts one into the other. Whenever someone shows you a curve and asks for a probability, your reflex should be 'where are my limits, and what is the area?' — never just 'read off the height'.
Where the density fits, and what comes next
Step back and the picture is symmetric. A discrete variable is described by a pmf, where probability sits in heights and you SUM; a continuous variable is described by a pdf, where probability sits in area and you INTEGRATE. The most famous bell-shaped curve, the normal distribution, and the lifetime model, the exponential distribution, are both just particular densities — particular shapes of f(x) whose areas you integrate to get probabilities. Learning their stories is the work of the next rung; for now, every one of them lives or dies by the same rule: area, not height.
There is also a clean bridge waiting in the next guide. Both the pmf and the pdf can be folded into one object that works for discrete and continuous variables alike: the cumulative distribution function F(x) = P(X <= x), the running total of probability up to x. For a continuous variable it is the running AREA, F(x) = integral from minus-infinity to x of f(t) dt — so the cdf is the accumulated density, and by the fundamental theorem of calculus the density is the slope of the cdf, f(x) = F'(x). That single function will then hand us quantiles, medians, and the survival function in the guides that follow.