The Uniform and Exponential Distributions

From bars to a smooth curve

You arrive at this rung already comfortable with discrete distributions, where a probability mass function piles a definite chunk of probability on each value. Continuous random variables are different in one disorienting way: any single exact value has probability zero. Ask for the probability that a randomly chosen waiting time is exactly 3.0000... seconds and the answer is 0, because there are uncountably many real numbers competing for the same finite total of 1. Probability now lives not on points but on intervals, and it is described by a [[probability-density-function|probability density function]], written f(x).

Two rules keep a density honest. It must never go negative (you cannot have negative probability anywhere), and its total area must equal exactly 1 (some value certainly occurs). The set of x where f(x) > 0 is the support of the distribution — the region where outcomes can actually land. With this picture in hand we can meet the two friendliest continuous shapes: a flat plateau and a smooth downhill slide.

The uniform: perfectly fair over an interval

The [[continuous-uniform-distribution|continuous uniform distribution]] on an interval [a, b], written X ~ Uniform(a, b), is the continuous version of "every outcome equally likely". Its density is flat: a constant height across [a, b] and zero outside. Since the total area must be 1 and the base has width b - a, the height is forced to be 1/(b - a). There is nothing to memorise here beyond "a rectangle of area 1". A spinner that can stop anywhere on a dial, or the rounding error when a clock reports whole seconds, are both well modelled by a uniform.

Because the density is flat, probabilities are just lengths. For Uniform(0, 10), the chance the value lands in [2, 5] is the area of a rectangle of width 3 and height 1/10, namely 3/10 — you can read it off without integrating. The cumulative distribution function F(x) = P(X <= x) climbs as a straight ramp from 0 at x = a up to 1 at x = b. The mean sits exactly in the middle by symmetry, E[X] = (a + b)/2, and a short integral gives the variance Var(X) = (b - a)^2 / 12. The wider the interval, the more spread out, just as your gut expects.

The exponential: the law of waiting

Now bend the flat plateau into a slide. The [[prob-exponential-distribution|exponential distribution]], X ~ Exponential(lambda), models the waiting time until something happens: the time until the next phone call, the next radioactive decay, the next customer at a counter. Its density lives on the non-negative half-line and decays smoothly, f(x) = lambda * e^(-lambda x) for x >= 0, and 0 for x < 0. The single parameter lambda > 0 is a [[rate-intensity-parameter|rate]] — events per unit time. A larger lambda means events arrive faster, so you wait less on average; a smaller lambda means long, patient waits.

It pays to derive the two facts you will actually use, rather than memorising them. The cumulative distribution function comes from integrating the density: F(x) = P(X <= x) = 1 - e^(-lambda x) for x >= 0. Far more useful in practice is its complement, the survival function, the probability you are still waiting at time x: P(X > x) = e^(-lambda x). The mean waiting time turns out to be the reciprocal of the rate, E[X] = 1/lambda, which reads beautifully: if calls arrive at lambda = 3 per minute, you wait on average 1/3 of a minute. The variance is Var(X) = 1/lambda^2.

X ~ Exponential(lambda),   lambda > 0,   x >= 0

  density   f(x) = lambda * e^(-lambda x)
  CDF       F(x) = 1 - e^(-lambda x)
  survival  P(X > x) = e^(-lambda x)

  E[X]   = 1 / lambda          (mean wait = 1 / rate)
  Var(X) = 1 / lambda^2

The exponential distribution in full: one rate parameter lambda controls everything.

A tiny worked number anchors it. Suppose a help desk receives calls at lambda = 2 per hour, so the wait until the next call is Exponential(2) measured in hours. The probability of waiting more than 30 minutes (x = 0.5 hours) is P(X > 0.5) = e^(-2 * 0.5) = e^(-1) is about 0.37. The average wait is 1/2 hour = 30 minutes, yet the chance of exceeding that average is only 37%, not 50% — the long thin tail drags the mean to the right of the median. That asymmetry is a hallmark of the exponential, and a useful reminder that for a skewed distribution the mean and the typical value are not the same thing.

Why the exponential, and not some other curve?

The exponential is not picked for convenience; it is forced on us by a natural assumption. Recall the discrete Poisson distribution from the previous rung, the law of rare events that counts how many things happen in a fixed window. The continuous companion is the Poisson process: events that occur at a constant average rate lambda, independently, with no two happening at exactly the same instant. The exponential is precisely the distribution of the time between consecutive events of such a process. Counting events gives Poisson; timing the gaps between them gives exponential. They are two views of the same underlying randomness.

Start with the survival function: "still waiting at time x" means no event has occurred in [0, x].
In a Poisson process the number of events in a window of length x is Poisson with mean lambda*x, so the probability of zero events is e^(-lambda x).
Therefore P(X > x) = e^(-lambda x); the waiting time exceeds x exactly when the count in [0, x] is zero.
Differentiate F(x) = 1 - e^(-lambda x) to recover the density f(x) = lambda*e^(-lambda x). The exponential was inevitable once you assumed a constant rate.

This origin story is also the source of the exponential's most famous and most counterintuitive property: memorylessness. If you have already waited 10 minutes for a call, the distribution of the *remaining* wait is exactly the same as it was at the start — the process does not 'warm up' or 'become overdue'. This is the continuous cousin of the gambler's fallacy: an Exponential(lambda) waiting time has no memory of how long you have stood there. It is a genuine feature of constant-rate randomness, not a paradox, and the very next guide is devoted to pinning down exactly what it does and does not mean.

Honest fine print and traps

Both models earn their simplicity by making strong assumptions, and using them where those assumptions fail is the usual mistake. The uniform assumes truly equal likelihood across the whole interval — but real 'random' choices are rarely flat. Human-picked numbers, file sizes, and city populations are all famously non-uniform, so reaching for Uniform(a, b) just because you 'have no information' can be quietly wrong. The exponential assumes a constant rate that never changes with age. That is excellent for radioactive decay and random arrivals, but a poor fit for machine parts or living organisms, whose failure rate climbs as they wear out.

The fix for that last failure points ahead. When the rate is allowed to change with age — captured by the hazard rate, a tool the rest of this rung develops — the exponential generalises into richer families. Summing several independent exponentials gives the gamma distribution (the wait for the k-th event, not just the first), and bending the constant rate into a power of time gives the Weibull, the standard model of wear-out. The exponential is the keystone: the special memoryless case sitting at the centre of a whole family of waiting-time laws you will explore in guide 5.

Finally, a reminder of the bedrock idea this guide introduced, because it underlies everything that follows. For a continuous variable, the value f(x) is not the answer to any probability question on its own; only an area is. When someone reports 'the probability of exactly 4.2 seconds', they are speaking loosely — that probability is 0, and what they almost always mean is the probability of a small interval around 4.2, which is approximately f(4.2) times the interval's width. Keep height and area in separate drawers, and the continuous world stops being confusing.