The Poisson Distribution: The Law of Rare Events

A binomial pushed to its limit

In the first guide of this rung you met the binomial distribution: fix n independent trials, each succeeding with probability p, and count the successes. Now imagine a strange kind of binomial — one with a HUGE number of trials, each almost certain to fail. A big city has hundreds of thousands of people; on any given hour, each one has a tiny chance of calling 999. A page of a book has thousands of character positions; each is a tiny chance of being a typo. A square metre of night sky has countless directions a meteor could come from, each vanishingly unlikely. In every case n is enormous and p is minuscule, yet the AVERAGE number of events, n times p, is a sensible, moderate number.

Here is the magic. As you let n run off to infinity and p shrink to zero, but hold their product fixed at some number we call lambda (so lambda = n times p), the binomial pmf stops depending on n and p separately. It converges to one clean formula that depends ONLY on lambda. That limiting shape is the Poisson distribution, and lambda is its single dial — the binomial-to-Poisson limit in action. We write X ~ Poisson(lambda). It is the distribution you reach for whenever you are counting how many of a great many rare, independent chances actually happened.

The formula and what each piece means

The Poisson pmf gives the probability of seeing exactly k events: P(X = k) = e^(-lambda) times lambda^k / k!, for k = 0, 1, 2, 3, and on forever. Notice there is no upper limit — unlike the binomial, where you can have at most n successes, a Poisson count can in principle be any non-negative whole number. Let us read the formula in pieces. The lambda^k term grows as k grows; the k! in the denominator eventually crushes it back down; and the e^(-lambda) out front is the normalising constant that makes all the probabilities sum to exactly 1.

P(X = k) = e^(-lambda) * lambda^k / k!      k = 0, 1, 2, 3, ...

Example: lambda = 2 (avg 2 events per interval)
  P(X=0) = e^(-2) * 1 / 1   = 0.1353
  P(X=1) = e^(-2) * 2 / 1   = 0.2707
  P(X=2) = e^(-2) * 4 / 2   = 0.2707
  P(X=3) = e^(-2) * 8 / 6   = 0.1804
  P(X=4) = e^(-2) * 16 / 24 = 0.0902
  ... (these and the rest sum to 1)

The Poisson pmf, evaluated at lambda = 2. The most likely counts cluster near lambda.

A worked picture helps. Suppose a small bakery sells, on average, 2 birthday cakes a day, and sales are spread out and independent. Then daily cake sales are roughly Poisson(2). The table above says: about 14% of days see zero cakes, about 27% see exactly one, another 27% see exactly two, and the long tail (5 or more in a day) is rare but never impossible. The probability of a really busy day, P(X >= 5), is best found by the complement: 1 minus P(0) minus P(1) minus P(2) minus P(3) minus P(4), which here is about 0.053 — roughly one bumper day every three weeks.

Mean and variance are both lambda

The Poisson has a signature property that is easy to remember and surprisingly useful: its mean and its variance are the SAME number, lambda. That is, E[X] = lambda and Var(X) = lambda. You can see why from the binomial heritage. A Binomial(n, p) has mean n times p and variance n times p times (1 - p). Push to the Poisson limit: the mean n times p is exactly lambda, and the variance n times p times (1 - p) becomes lambda times (1 - p), but p has gone to zero, so (1 - p) goes to 1 and the variance becomes lambda too. The two quantities meet in the limit.

Be careful not to read more into lambda than it carries. Lambda is a rate or an average count per fixed window — 2 cakes per day, 3 typos per page, 1.2 calls per minute. Doubling the window doubles the average: if cakes are Poisson(2) per day, then over two days they are Poisson(4), because lambda scales with the size of the interval. But lambda is NOT a probability — it can be any positive number, even larger than 1, and the probability of any single count is always e^(-lambda) lambda^k / k!, never lambda itself.

The three conditions for rare events

The name law of rare events is the heart of the matter. The Poisson earns its place precisely when events are individually rare but there are so many opportunities that some happen anyway. To trust the model, three conditions should roughly hold over your counting window. Each one is an honest assumption you can check — and when one breaks, the Poisson breaks with it.

Independence: one event happening should not change the chance of another. If your local 999 calls all spike together because of one car crash, that single cause links them, and the independence assumption fails.
Constant rate: the average rate lambda per unit window should not drift. Typos per page should not get heavier toward the end of the book; if it does, a single lambda no longer describes the whole thing.
No simultaneity: in a tiny enough sub-window, at most one event can occur — events do not arrive two-at-once. Counting raindrops works; counting hailstones that fall in clumps does not.

When all three hold, the same lambda governs events scattered across time or space, and the count over any window is Poisson. This is the seed of a much bigger idea you will meet later in the ladder — the Poisson process, which is just 'a Poisson count in every window, all stitched together consistently'. For now, the practical upshot is the law of rare events itself: lots of unlikely-but-independent chances, summed up, behave like Poisson(lambda) with lambda equal to the expected total.

Using Poisson to approximate a binomial

Because the Poisson is the limit of the binomial, you can run the logic backwards as a calculation shortcut. If you have a genuine binomial with large n and small p, the binomial pmf is a pain — those binomial coefficients and the p^k (1-p)^(n-k) factors get ugly fast. But Poisson approximation lets you replace it with a Poisson(lambda) where lambda = n times p, and just use e^(-lambda) lambda^k / k! instead. The rougher answer is far easier to compute and remarkably close when p is small.

A concrete check. Suppose a factory makes 1000 lightbulbs and each is independently defective with probability 0.002. The exact count of defects is Binomial(1000, 0.002). Set lambda = 1000 times 0.002 = 2 and approximate by Poisson(2). The chance of zero defects is e^(-2) = 0.1353 by Poisson; the exact binomial value is 0.998^1000 = 0.1351. The chance of exactly two defects is 0.2707 by Poisson versus 0.2709 exact. The two are nearly indistinguishable — and the Poisson took one short formula instead of a clumsy binomial coefficient.

Where the Poisson sits in the zoo

Step back and place the Poisson among its neighbours. The binomial counts successes in a FIXED number of trials; the Poisson counts events in a fixed window of time or space with no natural upper bound. The geometric you met in the last guide counts trials until the FIRST success and is memoryless; the Poisson is the matching count over an interval, and its waiting times between events turn out to be the exponential distribution — the continuous, memoryless cousin. These are not separate animals so much as one family seen from different angles.

One more elegant fact worth carrying forward: Poissons ADD. If X ~ Poisson(lambda_1) counts emails and Y ~ Poisson(lambda_2) counts texts, and the two are independent, then X + Y ~ Poisson(lambda_1 + lambda_2) counts all messages. The rates simply sum. This stability under addition is rare and beautiful — most distributions change shape when you add them — and it is what makes the Poisson the natural language for combining independent streams of rare events. Deciding when this distribution, rather than its cousins, is the right choice is exactly the job of the last guide in this rung, on choosing the discrete model.