Why a map, and not just five formulas
Across the last four guides you met the whole discrete cast: the Bernoulli atom and its sum the binomial, the geometric waiting time and its generalization the negative binomial, the Poisson law of rare events, and the hypergeometric for sampling without replacement. Knowing each one in isolation is necessary but not sufficient. The real skill — the one this final guide builds — is the [[choosing-the-discrete-model|choosing the discrete model]] step: standing in front of a fresh word problem and recognising which distribution it is, the way a birdwatcher names a bird in two seconds from its silhouette and how it moves.
The good news is that the choice is almost never a guess. Each distribution answers one specific kind of question and rests on a specific set of assumptions. Misnaming the model is rarely a calculation error; it is an assumption error — counting with a binomial what was really sampled without replacement, or modelling a waiting time with a binomial. So the procedure is to interrogate the problem, not to pattern-match the prose. Three or four sharp questions sort almost every elementary count into its right box.
The decision questions
Here is the questionnaire I run in my head. The very first split is the most important: are you counting how many successes happen in a fixed number of trials, or counting how many trials it takes until some successes happen? That single distinction separates the binomial-and-hypergeometric family (fixed n, random count) from the geometric-and-negative-binomial family (fixed count of successes, random number of trials). Get that backwards and every later step is wasted.
- Is the number of trials fixed in advance, or is it random because you stop when you reach a target number of successes? Fixed trials -> binomial / hypergeometric family. Stop-on-success -> geometric / negative binomial family.
- Do the trials stay independent with a constant success probability p? Yes (sampling with replacement, or a huge population) -> binomial / geometric. No, because each draw changes the odds (sampling without replacement from a finite pool) -> hypergeometric.
- In the waiting-time family, are you waiting for the FIRST success (geometric) or for the r-th success (negative binomial)? r = 1 is just the geometric special case.
- Are you instead counting how many rare events fall in a fixed window of time, space, or volume, with no natural n and no upper bound? -> Poisson, with its single rate lambda = expected count.
A side-by-side cheat sheet
It helps to see the five lined up by what they count, their parameters, and their mean. Notice how the mean encodes the story: n p is "n trials each worth p"; 1/p is "on average one success every 1/p trials"; lambda is simply the rate you were given. Reading a mean back into words is a great sanity check after you have chosen a model.
model counts params E[X] ----------------- ----------------------------- ------------ -------- Bernoulli(p) success in ONE trial p p Binomial(n,p) successes in n indep. trials n, p n p Geometric(p) trials until 1st success p 1/p NegBinom(r,p) trials until r-th success r, p r/p Poisson(lambda) rare events in a fixed window lambda lambda Hypergeom(N,K,n) successes in n draws, no repl. N, K, n n K / N
Two boundary facts in that table are worth carrying. Bernoulli is just Binomial(1, p) — the atom is the n = 1 case, not a separate species. And the geometric is just NegBinom(1, p): waiting for the first success is waiting for the r-th success with r = 1. Seeing these as special cases rather than five unrelated formulas is exactly the shift this guide wants you to make; the family tree has fewer roots than it first appears.
How they turn into one another
These distributions are not five islands — they are connected by limits, and the connections are themselves modelling guidance. The cleanest bridge is from hypergeometric to binomial. Sampling without replacement from a population of N is hypergeometric, but if N is enormous compared with the sample size n, removing a few items barely moves the odds, so p = K/N stays almost constant and the hypergeometric is well approximated by Binomial(n, K/N). Practically: when the population dwarfs your sample (a common rule of thumb is n less than about 5% of N), you may use the simpler binomial and lose almost nothing.
The second bridge runs from binomial to Poisson. If n is large and p is tiny while the mean n p stays moderate, the binomial collapses onto Poisson(lambda) with lambda = n p — this is the Poisson approximation to the binomial, the formal face of the law of rare events from guide 3. It is why you can model the count of typos on a page, or radioactive decays in a second, with a single rate: there are a vast number of tiny independent chances, each almost never firing. Chaining the two bridges, a hypergeometric with huge N and tiny K/N is first binomial, then Poisson.
Worked diagnoses
Let us run the questionnaire on a few problems and watch the right model surface. (a) "A factory line is 2% defective; in a box of 100 items, how many defects?" Fixed n = 100, constant p = 0.02, independent items -> Binomial(100, 0.02). Since n is large and p tiny with n p = 2, you may also approximate by Poisson(2). (b) "A call centre gets on average 5 calls a minute; chance of exactly 8 next minute?" A rate in a fixed window, no n -> Poisson(5). (c) "You keep dialing a busy number, each attempt connects with probability 0.3; how many tries until you get through?" Waiting for the first success, constant p -> Geometric(0.3), mean 1/0.3 just over 3 tries.
Now the trickier ones, where the without-replacement and waiting-for-r structure decides it. (d) "A bag has 50 marbles, 12 red; you grab 6 without looking; how many red?" Fixed sample n = 6, but drawing from a small finite pool with no replacement, so p changes each grab -> hypergeometric, Hypergeometric(N=50, K=12, n=6), mean 6*12/50 = 1.44. Here the binomial would be wrong because 6 is not negligible against 50. (e) "You flip a fair coin until you have collected 3 heads; how many flips total?" Waiting for the r-th success with r = 3, constant p -> NegBinom(3, 0.5), mean r/p = 6. Each diagnosis came straight from the four questions, not from memorising the problems.
Two traps to close on, because they catch careful people. First, do not let independence quietly fail: counting aces in a 5-card poker hand feels binomial but is hypergeometric, because the deck shrinks as you draw. Second, remember that independent trials have no memory — after nine straight reds at roulette, the tenth spin is unchanged; the geometric's defining memorylessness is the precise statement that your past failures do not bring success any closer. Choosing the right model is not only picking the formula; it is honestly checking that the assumptions behind that formula actually hold for your problem.