Hypergeometric and Negative Binomial

Where these two fit in the zoo

By now you have met the headliners of the discrete world. The binomial counts how many successes appear in a fixed number of independent trials, each with the same success probability p. The geometric flips the question around and asks how long you wait for the first success. The Poisson handles rare events scattered over time or space. This guide adds the last two regulars, and each one is born by relaxing exactly one assumption you took for granted before.

The hypergeometric distribution keeps the binomial's question — how many successes in a fixed-size sample — but drops the binomial's quiet assumption that the trials are independent with a constant p. It is what you get when you draw without replacement from a finite pool. The negative binomial distribution keeps the geometric's waiting-game spirit but drops the restriction that you stop at the very first success; instead you wait for the r-th success. So one is a binomial that learned to draw from a small bag honestly, and the other is a geometric that learned patience.

The hypergeometric: drawing without replacement

Here is the concrete picture. You have an urn with N marbles, of which K are red (the "successes") and N - K are blue. You scoop out a handful of n marbles all at once — equivalently, draw them one by one without putting any back — and you ask: how many of my n marbles are red? Call that count X. Because each marble you remove changes what is left, the draws are not independent, and the chance the next one is red shifts every time. That is the whole difference from the binomial.

The probability of getting exactly k red marbles is a pure counting ratio, built straight from the combinations you learned in the counting rung. Choose which k of the K reds you got, choose which n - k of the N - K blues you got, and divide by the number of ways to choose any n marbles from all N. There are no powers of p anywhere — the model has no p, only the actual contents of the urn.

P(X = k) = [ C(K, k) * C(N-K, n-k) ] / C(N, n)

Example: deck of 52, K = 4 aces, draw n = 5 cards.
P(exactly 2 aces) = [ C(4,2) * C(48,3) ] / C(52,5)
                  = [ 6 * 17296 ] / 2598960
                  = 103776 / 2598960  ~ 0.0399

The hypergeometric pmf, with a five-card poker draw worked out: about a 4% chance of exactly two aces.

The mean is friendly and worth holding onto: E[X] = n * (K / N). Read it as "sample size times the fraction of the pool that is red," which is exactly what your intuition expects — on average a sample mirrors the urn's composition. The variance is the binomial's n * p * (1 - p) multiplied by an extra factor (N - n) / (N - 1), the finite population correction. That factor is less than 1 whenever you draw more than one marble, so sampling without replacement always gives a slightly tighter spread than sampling with replacement. Intuitively, each draw uses up information about the urn, leaving less room for surprise.

When the urn is huge: hypergeometric becomes binomial

Now for a beautiful and practical bridge. If the urn is enormous compared to your handful, removing a few marbles barely changes the red fraction, so the draws are almost independent and the success chance barely budges. In that regime the hypergeometric is essentially the binomial with p = K / N. This is the hypergeometric-to-binomial limit: as N and K grow with K / N held at p, the hypergeometric pmf converges to the binomial pmf.

This is exactly why pollsters and quality inspectors can usually ignore the without-replacement subtlety. When you survey 1000 people out of a country of 50 million, you are technically drawing without replacement, but the finite population correction (N - n) / (N - 1) is so close to 1 that the binomial is a fine model. A common rule of thumb is that if your sample is less than about 5% of the population, treating it as binomial is harmless. The honest version is that it is always an approximation, and the correction is there if you need it.

The negative binomial: waiting for the r-th success

Switch back to independent trials with a fixed success probability p, exactly the world of the geometric. The geometric distribution answered "how many trials until the first success?" The negative binomial answers the natural sequel: "how many trials until the r-th success?" Picture a basketball player who keeps shooting free throws, each independently sunk with probability p, and you stop the clock the moment she makes her 3rd basket. The number of shots taken is negative binomial with r = 3.

The shape of its pmf has a clean story. To take exactly x shots and stop on the r-th success, two things must hold: the very last shot is a make (probability p), and among the first x - 1 shots there are exactly r - 1 makes scattered in any order. That scattering is counted by a binomial coefficient, and the remaining shots are misses. So P(X = x) = C(x - 1, r - 1) * p^r * (1 - p)^(x - r), for x = r, r + 1, r + 2, and so on.

There is a deeper way to see it that makes the mean and variance fall out for free. Waiting for the r-th success is just waiting for the 1st success, then resetting and waiting for the next, and so on, r times. Each wait is an independent geometric variable, so a negative binomial is a sum of r independent geometrics. By the linearity of expectation, the mean is r times the geometric mean: E[X] = r / p. The variance adds too (the waits are independent): Var(X) = r * (1 - p) / p^2. Setting r = 1 recovers the geometric exactly, which is the honest sanity check.

Two cautions that trip almost everyone

First, the negative binomial has rival definitions, and they are all correct as long as you say which one you mean. Some authors count the total number of trials X (so X starts at r, as above), while others count only the number of failures Y before the r-th success (so Y starts at 0, and Y = X - r). The pmf, the mean, and the variance all shift by that constant r. Software packages differ too, so when you see "negative binomial" always check: is it counting trials, or counting failures? A formula that looks wrong is often just the other convention.

Second, do not let the negative binomial revive the geometric's most famous misconception in disguise. The trials inside it are still independent, so they have no memory; a run of misses does not make the next shot "due." The negative binomial does not describe a system that gets luckier the longer it waits — it simply records how many independent, memoryless trials it took to accumulate r successes. Believing otherwise is the gambler's fallacy wearing a fancier name.

Lining the four up: a unified view

Step back and the discrete zoo organizes itself along two axes. The first axis is what is fixed and what is random. Binomial and hypergeometric fix the number of trials (the sample size) and let the count of successes be random. Geometric and negative binomial flip that: they fix the number of successes you want and let the number of trials be random. The second axis is whether trials are independent with constant p. Binomial and negative binomial assume yes; the hypergeometric is the without-replacement version of the binomial.

Lay the four out in a little two-by-two grid. Across the top, the columns are "fixed number of trials, random count of successes" versus "fixed number of successes, random waiting time." Down the side, the rows are "independent trials with constant p" versus "drawn without replacement." The binomial sits in the top-left, the negative binomial (with the geometric as its r = 1 corner) in the top-right, and the hypergeometric in the bottom-left as the without-replacement cousin of the binomial. The fourth cell, a without-replacement waiting model, exists too but is rarely needed at this level.

Ask what is fixed. Fixed sample size with a random success count points to binomial or hypergeometric; a fixed target number of successes with random waiting points to geometric or negative binomial.
Ask whether items are replaced. Drawing without replacement from a finite pool means hypergeometric; independent trials with constant p mean binomial.
If you are counting successes and the pool is huge relative to the sample, you may approximate the hypergeometric by the binomial with p = K / N.
If you are waiting for several successes, use the negative binomial; if for just one, it collapses to the geometric.

That two-axis map is the real payoff of this rung, and the next guide, choosing the right discrete model, turns it into a practiced reflex. Memorizing five pmf formulas is the shallow version of this knowledge; recognizing which mechanism is generating your data — fixed trials or fixed successes, with or without replacement, common or rare — is the deep version, and it is what lets you reach for the right tool the moment a problem lands on your desk.