The capstone: where every tool in this rung comes home
You now hold the whole conditional toolkit. You know that learning something happened means shrinking the sample space to the cases consistent with it; you know Bayes' theorem flips P(evidence given cause) into P(cause given evidence); you know that independence is a special, fragile situation and not the default. This final guide does no new theory at all. Instead it points everything you have at the two puzzles that fool almost everyone — including, famously, professional mathematicians — and watches the confusion evaporate.
Why end a serious rung on "puzzles"? Because these are not parlour tricks. The Monty Hall problem is a clean lesson in how the *way* information arrives changes what it tells you, and the base-rate fallacy is the single most expensive reasoning error in medicine, law, and security — billions of dollars and real prison sentences turn on getting it right. If you can feel *why* these answers are correct rather than merely accept them, you have understood conditional probability in your bones.
Monty Hall: three doors and a host who knows
Here is the Monty Hall problem. Three doors; behind one is a car, behind the other two are goats. You pick a door — say door 1 — but it stays shut. The host, who *knows* where the car is, opens one of the other two doors to reveal a goat — say door 3 — and offers you a swap to door 2. Should you switch? The gut screams "it's 50/50 now, two doors, who cares." The gut is wrong. Switching wins 2/3 of the time; staying wins only 1/3.
The cleanest way to see it is to refuse the temptation to recompute from scratch after the door opens, and instead ask what your *original* pick was worth. When you first chose door 1, you had a 1/3 chance of landing on the car and a 2/3 chance of landing on a goat — nothing the host does later changes that first number, because your door is never the one he opens. So 2/3 of the time the car is behind one of the *other* two doors. The host then helpfully removes the empty one of those two. That means: whenever the car was over there (probability 2/3), it is now sitting behind the single remaining other door. Switching collects that whole 2/3.
Make it conditional to be sure. Let the car be uniformly behind any door, so each prior is 1/3. You picked door 1; the host opened door 3 to show a goat. What is the likelihood of *that observation* under each hidden truth? If the car is behind 1, the host could open 2 or 3 and picks 3 with probability 1/2. If the car is behind 2, he is forced to open 3, probability 1. If the car is behind 3, he can never open it, probability 0. Feed those into Bayes' theorem and the posterior on door 2 comes out exactly 2/3.
Prior P(car) each door = 1/3
You pick door 1. Host opens door 3 showing a goat.
P(host opens 3 | car behind 1) = 1/2 (free choice of 2 or 3)
P(host opens 3 | car behind 2) = 1 (forced: can't open 1 or 2)
P(host opens 3 | car behind 3) = 0 (never reveals the car)
Bayes (door 2):
P(car 2 | open 3) = (1 * 1/3) / (1/2*1/3 + 1*1/3 + 0*1/3)
= (1/3) / (1/2) = 2/3
P(car 1 | open 3) = (1/2 * 1/3) / (1/2) = 1/3
=> Switch to door 2. Win probability 2/3.The hidden hinge: an informed host versus a clumsy one
Why does this feel so wrong? Because intuition quietly assumes the open door is just neutral scenery, when in fact it is a *message from someone who knows the answer*. Picture the contrast. Suppose instead a clumsy assistant who does not know where the car is bumps a door open at random, and it happens to show a goat. Now switching and staying really are 50/50 — because in that world the assistant sometimes reveals the car by accident, and the very fact that he revealed a goat carries different information. Same opened door, same goat, completely different probability. The information lives in the *process*, not the picture.
If three doors still feel slippery, scale up — this is the trick that converts almost everyone. Imagine 100 doors. You pick one (1/100 chance of the car). The knowing host then opens 98 of the remaining 99, every one a goat, leaving exactly one other door shut. Do you cling to your lonely 1/100 first guess, or switch to the door the host pointedly left standing after sweeping away 98 wrong ones? Now switching obviously wins 99/100 of the time. The three-door version is the same effect, just too small for the gut to feel.
The base-rate fallacy: the test is good, you are probably fine
Now the puzzle with real stakes. A disease affects 1 in 1000 people. A test is 99% accurate: it flags 99% of sick people (sensitivity), and it correctly clears 99% of healthy people (so a 1% false-positive rate). You test positive. How worried should you be? Most people, doctors included, blurt "99%." The honest answer is about 9%. You are still far more likely healthy than sick. The error is the base-rate fallacy: ignoring how rare the disease is to begin with — the prior, the base rate.
The fastest cure is to count actual people instead of juggling percentages. Imagine 100,000 people taking the test. About 100 of them are genuinely sick (1 in 1000), and the test flags 99 of those — the true positives. But 99,900 are healthy, and 1% of *them* test positive too: that is 999 false alarms. So among everyone who sees a positive result, 99 are sick and 999 are not. Your chance of really being sick is 99 / (99 + 999) = 99 / 1098 ≈ 0.09. The healthy group is so enormous that even its tiny 1% error rate buries the few true cases.
- Write the prior (base rate): P(sick) = 0.001, so P(healthy) = 0.999. This is the number everyone forgets.
- Write the likelihoods: P(positive given sick) = 0.99 and P(positive given healthy) = 0.01.
- Total-probability for the denominator: P(positive) = 0.99 * 0.001 + 0.01 * 0.999 = 0.00099 + 0.00999 = 0.01098.
- Bayes for the posterior: P(sick given positive) = 0.00099 / 0.01098 ≈ 0.090 — about 9%, not 99%.
Same trap, many faces — and how to stay out of it
The base-rate fallacy wears disguises. In a courtroom it becomes the prosecutor's fallacy: "the chance this DNA matches an innocent person is one in a million, so the defendant is surely guilty." That confuses P(match given innocent) with P(innocent given match) — the exact flip Bayes' theorem exists to correct. In a city of ten million, a one-in-a-million match means about ten innocent people would also match, so a bare match alone leaves the posterior of guilt far below certainty. Same arithmetic as the medical test, far higher stakes.
There is a tidy way to keep all this straight, which the earlier guide on odds hinted at: think in odds and likelihood ratios. Start with the prior odds — for the disease, 1-to-999 against. The evidence multiplies those odds by the likelihood ratio P(positive given sick) / P(positive given healthy) = 0.99 / 0.01 = 99. So the posterior odds are 99-to-999, which is about 1-to-10 — roughly a 9% chance, matching our count exactly. Evidence does not *set* your belief; it *multiplies* your prior odds. Forget the prior and you have thrown away half the calculation.
And that closes the rung. Look back at what united every guide: a probability is not a fixed label on an event but a number that *should move* when information arrives — by conditioning, which shrinks the sample space; by total probability, which assembles the whole from its parts; by Bayes, which reverses the arrow of inference. Monty Hall and the base-rate fallacy are simply what happens when people update on the wrong thing, or forget to update on the base rate at all. You now have the machinery to do it right — and to spot, instantly, when someone else has not.