The question Bayes answers: reversing the conditional
You already know that conditional probability reshapes the world: learning B happened shrinks the sample space down to B, and you recompute probabilities inside that smaller world. But conditioning has a direction, and the two directions are not the same. P(A given B) and P(B given A) ask genuinely different questions. P(disease given positive test) is what a worried patient cares about; P(positive test given disease) is what the lab measured. Bayes' theorem is the bridge between the two — it lets you compute the direction you want from the direction you know.
The derivation is almost embarrassingly short, and it is worth doing once so the formula never feels like magic. The multiplication rule says the joint probability P(A and B) can be written two ways: as P(A given B) P(B), and equally as P(B given A) P(A). They describe the same overlap, so they are equal: P(A given B) P(B) = P(B given A) P(A). Divide both sides by P(B) and you have Bayes' theorem. That is the whole trick — it is just one symmetric fact about P(A and B), rearranged.
Written out, that single rearrangement is the whole theorem: P(A given B) = P(B given A) P(A) / P(B). Each of the four pieces will earn a name in the next section — the answer P(A given B) is the posterior, P(B given A) is the likelihood, P(A) is the prior, and the denominator P(B) is the evidence. Keep the two-way reading of the joint probability in mind, because that symmetry is literally all that Bayes' theorem is.
Prior, likelihood, posterior: the anatomy of an update
The real power of Bayes' theorem is not the algebra but the story it tells about learning. Read the formula left to right as a verb: it turns one belief into another in light of evidence. The prior P(A) is what you believed about A before seeing the evidence. The likelihood P(B given A) is how well the hypothesis A predicts the evidence B you actually observed. The posterior P(A given B) is your revised belief after the evidence. The denominator P(B) is just a normalizing constant that makes the updated probabilities sum to 1.
Three quick links to the things you have already met. The prior is your starting point, often the base rate of A in the population. The likelihood is a conditional probability you read off the problem — note it is a function of the hypothesis, not of the data, so the likelihoods across different hypotheses need not sum to 1. The posterior is the answer you wanted all along. The denominator P(B) is almost always computed by the law of total probability from the previous guide, summing the likelihood times prior over every hypothesis.
A worked update: the famous medical test
Here is the example that converts most people to Bayesian thinking, because the honest answer is so far from the gut answer. A disease affects 1 person in 1000. A test is 99% sensitive (it catches 99% of true cases) and 95% specific (it correctly clears 95% of healthy people, so it has a 5% false-positive rate). You test positive. What is the probability you actually have the disease? Most people guess around 95%. The truth is under 2%. Let us see exactly why, step by step.
- Name the pieces. Let D = has disease, Pos = tests positive. The prior is P(D) = 0.001, so P(no D) = 0.999. The likelihoods are P(Pos given D) = 0.99 and P(Pos given no D) = 0.05.
- Compute the evidence P(Pos) by the law of total probability: P(Pos) = P(Pos given D) P(D) + P(Pos given no D) P(no D) = 0.99 * 0.001 + 0.05 * 0.999 = 0.00099 + 0.04995 = 0.05094.
- Apply Bayes: P(D given Pos) = P(Pos given D) P(D) / P(Pos) = 0.00099 / 0.05094, which is about 0.0194, or roughly 1.9%.
Why is the answer so small? Because the disease is rare. Out of 100000 people, only about 100 have it (99 of whom test positive), while the other 99900 are healthy — and 5% of those, nearly 5000 people, also test positive by sheer error. So among everyone who tests positive, the true cases (99) are vastly outnumbered by the false alarms (about 5000). Your positive test is real information — it raised your probability from 0.1% to about 1.9%, nearly a twentyfold jump — but it lands you nowhere near certainty, because the small prior anchors the posterior down. This anchoring effect is exactly the base-rate fallacy when people ignore it.
The odds form: updating made effortless
There is a cleaner way to run an update that strips away the annoying denominator. Instead of probabilities, work in odds — the ratio of a hypothesis being true to it being false. In odds form, Bayes' theorem becomes a simple multiplication: posterior odds = prior odds * likelihood ratio. The likelihood ratio is P(B given A) / P(B given not A), how much more probable the evidence is under the hypothesis than against it. The normalizing P(B) cancels out entirely because it appears identically in the numerator and denominator of the two competing hypotheses.
Odds form (no denominator needed): posterior odds = prior odds * likelihood ratio Medical test redone in odds: prior odds of disease = 0.001 / 0.999 ~= 1 : 999 likelihood ratio = 0.99 / 0.05 = 19.8 posterior odds = 19.8 / 999 ~= 1 : 50.5 posterior probability = 1 / (1 + 50.5) ~= 0.0194 (same 1.9%)
The odds form makes a deep idea visible: belief updating is multiplicative and accumulates. If a second independent test also comes back positive, you do not start over — you take the posterior odds from the first test as the new prior odds and multiply by the likelihood ratio again. Today's posterior becomes tomorrow's prior. This is the engine behind Bayesian inference: evidence arrives in pieces, and each piece nudges your odds up or down by its likelihood ratio. A likelihood ratio above 1 supports the hypothesis, below 1 argues against it, and exactly 1 means the evidence is uninformative and leaves your belief unchanged.
Honest caveats: garbage priors, and what "belief" means
Bayes' theorem is a theorem — it is exactly true, a rearrangement of definitions, with no approximation hiding inside it. But its output is only as trustworthy as its inputs. If your prior is badly chosen or your likelihoods are wrong, the posterior will be confidently wrong: garbage in, garbage out. The formula is an honest bookkeeper, not an oracle. It tells you how to combine a prior and evidence coherently; it cannot tell you that your prior was sensible in the first place. Where the prior comes from — a known base rate, a previous experiment, or a frank judgement call — is a real modelling decision you must own.
There is also a quiet philosophical commitment worth naming. To put a prior probability on a hypothesis like "this coin is biased", you are treating probability as a degree of belief, not just a long-run frequency. A strict frequentist would object that the coin either is or is not biased — there is no repeatable experiment in which the hypothesis is true 30% of the time. This is the conditional-probability machinery applied to beliefs, and the base rate discipline keeps it grounded. The two viewpoints often agree on numbers when data is plentiful and disagree most when data is scarce and the prior does heavy lifting; neither is universally "correct", and being clear about which one you are using is part of being honest.