What Does a Probability Even Mean?

A number we can compute but cannot quite point at

By now you can do real work. You can lay out a sample space, carve it into events, check the axioms, and when the outcomes are equally likely you can divide favourable by total and read off a number. A fair die shows a six with probability P(six) = 1/6 ≈ 0.167. That much is settled. But here is the question the previous four guides quietly stepped over: when you write P(six) = 1/6, *what are you saying is true?* The die is sitting on the table right now, not rolling. Where is the 1/6?

This is not a trick question, and it is not idle philosophy. The same symbol P(rain tomorrow) = 0.7, P(this coin lands heads) = 0.5, and P(the defendant is guilty) = 0.9 each look identical on the page, yet they seem to be about wildly different kinds of thing — a one-off future, a repeatable experiment, and a fact that is already either true or false. A probability is not something you can see or weigh. So before we build the whole subject on top of it, it is worth asking honestly what the number *means*. It turns out there are three respectable answers, and the grown-up view is that they are three lenses on one idea, not three rivals fighting to the death.

Answer one: counting the symmetric cases

The oldest answer is the classical one you already used. If a setup breaks into outcomes that are symmetric — interchangeable by the very design of the experiment — then each deserves the same probability, and P(A) is just the fraction of outcomes that make A happen. A fair die has six faces no reason prefers one over another, so each gets 1/6. This is beautiful when it applies: you need no experiments at all, just a careful count, which is why the classical definition powers every dice, card, and lottery calculation.

But be honest about the catch: this answer leans on a word that smuggles probability back in. "Equally likely" already *is* a probability statement. So the classical definition cannot define probability from scratch — it needs symmetry handed to it for free. And most of the world is not symmetric. There is no set of equally likely faces for "rain tomorrow," no interchangeable cases for whether a new drug works. The classical lens is sharp but narrow: it sees beautifully into games of chance designed to be symmetric, and goes blind the moment symmetry runs out. We need answers that work when nothing is symmetric.

Answer two: the long-run frequency

The frequentist answer says: P(A) is the fraction of times A would happen if you could repeat the experiment forever under the same conditions. P(heads) = 0.5 means that over a huge number of tosses, roughly half land heads — and the *roughly* tightens as the number of tosses grows. This is the meaning that feels most physical, because you can almost see it. Toss a coin 10 times and you might get 7 heads (70%); toss it 10,000 times and the proportion will sit much closer to 50%. The probability is the value the running proportion is settling toward.

And this is not a hope — it is a theorem. The law of large numbers proves that if you average the results of independent repeats, the average converges to the true probability. That is precisely what makes the frequentist picture more than wishful thinking: the long-run frequency really does home in on a fixed number. So when symmetry fails, you can still pin down P(A) the hard way — by repeating and watching. Insurance companies, casinos, and quality-control labs live in exactly this world.

The frequentist lens has its own honest limit, though. It only makes sense for things you can repeat "under the same conditions." What is the long-run frequency of *this specific* election, or of life existing on a particular planet? There is only one of each; you cannot rerun it. Pushed hard, the frequentist must either refuse to assign a probability to one-off events, or appeal to an imaginary infinite sequence that never actually happens. That gap is exactly where the third answer steps in.

Answer three: a measured degree of belief

The subjective or Bayesian answer says: P(A) is *your* degree of belief that A is true, measured on a scale from 0 (certain it is false) to 1 (certain it is true). This is the only answer that comfortably handles P(it rained in this city on this exact date last year) — a fact already fixed, unknown to you. The event is not random; *your information* is incomplete, and the probability measures that incompleteness. When a forecaster says 70% chance of rain, this is the honest reading: given everything the models know, a 0.7 confidence is warranted.

"Belief" sounds dangerously loose, so here is the discipline that saves it. Your beliefs are only allowed to count as probabilities if they obey the very same Kolmogorov axioms. The justification is a *Dutch book*: if your stated odds violate the rules — say you'd accept bets implying P(A) = 0.6 and P(not A) = 0.6 at once — a clever opponent can construct a set of bets you find each fair, yet which together guarantee you lose money no matter what happens. Coherent belief, belief that cannot be milked this way, *must* satisfy the axioms. That is why a degree of belief is a genuine probability and not just a mood.

There is a price, of course. Two reasonable people with the same training but different background information can assign different probabilities to the same claim, and neither is simply "wrong" — they hold different priors. To a frequentist this looks like an admission of arbitrariness; to a Bayesian it is just honesty about the fact that probability is relative to what you know. The saving grace, which the next rung explores, is that evidence pulls disagreeing priors together: feed both people the same data through Bayes' theorem and their conclusions usually converge.

Why the three agree, and how to read a number

Here is the reconciling insight that ties the whole interpretations question together. All three answers obey the *same* axioms, so once the numbers are fixed, every theorem in this entire course — every formula you will ever prove — holds identically no matter which interpretation you favour. The math does not care. The interpretations only disagree about how to *assign* the starting numbers and how to *describe* what they mean, not about how to *manipulate* them. That is why this debate, fierce as it gets, almost never changes a calculation.

And the three lenses tend to point the same way in practice. When symmetry exists, classical counting and long-run frequency and a well-informed belief all land on 1/6 for a fair die. When you have data, the frequency and a coherent belief updated on that data agree. They are three routes to the number, suited to three situations: symmetry (classical), repetition (frequentist), incomplete information about a one-off (Bayesian). A fluent reader of probability silently picks the lens the problem calls for.

It helps to see all three side by side for one statement. Reading a single probability through each lens makes the differences concrete — and shows they are differences of meaning, not of arithmetic.

Statement:  P(A) = 0.7

Classical    : 7 out of 10 symmetric outcomes make A happen.
               (Needs equally-likely cases. Often unavailable.)

Frequentist  : repeat forever -> A happens ~70% of the time.
               (Needs a repeatable experiment. Law of large numbers.)

Bayesian     : given what I know, my coherent belief in A is 0.7.
               (Works for one-off events. Must obey the axioms.)

All three obey the SAME axioms => every later theorem is unaffected.

One number, three honest readings — disagreeing on meaning, agreeing on the math.

Reading the dial: what 0, 1, and the middle really say

Whatever lens you use, a probability lives on a fixed dial from 0 to 1, and it pays to read the dial honestly. P(A) = 0 does not always mean "impossible," and P(A) = 1 does not always mean "guaranteed." In a finite world they do — but the moment you allow infinitely many outcomes, an event can have probability 0 yet still be possible. Pick a real number uniformly between 0 and 1: the probability of hitting *exactly* 0.5 is 0, because one point among a continuum has no width, yet some single number does come up. This is the first hint of a deep idea you will meet later: a density is not a probability, and a single point can carry probability zero without being forbidden.

A small but powerful habit: translate probabilities into *odds* when you want to compare or update beliefs. P(A) = 0.7 is the same as 7-to-3 odds in favour, or 0.7/0.3 ≈ 2.33-to-1. Odds make Bayesian reasoning vivid, because evidence multiplies the odds by a likelihood ratio rather than fiddling with the probability directly. You do not have to choose between the probability and the odds — they are the same information in two outfits — but switching outfits is often what makes a hard update feel easy.

That is the ground floor finished. You have a probability space: a sample space of outcomes, an algebra of events, and a P that obeys the axioms — and now you also know what the P actually claims, in all three honest senses. Everything above this point is built from these pieces. The very next rung takes the most important question the dial cannot answer on its own: how should a probability *change* when you learn something new?