Equally Likely Outcomes and Classical Probability

The oldest formula in probability

You already have the machinery from the earlier guides: a sample space is the set of all possible outcomes of a random experiment, an event is a subset of it, and the axioms tell us probability is a non-negative, normalized, additive way of measuring those subsets. The classical definition of probability is what happens when you add one extra assumption on top: that every individual outcome is equally likely. Roll a fair die and the six faces are interchangeable; deal a card from a well-shuffled deck and all 52 are on equal footing. Once you believe that, probability has nowhere left to hide — it has to be counting.

Concretely: if the sample space has N outcomes, all equally likely, and an event A contains k of them, then P(A) = k / N. Favourable cases over total cases. For a fair die, P(roll is even) = 3 / 6 = 1/2, because three of the six faces (2, 4, 6) are favourable. That's it. The whole reason this is worth a guide is that the simplicity is deceptive: the formula is only correct when the equally-likely assumption truly holds, and choosing the right N hides almost all the difficulty.

P(A) = (number of outcomes in A) / (number of outcomes in S)
     = k / N        -- valid ONLY when all N outcomes are equally likely

The classical formula, with its one load-bearing condition stated out loud.

Why it's a theorem, not a new axiom

It's tempting to treat "favourable over total" as the definition of probability — historically that's exactly how Laplace and his contemporaries thought. But from the modern viewpoint of the previous guide, the classical rule is a consequence of the axioms, not a competitor to them. Suppose each of the N outcomes has the same probability, call it p. The outcomes are elementary and mutually exclusive, so by additivity their probabilities sum to P(S). Normalization forces P(S) = 1, hence N · p = 1, so p = 1/N. An event A with k outcomes then has P(A) = k · (1/N) = k/N by additivity again. The classical formula falls straight out — no new assumption beyond "equally likely" was needed.

This matters because it tells you the limits of the rule precisely. The classical definition lives entirely inside one assumption: a finite sample space whose outcomes you have good reason to call equally likely. When that symmetry is real — fair dice, shuffled cards, a roulette wheel — counting is exact and beautiful. When it isn't, the formula gives confident nonsense. Probability theory had to grow beyond Laplace exactly because most real experiments have no such symmetry, which is what the broader interpretations of probability are about.

Choosing the sample space is the whole game

Here's the classic trap. Roll two distinguishable dice and ask for the probability the sum is 7. A careless solver lists the possible sums — 2, 3, 4, all the way to 12, eleven values — and says P(sum = 7) = 1/11. Wrong. Those eleven sums are not equally likely. A sum of 2 happens only one way (1,1), but a sum of 7 happens six ways: (1,6), (2,5), (3,4), (4,3), (5,2), (6,1). The honest sample space is the 36 ordered pairs, and those genuinely are equally likely if the dice are fair. So P(sum = 7) = 6/36 = 1/6. The lesson: you must choose a sample space whose outcomes really are equally likely before you start counting.

Notice what we did: we refined the sample space until the outcomes became interchangeable, then we counted. This is exactly why the next rung of this ladder is about counting. Once you commit to an equally-likely sample space, every classical probability becomes a ratio of two counts, and the art is counting the favourable cases without missing or double-counting. The fundamental counting principle (multiply the choices at each independent stage) and combinations (count unordered selections) are the workhorses you'll meet there. For now, just hold the discipline: identify N first, identify k second, and make sure the N outcomes are equally likely.

State the experiment and write down a sample space whose outcomes you can honestly defend as equally likely (refine it if the obvious description isn't).
Count the total number of outcomes, N.
Count the favourable outcomes, k — the ones in the event you care about.
Report P(A) = k / N, and sanity-check that it lands between 0 and 1.

When outcomes are not countable: geometric probability

What if there are infinitely many equally likely outcomes — say you drop a point at random onto a one-metre stick? You can't count favourable cases over total cases, because both are infinite. The natural fix keeps the spirit of "equally likely" but swaps counting for measuring. In geometric probability, the probability of landing in a region is the size of that region divided by the size of the whole space — length over length, area over area, volume over volume. If the point lands uniformly on the stick, P(it lands in the first 30 cm) = 30/100 = 0.3. Same ratio idea as k/N, just with a continuous notion of size.

Geometric probability is your first taste of why a single point can have probability zero. The chance the dropped point lands at exactly the 50 cm mark is 0, because a single point has length 0, and 0 / 100 = 0. That is not a contradiction and it does not mean the event is impossible — the point does land somewhere. It's a genuine feature of continuous models that we'll meet again, sharply, when we get to densities: a probability of zero is not the same as "can't happen". The later guides on continuous random variables build directly on this picture.

The misconceptions that come bundled with it

The biggest mistake is forcing equal likelihood onto outcomes that aren't symmetric. "Either I win the lottery or I don't, so it's 50-50" is the canonical disaster: the two outcomes exist, but there is no symmetry making them equal, so k/N with N = 2 is simply false. The classical rule is a privilege you earn by demonstrating symmetry, never a default. Tomorrow either rains or it doesn't, but that gives you no information about the probability of rain.

A second confusion is mixing up "equally likely" with the fairness of trials over time. Because each spin of a fair roulette wheel has equally likely outcomes, people wrongly conclude that after five reds a black is "due" to balance things out. This is the gambler's fallacy, and it's a misreading of what equal likelihood says. Equally likely outcomes within one trial say nothing about memory across trials; a fair wheel has none. The honest statement is that the long-run fraction of reds approaches its true value, not that short runs must even out — a distinction the law of large numbers, several rungs ahead, makes precise.

A third, subtler trap: the classical rule needs a finite N. There is no uniform way to pick a positive integer "at random" so that every integer is equally likely — if each had the same probability p, then either p = 0 (and the total is 0, not 1) or p > 0 (and the total is infinite). The countable case has no uniform distribution. That's why geometric probability uses length or area, not counting, for continuous spaces, and why "pick a random number" always needs a stated range. Symmetry is powerful, but it is finite-flavoured.