Mathematics 1933

Foundations of the Theory of Probability (Grundbegriffe der Wahrscheinlichkeitsrechnung)

Andrey Kolmogorov

Probability is just measure: a few axioms, and chance becomes rigorous mathematics.

Choose your version

In depth · the introduction

What is the chance of something? For two hundred years that question had no clean answer — until Kolmogorov noticed that probability is just a way of measuring, like area or weight.

The big idea

Picture the set of everything that could happen — every face a die could land on, every path a particle could take. Call that whole set E. An 'event' is just a chunk of E: 'the die shows an even number' is the chunk {2, 4, 6}. Kolmogorov's insight was that the probability of an event behaves exactly like the size of that chunk.

So he wrote down a handful of simple rules. No probability is negative. The probability that something in E happens is exactly 1 — the whole is the whole. And if two events can't both happen at once, the chance of one-or-the-other is just their chances added together. From those few rules — and one more for handling infinitely many events — every law of probability follows. He had turned chance into a branch of measurement, no looser than geometry.

How it came about

By 1900 probability was a scandal at the heart of mathematics: gamblers and physicists used it daily and it plainly worked, yet no one could say precisely what a 'probability' was, and clever paradoxes kept springing up. David Hilbert, listing the great unsolved problems of the new century, made it his sixth: put probability on a rigorous, axiomatic footing.

The tools arrived from an unexpected place. Henri Lebesgue had built a powerful new theory of how to 'measure' the size of complicated sets, and Maurice Fréchet had stripped it of its geometric trappings so it could measure anything. The young Soviet mathematician Andrey Kolmogorov saw that probability was simply measure wearing a disguise. In 1933 he published a slim 62-page book, in German, that made the identification exact. He was the first to say plainly: probability theory is part of measure theory.

Why it mattered

Before this book, deep results in probability were hard to trust because the ground beneath them was vague. Afterwards, every theorem rested on the same secure footing as the rest of mathematics, and the entire toolkit of Lebesgue integration came along for free. It also tamed the infinite: Kolmogorov showed how to give a precise probability to things like 'a random curve traced out forever,' which is what made the modern theory of random processes — and with it modern finance, statistics, and machine learning — possible. He was careful, too: he openly credited Lebesgue, Fréchet and others, and admitted his axioms say nothing about what a probability really means.

A way to picture it

Think of pouring exactly one litre of water over a map. Every region's 'probability' is just how much water lands on it. No region can hold negative water; the whole map holds the full litre (that is P(E) = 1); and the water on two regions that don't overlap is simply the sum of each. Conditional probability is asking: of the water that fell on this county, what fraction sits in this town? That is the whole of Kolmogorov's idea — probability is liquid you measure, and his axioms are just the rules water already obeys.

Where it sits

Probability began at the gambling table — Pascal and Fermat in the 1650s, then Jacob Bernoulli's law of large numbers and Laplace's grand synthesis. But its foundations stayed shaky for centuries. Kolmogorov's book is the hinge: classical probability before, measure-theoretic probability after. The framework runs straight into the modern world — into Shannon's information theory (also in this Library), into the random walks that price options, and into the convergence guarantees behind today's AI. Rival foundations were proposed — von Mises's frequencies, de Finetti's subjective bets — but it is Kolmogorov's triple (E, F, P) that every textbook now opens with.

The original document

Original source text

Preface — measure and probability

A. Kolmogorov · Foundations of the Theory of Probability · 1933 · Preface (dated Easter 1933, Moscow)

After Lebesgue's investigations, the analogy between the measure of a set and the probability of an event, as well as between the integral of a function and the mathematical expectation of a random variable, was clear.

This analogy could be extended further; for example, many properties of independent random variables are completely analogous to corresponding properties of orthogonal functions. But in order to base probability theory on this analogy, one still needed to liberate the theory of measure and integration from the geometric elements still in the foreground with Lebesgue. This liberation was accomplished by Fréchet.

The axioms (Chapter I)

Chapter I · Elementary Theory of Probability · §1 Axioms

Let E be a set of elements, which we shall call elementary events, and F a set of subsets of E; the elements of the set F will be called random events. Kolmogorov begins with five axioms concerning E and F:

I. F is a field of sets.

II. F contains the set E.

III. To each set A from F is assigned a non-negative real number P(A). This number P(A) is called the probability of the event A.

IV. P(E) = 1.

V. If A and B are disjoint, then P(A + B) = P(A) + P(B).

A system of sets F, together with a definite assignment of numbers P(A) satisfying Axioms I–V, is called a field of probability.

The axiom of continuity (Chapter II)

Chapter II · Infinite Probability Fields · §1 Axiom of Continuity

VI. For a decreasing sequence of events A₁ ⊇ A₂ ⊇ ⋯ of F, for which the product (intersection) of all the Aₙ is empty, the following equation holds: lim P(Aₙ) = 0 as n → ∞.

This is the axiom of continuity. Given the first five axioms, it is equivalent to countable additivity. Kolmogorov is candid about its status:

Since the new axiom is essential only for infinite fields of probability, it is hardly possible to explain its empirical meaning … Infinite fields of probability occur only as idealized models of real random processes. This understood, we limit ourselves arbitrarily to models that satisfy Axiom VI.

[ … ]

Kolmogorov · Moscow · Easter 1933