Foundations of the Theory of Probability (Grundbegriffe der Wahrscheinlichkeitsrechnung)
Probability is just measure: a few axioms, and chance becomes rigorous mathematics.
What is the chance of something? For two hundred years that question had no clean answer — until Kolmogorov noticed that probability is just a way of measuring, like area or weight.
The big idea
Picture the set of everything that could happen — every face a die could land on, every path a particle could take. Call that whole set E. An 'event' is just a chunk of E: 'the die shows an even number' is the chunk {2, 4, 6}. Kolmogorov's insight was that the probability of an event behaves exactly like the size of that chunk.
So he wrote down a handful of simple rules. No probability is negative. The probability that something in E happens is exactly 1 — the whole is the whole. And if two events can't both happen at once, the chance of one-or-the-other is just their chances added together. From those few rules — and one more for handling infinitely many events — every law of probability follows. He had turned chance into a branch of measurement, no looser than geometry.
How it came about
By 1900 probability was a scandal at the heart of mathematics: gamblers and physicists used it daily and it plainly worked, yet no one could say precisely what a 'probability' was, and clever paradoxes kept springing up. David Hilbert, listing the great unsolved problems of the new century, made it his sixth: put probability on a rigorous, axiomatic footing.
The tools arrived from an unexpected place. Henri Lebesgue had built a powerful new theory of how to 'measure' the size of complicated sets, and Maurice Fréchet had stripped it of its geometric trappings so it could measure anything. The young Soviet mathematician Andrey Kolmogorov saw that probability was simply measure wearing a disguise. In 1933 he published a slim 62-page book, in German, that made the identification exact. He was the first to say plainly: probability theory is part of measure theory.
Why it mattered
Before this book, deep results in probability were hard to trust because the ground beneath them was vague. Afterwards, every theorem rested on the same secure footing as the rest of mathematics, and the entire toolkit of Lebesgue integration came along for free. It also tamed the infinite: Kolmogorov showed how to give a precise probability to things like 'a random curve traced out forever,' which is what made the modern theory of random processes — and with it modern finance, statistics, and machine learning — possible. He was careful, too: he openly credited Lebesgue, Fréchet and others, and admitted his axioms say nothing about what a probability really means.
A way to picture it
Think of pouring exactly one litre of water over a map. Every region's 'probability' is just how much water lands on it. No region can hold negative water; the whole map holds the full litre (that is P(E) = 1); and the water on two regions that don't overlap is simply the sum of each. Conditional probability is asking: of the water that fell on this county, what fraction sits in this town? That is the whole of Kolmogorov's idea — probability is liquid you measure, and his axioms are just the rules water already obeys.
Where it sits
Probability began at the gambling table — Pascal and Fermat in the 1650s, then Jacob Bernoulli's law of large numbers and Laplace's grand synthesis. But its foundations stayed shaky for centuries. Kolmogorov's book is the hinge: classical probability before, measure-theoretic probability after. The framework runs straight into the modern world — into Shannon's information theory (also in this Library), into the random walks that price options, and into the convergence guarantees behind today's AI. Rival foundations were proposed — von Mises's frequencies, de Finetti's subjective bets — but it is Kolmogorov's triple (E, F, P) that every textbook now opens with.
Preface — measure and probability
The axioms (Chapter I)
III. To each set A from F is assigned a non-negative real number P(A). This number P(A) is called the probability of the event A.
The axiom of continuity (Chapter II)
Since the new axiom is essential only for infinite fields of probability, it is hardly possible to explain its empirical meaning … Infinite fields of probability occur only as idealized models of real random processes. This understood, we limit ourselves arbitrarily to models that satisfy Axiom VI.