Maximal Inequalities and the Convergence Theorem

From one clever time to all times at once

The last two guides leaned on the optional stopping theorem: if you freeze a martingale at a well-behaved stopping time, its expected value stays put at its start. That is a statement about one time — a time you get to choose, but still a single snapshot. This final guide asks a harder question. Forget choosing a moment: how large does the martingale ever get along the whole path? And does the path eventually stop wandering and settle on a value? These are statements about all times together, and they are exactly the tools that make martingales the workhorse of modern probability.

Here is why all-times control is so much stronger than the ordinary tools you met earlier in the ladder. Markov's inequality bounds the chance that X_n alone is large at a fixed step n. But a process can be small at every fixed inspection time and still spike enormously in between — like a pot that boils over only while you blink. A maximal inequality closes that gap: it bounds the probability that the running maximum, the highest point reached up to time n, ever crosses a level. The structure of a martingale is exactly what lets you upgrade a one-time bound into a whole-history bound for almost no extra cost.

Doob's maximal inequality: bounding the running peak

Let M_0, M_1, ..., M_n be a martingale (or, more generally, a non-negative submartingale — a process that drifts upward in conditional mean). Write M_n^* for the running maximum max(M_0, ..., M_n), the highest value seen so far. The Doob maximal inequality says that for any level a > 0, P(M_n^* >= a) <= E[|M_n|] / a. Read it slowly: the chance that the path *ever* reached a, at any time up to n, is controlled by the *final* expected size E[|M_n|] divided by a. The whole noisy history is governed by a single endpoint expectation.

Compare this with plain Markov, which only gives P(M_n >= a) <= E[|M_n|] / a for the single endpoint. The maximal inequality gives the *same right-hand side* but for the far stronger left-hand event 'the maximum over the whole path crossed a'. You pay nothing extra and you control infinitely more. The trick that buys this is exactly optional stopping: define the stopping time T = the first time the path hits a. On the event {M_n^* >= a} the path has been stopped by time n at a level at least a, and the martingale (or submartingale) property pins down E[M at the stopped time], which is what produces the bound.

Plain Markov (one fixed time):   P( M_n >= a )      <=  E[|M_n|] / a
Doob maximal (whole history):    P( max_{k<=n} M_k >= a )  <=  E[|M_n|] / a

L^2 form (square-integrable martingale):
    E[ (M_n^*)^2 ]   <=   4 * E[ M_n^2 ]

Same endpoint cost, far stronger left-hand event.

Markov controls one snapshot; Doob controls the entire running peak for the same price.

There is a squared version that is even more useful: for a square-integrable martingale, E[(M_n^*)^2] <= 4 E[M_n^2]. In words, the expected size of the worst peak is no more than a constant times the expected size of the endpoint. This is the engine behind Doob's L^p inequalities, and it is how, later in your studies, you control the maximum fluctuation of Brownian motion and of sums of martingale increments — the all-times grip you cannot get from looking at one time at a time.

Counting wobbles: the upcrossing inequality

Why would a martingale converge at all? A sequence of numbers fails to converge for exactly one reason: it keeps oscillating, swinging down past some low level b and back up past some higher level a, over and over, forever. Each such full sweep from below b to above a is called an upcrossing of the interval [b, a]. If a sequence makes only finitely many upcrossings of *every* such interval, it cannot oscillate persistently — it is forced to converge (possibly to plus or minus infinity, but to settle). So convergence becomes a counting question: how many upcrossings can the path afford?

The upcrossing inequality (Doob's, again) caps that count in expectation. Let U_n[b, a] be the number of times the martingale completes an upcrossing of [b, a] by step n. Then E[U_n[b, a]] <= E[(M_n - b)^+] / (a - b), where x^+ means max(x, 0). The proof is a beautiful payoff of the earlier guide on the martingale transform: imagine a gambler who buys in whenever the price dips below b and sells out whenever it rises above a. Each completed upcrossing books at least a profit of (a - b). But you cannot beat a fair game — the expected gain from any such predictable strategy is non-positive — so the number of those guaranteed profits must be small. The fairness of the game is precisely what forbids endless oscillation.

The martingale convergence theorem

Now the payoff. The martingale convergence theorem states: if M_n is a martingale (or submartingale) that is bounded in L^1 — meaning sup_n E[|M_n|] is finite, the sizes do not blow up on average — then M_n converges almost surely to a finite limit M_infinity. No assumption that the increments shrink, no smoothness, nothing about a formula. Just 'fair game' plus 'does not explode on average' is enough to guarantee the path settles down to a definite value with probability one. This is one of the most surprising and powerful convergence results in all of probability.

The proof is the upcrossing inequality, used in one clean stroke. If sup_n E[|M_n|] is finite, then for every fixed interval [b, a] the bound E[U_n[b, a]] <= E[(M_n - b)^+]/(a - b) stays bounded as n grows, so the total number of upcrossings of [b, a] over the whole infinite path has finite expectation — hence is finite almost surely. But a path can fail to converge only if it upcrosses *some* rational interval [b, a] infinitely often. Ruling that out for every rational interval (countably many, so a union of probability-zero bad events is still probability zero) leaves the path no room to oscillate. It must converge.

Assume M_n is an L^1-bounded martingale: sup_n E[|M_n|] is finite.
For any rationals b < a, the upcrossing inequality keeps E[U_n[b, a]] bounded in n, so the path upcrosses [b, a] only finitely often, almost surely.
Take the union over all (countably many) rational intervals — still a probability-zero exceptional set — so almost every path upcrosses no interval infinitely often.
A real sequence with finitely many upcrossings of every interval cannot oscillate, so it converges; L^1-boundedness keeps the limit finite. Hence M_n converges almost surely to a finite M_infinity.

What convergence does and does not promise

Be precise about what is promised, because this is where intuition trips. The theorem gives convergence of the *random variables* M_n to M_infinity along almost every path. It does not automatically give E[M_n] -> E[M_infinity], nor M_n = E[M_infinity given F_n]. Those extra conclusions need a stronger hypothesis — uniform integrability, equivalently L^p-boundedness for some p > 1 (this is the L^p convergence upgrade). Without it, the limit can quietly lose mass.

The cleanest cautionary tale is a random walk doubling game. Bet 1 on a fair coin; if you lose, you keep playing, and your fortune M_n is a martingale with E[M_n] = 0 at every step. Now stop the first time you are ahead by 1 — with probability one that time arrives, so M_n converges almost surely to the constant 1. But E[M_n] = 0 for all n, while E[M_infinity] = 1. The limit exists pathwise yet the expectation jumped. The convergence theorem held perfectly; it simply never promised the means would follow. This is the same uniform-integrability gap that made optional stopping fail in the gambler's-ruin guide when the stopping time was unbounded.

One last honesty check before you climb on. A bounded non-negative supermartingale always converges, because it is automatically L^1-bounded — this is why the simplest convergence applications (like Polya urn proportions, or the survival probabilities in a branching process) come out so cleanly. But 'converges almost surely' is a statement about probability one, not about every single path: rare exceptional paths that wander forever can exist, they just form a set of probability zero. And convergence to M_infinity says nothing about *how fast* — for rates you reach for the maximal and concentration inequalities like Azuma-Hoeffding you met earlier. With these three results — maximal control, upcrossing control, and the convergence theorem — you now hold the core machinery that makes martingales the most reusable tool in the probabilist's kit.