The problem: integrating against a path with no slope
The previous guide left us with a paradox. We want to make sense of an expression like the integral of f(t) dB(t) — adding up tiny gains, where each step is weighted by an increment dB(t) of Brownian motion. In ordinary calculus we would rewrite dB(t) as B'(t) dt and integrate as usual. But B has nowhere-differentiable paths: B'(t) simply does not exist anywhere. The classical bridge from a sum to an integral is burned, and we have to lay a new one.
Why would anyone want this integral? Picture a trader holding f(t) shares of a stock at time t, where the price wiggles like Brownian motion. Over a tiny interval the price moves by dB(t), so the trader's gain over that sliver is f(t) dB(t), and the total profit is the sum of all those slivers — exactly the integral we are after. The financial meaning is the whole point: the integrand f(t) is a betting strategy, and the integral is the running fortune it produces.
A Riemann integral approximates the area by a sum of rectangles and lets the mesh shrink; the answer does not care whether you sample the height of each rectangle at its left edge, right edge, or middle, because as the rectangles narrow those choices converge to the same number. For Brownian motion that comfort evaporates. Because the path has positive quadratic variation — it wiggles infinitely much over any interval — the left-endpoint and midpoint sums converge to genuinely different limits. The choice of sampling point is no longer a harmless convention; it changes the answer.
Ito's rule: always look backward, never forward
Kiyosi Ito's resolution is beautifully decisive: when you build the approximating sum, always evaluate the integrand f at the left endpoint of each little interval, before the increment dB happens. Chop [0, T] into points 0 = t(0) < t(1) < ... < t(n) = T and form the sum of f(t(k)) * (B(t(k+1)) - B(t(k))). The integrand uses the value at t(k); the Brownian increment B(t(k+1)) - B(t(k)) reaches into the next interval. As the mesh shrinks, this sum converges (in mean square) to the Ito integral.
The left-endpoint rule is not an arbitrary tie-breaker; it is the only choice that respects causality. The integrand f(t(k)) is decided using information available at time t(k) — your share count is fixed before the price jumps — and the increment B(t(k+1)) - B(t(k)) is the future surprise, independent of everything up to t(k). A betting strategy that peeked at the next price move would be cheating, and the midpoint or right-endpoint rules quietly let it peek. Ito's rule encodes the honest principle: you must bet, then watch the coin land.
This single rule buys an enormous prize: the Ito integral is a martingale. Because each increment B(t(k+1)) - B(t(k)) has mean zero and is independent of the already-fixed weight f(t(k)), every new term adds zero on average — the running integral is a fair game with no drift. So the trader's accumulated profit, built from a strategy that cannot see the future, has expected value zero at every horizon. That is exactly the no-free-lunch heartbeat that makes this integral the right tool for pricing, and it is a direct inheritance from the martingale rung you just climbed.
The extra term: why (dB)^2 behaves like dt
Now for the twist that makes stochastic calculus its own subject. In ordinary calculus, when you expand a small change you keep first-order terms and throw away (dx)^2, (dx)^3, ... as negligible. For Brownian motion you cannot throw away (dB)^2. The reason is precisely the quadratic variation result from guide 3: over a small step of length dt, the increment dB has size roughly the square root of dt, so (dB)^2 has size roughly dt — the same order as the terms you must keep. The square of the noise is not negligible; it is a first-class citizen.
The working rule, the engine of everything that follows, is (dB)^2 = dt, together with dt * dB = 0 and (dt)^2 = 0. Read these as a bookkeeping summary of which terms survive in the limit. Honestly: (dB)^2 is itself a random quantity that merely averages to dt; what the identity records is that, when you sum these squared increments over an interval, the random fluctuations wash out and the total converges (in mean square) to the deterministic length of the interval. So (dB)^2 = dt is a statement about the accumulated sum, not a literal equation about one random wiggle — a subtlety worth keeping in mind even as you use the rule mechanically.
Ito's lemma: the chain rule, corrected
Suppose you hold not the Brownian path itself but a smooth function of it, say Y(t) = f(B(t)). How does Y change? In ordinary calculus the chain rule says dY = f'(B) dB and we stop. But a careful Taylor expansion keeps the second-order term: dY = f'(B) dB + (1/2) f''(B) (dB)^2 + ... . Normally the (dB)^2 term is discarded — here it is not, because (dB)^2 = dt. Substituting gives Ito's lemma for this case: dY = f'(B) dB + (1/2) f''(B) dt.
Ordinary chain rule: dY = f'(B) dB
Ito's lemma: dY = f'(B) dB + (1/2) f''(B) dt
\______________/
the Ito correction
(born from (dB)^2 = dt)
General form, X driven by dX = a dt + b dB :
dY = ( f'(X) a + (1/2) f''(X) b^2 ) dt + f'(X) b dBLet us see the correction bite with a tiny worked example: Y = B(t)^2, so f(x) = x^2, f'(x) = 2x, f''(x) = 2. Ito's lemma gives dY = 2B dB + (1/2)(2) dt = 2B dB + dt. Integrate from 0 to T (using B(0) = 0): B(T)^2 = (the Ito integral of 2B dB) + T. Take expectations; the Ito integral is a martingale starting at 0, so its expectation is 0, and we recover E[B(T)^2] = T — the variance of Brownian motion, which we already knew. The reassurance is real: drop the +dt correction and you would get the absurd E[B(T)^2] = 0. The extra term is not decoration; it carries the variance.
Putting it to work: geometric Brownian motion
The headline application is the model for a stock price, the geometric Brownian motion you will use in the next guide for Black-Scholes. It is defined by the stochastic differential equation dS = mu * S dt + sigma * S dB, read as: in each instant the price grows by an average rate mu plus a random kick of size sigma, both scaled by the current price S. The natural guess for its solution is exponential growth with noise, and Ito's lemma is exactly the tool to confirm it and pin down the constant.
- Try Y = log(S) and apply Ito's lemma with f(x) = log(x), so f'(x) = 1/x and f''(x) = -1/x^2. The general form gives dY = ( (1/S)(mu S) + (1/2)(-1/S^2)(sigma S)^2 ) dt + (1/S)(sigma S) dB.
- Simplify each piece: the drift becomes (mu - sigma^2 / 2) dt and the noise becomes sigma dB. So d(log S) = (mu - sigma^2 / 2) dt + sigma dB — a constant drift plus constant-scale noise, which is easy to integrate.
- Integrate from 0 to t and exponentiate: S(t) = S(0) * exp( (mu - sigma^2 / 2) t + sigma B(t) ). The price is the exponential of a Brownian motion with drift — always positive, log-normally distributed at each time.
Stare at the drift the solution actually carries: not mu but mu - sigma^2 / 2. That subtracted sigma^2 / 2 is the Ito correction making itself felt in the real world, and it is genuinely counterintuitive. Naive intuition says a price growing at average rate mu should have log-price growing at rate mu too — but volatility steals sigma^2 / 2 from the growth rate of the log. This is the honest reason a noisier asset with the same average return compounds more slowly, and forgetting the -sigma^2 / 2 is one of the most common mistakes when people first model returns. The extra term that calculus tries to discard is, here, money.