Why Ordinary Calculus Fails: Quadratic Variation

The previous guide left us a problem

In the last guide we met the unsettling geometry of Brownian motion: its paths are continuous — you can draw one without lifting your pen — yet differentiable at no point at all. There is no slope dB/dt anywhere, because the difference quotient [B(t + h) - B(t)] / h has a typical size of sqrt(h)/h = 1/sqrt(h), which blows up to infinity as h shrinks. The path never commits to a tangent direction. That is a vivid fact, but it leaves us stranded: all of calculus is built on derivatives and on integrals of the form 'integral of f times dx'. If dB/dt does not exist, what does it even mean to do calculus driven by B?

This guide answers a sharper, more diagnostic question: not just 'is B smooth?' (no) but 'exactly how rough is B, and what does that roughness do to the rules of calculus?' The answer is a single, astonishingly clean number called the quadratic variation. It will turn out that B is rough in a very precise, measurable way — and that precise roughness is not an obstacle to be apologized for. It is the engine that powers a whole new calculus. By the end you should see why the ordinary chain rule cannot survive, and what quantity has to take its place.

Two ways to measure how much a path moves

Imagine a tiny pen tracing a path over the interval [0, t]. How much did it move? The most natural answer is the total variation: chop [0, t] into many small pieces and add up the absolute sizes of all the little up-and-down changes, sum of |B(t_(k+1)) - B(t_k)|. This is literally the distance the pen travels — the arc length, the path's length. For any smooth curve this is a finite number, and refining the chopping just nails it down more precisely.

For a Brownian path the total variation is infinite. Here is the intuition. Over a small piece of length t/n, the change in B has a typical size of sqrt(t/n) — that is the hallmark sqrt-of-time scaling of Brownian increments. Adding up n of these gives a sum of about n times sqrt(t/n) = sqrt(n) times sqrt(t), which marches off to infinity as n grows. The pen, asked to trace a Brownian path over even one second, would travel an unbounded distance. The path is so crinkly that 'length' is not a useful ruler — it always reads infinity. We need a gentler instrument.

The gentler instrument squares the little changes instead of taking their absolute values. Do exactly the same chopping, but this time add up the SQUARES of the increments: sum of [B(t_(k+1)) - B(t_k)]^2. This is the quadratic variation, written [B, B]_t. Squaring is the whole trick. Each little change is around sqrt(t/n) in size, so its square is around t/n — a much smaller number — and squaring crushes the wild fluctuations that made the total variation explode. The question is whether this gentler sum settles down to something finite. It does, and the answer is the heart of this guide.

The clean answer: [B, B]_t = t

As you make the chopping finer (n to infinity), the sum of squared increments does not shrink to zero and does not blow up — it converges to exactly t. The quadratic variation of Brownian motion over [0, t] is simply t itself: [B, B]_t = t. No constants to remember, no dependence on which random path you happened to draw. Two competing pressures balance perfectly: there are more and more pieces (pushing the sum up) but each squared piece is smaller and smaller (pushing it down), and the squaring is exactly the power at which they cancel into a finite limit.

Why exactly t? It is pure variance bookkeeping, and it leans on two facts from earlier rungs. Each increment B(t_(k+1)) - B(t_k) over a piece of length t/n is Normal(0, t/n), so on average its square equals its variance, E[(increment)^2] = t/n. There are n such pieces, so the expected total is n times (t/n) = t. That fixes the average. Then, because the increments over disjoint pieces are independent (independent stationary increments), the random scatter around that average is a sum of many independent terms — and by the law of large numbers it washes out as n grows. The sum doesn't just average t, it converges to t.

Chop [0, t] into n equal pieces of length t/n.

  increment on piece k:   B(t_{k+1}) - B(t_k)  ~  Normal(0, t/n)

  TOTAL VARIATION   = sum |increment|   ~ n * sqrt(t/n) = sqrt(n*t)  -> infinity
  QUADRATIC VARIATION = sum (increment)^2 ~ n * (t/n)     = t          -> t

  Smooth curve f:   increment ~ (t/n),  (increment)^2 ~ (t/n)^2
                    sum (increment)^2 ~ n * (t/n)^2 = t^2 / n -> 0

Total variation diverges; quadratic variation converges to t. A smooth curve has quadratic variation zero.

Compare this with a smooth curve, where the contrast is total. On a smooth f, a change over a piece of length t/n is itself of order t/n (slope times step), so its square is of order (t/n)^2. Summing n of those gives about n times (t/n)^2 = t^2/n, which goes to zero. Every ordinary function you ever differentiated has quadratic variation zero. Brownian motion is the strange middle creature: total variation infinite, quadratic variation finite and equal to t. That finite, nonzero quadratic variation is the fingerprint of genuine roughness — it is exactly what tells a real Brownian path apart from a smooth impostor.

The heuristic that changes everything: (dB)^2 = dt

The statement [B, B]_t = t is usually carried around in a compact, suggestive shorthand: (dB)^2 = dt. Read it as the infinitesimal version of the bookkeeping above. Over a tiny time step dt, a Brownian increment dB is Normal(0, dt), so its typical magnitude is sqrt(dt). Squaring gives (dB)^2 with typical magnitude dt. In other words, the square of a Brownian increment is not negligibly small the way a squared step is in ordinary calculus — it is comparable to dt itself, a genuine first-order quantity. That is the whole departure from the calculus you know.

Hold the orders of magnitude side by side. In ordinary calculus with a smooth variable x, a step dx is first order and its square (dx)^2 is second order — vanishingly smaller, so we throw it away. With Brownian motion, dB is of order sqrt(dt), so (dB)^2 is of order dt: the same order as a single ordinary step in time, too big to discard. Meanwhile (dt)^2 and dB times dt (of order sqrt(dt) times dt = dt^1.5) are still negligible. The clean working rules for the new calculus are: (dB)^2 = dt, (dt)^2 = 0, and dB times dt = 0.

Why the ordinary chain rule breaks

Now we can see precisely where ordinary calculus fails. Suppose we want to track how a smooth function f changes as its input follows Brownian motion — say f(B_t). In ordinary calculus we would expand f using a Taylor series for a small change in B: f(B + dB) is approximately f(B) + f'(B) dB + (1/2) f''(B) (dB)^2 + smaller terms. For a smooth driver, the (dB)^2 term is second order and we drop it, leaving the familiar chain rule df = f'(B) dB. That is the move that quietly fails here.

Write the Taylor expansion of the change: df = f'(B) dB + (1/2) f''(B) (dB)^2 + higher-order terms.
In ordinary calculus you would discard (dB)^2 as negligibly second order. But here quadratic variation forbids it: (dB)^2 = dt, a first-order quantity, so it cannot be thrown away.
Substitute (dB)^2 = dt and drop the genuinely negligible terms ((dt)^2 and dB dt vanish): df = f'(B) dB + (1/2) f''(B) dt.
Read the result: the change in f(B_t) carries an EXTRA drift term (1/2) f''(B) dt that the ordinary chain rule never had. That correction is exactly the content of Ito's lemma.

So the surviving second-order term is not an error or an approximation — it is forced on us by the finite quadratic variation, and it is the signature of stochastic calculus. The full statement, df = f'(B) dB + (1/2) f''(B) dt, is [[ito-lemma|Ito's lemma]], the replacement for the chain rule, which the next guide develops in earnest alongside the [[ito-integral|Ito integral]] that gives the dB term a rigorous meaning. The extra (1/2) f''(B) dt term is why, for example, the average of f(B_t) can drift even though B_t itself has mean zero: convexity (a positive f'') systematically pushes the average up, a phenomenon worth a whole guide of its own.

Why this number is the right one to obsess over

Step back and notice how much the single fact [B, B]_t = t organizes. The sqrt-of-time scaling of increments — variance over [0, t] is t, so a typical move is sqrt(t) — is precisely what produces the finite quadratic variation, and it is the same scaling behind the path's self-similarity: zoom into a Brownian path with the right space-time ratio and it looks statistically identical. Roughness, sqrt-time scaling, and quadratic variation are three faces of one structure. Quadratic variation is simply the cleanest, most quantitative face of the three.

Quadratic variation is also where the abstract meets the practical. In finance, the quadratic variation of a price process is the mathematical home of realized volatility — when traders square tiny price moves and add them up over a day, they are literally estimating a quadratic variation. The rate at which [B, B]_t accumulates (here, 1 unit per unit time) is the variance rate; scale Brownian motion by sigma and the quadratic variation of sigma times B over [0, t] becomes sigma^2 times t, which is exactly why sigma^2 shows up everywhere in option pricing. The strange number from a thought experiment about squared wiggles is the same object a trading desk measures before lunch.

One honest caveat to carry forward. The clean value [B, B]_t = t is special to standard Brownian motion; the deeper lesson is that quadratic variation, as a concept, is what continuous random processes have instead of arc length, and it is exactly the quantity the chain rule must respect. A different rough process can have a different quadratic variation (think sigma^2 t), and a smooth process has quadratic variation zero, recovering the ordinary chain rule as the special case where the correction term disappears. So nothing you learned in calculus was wrong — it was the f''-correction silently equal to zero. Brownian motion just refuses to let that term vanish.