From a number to a whole function
In Volume I, optimization meant finding a number: you set the derivative to zero, dy/dx = 0, and solved for the x that minimized a function. Here we want something bolder — the best whole curve y(x) running between two fixed endpoints. The quantity we minimize is a functional, usually an integral J[y] = integral from a to b of L(x, y, y') dx, which eats an entire function and returns one number (a time, a length, an energy). The previous guide introduced J and its first variation; this one turns that variation into a concrete equation you can solve.
The strategy is a faithful copy of single-variable calculus, but lifted up a level. To probe the minimizer y(x), nudge it: replace y by y(x) + epsilon eta(x), where eta is any smooth 'wiggle' that vanishes at both endpoints (so the competitor still passes through the same fixed points) and epsilon is a tiny dial. Plug this into J and you get an ordinary function of the single number epsilon, call it Phi(epsilon) = J[y + epsilon eta]. If y is genuinely optimal, then epsilon = 0 is an ordinary minimum of Phi, so dPhi/depsilon at epsilon = 0 must equal zero — exactly the critical-point condition from Volume I.
Differentiating under the integral
Compute dPhi/depsilon by differentiating inside the integral. Because L depends on epsilon only through y + epsilon eta and its derivative y' + epsilon eta', the chain rule gives a clean integrand: at epsilon = 0, dPhi/depsilon = integral from a to b of [ (partial L / partial y) eta + (partial L / partial y') eta' ] dx. This is precisely the first variation delta J. The eta term is harmless, but the eta' term is awkward — it carries the derivative of our arbitrary wiggle, and we cannot conclude anything while eta and eta' both float free.
The fix is the oldest trick in the trade: integration by parts, which trades a derivative on eta' for a derivative on the coefficient in front. Writing F = partial L / partial y', the second piece becomes integral of F eta' dx = [F eta] from a to b minus integral of (dF/dx) eta dx. The boundary term [F eta] vanishes outright, because eta was chosen to be zero at both a and b. That single design choice — fixed endpoints force the wiggle to die at the ends — is what makes the whole method work.
The fundamental lemma seals it
After the by-parts, every wiggle now multiplies eta alone: delta J = integral from a to b of [ partial L / partial y minus d/dx ( partial L / partial y' ) ] eta(x) dx, and optimality demands this integral equal zero for EVERY admissible eta. Here the fundamental lemma of the calculus of variations does the heavy lifting: if a continuous function g(x) satisfies integral of g(x) eta(x) dx = 0 for all smooth eta vanishing at the ends, then g(x) is identically zero. The intuition is sharp — if g were positive on some little interval, pick an eta that bumps up only there, and the integral would come out positive, a contradiction.
Setting that bracket to zero is the prize. The Euler–Lagrange equation reads d/dx ( partial L / partial y' ) minus partial L / partial y = 0. Read it carefully: partial L / partial y and partial L / partial y' are partial derivatives taken treating x, y, y' as three independent slots, while d/dx out front is a TOTAL derivative along the curve, which lets x, y(x), and y'(x) all vary together — a distinction worth pausing on, because confusing the two is the single most common error in the subject.
J[y] = integral_a^b L(x, y, y') dx (minimize over y, endpoints fixed)
d ( dL ) dL
-- ( --- ) - -- = 0 <- Euler-Lagrange equation
dx ( dy' ) dy
(dL/dy, dL/dy' = partial derivatives; d/dx = total derivative along y(x))A worked path: light and the shortest line
Try the friendliest case: the shortest curve between two points. Arc length is J[y] = integral from a to b of sqrt(1 + (y')^2) dx, so L = sqrt(1 + (y')^2). Notice L does not contain y at all, so partial L / partial y = 0. Meanwhile partial L / partial y' = y' / sqrt(1 + (y')^2). The Euler–Lagrange equation then says d/dx of that quantity equals zero, which means y' / sqrt(1 + (y')^2) is a constant — and that forces y' itself to be constant. The minimizer is a straight line, exactly as it must be. The machine recovered an obvious truth, which is how you trust it on the non-obvious cases.
That example exposed a gift: when L has no explicit x (when L = L(y, y') only), there is a first integral that spares you a hard second-order equation. The Beltrami identity, L minus y' (partial L / partial y') = constant, follows directly from Euler–Lagrange and drops the order by one. It is exactly this shortcut that cracks the hanging-chain catenary problem and the fastest-descent brachistochrone, the headline puzzles of the next guides — there L hides x, and Beltrami turns a fearsome equation into a separable one.
The recipe, and the honest fine print
- Write the functional J[y] = integral of L(x, y, y') dx and read off the integrand L.
- Compute the two partial derivatives partial L / partial y and partial L / partial y', treating x, y, y' as independent.
- Take the total d/dx of partial L / partial y' (use the chain rule — it will produce y'' terms in general).
- Set d/dx(partial L / partial y') minus partial L / partial y = 0; if L has no explicit x, use the Beltrami first integral instead.
- Solve the resulting ODE and fix the constants with the two endpoint values y(a) and y(b).
Two honest cautions. First, Euler–Lagrange is only a NECESSARY condition: it locates a stationary function the way dy/dx = 0 locates a stationary point, but a solution might be a minimum, a maximum, or a saddle, and proving it truly minimizes needs a second-order test (the analogue of the Volume I second-derivative test). Second, the derivation quietly assumed the minimizer is smooth enough to be twice differentiable and that you may differentiate under the integral; for badly behaved L or non-smooth competitors those steps need more care. The equation is powerful and central, not magic.