Lagrange Multipliers

The constrained question

In the previous guide you hunted for the peaks and valleys of a function f(x, y) by setting its gradient to zero — the flat-tangent-plane condition of [[unconstrained-optimization|unconstrained optimization]]. But almost no real problem is unconstrained. A factory minimizes cost subject to producing a fixed quantity; a satellite seeks lowest fuel subject to reaching a fixed orbit; a soap film minimizes area subject to clinging to a fixed wire. In every case you are not free to roam the whole plane: you are pinned to a curve, the set of points where some condition g(x, y) = 0 holds. The question becomes: among only those allowed points, where is f largest or smallest?

Here is the picture to hold in your head. Draw the objective f as a landscape of contour lines — the level sets f(x, y) = c, each a curve along which f is constant, like the colored rings of a topographic map. Now overlay the constraint curve g(x, y) = 0, a single fixed path winding across that map. You are a hiker forbidden to leave the path. As you walk it, you cross from one contour ring to the next; the value of f rises and falls along your route. You want the moment on the path where f is as high as it can get.

Why the gradients must line up

Walk the constraint path slowly and watch which contour rings you cross. While the path cuts across the rings — slicing from a lower contour to a higher one — f is still changing, so you are not yet at the best point; keep going and you can climb higher. The value of f along the path can only stop changing when the path stops crossing rings. And the only way to be on a path while crossing no contour at all, even momentarily, is for the path to run tangent to a contour — to graze it, kissing one ring without cutting through it. That tangency is the entire secret.

Now translate tangency into gradients. Recall the key fact from this rung: a gradient is always perpendicular to its own level set, and it points in the direction of steepest increase. So nabla f is perpendicular to the contour of f, and nabla g is perpendicular to the constraint curve (which is just the level set g = 0). When the two curves are tangent, they share the same tangent line — and therefore they share the same perpendicular direction. Two vectors with the same perpendicular must themselves be parallel. Hence at the optimum, nabla f and nabla g point along the same line. The directional information lines up exactly because, along the allowed direction, the directional derivative of f has to be zero — there is no uphill left to gain without leaving the path.

The method, step by step

The parallel-gradient idea becomes a recipe. The condition nabla f = lambda nabla g is a vector equation, so in two variables it is really two scalar equations — one for the x-component, one for the y-component. Together with the constraint g = 0 that is three equations in three unknowns: x, y, and the multiplier lambda. Solve the system and the points you find are your candidates for the constrained best and worst.

Write the constraint in the form g(x, y) = 0 (move everything to one side), and identify the objective f(x, y) you are optimizing.
Compute both gradients, nabla f = (df/dx, df/dy) and nabla g = (dg/dx, dg/dy), using the partial derivatives you already know.
Set them parallel: df/dx = lambda dg/dx and df/dy = lambda dg/dy. These are your two component equations.
Adjoin the constraint g(x, y) = 0 as the third equation, and solve the three together for x, y, and lambda.
Evaluate f at every candidate point you found; the largest value is the constrained maximum, the smallest is the constrained minimum.

A clean way to remember all of this at once is the Lagrangian function L(x, y, lambda) = f(x, y) - lambda * g(x, y). Take its three partial derivatives and set each to zero. The x- and y-partials reproduce exactly the two parallel-gradient equations; the lambda-partial, beautifully, just hands back the constraint g = 0, because lambda appears linearly. So the whole constrained problem becomes an ordinary 'set the gradient to zero' problem — in one extra dimension. We have traded a constrained search in the plane for an unconstrained stationary-point hunt in (x, y, lambda)-space.

Objective:   maximize  f(x, y) = x * y     (area of a rectangle)
Constraint:  g(x, y) = 2x + 2y - P = 0     (fixed perimeter P)

Lagrangian:  L = x*y - lambda*(2x + 2y - P)

  dL/dx = 0  ->   y = 2 lambda
  dL/dy = 0  ->   x = 2 lambda     ==>  x = y   (a square!)
  dL/dlam = 0 ->  2x + 2y = P      ==>  x = y = P/4

so the largest-area rectangle of fixed perimeter is the square,
and lambda = (P/4)/2 = P/8  -- we read its meaning below.

The classic worked case: of all rectangles with a fixed perimeter, the square encloses the most area — and the multiplier falls out for free.

What the multiplier actually means

It is tempting to treat lambda as a throwaway — an algebraic crutch you compute and discard. That sells it badly short. The multiplier carries real meaning: lambda is the sensitivity of the optimum to a loosening of the constraint. Write the constraint as g(x, y) = c and let the best achievable value of f be M(c). Then, to first order, dM/dc = lambda. In words: if you relax the budget by one unit, the best attainable f improves by about lambda units. That is why economists call lambda a shadow price — it is the marginal value of one more unit of the scarce resource the constraint is rationing.

Look back at the rectangle. We found the maximum area was a square with x = y = P/4, so the best area is M(P) = (P/4)^2 = P^2 / 16. Differentiate directly: dM/dP = 2P / 16 = P/8 — which is exactly the lambda we computed. The match is not a coincidence; it is the sensitivity theorem in action. Stretch the available perimeter by a tiny amount dP and the enclosed area grows by about (P/8) dP. The multiplier you might have thrown away was quietly telling you the exchange rate between the constraint and the prize.

Honest limits, and where it goes next

Be honest about what the method guarantees — and what it does not. The parallel-gradient condition is a necessary condition for a constrained optimum, not a sufficient one. It locates stationary points along the constraint, but those can be maxima, minima, or neither, exactly as a flat tangent in one variable can be a peak, a valley, or a saddle. The method hands you a list of suspects; it does not by itself tell you which is the maximum and which the minimum. On a closed, bounded constraint curve the safe move is the one in the steps above: simply evaluate f at every candidate and read off the largest and smallest. To classify a candidate by curvature instead, you would test the [[bordered-hessian|bordered Hessian]], the constrained cousin of the second-derivative test from the last guide.

There is a second honest caveat: the derivation quietly assumed nabla g is not the zero vector at the optimum. If nabla g vanishes there, the constraint curve has a corner or cusp — its tangent direction is undefined — and the tidy 'parallel gradients' picture breaks down, since you cannot be parallel to a vector that has no direction. Such points must be checked separately by hand. This nondegeneracy is the constraint-qualification fine print; it is genuinely needed, not a formality, and skipping the check is how a careful solution occasionally misses the true answer.