Inequality Constraints & the KKT Conditions

When the fence may or may not be touched

The previous guide solved problems where a constraint had to hold exactly: minimize f(x, y) subject to g(x, y) = 0. The Lagrange multiplier told you the optimum sits where nabla f is parallel to nabla g, because along the curve g = 0 there is no allowed direction left in which f still decreases. But most real constraints are not equalities — they are limits you must not cross. A budget says total cost <= B, not cost = B. A material says stress <= S. A probability says p >= 0. These are inequality constraints, written g(x) <= 0, and they change the geometry in one decisive way: the feasible region is now a solid region with a boundary, not a thin curve.

With a solid feasible region, an optimum can land in one of two genuinely different places, and the whole theory pivots on which. Either the best point sits strictly inside the region, comfortably away from every fence — then the constraint is doing nothing, the problem is locally an ordinary unconstrained optimization, and you simply solve nabla f = 0 as in Volume I. Or the best point is pressed up against a fence g(x) = 0 — then that constraint is active (binding), and it behaves exactly like the equality constraint of the previous guide. The hard part is that you usually do not know in advance which case you are in; the KKT conditions are precisely the bookkeeping that lets the algebra decide for you.

The sign of the multiplier now carries meaning

Suppose the minimum does sit on the fence g = 0. As with equality constraints we still need nabla f = mu nabla g for some multiplier mu, because at an optimum no feasible direction may decrease f. But an inequality is one-sided, and that asymmetry pins down the SIGN of mu — something the equality case never cared about. Recall that the gradient nabla g points in the direction of increasing g, that is, outward across the fence into the forbidden region g > 0. To stay feasible you may only step where nabla g points inward or sideways. For f to have no feasible descent direction at such a point, nabla f must point in the same outward direction as nabla g, which forces the multiplier to be nonnegative: mu >= 0.

Read that sign condition physically and it stops being a technicality. The multiplier mu is a shadow price: it measures how much the optimal value of f would improve if you relaxed the fence by a hair, loosening g <= 0 to g <= epsilon. A nonnegative mu says exactly what intuition demands — loosening a binding constraint can only help you (or do nothing), never hurt. If your algebra ever returns mu < 0 on an active constraint, that is not a feasible minimum: it means f would actually decrease if you stepped OFF the fence into the interior, so the true optimum is inside and this constraint should have been declared inactive. The sign is the test that catches that mistake.

Complementary slackness: the clever switch

Now for the trick that fuses both cases — inside or on the fence — into one tidy system. We keep a multiplier mu for the constraint and impose a single extra equation: mu g(x) = 0. This is complementary slackness, and it is an elegant logical switch. A product is zero only if at least one factor is zero, so at any solution either mu = 0 (the multiplier switches off) or g = 0 (the constraint is active), or both. If the optimum is strictly inside, then g < 0, so the equation forces mu = 0 and the constraint silently vanishes from nabla f = mu nabla g, leaving the plain nabla f = 0. If the optimum is on the fence, then g = 0 and mu is free to be positive and do its job. One equation lets the solution choose its own regime.

Assemble the pieces and you have the full Karush-Kuhn-Tucker conditions — the KKT conditions — for minimizing f subject to several inequality constraints g_i(x) <= 0 and possibly equality constraints h_j(x) = 0. Build the Lagrangian L = f + sum mu_i g_i + sum lambda_j h_j, exactly as in the previous guide but with two families of multipliers, and demand four things at once: stationarity (nabla L = 0 in x), primal feasibility (every g_i <= 0 and h_j = 0), dual feasibility (every mu_i >= 0 — but the equality multipliers lambda_j stay unrestricted in sign), and complementary slackness (every mu_i g_i = 0). These multiplier families generalise directly from the multiple-constraints machinery of the last guide.

KKT conditions  (minimize f, constraints g_i(x) <= 0, h_j(x) = 0)

Lagrangian:   L = f + sum_i mu_i g_i + sum_j lambda_j h_j

  (1) stationarity           nabla_x L = nabla f + sum mu_i nabla g_i + sum lambda_j nabla h_j = 0
  (2) primal feasibility     g_i(x) <= 0   and   h_j(x) = 0       (stay legal)
  (3) dual feasibility       mu_i >= 0                            (lambda_j free in sign)
  (4) complementary slack    mu_i * g_i(x) = 0  for every i       (the switch)

Reading (4):  g_i < 0 (slack)  => mu_i = 0   (constraint idle)
              mu_i > 0         => g_i = 0     (constraint active, on the fence)

The four KKT blocks. Stationarity and primal feasibility you already met for equalities; dual feasibility (mu >= 0) and complementary slackness (mu g = 0) are what inequalities add.

Working it: enumerate the cases

In practice, complementary slackness turns a problem with inequalities into a small finite hunt: for each constraint you guess active or inactive, solve the resulting equality system, and then check the guess against the conditions it did not assume. Take a clean two-variable example — minimize f(x, y) = (x - 2)^2 + (y - 2)^2, the squared distance to the point (2, 2), subject to x + y <= 2. The unconstrained minimum is plainly at (2, 2), but there x + y = 4 > 2, which is infeasible, so the fence must bite. The constraint is g = x + y - 2 <= 0; we test it active.

Guess the constraint inactive (mu = 0): then nabla f = 0 gives (x, y) = (2, 2). Check feasibility: x + y = 4, which violates x + y <= 2. Rejected — the interior case fails.
Guess the constraint active (g = 0): write stationarity nabla f + mu nabla g = 0. Here nabla f = (2(x - 2), 2(y - 2)) and nabla g = (1, 1), giving 2(x - 2) + mu = 0 and 2(y - 2) + mu = 0.
Subtracting the two equations gives x = y; the active constraint x + y = 2 then forces x = y = 1. So the candidate point is (1, 1).
Solve for the multiplier: 2(1 - 2) + mu = 0 gives mu = 2. Check dual feasibility: mu = 2 >= 0. Passed. So (1, 1) with mu = 2 satisfies every KKT condition and is the constrained minimum.

The picture matches the algebra perfectly: the nearest point of the half-plane x + y <= 2 to the target (2, 2) is the foot of the perpendicular onto the boundary line, namely (1, 1), and the positive multiplier mu = 2 records that pushing the fence outward would let you get closer to (2, 2) — a strictly helpful relaxation, exactly as a nonnegative shadow price should read. With several constraints you would repeat this case-enumeration over every subset of constraints declared active, but the logic per case is identical.

Necessary, sufficient, and honest limits

Be precise about what the KKT conditions deliver, because here it is easy to overclaim. In general the conditions are NECESSARY but not sufficient: at a genuine local minimum they hold (so any minimum is a KKT point), yet a point can satisfy all four conditions and still be a maximum, a saddle, or merely a local optimum — exactly like nabla f = 0 in unconstrained Volume I calculus, which flags critical points without telling you their type. KKT narrows the search to a finite list of candidates; you must still test each one to find the true minimizer.

There are two important caveats. First, necessity itself needs a mild regularity hypothesis — a constraint qualification (the commonest is that the gradients of the active constraints are linearly independent at the point). Without it the multipliers may fail to exist even at a true minimum, so KKT can quietly miss it; such degenerate corners are rare in practice but real. Second, and far more cheering, is the convex case: when f and every g_i is a convex function and the equality constraints are linear, the KKT conditions become both necessary AND sufficient, and any point that satisfies them is a global minimum, not merely local. This is the structural reason local versus global optimum anxieties evaporate for convex problems and why convexity is the property engineers and machine-learning practitioners prize above almost all others.