Critical Points & the Second-Derivative Test

Where the ground goes flat

In Volume I you hunted for the highs and lows of a curve y = f(x) by setting the derivative to zero. At a critical point the tangent line is horizontal, and a max or min can only hide where the slope has vanished. Lift this onto a surface z = f(x, y) — a rolling landscape over the plane — and the same instinct holds, but now there is no single slope. At each point the surface tilts by a different amount in every compass direction, and all of that tilt is packaged into one vector: the [[gradient|gradient]] nabla f = (df/dx, df/dy), the partial slope along x stacked on the partial slope along y.

A maximum or minimum can only live where the surface is locally flat — and locally flat means flat in every direction at once, not just along the two axes. The gradient encodes the steepest uphill direction and how steep it is; if the gradient were nonzero anywhere, you could walk a little along it and climb, or walk against it and descend, so you could not already be sitting at a peak or a pit. So we demand the whole vector vanish: nabla f = (0, 0). A point where the gradient is the zero vector is a [[stationary-point-multivariable|stationary point]] (also called a critical point), and these are the only candidates for an interior maximum or minimum of a smooth function.

Concretely, nabla f = (0, 0) is two equations, df/dx = 0 and df/dy = 0, solved simultaneously. For f(x, y) = x^2 + y^2 - 4x the partials are 2x - 4 and 2y, so the only stationary point is (2, 0) — and the surface is a bowl, so that point is the bottom. For f(x, y) = x^2 - y^2 the partials are 2x and -2y, again giving (0, 0), but this surface curves up along x and down along y. Same recipe, very different geography. The gradient test found the candidate in both cases; it simply cannot, by itself, tell a valley floor from a mountain pass.

A new animal: the saddle

Single-variable calculus offered three fates at a flat point: a local max, a local min, or an inflection where the curve flattens but keeps going the same way. The plane adds a genuinely new species. Look hard at f(x, y) = x^2 - y^2 at the origin. Slice the surface with the plane y = 0 and you see the parabola z = x^2 — a valley, smiling upward, with the origin at its lowest. Slice instead with x = 0 and you see z = -y^2 — a hill, frowning downward, with the origin at its highest. The very same point is the bottom of one cross-section and the top of another.

This is a [[saddle-point|saddle point]], and the name is exactly right: it is shaped like a horse's saddle or a Pringle. Sit at the center and the surface rises in front of you and behind you, while falling away to your left and right. A marble placed there is in equilibrium — the ground is flat, the gradient is zero — but the equilibrium is unstable: nudge it along the rising axis and it rolls back, nudge it along the falling axis and it rolls clean away. A saddle is a stationary point that is neither a maximum nor a minimum, because the surface goes up in some directions and down in others. There is no analogue of this on a curve, because a curve has no 'other direction' to disagree with the first.

The Hessian: a matrix of curvatures

To classify a flat point we need curvature, the way a single variable needed the second derivative f''. With two inputs there are several second partials, and the bookkeeping that holds them is the [[calc-hessian-matrix|Hessian matrix]] H. Its entries are the second partial derivatives: H = [d^2f/dx^2, d^2f/dxdy; d^2f/dydx, d^2f/dy^2]. The diagonal entries measure how the surface curves along each axis on its own; the off-diagonal entries measure how the slope in x changes as you move in y — the twist that couples the two directions together.

A gift makes the Hessian symmetric. Clairaut's theorem says that for a function whose second partials are continuous, the order of differentiation does not matter: d^2f/dxdy = d^2f/dydx, the mixed partials agree. So the off-diagonal entries are equal, H = [a, b; b, c], and the matrix is always symmetric. This is not a cosmetic nicety — symmetry is exactly what guarantees, later, that all the curvature information is real and well-behaved, with no twisting into imaginary numbers. Keep the honest caveat in mind, though: if the second partials are not continuous, the mixed partials can genuinely differ and this whole machine needs more care.

Why does a matrix, of all things, capture curvature? Because of the [[second-order-taylor-expansion-multivariable|second-order Taylor expansion]]. Recall from Volume I that near a point a single-variable function is f(a) + f'(a)(x-a) + (1/2)f''(a)(x-a)^2. The multivariable version says: near a stationary point, where the gradient term drops out entirely, f(x) is approximately f at the point plus (1/2) times d-transpose-times-H-times-d, where d is the small step you take. That quadratic form d^T H d is the surface's local shape, stripped of everything but its curving. Classifying the stationary point is therefore the same question as: is this quadratic form always positive, always negative, or does its sign depend on which way d points?

Definiteness reads the verdict

The single fact that decides everything is the [[definiteness-of-the-hessian|definiteness of the Hessian]]. Call H positive definite if d^T H d is strictly positive for every nonzero step d — the surface curves upward no matter which way you lean, so the point is a strict local minimum. Call it negative definite if d^T H d is always negative — the surface curves down in all directions, a strict local maximum. If d^T H d is positive for some directions and negative for others, H is indefinite, and that mixed verdict is precisely a saddle: up this way, down that way. Definiteness is the direction-free summary the gradient could not give you.

For a 2-by-2 Hessian H = [a, b; b, c] this becomes a famous, checkable rule built from the determinant D = ac - b^2. If D > 0 and a > 0, the form is positive definite — a local minimum. If D > 0 and a < 0, it is negative definite — a local maximum. If D < 0, the form is indefinite — a saddle point, guaranteed. The determinant is doing real work here: a positive D means the two curvatures share a sign and the twist b^2 is too small to flip anything, so the bowl holds; a negative D means the twist dominates and tears the bowl into a saddle. The single number a then just says which way a same-sign bowl opens.

Stationary point of f(x,y), Hessian H = [a, b; b, c],  D = a*c - b*b

   D > 0,  a > 0   ->  local MINIMUM   (bowl up,  positive definite)
   D > 0,  a < 0   ->  local MAXIMUM   (bowl down, negative definite)
   D < 0           ->  SADDLE point    (mixed,    indefinite)
   D = 0           ->  TEST FAILS      (flat to 2nd order -- look higher)

example  f = x^2 - y^2 :  a=2, c=-2, b=0  ->  D = -4 < 0  ->  saddle
example  f = x^2 + y^2 :  a=2, c= 2, b=0  ->  D =  4 > 0, a>0 -> minimum

The 2-by-2 second-derivative test at a glance, with the two surfaces from earlier slotted in.

Reading the fine print

Be honest about where this [[second-derivative-test-multivariable|second-derivative test]] falls silent: the case D = 0. When the determinant is exactly zero, the quadratic form is flat along at least one direction — it neither curves up nor down to second order — and the test simply cannot decide. The verdict then hangs on cubic or higher terms that the Hessian never saw. The classic trap is the 'monkey saddle' f(x, y) = x^3 - 3xy^2, whose Hessian at the origin is the zero matrix, so D = 0; the test shrugs, and only by looking at the cubic shape do you find a three-way saddle with room for two legs and a tail. A zero determinant is a flag that says 'look higher,' never 'this is a minimum.'

Two more honest limits, both inherited straight from Volume I. First, the test is purely local: it certifies a point as a low spot relative only to its immediate neighbors. A function can have a tidy local minimum that is nowhere near its true global minimum elsewhere on the surface — the second-derivative test never sees the whole landscape, only the dimple under its feet. Second, it speaks only about interior stationary points. The actual best value of a real design problem often sits on a boundary or at a corner the gradient never flattens at, exactly as an endpoint could beat every interior critical point on a closed interval in single-variable optimization.

Compute both first partials df/dx and df/dy, set each to zero, and solve the system simultaneously to list every interior stationary point.
Form the Hessian H = [d^2f/dx^2, d^2f/dxdy; d^2f/dydx, d^2f/dy^2] and evaluate its entries at each stationary point separately.
Compute D = ac - b^2 at that point; if D > 0 read off a min (a > 0) or a max (a < 0), if D < 0 declare a saddle, and if D = 0 fall back to higher-order analysis.
For a true optimum, also check the boundary and any corners separately, then compare those candidate values against the interior winners to crown a global best.