The Chain Rule and Equality of Mixed Partials

The chain rule is matrix multiplication

Once the derivative is a linear map, the multivariable chain rule becomes beautifully simple: the derivative of a composition is the composition of the derivatives. In matrix terms, the Jacobian of g∘f is the product of the Jacobians. This is the whole reason for defining the total derivative as a linear map in the first place.

Chain rule:  if f differentiable at a, g differentiable at f(a), then

   D(g . f)(a) = Dg(f(a)) * Df(a)     (matrix product).

Entrywise, for h = g(f(x_1,...,x_n)) with f having components u_k:

   dh/dx_j = sum_k  (dg/du_k) * (du_k/dx_j).

Worked: z = g(u, v) with u = x^2 y, v = x + y. Then

   dz/dx = g_u * (2xy) + g_v * (1)
   dz/dy = g_u * (x^2) + g_v * (1).

As matrices, Df = [ 2xy   x^2 ;  1   1 ] and Dg = [ g_u  g_v ];
the row-times-matrix product reproduces both lines above.

The chain rule as a Jacobian product, then unpacked entry by entry.

Higher derivatives and mixed partials

Differentiate a partial again and you get a second-order higher-order derivative. The interesting ones are the mixed partial derivatives, where you differentiate with respect to different variables. Collecting all second partials into a matrix gives the Hessian, the second-order analogue of the gradient. A natural worry: does the order of differentiation matter — is d/dx d/dy f the same as d/dy d/dx f?

Usually it does not. Clairaut's theorem (also called Schwarz's theorem) says: if the mixed partials d/dx d/dy f and d/dy d/dx f exist and are continuous on a neighborhood of a, they are equal at a. This is why the Hessian is symmetric for any well-behaved C^2 function.

When the order does matter

Practically, every smooth function you meet is C^2 or better, so you may freely swap the order of partials. The counterexample is a reminder that the hypothesis is doing real work, not a piece of pedantry.