The chain rule is matrix multiplication
Once the derivative is a linear map, the multivariable chain rule becomes beautifully simple: the derivative of a composition is the composition of the derivatives. In matrix terms, the Jacobian of g∘f is the product of the Jacobians. This is the whole reason for defining the total derivative as a linear map in the first place.
Chain rule: if f differentiable at a, g differentiable at f(a), then D(g . f)(a) = Dg(f(a)) * Df(a) (matrix product). Entrywise, for h = g(f(x_1,...,x_n)) with f having components u_k: dh/dx_j = sum_k (dg/du_k) * (du_k/dx_j). Worked: z = g(u, v) with u = x^2 y, v = x + y. Then dz/dx = g_u * (2xy) + g_v * (1) dz/dy = g_u * (x^2) + g_v * (1). As matrices, Df = [ 2xy x^2 ; 1 1 ] and Dg = [ g_u g_v ]; the row-times-matrix product reproduces both lines above.
Higher derivatives and mixed partials
Differentiate a partial again and you get a second-order higher-order derivative. The interesting ones are the mixed partial derivatives, where you differentiate with respect to different variables. Collecting all second partials into a matrix gives the Hessian, the second-order analogue of the gradient. A natural worry: does the order of differentiation matter — is d/dx d/dy f the same as d/dy d/dx f?
Usually it does not. Clairaut's theorem (also called Schwarz's theorem) says: if the mixed partials d/dx d/dy f and d/dy d/dx f exist and are continuous on a neighborhood of a, they are equal at a. This is why the Hessian is symmetric for any well-behaved C^2 function.
When the order does matter
Practically, every smooth function you meet is C^2 or better, so you may freely swap the order of partials. The counterexample is a reminder that the hypothesis is doing real work, not a piece of pedantry.