作为线性映射的全导数

正确地定义可微

在直线上，f 在 a 处可微当且仅当 f(a + h) = f(a) + f'(a) h + o(h)。再读一遍：增量 f'(a) h 关于 h 是线性的，余项相对于 h 是小的。我们原样照搬。f 在 a 处的全导数是一个线性映射 L: R^n -> R^m，使误差项相对于 |h| 可忽略。

Definition ([[differentiability-in-rn|differentiability in R^n]]).

f is differentiable at a if there is a linear map L with

   lim_{h -> 0}  | f(a + h) - f(a) - L(h) |  /  |h|  =  0.

L is unique; we write L = Df(a), the total derivative.

Key consequences when f is differentiable at a:

  (1) every partial exists, and the matrix of L has entries D_j f_i(a);
  (2) every directional derivative equals  D_v f(a) = L(v);
  (3) f is continuous at a.

Contrast with anl-mul-1: partials existing does NOT give (3),
but a single linear approximation valid in EVERY direction does.

o(|h|) 定义强制在所有方向上同时逼近。

梯度与雅可比矩阵

当目标空间是 R（即 m = 1）时，线性映射 L(h) 就是一个点积 g . h，向量 g 即梯度 grad f(a) = (D_1 f(a), ..., D_n f(a))。于是 D_v f(a) = grad f(a) . v，这使梯度成为最速上升方向。当目标是 R^m 时，把这 m 个梯度按行排列就得到雅可比矩阵 J，即全导数的矩阵。

Verify f(x, y) = x^2 + y^2 is differentiable at a = (1, 2).

Partials: D_1 f = 2x = 2,  D_2 f = 2y = 4, so guess L(h) = (2, 4) . h.

Let h = (h1, h2). Compute the remainder R(h):

  f(a+h) = (1+h1)^2 + (2+h2)^2
         = 1 + 2h1 + h1^2 + 4 + 4h2 + h2^2
         = f(a) + (2 h1 + 4 h2) + (h1^2 + h2^2)
         = f(a) + L(h) + |h|^2.

So  |R(h)| / |h| = |h|^2 / |h| = |h| -> 0  as h -> 0.   QED.

Gradient: grad f(1,2) = (2, 4). Steepest ascent points that way.

直接验证极限为零的条件；余项为 |h|^2。

一个干净的充分条件

注意逆命题：可微并不要求偏导数连续，而偏导数存在也不给出可微。强弱链条是：C^1 ⟹ 可微 ⟹ 偏导数存在且 f 连续。每个箭头都不可逆。