正确地定义可微
在直线上,f 在 a 处可微当且仅当 f(a + h) = f(a) + f'(a) h + o(h)。再读一遍:增量 f'(a) h 关于 h 是线性的,余项相对于 h 是小的。我们原样照搬。f 在 a 处的全导数是一个线性映射 L: R^n -> R^m,使误差项相对于 |h| 可忽略。
Definition ([[differentiability-in-rn|differentiability in R^n]]).
f is differentiable at a if there is a linear map L with
lim_{h -> 0} | f(a + h) - f(a) - L(h) | / |h| = 0.
L is unique; we write L = Df(a), the total derivative.
Key consequences when f is differentiable at a:
(1) every partial exists, and the matrix of L has entries D_j f_i(a);
(2) every directional derivative equals D_v f(a) = L(v);
(3) f is continuous at a.
Contrast with anl-mul-1: partials existing does NOT give (3),
but a single linear approximation valid in EVERY direction does.梯度与雅可比矩阵
当目标空间是 R(即 m = 1)时,线性映射 L(h) 就是一个点积 g . h,向量 g 即梯度 grad f(a) = (D_1 f(a), ..., D_n f(a))。于是 D_v f(a) = grad f(a) . v,这使梯度成为最速上升方向。当目标是 R^m 时,把这 m 个梯度按行排列就得到雅可比矩阵 J,即全导数的矩阵。
Verify f(x, y) = x^2 + y^2 is differentiable at a = (1, 2).
Partials: D_1 f = 2x = 2, D_2 f = 2y = 4, so guess L(h) = (2, 4) . h.
Let h = (h1, h2). Compute the remainder R(h):
f(a+h) = (1+h1)^2 + (2+h2)^2
= 1 + 2h1 + h1^2 + 4 + 4h2 + h2^2
= f(a) + (2 h1 + 4 h2) + (h1^2 + h2^2)
= f(a) + L(h) + |h|^2.
So |R(h)| / |h| = |h|^2 / |h| = |h| -> 0 as h -> 0. QED.
Gradient: grad f(1,2) = (2, 4). Steepest ascent points that way.一个干净的充分条件
注意逆命题:可微并不要求偏导数连续,而偏导数存在也不给出可微。强弱链条是:C^1 ⟹ 可微 ⟹ 偏导数存在且 f 连续。每个箭头都不可逆。