正確地定義可微
在直線上,f 在 a 處可微若且唯若 f(a + h) = f(a) + f'(a) h + o(h)。再讀一遍:增量 f'(a) h 關於 h 是線性的,餘項相對於 h 是小的。我們原樣照搬。f 在 a 處的全導數是一個線性映射 L: R^n -> R^m,使誤差項相對於 |h| 可忽略。
Definition ([[differentiability-in-rn|differentiability in R^n]]).
f is differentiable at a if there is a linear map L with
lim_{h -> 0} | f(a + h) - f(a) - L(h) | / |h| = 0.
L is unique; we write L = Df(a), the total derivative.
Key consequences when f is differentiable at a:
(1) every partial exists, and the matrix of L has entries D_j f_i(a);
(2) every directional derivative equals D_v f(a) = L(v);
(3) f is continuous at a.
Contrast with anl-mul-1: partials existing does NOT give (3),
but a single linear approximation valid in EVERY direction does.梯度與雅可比矩陣
當目標空間是 R(即 m = 1)時,線性映射 L(h) 就是一個點積 g . h,向量 g 即梯度 grad f(a) = (D_1 f(a), ..., D_n f(a))。於是 D_v f(a) = grad f(a) . v,這使梯度成為最速上升方向。當目標是 R^m 時,把這 m 個梯度按列排列就得到雅可比矩陣 J,即全導數的矩陣。
Verify f(x, y) = x^2 + y^2 is differentiable at a = (1, 2).
Partials: D_1 f = 2x = 2, D_2 f = 2y = 4, so guess L(h) = (2, 4) . h.
Let h = (h1, h2). Compute the remainder R(h):
f(a+h) = (1+h1)^2 + (2+h2)^2
= 1 + 2h1 + h1^2 + 4 + 4h2 + h2^2
= f(a) + (2 h1 + 4 h2) + (h1^2 + h2^2)
= f(a) + L(h) + |h|^2.
So |R(h)| / |h| = |h|^2 / |h| = |h| -> 0 as h -> 0. QED.
Gradient: grad f(1,2) = (2, 4). Steepest ascent points that way.一個乾淨的充分條件
注意逆命題:可微並不要求偏導數連續,而偏導數存在也不給出可微。強弱鏈條是:C^1 ⟹ 可微 ⟹ 偏導數存在且 f 連續。每個箭頭都不可逆。