作為線性映射的全導數

正確地定義可微

在直線上，f 在 a 處可微若且唯若 f(a + h) = f(a) + f'(a) h + o(h)。再讀一遍：增量 f'(a) h 關於 h 是線性的，餘項相對於 h 是小的。我們原樣照搬。f 在 a 處的全導數是一個線性映射 L: R^n -> R^m，使誤差項相對於 |h| 可忽略。

Definition ([[differentiability-in-rn|differentiability in R^n]]).

f is differentiable at a if there is a linear map L with

   lim_{h -> 0}  | f(a + h) - f(a) - L(h) |  /  |h|  =  0.

L is unique; we write L = Df(a), the total derivative.

Key consequences when f is differentiable at a:

  (1) every partial exists, and the matrix of L has entries D_j f_i(a);
  (2) every directional derivative equals  D_v f(a) = L(v);
  (3) f is continuous at a.

Contrast with anl-mul-1: partials existing does NOT give (3),
but a single linear approximation valid in EVERY direction does.

o(|h|) 定義強制在所有方向上同時逼近。

梯度與雅可比矩陣

當目標空間是 R（即 m = 1）時，線性映射 L(h) 就是一個點積 g . h，向量 g 即梯度 grad f(a) = (D_1 f(a), ..., D_n f(a))。於是 D_v f(a) = grad f(a) . v，這使梯度成為最速上升方向。當目標是 R^m 時，把這 m 個梯度按列排列就得到雅可比矩陣 J，即全導數的矩陣。

Verify f(x, y) = x^2 + y^2 is differentiable at a = (1, 2).

Partials: D_1 f = 2x = 2,  D_2 f = 2y = 4, so guess L(h) = (2, 4) . h.

Let h = (h1, h2). Compute the remainder R(h):

  f(a+h) = (1+h1)^2 + (2+h2)^2
         = 1 + 2h1 + h1^2 + 4 + 4h2 + h2^2
         = f(a) + (2 h1 + 4 h2) + (h1^2 + h2^2)
         = f(a) + L(h) + |h|^2.

So  |R(h)| / |h| = |h|^2 / |h| = |h| -> 0  as h -> 0.   QED.

Gradient: grad f(1,2) = (2, 4). Steepest ascent points that way.

直接驗證極限為零的條件；餘項為 |h|^2。

一個乾淨的充分條件

注意逆命題：可微並不要求偏導數連續，而偏導數存在也不給出可微。強弱鏈條是：C^1 ⟹ 可微 ⟹ 偏導數存在且 f 連續。每個箭頭都不可逆。