Iterating toward a solution: from splittings to Krylov

The iterative bargain

Direct methods transform A until the answer falls out; iterative methods start from a guess x_0 and refine it, producing x_1, x_2, ... that converge to the true x. The bargain is appealing for huge sparse systems: each iteration's main cost is one matrix-vector product A v, which for a sparse A costs only O(nnz) — proportional to the number of nonzeros, not n^2. You never form L or U, never suffer fill-in. You only ever touch A through multiplication.

Stationary iterations: Jacobi and Gauss-Seidel

The oldest idea is a matrix splitting: write A = M - N where M is easy to invert, turn A x = b into M x = N x + b, and iterate x_{k+1} = M^-1 (N x_k + b). In Jacobi iteration M is just the diagonal of A, so each new component is solved using only old values. Gauss-Seidel takes M to be the lower triangle, immediately reusing freshly updated components within the same sweep — usually converging in about half as many steps.

Jacobi sweep for  A x = b  (a_ii on the diagonal):

  for i = 1..n:
      x_new[i] = ( b[i] - sum_{j != i} a[i][j] * x_old[j] ) / a[i][i]
  x_old <- x_new      # all components use OLD values

Gauss-Seidel sweep (in place):

  for i = 1..n:
      x[i] = ( b[i] - sum_{j < i} a[i][j]*x[j]      # already updated
                    - sum_{j > i} a[i][j]*x[j]       # still old
             ) / a[i][i]

Convergence guaranteed if A is strictly diagonally dominant
(|a_ii| > sum_{j != i} |a_ij| for every row).

Jacobi uses only old values per sweep; Gauss-Seidel reuses fresh ones immediately, usually halving the iterations.

The leap: Krylov subspaces

Stationary methods waste information: at step k they remember only x_k, throwing away every earlier vector. The modern insight is to keep them all. Starting from the residual r_0 = b - A x_0, repeatedly multiplying by A generates r_0, A r_0, A^2 r_0, ..., and their span is the Krylov subspace K_m = span{r_0, A r_0, ..., A^{m-1} r_0}.

Why is this the right space? Because A^-1 b, the answer you want, can be written as a polynomial in A applied to b (Cayley-Hamilton guarantees it), and that polynomial's powers of A live exactly in the Krylov subspace. So K_m contains an increasingly good approximation to the true solution, and it is built using nothing but the sparse matrix-vector products you can already afford. A Krylov method picks the best approximation inside K_m by some criterion — and the choice of criterion is precisely what distinguishes conjugate gradient from GMRES in the next guide.