CG and GMRES: choosing what 'best' means

Building an orthonormal Krylov basis

The raw Krylov vectors r_0, A r_0, A^2 r_0, ... are a numerical disaster: they all tilt toward the dominant eigenvector, so they become nearly parallel and lose linear independence fast. Before doing anything useful you must orthonormalize them. The Arnoldi iteration does exactly this — a Gram-Schmidt process applied as the basis is generated — producing an orthonormal basis of the Krylov subspace and, as a byproduct, a small Hessenberg matrix that represents A on that basis.

Conjugate gradient: symmetry as a gift

Conjugate gradient (CG) is the method of choice when A is symmetric positive definite. It reframes A x = b as minimizing the energy f(x) = (1/2) x^T A x - b^T x, whose minimizer is exactly the solution. CG descends along directions that are A-orthogonal (conjugate) to all previous ones, which is what lets each step rely on the two-term Lanczos recurrence — it stores only a handful of vectors no matter how many iterations you run.

Conjugate Gradient for symmetric positive-definite A x = b:

  x = x0;  r = b - A*x;  p = r;  rs = <r,r>
  repeat:
      Ap    = A * p                 # the one matvec per step
      alpha = rs / <p, Ap>          # exact line minimum
      x     = x + alpha * p
      r     = r - alpha * Ap
      rs_new = <r, r>
      if sqrt(rs_new) < tol: stop
      beta  = rs_new / rs           # keeps directions A-conjugate
      p     = r + beta * p
      rs    = rs_new

Memory: x, r, p, Ap  -> just 4 vectors, independent of step count.
Error bound: ||e_k||_A / ||e_0||_A <= 2 ((sqrt(K)-1)/(sqrt(K)+1))^k,
            where K = condition number.  Small K => fast.

CG: one matvec and a constant four vectors per step; convergence speed set by sqrt of the condition number.

GMRES: when symmetry is gone

Most real matrices are not symmetric — convection terms, asymmetric couplings, general operators. Then CG's short recurrence collapses and you need GMRES (Generalized Minimal RESidual). At step m it picks the x in x_0 + K_m that minimizes the residual norm ||b - A x|| over the whole Krylov subspace, solving a tiny least-squares problem on the Hessenberg matrix from Arnoldi. Its residual is monotonically non-increasing — it can never get worse.

Both methods share one Achilles heel, visible in CG's error bound: convergence speed is governed by the condition number. A well-conditioned matrix converges in a handful of steps; an ill-conditioned one can crawl for thousands. That single fact is what makes the next guide — preconditioning — not an optional refinement but the difference between a solver that finishes and one that never does.