Building an orthonormal Krylov basis
The raw Krylov vectors r_0, A r_0, A^2 r_0, ... are a numerical disaster: they all tilt toward the dominant eigenvector, so they become nearly parallel and lose linear independence fast. Before doing anything useful you must orthonormalize them. The Arnoldi iteration does exactly this — a Gram-Schmidt process applied as the basis is generated — producing an orthonormal basis of the Krylov subspace and, as a byproduct, a small Hessenberg matrix that represents A on that basis.
Conjugate gradient: symmetry as a gift
Conjugate gradient (CG) is the method of choice when A is symmetric positive definite. It reframes A x = b as minimizing the energy f(x) = (1/2) x^T A x - b^T x, whose minimizer is exactly the solution. CG descends along directions that are A-orthogonal (conjugate) to all previous ones, which is what lets each step rely on the two-term Lanczos recurrence — it stores only a handful of vectors no matter how many iterations you run.
Conjugate Gradient for symmetric positive-definite A x = b:
x = x0; r = b - A*x; p = r; rs = <r,r>
repeat:
Ap = A * p # the one matvec per step
alpha = rs / <p, Ap> # exact line minimum
x = x + alpha * p
r = r - alpha * Ap
rs_new = <r, r>
if sqrt(rs_new) < tol: stop
beta = rs_new / rs # keeps directions A-conjugate
p = r + beta * p
rs = rs_new
Memory: x, r, p, Ap -> just 4 vectors, independent of step count.
Error bound: ||e_k||_A / ||e_0||_A <= 2 ((sqrt(K)-1)/(sqrt(K)+1))^k,
where K = condition number. Small K => fast.GMRES: when symmetry is gone
Most real matrices are not symmetric — convection terms, asymmetric couplings, general operators. Then CG's short recurrence collapses and you need GMRES (Generalized Minimal RESidual). At step m it picks the x in x_0 + K_m that minimizes the residual norm ||b - A x|| over the whole Krylov subspace, solving a tiny least-squares problem on the Hessenberg matrix from Arnoldi. Its residual is monotonically non-increasing — it can never get worse.
Both methods share one Achilles heel, visible in CG's error bound: convergence speed is governed by the condition number. A well-conditioned matrix converges in a handful of steps; an ill-conditioned one can crawl for thousands. That single fact is what makes the next guide — preconditioning — not an optional refinement but the difference between a solver that finishes and one that never does.