Eckart-Young: The Best Low-Rank Approximation

A as a sum of rank-one pieces

Multiplying A = U Sigma V^T out term by term gives the outer-product form: A = sigma_1 u_1 v_1^T + sigma_2 u_2 v_2^T + ... + sigma_r u_r v_r^T. Each u_i v_i^T is a rank-one matrix, weighted by its singular value. Because the sigma_i are sorted, the first terms carry the most of A.

Keep only the first k terms and you get the truncated SVD A_k = sum_{i=1}^k sigma_i u_i v_i^T. It is a rank-k matrix — a deliberately simplified stand-in for A. The natural question: among ALL rank-k matrices, is A_k the closest one to A?

The Eckart-Young theorem

Yes — and that is the Eckart-Young theorem, one of the most useful results in all of applied mathematics. Over every matrix B of rank at most k, the truncated SVD A_k minimizes both ||A - B||_2 and ||A - B||_F. The leftover error is governed entirely by the singular values you discarded.

Eckart-Young (spectral and Frobenius forms):

   min over rank(B) <= k  of  ||A - B||_2  =  ||A - A_k||_2  =  sigma_{k+1}

   min over rank(B) <= k  of  ||A - B||_F  =  ||A - A_k||_F
                                            =  sqrt(sigma_{k+1}^2 + ... + sigma_r^2)

Spectral-norm sketch:
   - A - A_k = sum_{i>k} sigma_i u_i v_i^T, whose top singular value is sigma_{k+1},
     so ||A - A_k||_2 = sigma_{k+1}: A_k DOES this well.
   - For ANY rank-k B, the (k+1)-dim space span(v_1,...,v_{k+1}) meets null(B)
     (dimensions k+1 and >= n-k overlap). Pick a unit x there: B x = 0, and
     ||A x|| >= sigma_{k+1}. Hence ||A - B||_2 >= sigma_{k+1}. No B beats A_k.

The error of the best rank-k approximation is exactly the first discarded singular value.

The pseudoinverse: solving the unsolvable

Most matrices have no inverse: rectangular, or rank-deficient. The SVD repairs this. Invert what you can — the nonzero singular values — and leave the rest alone. That gives the Moore-Penrose pseudoinverse A^+ = V Sigma^+ U^T, where Sigma^+ replaces each sigma_i > 0 by 1/sigma_i and transposes.

Then x = A^+ b is exactly the minimum-norm least-squares solution of A x = b: among all x that minimize ||A x - b||, it is the shortest one. The SVD thus unifies inversion, least squares, and rank deficiency into a single formula.