Singular Values as Norms, and the Four Subspaces

The biggest stretch is a norm

The largest singular value sigma_1 answers: by how much can A magnify any vector? Precisely, sigma_1 = max over ||x||=1 of ||A x||. This is the spectral norm (the operator 2-norm) ||A||_2 = sigma_1. The maximizing input is v_1 and the output direction is u_1.

The smallest nonzero singular value sigma_r tells the opposite story — the least A can stretch a vector inside its row space. The ratio sigma_1 / sigma_r is the condition number kappa(A): how much A can distort relative lengths, and how badly errors can be amplified when you solve A x = b. A near-singular matrix has a tiny sigma_r and a huge condition number.

Three norms, one spectrum

Different norms are different ways of summarizing the same list of singular values. The Frobenius norm is their root-sum-of-squares; the nuclear (trace) norm is their plain sum; the spectral norm is their maximum. Think of them as the L2, L1, and L-infinity norms of the singular-value vector.

Let the singular values be  sigma_1 >= sigma_2 >= ... >= sigma_r > 0.

  spectral norm    ||A||_2   = sigma_1                  (max)
  Frobenius norm   ||A||_F   = sqrt(sigma_1^2 + ... + sigma_r^2)
  nuclear norm     ||A||_*   = sigma_1 + ... + sigma_r  (sum)

Why Frobenius works:  ||A||_F^2 = trace(A^T A) = sum of eigenvalues of A^T A
                                = sum of lambda_i = sum of sigma_i^2.
Orthogonal U, V do not change ||.||_F, so ||A||_F = ||Sigma||_F.

All three norms read off the same singular values; only the way of combining them differs.

Reading the four fundamental subspaces

Split the singular vectors by whether their singular value is positive (i <= r) or zero (i > r). The SVD reads off all four fundamental subspaces at once, and this is the cleanest statement of the rank-nullity theorem you will ever see.

Let r = rank(A) = number of positive singular values.

  column space of A      = span( u_1, ..., u_r )       (in R^m)
  left null space of A   = span( u_{r+1}, ..., u_m )    (in R^m)
  row space of A         = span( v_1, ..., v_r )        (in R^n)
  null space of A        = span( v_{r+1}, ..., v_n )    (in R^n)

The v's split R^n;  the u's split R^m;  each into orthogonal complements.
  dim(row space) + dim(null space)        = r + (n - r) = n
  dim(column space) + dim(left null space) = r + (m - r) = m

Positive-sigma vectors span the row/column spaces; zero-sigma vectors span the null spaces.