One decomposition for every matrix
The spectral theorem was lovely, but it only applies to symmetric matrices. Most matrices are not symmetric — many are not even square. The singular value decomposition (SVD) handles all of them. For any m-by-n matrix A: A = U*S*V^T, where U and V are orthogonal (rotations/reflections) and S is diagonal with non-negative entries.
A = U * S * V^T V^T : rotate the input (orthogonal) S : stretch each axis by sigma_1 >= sigma_2 >= ... >= 0 U : rotate into the output (orthogonal)
The geometry: rotate, stretch, rotate
Picture the unit circle. Apply A and it becomes an ellipse. The SVD names the pieces of that transformation precisely.
- V^T rotates the input space so the special input directions line up with the axes.
- S stretches each axis by its singular value sigma_i — these are the lengths of the ellipse's semi-axes.
- U rotates the stretched result into its final orientation in the output space.
The singular values sigma_1 >= sigma_2 >= ... >= 0 are the heart of the SVD. A large singular value is a direction the matrix stretches a lot; a near-zero singular value is a direction it nearly flattens. This single ordered list tells you almost everything about how the matrix behaves.
Singular values, rank, and eigenvalues
The number of nonzero singular values is exactly the rank of A. So the SVD reads off the rank by counting — and, crucially, it lets you measure *almost*-rank: if a singular value is tiny but not quite zero, the matrix is *nearly* lower rank, which is the whole basis for compression in the next guide.