PCA is just an SVD of centered data
Principal component analysis via SVD is the SVD wearing a statistician's hat. Center your data matrix X (subtract each column's mean), take its SVD X = U Sigma V^T, and the right singular vectors v_i are exactly the principal directions — the axes of greatest variance. This sharpens the Vol I view of PCA.
The variance captured along v_i is sigma_i^2 / (N-1). Keeping the top k directions IS a truncated SVD, so by Eckart-Young it is provably the best k-dimensional summary of the data. Dimensionality reduction and best-low-rank approximation are the same theorem.
Polar decomposition: rotation times stretch
Regroup the SVD and a second classical factorization falls out. The polar decomposition writes A = Q P, where Q = U V^T is orthogonal (a pure rotation/reflection) and P = V Sigma V^T is symmetric positive semidefinite (a pure stretch). It is the matrix analogue of writing a complex number as (unit phase) times (magnitude).
From A = U Sigma V^T, insert V^T V = I in the middle:
A = U Sigma V^T = (U V^T)(V Sigma V^T) = Q P
Q = U V^T orthogonal (rotation / reflection)
P = V Sigma V^T sym. pos. semidef. (stretch along principal axes)
Application: the orthogonal matrix CLOSEST to A (in Frobenius norm) is
exactly Q = U V^T.
This is how you 'snap' a drifted rotation matrix back to a true rotation
in graphics, robotics, and orthogonal Procrustes alignment.Scaling up: randomized SVD
On a billion-by-million matrix you cannot afford a full SVD — but you rarely need it. If only the top k singular values matter, randomized SVD finds them fast. Multiply A by a small random matrix to sketch its dominant range, orthonormalize, and do a small dense SVD inside that sketch.
- Draw a random n-by-(k+p) matrix Omega (p is a small oversampling margin, say 5-10).
- Form the sketch Y = A Omega; orthonormalize its columns with QR to get Q with Q^T Q = I.
- Project: B = Q^T A is small; take its SVD B = U_B Sigma V^T cheaply.
- Lift back: U = Q U_B. Then A is approximately U Sigma V^T, accurate to near the Eckart-Young optimum.
From a decomposition that merely *exists* for every matrix, we arrived at the workhorse behind recommender systems, image compression, latent-semantic search, noise reduction, and the dimensionality reduction inside countless machine-learning pipelines. One factorization — A = U Sigma V^T — and the rotate-stretch-rotate picture sits underneath all of it. That is the payoff of the SVD.