The Payoff: PCA, Polar Decomposition, and Big Data

PCA is just an SVD of centered data

Principal component analysis via SVD is the SVD wearing a statistician's hat. Center your data matrix X (subtract each column's mean), take its SVD X = U Sigma V^T, and the right singular vectors v_i are exactly the principal directions — the axes of greatest variance. This sharpens the Vol I view of PCA.

The variance captured along v_i is sigma_i^2 / (N-1). Keeping the top k directions IS a truncated SVD, so by Eckart-Young it is provably the best k-dimensional summary of the data. Dimensionality reduction and best-low-rank approximation are the same theorem.

Polar decomposition: rotation times stretch

Regroup the SVD and a second classical factorization falls out. The polar decomposition writes A = Q P, where Q = U V^T is orthogonal (a pure rotation/reflection) and P = V Sigma V^T is symmetric positive semidefinite (a pure stretch). It is the matrix analogue of writing a complex number as (unit phase) times (magnitude).

From A = U Sigma V^T, insert V^T V = I in the middle:

   A = U Sigma V^T = (U V^T)(V Sigma V^T) = Q P
        Q = U V^T          orthogonal     (rotation / reflection)
        P = V Sigma V^T    sym. pos. semidef. (stretch along principal axes)

Application: the orthogonal matrix CLOSEST to A (in Frobenius norm) is
   exactly  Q = U V^T.
This is how you 'snap' a drifted rotation matrix back to a true rotation
   in graphics, robotics, and orthogonal Procrustes alignment.

Polar decomposition splits any matrix into a rotation Q and a stretch P.

Scaling up: randomized SVD

On a billion-by-million matrix you cannot afford a full SVD — but you rarely need it. If only the top k singular values matter, randomized SVD finds them fast. Multiply A by a small random matrix to sketch its dominant range, orthonormalize, and do a small dense SVD inside that sketch.

Draw a random n-by-(k+p) matrix Omega (p is a small oversampling margin, say 5-10).
Form the sketch Y = A Omega; orthonormalize its columns with QR to get Q with Q^T Q = I.
Project: B = Q^T A is small; take its SVD B = U_B Sigma V^T cheaply.
Lift back: U = Q U_B. Then A is approximately U Sigma V^T, accurate to near the Eckart-Young optimum.

From a decomposition that merely *exists* for every matrix, we arrived at the workhorse behind recommender systems, image compression, latent-semantic search, noise reduction, and the dimensionality reduction inside countless machine-learning pipelines. One factorization — A = U Sigma V^T — and the rotate-stretch-rotate picture sits underneath all of it. That is the payoff of the SVD.