Building the SVD from A^T A

Why A^T A is the right object

Whatever A is, the matrix A^T A is square (n-by-n), symmetric, and positive semidefinite: x^T (A^T A) x = ||A x||^2 >= 0. By the spectral theorem it has an orthonormal eigenbasis with nonnegative eigenvalues. That is the seed of everything.

Write those eigenpairs as A^T A v_i = lambda_i v_i with lambda_1 >= lambda_2 >= ... >= 0. Define the singular values as sigma_i = sqrt(lambda_i). Because the lambda_i are nonnegative the square root is real — this is the precise reason singular values are always real and nonnegative even when A has complex or negative eigenvalues. See singular values vs eigenvalues.

Manufacturing the left singular vectors

The v_i are in hand. For each i with sigma_i > 0, define u_i = (A v_i) / sigma_i. Two short calculations show these are exactly the left singular vectors: they are unit vectors, and they are mutually orthogonal.

Unit length:
   ||u_i||^2 = (A v_i)^T (A v_i) / sigma_i^2
             = v_i^T (A^T A) v_i / sigma_i^2
             = v_i^T (lambda_i v_i) / sigma_i^2
             = lambda_i / sigma_i^2  =  1     (since sigma_i^2 = lambda_i)

Orthogonality (i != j):
   <u_i, u_j> = (A v_i)^T (A v_j) / (sigma_i sigma_j)
              = v_i^T (A^T A) v_j / (sigma_i sigma_j)
              = v_i^T (lambda_j v_j) / (sigma_i sigma_j)
              = (lambda_j / (sigma_i sigma_j)) <v_i, v_j>
              = 0                              (the v's are orthonormal)

=>  the u_i form an orthonormal set, and  A v_i = sigma_i u_i.

u_i = A v_i / sigma_i are automatically orthonormal — that is the whole trick.

The existence proof, assembled

Form A^T A; by the spectral theorem get orthonormal eigenvectors v_1, ..., v_n with eigenvalues lambda_1 >= ... >= lambda_n >= 0.
Set sigma_i = sqrt(lambda_i); let r be the number of strictly positive sigma_i (this r is the rank of A).
For i <= r set u_i = A v_i / sigma_i; extend u_1, ..., u_r to a full orthonormal basis of R^m.
Then A = U Sigma V^T holds exactly. Since this construction never required A to be square or invertible, the SVD exists for every matrix.

In practice nobody forms A^T A numerically — squaring the matrix squares its condition number and loses precision. The proof tells you the SVD *exists*; real algorithms compute it far more stably. But for understanding, A^T A is the master key.