PCA and Dimensionality Reduction

The idea: rotate to the axes of greatest variance

Imagine a cloud of data points. It is wider in some directions than others. Principal component analysis finds the direction of greatest spread (the first principal component), then the next-greatest direction perpendicular to it, and so on. It is nothing more than choosing a better set of axes — a rotation aligned with the shape of your data.

Computationally, PCA is just the SVD of your data after centering it (subtracting the mean of each feature). The principal components are the top right-singular vectors; the singular values tell you how much variance each one captures.

Center the data: subtract each feature's mean so the cloud sits at the origin.
Take the SVD of the centered data matrix, A = U*S*V^T.
The columns of V are the principal components; sigma_i^2 is the variance along each.
Project the data onto the top k components to reduce it to k dimensions.

Compress and visualize

Keeping only the top few components is a projection onto the directions that matter most for variance. Often a 1000-feature dataset has its shape captured by a handful of components, letting you plot it in 2D or feed a smaller, faster model. This is the same move as low-rank approximation: throw away the small singular values, keep the big ones.

variance kept = (sigma_1^2 + ... + sigma_k^2) / (sigma_1^2 + ... + sigma_r^2)
# pick the smallest k that keeps, say, 95% of the variance

The explained-variance ratio guides how many components to keep.

PCA and Dimensionality Reduction

The idea: rotate to the axes of greatest variance

Compress and visualize

Honest caveats