Regression is projection
Least-squares regression fits y ≈ X b. Since X b can only ever land in the column space of X, the best we can do is the orthogonal projection of y onto that subspace. Setting the residual y - X b orthogonal to every column gives the normal equations X^T X b = X^T y. This is Volume I's geometry, now wearing a data-science hat.
The Kalman filter: dynamics meets least squares
Fuse guide 4 with guide 5. A Kalman filter tracks a state that follows a linear dynamical system x_{t+1} = A x_t + noise, while you only see noisy measurements z_t = H x_t + noise. Each step predicts by pushing the state through A, then corrects by a least-squares update toward the new measurement, weighted by how much you trust each source.
- Predict: x_pred = A x_est, and grow the uncertainty (covariance) by the process noise.
- Compare: residual = z - H x_pred (how surprised the measurement is).
- Correct: x_est = x_pred + K * residual, where the Kalman gain K is the least-squares optimal blend.
The DFT is just a change of basis
The discrete Fourier transform sounds like analysis, but it is pure linear algebra: it rewrites a signal in the basis of complex sinusoids. The DFT matrix F is (a scaling of) a unitary matrix — its columns are orthonormal — so transforming and inverting are the same change of basis done forward and back, no information lost.
The payoff: four problems, one toolkit
Step back and the whole track is three moves played in different arenas. Eigenvectors find steady states and stability — PageRank, Markov chains, dynamical systems. Projection / low rank finds the best simple summary — PCA, regression, recommenders, embeddings. Change of basis picks coordinates where the problem is easy — Fourier, spectral clustering, diagonalization.
That is the whole promise of Volume II made concrete: the same handful of theorems you proved abstractly in a first course are the working engine behind search, recommendation, networks, signal processing, and control. Learn the toolkit once; recognize it everywhere.