When there is no exact answer
Real data does not sit on a perfect line. Suppose you measure many points and want a single straight line through them. Writing 'this line passes exactly through every point' gives a system A x = b with more equations than unknowns — overdetermined. Almost always there is no exact solution; the noise makes the equations contradict each other.
fit y = m*t + c to three points: (t,y) = (0,1), (1,2), (2,2) [0 1][m] [1] [1 1][c] = [2] <- 3 equations, 2 unknowns [2 1] [2] no exact (m,c) hits all three
Settle for the closest thing
If you cannot hit b exactly, hit the closest reachable point instead. Every vector A x you can produce lives in the column space of A — the set of all combinations of A's columns. The vector b you actually want usually sits outside that space. Least squares picks the x whose A x is nearest to b, measured by the total squared error.
The normal equations
How do we turn 'the error is perpendicular to the column space' into something computable? Each column of A must be orthogonal to the error b - A x, i.e. A^T (b - A x) = 0. Rearranged, that is the famous normal equations: A^T A x = A^T b. This is now a square, solvable system for the best x.
- Form A^T A and A^T b from your data matrix A and target b.
- Solve the square system A^T A x = A^T b for x (the line's slope and intercept).
- That x minimizes the total squared error — your best-fit line.