Least Squares: Fitting Lines to Data

When there is no exact answer

Real data does not sit on a perfect line. Suppose you measure many points and want a single straight line through them. Writing 'this line passes exactly through every point' gives a system A x = b with more equations than unknowns — overdetermined. Almost always there is no exact solution; the noise makes the equations contradict each other.

fit y = m*t + c to three points:
  (t,y) = (0,1), (1,2), (2,2)

[0 1][m]   [1]
[1 1][c] = [2]    <- 3 equations, 2 unknowns
[2 1]      [2]       no exact (m,c) hits all three

More equations than unknowns: usually unsolvable exactly.

Settle for the closest thing

If you cannot hit b exactly, hit the closest reachable point instead. Every vector A x you can produce lives in the column space of A — the set of all combinations of A's columns. The vector b you actually want usually sits outside that space. Least squares picks the x whose A x is nearest to b, measured by the total squared error.

The normal equations

How do we turn 'the error is perpendicular to the column space' into something computable? Each column of A must be orthogonal to the error b - A x, i.e. A^T (b - A x) = 0. Rearranged, that is the famous normal equations: A^T A x = A^T b. This is now a square, solvable system for the best x.

Form A^T A and A^T b from your data matrix A and target b.
Solve the square system A^T A x = A^T b for x (the line's slope and intercept).
That x minimizes the total squared error — your best-fit line.