Building Orthogonal Bases

A basis you have, an orthogonal one you want

The earlier guides in this rung kept handing you orthogonality as a gift. The Sturm-Liouville problem guaranteed that eigenfunctions for different eigenvalues are automatically orthogonal, and the generalized Fourier series guide showed how cheaply you can expand a function once an orthogonal family is in hand — one integral per coefficient and no algebra to solve. But that raises an honest question we have not yet faced: where does a fresh orthogonal family actually come from when nobody hands you one? This guide answers it constructively. We are going to BUILD an orthogonal basis from the most boring raw material imaginable, and the famous polynomials will fall out as a reward.

Start with what you already trust. The powers 1, x, x^2, x^3, ... are the most familiar basis in all of mathematics: every polynomial is a combination of them, and a Taylor series is just an expansion in exactly these blocks. So the powers already span the space — they are a complete set of axes for polynomials. The trouble is that they are a SKEWED set of axes. They overlap each other badly; on the interval [-1, 1], for instance, the integral of 1 times x^2 is two-thirds, not zero, so the constant function and x^2 are far from perpendicular. Working in a skewed basis is like reading coordinates off two axes meeting at a sharp angle — every coordinate leaks into the others, and you cannot read one off without solving for all of them at once.

So the plan writes itself. We keep the spanning power of 1, x, x^2, ... — that part is already perfect — and we straighten the axes until they meet at right angles, without changing the space they describe. The straightening procedure is Gram-Schmidt orthogonalization, the single most important algorithm for turning any basis into a perpendicular one. You likely met it for vectors in finite dimensions; here it works verbatim for functions, once we agree on what 'perpendicular' means for two curves. That agreement is the whole subtlety, and it is where the weight enters. See Gram-Schmidt orthogonalization.

What 'perpendicular' means for two curves

For two arrows in space, perpendicular means the dot product is zero: multiply the matching components, add them up, get zero. For two functions we copy that recipe one notch up. Multiplying matching components becomes multiplying the curves point by point; adding up the components becomes integrating over the interval. So the natural inner product of two functions f and g on an interval [a, b] is the integral over [a, b] of f(x) g(x) dx, and the curves are orthogonal exactly when that integral vanishes. The dot product became an integral, and that is the only genuinely new idea you need.

But the Sturm-Liouville machinery from guide one already warned us that orthogonality usually comes with a weight function w(x) baked in — when an operator is written in self-adjoint form, its eigenfunctions are orthogonal not under the plain integral but under a weighted one. So we adopt the general definition from the start. The weighted inner product of f and g is the integral over [a, b] of f(x) g(x) w(x) dx, where w(x) is a fixed positive weight. Two curves are orthogonal when this weighted integral is zero. This is orthogonality with a weight, and the punchline of this whole guide is that the choice of interval and weight is the ONLY input that decides which famous family you build.

Gram-Schmidt: subtract the shadow, repeat

The whole algorithm is one vivid idea applied over and over: to make a new vector perpendicular to the ones you already have, subtract off its shadow on each of them. Geometrically, the shadow (the projection) of a vector onto an existing axis is the part that points along that axis; remove it and what is left points purely sideways, perpendicular to that axis. Do this against every axis you have built so far, and the leftover is perpendicular to all of them at once. Gram-Schmidt just runs this shadow-removal down the list of raw powers, one at a time, in order.

Take the first raw power, the constant 1, unchanged; call it the first family member u_0. There is nothing earlier to be perpendicular to, so it starts the basis as-is.
Take the next raw power, x. Compute its shadow on u_0 (the weighted inner product of x with u_0, divided by the inner product of u_0 with itself, times u_0) and subtract that shadow off. What remains, u_1, is guaranteed perpendicular to u_0.
Take x^2. Subtract its shadow on u_0 AND its shadow on u_1 — both must go. What remains, u_2, is perpendicular to both earlier members. Continue with x^3, x^4, ..., each time stripping off every shadow on every member already built.
Optionally normalize: divide each finished u_k by its own norm so it has length 1. Orthogonal is enough for clean coefficient formulas; orthonormal is just the tidied-up, unit-length version.

Notice one structural fact that will reassure you later: because u_k is x^k minus a combination of lower powers, every u_k is itself a polynomial of degree exactly k, with one new highest power and corrections below it. So the family you grow has exactly one polynomial of each degree, and you can stop at any degree and still have a perfectly good orthogonal basis for polynomials up to that degree. That is precisely the structure of all the classical families. Now we turn the crank with two different weights and watch two different famous families march out.

Weight 1 on [-1, 1] grows Legendre

Make the simplest possible choice: the interval [-1, 1] with the plainest weight, w(x) = 1. Here the inner product is just the ordinary integral over [-1, 1]. A symmetry shortcut saves a lot of arithmetic: over a symmetric interval, the integral of any ODD function is zero, because the left half cancels the right. Keep that in mind and the shadows mostly vanish before you compute them. Start the machine: u_0 = 1. Next, x — is its shadow on u_0 already zero? Its shadow uses the integral over [-1, 1] of x times 1, which is the integral of an odd function and so is zero. So x is already perpendicular to 1, and u_1 = x untouched.

Now the first real subtraction. Take x^2 and remove its shadows on u_0 = 1 and u_1 = x. Its shadow on x is zero (the integral of x^2 times x over [-1, 1] is an odd integral, gone). Its shadow on 1 is not zero: the integral over [-1, 1] of x^2 is two-thirds, while the integral of 1 times 1 is two, so the shadow coefficient is (two-thirds) over two, which is one-third. Subtract: u_2 = x^2 minus one-third. That is the Legendre shape — up to the conventional scaling P_2(x) = (3 x^2 minus 1) over 2, which is exactly 3/2 times our u_2. The classical Legendre polynomials just assembled themselves from the powers, with no equation, no luck, only shadow-subtraction.

Weight w(x) = 1 on [-1, 1].  Inner product <f,g> = integral_{-1}^{1} f g dx.
(Odd integrands over [-1,1] vanish, killing most shadows.)

  u0 = 1
  u1 = x  -  ( <x,1>/<1,1> ) * 1   =  x  -  0       =  x
  u2 = x^2 - ( <x^2,1>/<1,1> )*1 - ( <x^2,x>/<x,x> )*x
            <x^2,1> = 2/3,  <1,1> = 2  ->  shadow coeff = 1/3
            <x^2,x> = 0 (odd)          ->  no shadow on x
     u2 = x^2 - 1/3

Conventional rescaling (force P_n(1) = 1):
  P0 = 1,   P1 = x,   P2 = (3x^2 - 1)/2  =  (3/2) * u2

Same machine, only the WEIGHT and INTERVAL change to switch families.

Two turns of the Gram-Schmidt crank on [-1, 1] with weight 1. The leftover after shadow-subtraction is the Legendre polynomial, up to a conventional rescaling that fixes P_n(1) = 1.

Swap the weight, grow Hermite instead

Here is the moment the construction earns its keep: change nothing about the algorithm, only the habitat. Move to the whole real line (-infinity, infinity) and adopt the bell-curve weight w(x) = e^{-x^2}, the Gaussian shape that ran through the special-functions rung. The same odd-function symmetry still helps — over a symmetric domain with a symmetric weight, odd integrands still integrate to zero. The new ingredient is just that the inner-product integrals now run to infinity and lean on the Gaussian integral: the integral over the whole line of e^{-x^2} is the square root of pi, and the integral of x^2 e^{-x^2} is half the square root of pi.

Run the identical three steps. u_0 = 1 again. For x, its shadow on 1 uses the integral over the line of x e^{-x^2}, an odd integrand, which is zero — so u_1 = x once more. For x^2, its shadow on x is zero by oddness again, and its shadow on 1 has coefficient (the integral of x^2 e^{-x^2}) over (the integral of e^{-x^2}), which is (half the square root of pi) over (the square root of pi), exactly one-half. Subtract: u_2 = x^2 minus one-half. That is the Hermite shape: the classical convention writes H_2(x) = 4 x^2 minus 2, which is just 4 times our u_2. The very same crank, the very same powers, but a different weight — and out walks a different family of Hermite polynomials.

Sit with what just happened, because it is the lesson of the guide. Legendre and Hermite are not two unrelated inventions a textbook asks you to memorize. They are the SAME algorithm — orthogonalize the powers — run with two different weights, w = 1 on a finite interval versus w = e^{-x^2} on the whole line. Laguerre is the same crank with weight e^{-x} on the half-line; Chebyshev is the same crank with weight 1 over the square root of (1 - x^2) on [-1, 1]. The famous families are not a zoo of special creatures; they are one construction photographed under four different lights. The weight is the dial, and turning it selects the family.

Honest limits, and where this points next

A few honesties before you over-trust the recipe. First, normalization is a convention, not a fact of nature: our raw u_2 was x^2 minus one-third, the classical P_2 is (3 x^2 minus 1) over 2, and a physicist normalizing to unit norm would write yet another multiple. All three are the same curve up to scale, and you must always check which convention a formula assumes — the orthogonality is real, the leading constant is a choice. Second, Gram-Schmidt needs the powers to actually be integrable against the weight; on an infinite interval the weight MUST decay fast enough to kill the growth of x^k, which is exactly why Hermite needs the strong e^{-x^2} and not a slower tail.

Third, a caution that bites in real computation: classical Gram-Schmidt done in finite-precision arithmetic is numerically fragile. The early members are nearly parallel before straightening, so subtracting their shadows means subtracting big nearly-equal numbers, and rounding errors let orthogonality quietly drift away by the time you reach high degree. The cure is a re-ordered version called modified Gram-Schmidt, which subtracts shadows one at a time as it goes; it is mathematically identical but far steadier on a computer. For pencil-and-paper families like the ones above this never matters, but it is the reason production code rarely runs the textbook algorithm verbatim.

Carry away the working picture. An orthogonal basis is not a magic gift; it is a basis you straighten, and Gram-Schmidt is the straightening — subtract the shadow on each axis already built and keep the perpendicular remainder. The interval and the weight are the only knobs, and turning them sends the same machine out a different door as Legendre, Hermite, Laguerre, or Chebyshev. With a genuine orthogonal family now in hand by construction, you are equipped to expand any reasonable function in it for free, exactly the generalized Fourier payoff from before. The final guide of this rung asks a different question of these same eigenfunctions — not how to build the basis, but how to estimate its eigenvalues quickly — and answers it with the Rayleigh quotient.