Estimating Eigenvalues: The Rayleigh Quotient

Why we need an estimate at all

By now in this rung you have seen the Sturm-Liouville problem from several sides: an operator L written in self-adjoint form, a weight w(x), and a ladder of eigenvalues lambda_1 < lambda_2 < lambda_3 < ... that are real, discrete, and march off to infinity, each with its own eigenfunction. Those eigenvalues are the whole point: physically they ARE the squared natural frequencies of a vibrating string, the energy levels of a quantum particle, the buckling loads of a column. But here is the uncomfortable truth this guide confronts head-on — for almost any real problem you cannot write the eigenvalues down in closed form. The string is non-uniform, the column tapers, the potential is lumpy, and the exact spectrum is simply not available.

So we lower our ambition in exactly the right way. Often we do not need ALL the eigenvalues — we need the lowest one, lambda_1, because it governs the slowest, most easily excited, most physically dominant mode: the fundamental tone of the drum, the ground-state energy of the atom, the load at which the column first buckles. And we do not always need it to ten digits — we need a trustworthy estimate, ideally one that comes with a guarantee about which side of the truth it lands on. The remarkable fact ahead is that a single quotient delivers exactly that.

The energy quotient and what it remembers

Here is the object. For a Sturm-Liouville operator built from p(x), q(x), and weight w(x), the Rayleigh quotient of a trial function u(x) — any reasonable function that obeys the boundary conditions — is R[u] = (integral of [p (u')^2 - q u^2] dx) / (integral of w u^2 dx), both integrals taken across the interval. Do not let the symbols intimidate you; read them physically. The numerator is an energy: p (u')^2 is the bending-or-stretching energy stored when you deform the system into the shape u (it grows where the slope u' is steep), and -q u^2 is a potential contribution. The denominator is a weighted size of u, its norm measured with the natural weight w. So R[u] is, quite literally, stored energy divided by the amount of stuff doing the storing — an energy per unit weighted amplitude.

Why this particular ratio? Because it is the eigenvalue in disguise. Suppose you were lucky enough to feed it the EXACT n-th eigenfunction y_n, which satisfies L y_n = lambda_n w y_n. Multiply that equation by y_n, integrate, and move the derivative onto the other factor using integration by parts — the boundary terms drop out precisely because y_n obeys the boundary conditions, which is the self-adjoint structure earning its keep. What is left is exactly R[y_n] = lambda_n. The quotient does not approximate the eigenvalue when handed a true eigenfunction; it returns it on the nose. The Rayleigh quotient remembers the spectrum perfectly.

The minimum principle: lambda_1 is the floor

Now the centerpiece, the variational characterization of the lowest eigenvalue: among ALL admissible trial functions u, the Rayleigh quotient is smallest exactly when u is the ground-state eigenfunction y_1, and its minimum value is lambda_1. In symbols, lambda_1 = min over u of R[u]. The eigenvalue problem has quietly turned into a minimization problem — and minimizing a single number over a family of shapes is something we know how to attack approximately, whereas solving the differential equation exactly we often cannot.

Where does the floor come from? Recall the deep fact from earlier in this rung: the eigenfunctions form a complete, orthogonal family, so ANY admissible u can be written as a superposition u = c_1 y_1 + c_2 y_2 + c_3 y_3 + ... — a generalized Fourier series in the eigenfunctions. Plug that expansion into R[u]. Because the y_n are orthogonal with weight w, all the cross-terms vanish, and the quotient collapses into a clean weighted average: R[u] = (lambda_1 c_1^2 + lambda_2 c_2^2 + lambda_3 c_3^2 + ...) / (c_1^2 + c_2^2 + c_3^2 + ...), where I have absorbed the weighted norms of each y_n into the c_n. Read that as a center of mass: it is an average of the eigenvalues lambda_n, weighted by how much of each mode your trial shape contains.

From that one line the whole theorem falls out. An average of the numbers lambda_1, lambda_2, lambda_3, ... — all of which are at least lambda_1, since lambda_1 is the smallest — can never dip below lambda_1. So R[u] is greater than or equal to lambda_1 for EVERY trial function, with equality precisely when all the weight sits on the first mode, that is when c_2 = c_3 = ... = 0 and u is a multiple of y_1. That is the minimum principle, proved by nothing deeper than 'an average cannot be smaller than its smallest ingredient.' The lowest eigenvalue is the floor of the Rayleigh quotient, and the ground state is the shape that sits on it.

Why a crude guess works so well

Here is the property that makes the method magical rather than merely valid. Near the true ground state the Rayleigh quotient is stationary — flat, like the bottom of a smooth valley. Suppose your guess is the real eigenfunction plus a small error, u = y_1 + epsilon h, where epsilon is tiny. Push it through the weighted-average formula: the error h injects small amounts of the higher modes, but every contribution it makes to R[u] enters squared. The shape error is first order in epsilon, yet the eigenvalue error is second order, of size epsilon^2. A 10 percent error in your guessed shape produces only about a 1 percent error in the eigenvalue. This is the same flatness-at-a-minimum you met in Volume I: at a minimum of any smooth function the derivative is zero, so first-order wiggles cost you nothing and only the second-order curvature is felt.

Let us make it concrete with the cleanest possible case. Take y'' + lambda y = 0 on the interval 0 to 1 with y(0) = y(1) = 0 — the plucked string clamped at both ends. Here p = 1, q = 0, w = 1. The exact answer you already know from the sine series of earlier guides: lambda_1 = pi^2 = 9.8696, with eigenfunction sin(pi x). Now pretend you do not know that, and simply guess the simplest curve that is zero at both ends and bulges up in the middle: the parabola u = x(1 - x). One integral for the numerator, one for the denominator, and you are done.

Problem:  y'' + lambda y = 0,  y(0) = y(1) = 0   (p=1, q=0, w=1)
Exact:    lambda_1 = pi^2 = 9.8696...   ,  y_1 = sin(pi x)

Trial shape (zero at both ends, bulges in the middle):
    u = x(1 - x)        u' = 1 - 2x

Numerator   = integral_0^1 (u')^2 dx = integral_0^1 (1 - 2x)^2 dx = 1/3
Denominator = integral_0^1  u^2  dx = integral_0^1 [x(1-x)]^2 dx = 1/30

    R[u] = (1/3) / (1/30) = 10

Compare:  10  vs  pi^2 = 9.8696   ->  +1.3% , and ABOVE the truth.

A one-line parabola overestimates the fundamental eigenvalue by only 1.3 percent — and the minimum principle guarantees the estimate sits above pi^2, never below.

Sit with that result. A parabola is visibly not a sine wave — it is too pointy at the top, its second derivative is a constant rather than oscillating — and yet R[u] = 10 misses the exact pi^2 = 9.8696 by barely over one percent. That is the second-order stationarity at work: a shape that is roughly right in the eyeball sense gives an eigenvalue that is precisely right in the decimal sense. And we got it for free, with two elementary integrals, no differential equation solved.

Turning the principle into a procedure

The single parabola was a one-shot estimate. To do better, do not guess once — guess a whole family and let the minimum principle pick the best member. Write the trial function with a few free knobs, u = a_1 phi_1 + a_2 phi_2 + ... + a_N phi_N, where the phi_k are chosen basis shapes (each satisfying the boundary conditions) and the a_k are numbers you get to tune. Then R[u] becomes an ordinary function of the coefficients, and 'minimize over all u' becomes 'minimize over the a_k' — a finite, concrete optimization. This is the Rayleigh-Ritz method, the practical engine behind the principle.

Choose N trial shapes phi_1, ..., phi_N that each satisfy the boundary conditions, and form u = a_1 phi_1 + ... + a_N phi_N with unknown coefficients.
Build the stiffness matrix A (numerator integrals between every phi_i and phi_j) and the mass matrix B (denominator weighted integrals), so the numerator is a-transpose A a and the denominator is a-transpose B a.
Setting the gradient of R to zero turns the minimization into a generalized matrix eigenvalue problem A a = mu B a — the calculus-of-variations problem has become linear algebra.
The smallest matrix eigenvalue mu_1 is your estimate of lambda_1; adding more or better basis shapes can only push the estimate DOWN, monotonically toward the true value.

Honest limits, and the higher modes

Be clear-eyed about what the guarantee does and does not cover. The clean min-principle delivers an UPPER bound on lambda_1, and only lambda_1. It always overestimates — you know your number is too big, never too small, which is genuinely useful (in quantum mechanics this is the famous variational principle: a computed ground-state energy can never fall below the true one). But it gives you no automatic lower bound, so 'how far above am I' is not answered by the method itself; that requires extra, harder estimates. And the answer is one-sided by nature, not a typo you can fix by guessing better.

What about lambda_2, lambda_3, and beyond? The same machine reaches them, with one extra rule. Recall the weighted-average picture: R[u] is dragged down by whatever lambda_1-content the trial function carries. So to estimate the SECOND eigenvalue, restrict your trial functions to be orthogonal to the ground state — force c_1 = 0, removing the lowest mode from the average — and then minimizing R[u] over that restricted set gives an upper bound for lambda_2. Knock out the first two modes and you bound lambda_3, and so on up the ladder. (The slicker, assumption-free version is the Courant-Fischer min-max characterization, which reaches lambda_n without needing the earlier eigenfunctions in hand.)

One last guardrail worth stating plainly. The whole minimum principle leans on the Sturm-Liouville structure being in force — a self-adjoint operator with proper boundary conditions and a positive weight w. That is what makes the eigenvalues real and bounded below, gives the orthogonal complete basis the proof rode on, and pins the numerator's sign. Strip those hypotheses away (a non-self-adjoint operator, a sign-changing weight) and the comforting 'always an upper bound from above' can fail. The Rayleigh quotient is not a universal oracle; it is the precise reward you earn for having a genuine Sturm-Liouville problem — which, as this rung has argued throughout, is exactly the setting where Fourier, Legendre, and Bessel expansions all live.