The Sturm–Liouville Problem

The question hiding behind every expansion

By now you have met the same move again and again on this ladder: take an arbitrary function and write it as a sum of simpler pieces. A Fourier series builds it from sines and cosines; a Legendre series from polynomials on [-1, 1]; a Bessel series from the oscillating functions that live on a drumhead. Each of these felt like its own small miracle — why should an *arbitrary* function be expressible in exactly these special pieces, and why are the pieces always *orthogonal*, so that you can pick off one coefficient at a time? The honest answer is that these are not three miracles but one. They are all instances of a single eigenvalue problem named after Charles Sturm and Joseph Liouville, who studied it in the 1830s.

Here is the shape of it. A [[sturm-liouville-problem|Sturm–Liouville problem]] is a second-order linear differential equation carrying an undetermined parameter lambda, posed on an interval [a, b] together with conditions at the two ends. In its canonical written form it reads (d/dx)(p(x) dy/dx) + q(x) y + lambda w(x) y = 0, where p(x) and the [[weight-function|weight function]] w(x) are given, both strictly positive inside the interval. The task is not to solve it for one fixed lambda. It is to find the *special* values of lambda — the eigenvalues lambda_1, lambda_2, lambda_3, ... — for which a nonzero solution exists that also obeys the boundary conditions. Each such lambda_n comes paired with its own eigenfunction y_n(x).

The self-adjoint form: a symmetric matrix in disguise

Strip the parameter and the weight away for a moment and look at the differential operator itself: L[y] = (d/dx)(p(x) dy/dx) + q(x) y. The crucial thing is the very first term, where the derivative is *packaged inside another derivative* — p times y', and then the whole thing differentiated again. This particular packaging is called the [[self-adjoint-form|self-adjoint form]], and it is the differential-equation analogue of a symmetric matrix [a, b; b, c] (note the matching off-diagonal entries). A symmetric matrix is the kind that hands you real eigenvalues and orthogonal eigenvectors for free; an operator written in self-adjoint form inherits exactly those guarantees. Almost every good property in this whole subject traces back to this one shape.

But a typical second-order equation does not arrive in this tidy shape. It usually looks like a(x) y'' + b(x) y' + c(x) y, with the first-derivative term standing on its own outside any package. The beautiful fact — and the reason the theory is so broad — is that *any* linear second-order equation can be forced into self-adjoint form. You multiply the whole equation through by a cleverly chosen factor, after which the first two terms collapse into one exact derivative (d/dx)(p y'). This is precisely the [[integrating-factor|integrating factor]] trick you first used on first-order linear ODEs early in this volume, now doing heavier lifting. The required multiplier is mu(x) = (1/a) exp(integral of (b/a) dx), and afterward p(x) = mu(x) a(x).

General form:    a y'' + b y' + c y + lambda y = 0      (not self-adjoint)

  multiply by    mu(x) = (1/a) exp( integral (b/a) dx )

Self-adjoint:    (p y')' + q y + lambda w y = 0           p = mu*a,  w = mu
  ___________________________________________________________________
  Bessel:   x^2 y'' + x y' + (x^2 - n^2) y = 0
            divide by x  ->  (x y')' + (x - n^2/x) y = 0
            so   p = x,   q = -n^2/x,   weight  w = x
  ___________________________________________________________________
  Legendre: (1-x^2) y'' - 2x y' + lambda y = 0    is ALREADY self-adjoint:
            ((1-x^2) y')' + lambda y = 0
            so   p = 1-x^2,   q = 0,   weight  w = 1   on  [-1, 1]

One multiplier turns any linear second-order equation self-adjoint. The weight w is whatever ends up multiplying lambda y — and it is rarely just 1.

Why self-adjointness forces orthogonality

It is worth seeing, not just being told, why this packaging matters — because the proof is short and it shows you exactly where everything comes from. The whole argument rides on integration by parts, the very same integration by parts you learned in first-year calculus, applied twice. Take two eigenfunctions, y_m and y_n, with their eigenvalues lambda_m and lambda_n. Multiply the equation for y_m by y_n, the equation for y_n by y_m, subtract, and integrate across the interval. The q y terms cancel cleanly, and after moving the derivatives around by parts, almost everything else cancels too.

What is left is an exact identity: (lambda_m - lambda_n) times the integral from a to b of y_m(x) y_n(x) w(x) dx equals the boundary term p(x)(y_m' y_n - y_m y_n') evaluated at the two endpoints a and b. Read this carefully, because it is the hinge of the whole subject. *If* the boundary conditions are chosen so that the right-hand boundary term vanishes — and that is exactly the job of the Sturm–Liouville boundary conditions, coming next — then the right side is zero. For two distinct eigenvalues, lambda_m minus lambda_n is not zero, so the only way to satisfy the equation is for the integral itself to vanish: the integral of y_m y_n w over [a, b] equals zero. That is [[orthogonality-of-eigenfunctions|orthogonality]], falling straight out of the self-adjoint shape plus a cooperative boundary.

The boundary conditions that make it work

Everything just hinged on one phrase: *the boundary term must vanish*. So the boundary conditions are not decoration — they are half of the problem, and the wrong ones wreck the guarantees. A Sturm–Liouville problem is genuinely a boundary-value problem, not an initial-value problem: instead of fixing y and y' at one starting point and marching forward, you pin down behavior at *both* ends at once, and only special lambda values can thread that needle. The admissible [[sturm-liouville-boundary-conditions|Sturm–Liouville boundary conditions]] come in three families, and each is just a way of guaranteeing that p(y_m' y_n - y_m y_n') cancels between the two endpoints.

Regular (separated) conditions impose a homogeneous relation at each end independently: alpha y(a) + alpha' y'(a) = 0 and beta y(b) + beta' y'(b) = 0. The familiar special cases are Dirichlet (y = 0, the string clamped to the wall), Neumann (y' = 0, an insulated end of a heated bar where no heat flows out), and Robin (a mix of value and slope, like a bar losing heat to the air by Newton's cooling). Each kills the boundary term at its own endpoint, separately.
Periodic conditions tie the two ends together: y(a) = y(b) and y'(a) = y'(b), as on a closed ring where the two ends are literally the same point. Here the boundary term at a exactly cancels the one at b because everything matches across the seam. This is the case behind the full Fourier series with both sines and cosines — and a small subtlety: a periodic eigenvalue can carry two independent eigenfunctions at once (a degeneracy), unlike the regular case where each eigenvalue is simple.
Singular conditions appear when p(x) itself vanishes at an endpoint (p = 1-x^2 dies at x = +/-1 for Legendre; p = x dies at x = 0 for Bessel) or when the interval runs off to infinity. There you cannot pin down y the usual way — and you do not need to, because p already equals zero there, killing the boundary term on its own. Instead you impose a *regularity* demand: y must stay bounded, or stay square-integrable against the weight. The unspoken boundary condition for the sphere is exactly this — just that the solution not blow up at the poles.

The modeling lesson is that the *physics* dictates which family you are in, and getting it wrong silently corrupts everything downstream. A string clamped at both ends is Dirichlet and gives a sine basis; the very same equation with both ends insulated is Neumann and flips the basis to cosines, even admitting a flat lambda = 0 mode that encodes conserved total heat. Same operator, different boundary, different orthogonal family. So before reaching for a known expansion, always ask which of these three situations the real problem lives in.

The weight function: the hidden choice in 'orthogonal'

Whenever you call two functions orthogonal, you have quietly committed to a particular way of multiplying and adding them up — an inner product, built from a definite integral. The weight function w(x) is that quiet commitment made loud. Two functions f and g are orthogonal *with respect to the weight w* exactly when the integral from a to b of f(x) g(x) w(x) dx equals zero. Change the weight and the same two functions may stop being orthogonal; the word 'orthogonal' is meaningless until you name the weight that goes with it. For plain Fourier sines the weight is the boring w = 1, which is why nobody mentions it there — but that silence has misled many a student into thinking weight is optional.

And where does the weight come from? You do not choose it by hand — you *read it off* the self-adjoint equation. It is simply whatever function ends up multiplying lambda y: the w in (d/dx)(p y') + q y + lambda w y = 0. This is no coincidence, and the proof from two sections ago shows why: the term that survived all the cancellation was exactly the integral of (lambda_m - lambda_n) y_m y_n w, so the eigenfunctions come out orthogonal against w, never against the bare dx. The classics make the point vivid. For the Fourier sine basis w = 1; for Legendre on [-1, 1] also w = 1; but for Bessel w = x, and for the Hermite polynomials on the whole real line w = e^{-x^2} — without that exponential weight the Hermite integrals would simply diverge and there would be no orthogonality to speak of.

The practical sting is that the weight is the one thing students most often drop. To extract the coefficient of y_n in an expansion you compute (integral of f y_n w dx) divided by (integral of y_n^2 w dx) — and forgetting the w in either integral hands you wrong numbers and a series that refuses to rebuild f. One honest caveat: the weight must be strictly positive throughout the open interval, only ever allowed to touch zero at a singular endpoint. If w changed sign anywhere inside, the inner-product structure would collapse and the eigenvalues could go complex — the whole edifice rests on w > 0.

What you are guaranteed, and where it goes next

Assemble the three ingredients — self-adjoint form, the right boundary conditions, a positive weight — and a regular Sturm–Liouville problem hands you a remarkable package of guarantees, the continuous echo of everything a symmetric matrix gives. First, the eigenvalues are [[real-discrete-eigenvalues|real and discrete]]: they form an increasing sequence lambda_1 < lambda_2 < lambda_3 < ... marching off to plus infinity, never complex, never crowding together. Second, the eigenfunctions are mutually orthogonal against the weight, as we proved. Third — and this is the deepest, the part we have only asserted — the eigenfunctions are *complete*: any reasonable function on the interval can be expanded in them, with no piece left over.

Completeness is the licence to expand. It says the eigenfunctions y_n form a basis for functions just as the standard axes form a basis for ordinary vectors, so you may write f(x) = sum over n of c_n y_n(x). And because of orthogonality you can read each coefficient off in isolation: c_n = (integral of f y_n w dx) / (integral of y_n^2 w dx). That formula — multiply by one eigenfunction, integrate against the weight, divide by its norm — *is* the general recipe behind every Fourier coefficient you have ever computed. Bundle it all up and you have the [[generalized-fourier-series|generalized Fourier series]]: the single template that the ordinary Fourier, Legendre, and Bessel series are just three fillings of.

Be honest about the fine print, though. The cleanest theorems above are stated for a *regular* problem — finite interval, p and w strictly positive throughout, separated boundary conditions. The singular cases (Legendre, Bessel, the infinite line) still work beautifully but need extra care: there a continuous spectrum can appear instead of a tidy discrete list, and 'completeness' has to be stated with more attention to which functions and which kind of convergence. None of this undoes the picture; it just means the guarantees are theorems with hypotheses, not magic. This guide built the frame. The guides that follow in this rung fill it in — the orthogonality and completeness made rigorous, then the Legendre and Bessel problems worked end to end — and you will recognize every one of them as this same self-adjoint eigenvalue problem wearing a new coefficient p, a new weight w, and a new pair of ends.