Convolution: blur in space, multiply in frequency
The convolution (f * g)(x) = integral over ℝ of f(t) g(x − t) dt is a moving weighted average: slide g over f and accumulate overlap. Its magic is the convolution theorem — it turns the entangled operation of convolution into plain multiplication on the Fourier side.
Convolution theorem: (f * g)-hat (xi) = f-hat(xi) * g-hat(xi).
Proof (f, g in L^1, so Fubini applies):
(f * g)-hat(xi) = integral_x [ integral_t f(t) g(x - t) dt ] e^{-2pi i x xi} dx.
Swap order (Fubini -- absolute integrability of f(t) g(x-t) over R^2):
= integral_t f(t) [ integral_x g(x - t) e^{-2pi i x xi} dx ] dt.
Inner integral: substitute u = x - t, so e^{-2pi i x xi} = e^{-2pi i t xi} e^{-2pi i u xi}:
integral_x g(x-t) e^{-2pi i x xi} dx = e^{-2pi i t xi} * g-hat(xi).
Put it back:
= [ integral_t f(t) e^{-2pi i t xi} dt ] * g-hat(xi) = f-hat(xi) * g-hat(xi). QED
Readout: convolving = blurring in x <--> pointwise multiplying spectra in xi.
Low-pass filtering, smoothing, and PDE solution operators are all 'multiply f-hat by something'.Approximate identities: a delta with no value
Convolution has no honest identity element among functions: there is no g with f * g = f for all f. The substitute is an approximate identity {φ_ε}: non-negative, mass 1, concentrating at 0 as ε → 0. Then φ_ε * f → f. We have already met two — the Fejér kernel on the circle and the Gaussian on the line. Each smooths f a little, then less and less.
- Convolving with a SMOOTH φ_ε produces a smooth function φ_ε * f — derivatives fall on φ_ε, which has as many as we like (mollification).
- Since φ_ε * f → f, smooth functions are a dense subset of L¹ and L² — the technical backbone behind Riemann–Lebesgue and Plancherel.
- As ε → 0 the family {φ_ε} wants to converge to a single object δ with the property δ * f = f exactly — but no function does this.
Distributions: redefine “function” by how it acts
The limit δ is not a function — no rule x ↦ δ(x) makes sense. The resolution is to stop asking what δ is at a point and ask only what it does to test functions. A distribution is a continuous linear functional on smooth, rapidly decaying test functions; δ is the one that reads off a value: ⟨δ, φ⟩ = φ(0). Ordinary functions embed by g ↦ (φ ↦ ∫ g φ), so distributions genuinely enlarge the function concept.