Convolution, approximate identities, and a first look at distributions

Convolution: blur in space, multiply in frequency

The convolution (f * g)(x) = integral over ℝ of f(t) g(x − t) dt is a moving weighted average: slide g over f and accumulate overlap. Its magic is the convolution theorem — it turns the entangled operation of convolution into plain multiplication on the Fourier side.

Convolution theorem:   (f * g)-hat (xi) = f-hat(xi) * g-hat(xi).

Proof (f, g in L^1, so Fubini applies):
   (f * g)-hat(xi) = integral_x [ integral_t f(t) g(x - t) dt ] e^{-2pi i x xi} dx.

Swap order (Fubini -- absolute integrability of f(t) g(x-t) over R^2):
                = integral_t f(t) [ integral_x g(x - t) e^{-2pi i x xi} dx ] dt.

Inner integral: substitute u = x - t, so e^{-2pi i x xi} = e^{-2pi i t xi} e^{-2pi i u xi}:
   integral_x g(x-t) e^{-2pi i x xi} dx = e^{-2pi i t xi} * g-hat(xi).

Put it back:
                = [ integral_t f(t) e^{-2pi i t xi} dt ] * g-hat(xi) = f-hat(xi) * g-hat(xi).   QED

Readout:  convolving = blurring in x  <-->  pointwise multiplying spectra in xi.
Low-pass filtering, smoothing, and PDE solution operators are all 'multiply f-hat by something'.

Convolution becomes a product under the Fourier transform (Fubini + a shift).

Approximate identities: a delta with no value

Convolution has no honest identity element among functions: there is no g with f * g = f for all f. The substitute is an approximate identity {φ_ε}: non-negative, mass 1, concentrating at 0 as ε → 0. Then φ_ε * f → f. We have already met two — the Fejér kernel on the circle and the Gaussian on the line. Each smooths f a little, then less and less.

Convolving with a SMOOTH φ_ε produces a smooth function φ_ε * f — derivatives fall on φ_ε, which has as many as we like (mollification).
Since φ_ε * f → f, smooth functions are a dense subset of L¹ and L² — the technical backbone behind Riemann–Lebesgue and Plancherel.
As ε → 0 the family {φ_ε} wants to converge to a single object δ with the property δ * f = f exactly — but no function does this.

Distributions: redefine “function” by how it acts

The limit δ is not a function — no rule x ↦ δ(x) makes sense. The resolution is to stop asking what δ is at a point and ask only what it does to test functions. A distribution is a continuous linear functional on smooth, rapidly decaying test functions; δ is the one that reads off a value: ⟨δ, φ⟩ = φ(0). Ordinary functions embed by g ↦ (φ ↦ ∫ g φ), so distributions genuinely enlarge the function concept.