The delta cannot possibly be a function
By now you have used the [[dirac-delta-function|Dirac delta]] delta(x) freely — as the perfect impulse in a Laplace transform, as the single sharp poke whose response is the Green's function this whole rung is built on. The usual cartoon is: delta(x) is zero everywhere except at x = 0, where it is infinite, and yet its total area is exactly one, so that the integral from minus infinity to infinity of delta(x) dx equals 1. Stop and stare at that cartoon honestly, because it is impossible. No genuine function can do this. A function that is zero at every point except one isolated point has, by any sensible notion of area — the definite integral you know — an integral of exactly zero. Changing a function's value at a single point cannot move its integral at all.
And "infinite" is not a number you are allowed to assign as a function value anyway; the moment you write delta(0) = infinity you have left the world of functions, which take ordinary real values at every point. So delta is not a badly-behaved function — it is not a function. For a while physicists used it as if it were, and it worked beautifully, which is suspicious and wonderful at the same time. The job of this guide is to find the honest object that delta really is, so that everything you have been doing with it becomes provably correct rather than merely lucky.
Stop asking what it equals; ask what it does
Here is the conceptual pivot that fixes everything, and it is worth slowing down for. A classical function is defined by its values: hand it a point x, it hands you back a number f(x). The delta has no sensible values, so that road is closed. But notice how we ALWAYS use the delta in practice — never alone, always inside an integral against some smooth, well-behaved function f, where its one job is to reach in and pluck out a value: integral of delta(x) f(x) dx = f(0). The delta's entire identity lives in that action. So we make a radical move: we redefine the delta not by its values but by what it DOES to the functions it meets. We declare, once and for all, that delta is the rule "feed me a smooth function f, and I return the number f(0)." That rule is perfectly well defined, perfectly finite, and involves no infinities anywhere.
An object that is defined by how it acts on functions — eating a function and returning a number — is called a [[generalized-function|generalized function]], or a distribution. The smooth functions we are allowed to feed it are called test functions: think of them as polite, obedient probes, taken to be infinitely differentiable and to vanish outside some bounded region so that no boundary terms can ever cause trouble. A distribution is then nothing more or less than a (linear, continuous) rule that assigns a number to each test function. The delta is the simplest interesting one: its rule is "return the value at the origin." Every ordinary, reasonable function g also defines such a rule — feed it f and it returns the honest integral of g(x) f(x) dx — so the old functions all still live inside the new world. The new world is strictly bigger, and the delta is one of the newcomers it makes room for.
How to differentiate something that has no graph
Here is where the new world pays off spectacularly. Once a distribution is defined by how it acts on test functions, we get to DEFINE its derivative by deciding how the derivative should act — and there is exactly one honest way to do it. Take an ordinary differentiable function g and integrate its derivative g' against a test function f. Because f vanishes far away, integration by parts has no boundary term and gives integral of g'(x) f(x) dx = minus integral of g(x) f'(x) dx. The derivative jumped off g and landed on f, with a minus sign. Now read that equation as a definition: for ANY distribution T, its derivative T' is the rule "T' acting on f equals minus T acting on f'." Since the test function f is infinitely smooth, f' always exists, so EVERY distribution is differentiable — infinitely many times. Nothing can fail to have a derivative anymore.
Test this on the most famous staircase in mathematics, the [[heaviside-step-function|Heaviside step]] H(x), which is 0 for x < 0 and 1 for x > 0. Classically it has a jump at the origin and no derivative there at all. As a distribution, its derivative H' is the rule "H' on f = minus H on f' = minus integral from 0 to infinity of f'(x) dx." By the fundamental theorem of calculus that last integral is f(infinity) minus f(0), and f vanishes far away so f(infinity) = 0, leaving exactly +f(0). But "return f(0)" is precisely the delta's own rule. So H' = delta, cleanly and provably. The slope of a step, which classical calculus refuses to define, is the delta — and now that statement is a theorem, not a hand-wave. This single identity is the engine behind every jump you will meet.
DERIVATIVE OF A DISTRIBUTION (move d/dx onto the test function, flip the sign)
< T' , f > := - < T , f' > for every smooth test function f
(no boundary term: f vanishes far away)
WORKED: the step function H (H = 0 for x<0, H = 1 for x>0)
< H' , f > = - < H , f' >
= - integral over all x of H(x) f'(x) dx
= - integral from 0 to infinity of f'(x) dx
= - [ f(infinity) - f(0) ]
= - [ 0 - f(0) ]
= + f(0)
= < delta , f >
==> H' = delta (and one more derivative: < delta' , f > = - f'(0) )The delta's family: derivatives, shifts, and scalings
Now that differentiation costs nothing, the delta sprouts a whole family. Its first derivative delta', the doublet, acts by the rule we just derived: delta' on f equals minus f'(0). Picture it physically as a pair of opposite spikes infinitesimally close together — a tiny push immediately followed by an equal tiny pull, which is exactly how you model an idealized point couple or a dipole. The second derivative delta'' returns f''(0), and so on up the ladder; each new derivative probes one more order of the test function's behaviour at the origin. These are not curiosities — delta' and delta'' show up the instant you take derivatives of a Green's function that has a kink, which is most of them.
Two more manipulations let you handle the delta wherever it lands. A SHIFTED delta, delta(x minus a), simply samples the test function at the point a instead of the origin: integral of delta(x minus a) f(x) dx = f(a). That is precisely the point source sitting at location a — the impulse you place wherever you want the system poked, the heart of the whole Green's-function construction. And a SCALED delta obeys delta(c x) = delta(x) / |c|: squeezing the horizontal axis by a factor c makes the spike c times taller in the only sense that matters, the integral, because the area must stay one. The absolute value matters — flipping the axis with c negative does not introduce a sign, because area is unsigned. These rules are not separate facts to memorize; each is forced on you by demanding that the action-on-test-functions definition stay consistent.
Why this is exactly the language Green's functions speak
Everything in this rung quietly relies on what we just built. A Green's function G(x, a) is DEFINED as the response to a unit point source, the solution of L G = delta(x minus a) where L is your differential operator. That equation has no meaning until delta is a legitimate object you can put on the right-hand side of a differential equation — which is exactly what a distribution is. The famous jump condition you use to build G by hand — the solution is continuous but its derivative jumps by a definite amount across the source point — is nothing but the identity H' = delta read in reverse: a delta on the right-hand side forces a unit jump in the highest derivative, because that is the one and only way to manufacture a delta by differentiating. The book-keeping you learned as a recipe is, underneath, distribution theory.
The same idea promotes the fundamental solution of a partial differential equation to a rigorous object — the free-space Green's function for the Laplacian, for instance, satisfies nabla-squared G = delta in the distribution sense, even though G itself blows up at the source and is certainly not twice differentiable there in the classical way. Classically that equation is gibberish at the singular point; as a statement about how both sides act on test functions, it is exact. This is the deeper reason the whole method works: a distribution can solve a differential equation it could never satisfy pointwise, because the derivatives are interpreted in the weak, test-function sense. The delta is what lets a single sharp poke be a legitimate input, and the Green's function is the legitimate output.
- When you meet a delta, never read it as a function value; read it as the rule "feed me a smooth test function f and I return f(0)" (or f(a) for a delta shifted to a).
- To differentiate any distribution, move the derivative onto the test function and flip the sign: T' on f equals minus T on f'. This always works because test functions are infinitely smooth.
- Use the master identity H' = delta to convert jumps into deltas and back — this is exactly the jump condition you impose when constructing a Green's function by hand.
- Read L G = delta(x minus a) as a statement about actions on test functions, not pointwise values; that is what makes the fundamental solution and the free-space Green's function rigorous even where G itself is singular.