Two impossible objects you cannot live without
Picture a light switch. Before time t = 0 the room is dark; the instant you flip it, the light is on, and stays on. As a graph that is a function which is 0 for t < 0 and 1 for t > 0, jumping straight up at the origin with no in-between. That is the Heaviside step function, H(t) — the mathematical idealization of an instantaneous switch. See Heaviside step function. It is honest about something real engineering does constantly: turn a voltage on, drop a load on a beam, open a valve — events that, on the timescale of the problem, are effectively instant.
Now ask the dangerous question: what is the derivative of that switch? Everywhere except the origin H is flat, so H'(t) = 0 for t not 0. But at t = 0 the graph leaps by a full unit over zero horizontal distance — the slope there is, in an honest sense, infinite. So H' is zero everywhere except one point, where it is infinitely tall, and yet — this is the magic — the total area under it must be exactly 1, because that is how much H climbed. An infinitely thin, infinitely tall spike of area 1: that is the Dirac delta, delta(t). See Dirac delta function. The switch and the spike are derivative and antiderivative of each other.
Making it rigorous: the limit of a narrowing bump
How do you tame an object that breaks the rules? You build it as a limit of perfectly ordinary functions. Take a tall thin rectangle: width epsilon, height 1/epsilon, centered at 0. Its area is width times height = 1, no matter how small epsilon is. Now let epsilon shrink toward 0. The rectangle grows taller and skinnier, always with area 1, squeezing onto the single point x = 0. The delta is the idealized end of that process — not the rectangle at any particular epsilon, but what every quantity it touches converges to as epsilon goes to 0.
The rectangle is the crudest choice; smoother ones work too and are often nicer. A tall narrow Gaussian bell of unit area, (1/(epsilon square root of pi)) e^{-x^2/epsilon^2}, narrows to delta as epsilon goes to 0 — a familiar shape, connecting straight back to the Gaussian integral whose total area is the pi that keeps it normalized. The point is liberating: there is no single 'correct' bump. Any family of unit-area pulses that concentrates onto the origin gives the same delta, because the only thing about them that survives the limit is what they do to a smooth function — and that turns out to be one clean rule.
The one rule that defines it: the sifting property
Everything the delta does is contained in one equation: the integral over the whole line of f(x) times delta(x - a) dx equals f(a). This is the sifting property, and it is the real definition — the delta is precisely the thing that, multiplied against any smooth f and integrated, hands you back the single value of f at one point. Picture it with the narrowing bump: delta(x - a) is zero everywhere except a tiny window around x = a. Inside that window f is essentially constant at f(a), so it slides out of the integral, leaving f(a) times (the area of the bump) = f(a) times 1 = f(a). The spike samples f at exactly one place and throws everything else away.
DEFINING PROPERTY (the sifting / sampling rule):
integral_{-inf}^{+inf} f(x) delta(x - a) dx = f(a)
Special case f = 1 (total area is one):
integral_{-inf}^{+inf} delta(x) dx = 1
LINK TO THE STEP (Fundamental Theorem of Calculus):
integral_{-inf}^{x} delta(s) ds = H(x) so H'(x) = delta(x)
USEFUL ALGEBRA OF THE DELTA:
delta(-x) = delta(x) (even)
delta(c x) = (1/|c|) delta(x) (rescaling shrinks the area)
x delta(x) = 0 (the spike sits where x = 0)
delta(g(x)) = sum over roots x_k of delta(x - x_k) / |g'(x_k)|The link between the two objects is now exactly the Fundamental Theorem of Calculus, read generously. Integrating delta from minus infinity up to x accumulates its single unit of area the moment you pass the origin, so the running total is 0 before 0 and 1 after — which is precisely H(x). Run the theorem the other way and H'(x) = delta(x). So the switch is the integral of the spike, and the spike is the derivative of the switch. Volume I taught you this pairing for smooth functions; here it survives, intact, for objects that were not even differentiable in the old sense — that is the whole power of the generalized-function viewpoint.
Why engineers reach for it: impulses, point sources, sampling
The first job is the impulse. Hit a mass with a hammer: a large force acting over a tiny time, whose product — force times time, the change in momentum — is what actually matters. Idealize the hammer blow as F(t) = J times delta(t), where J is the total impulse. Feed that into a system and you read off its impulse response: the entire way the system rings, decays, or settles after a single sharp kick. See impulse response. This is not a toy. Knowing the impulse response of a linear system tells you its response to ANY input, by superposing infinitely many scaled, shifted impulses — which is what a convolution is.
The second job is the point source. A point charge, a point mass, a single concentrated load on a beam — physically the matter sits at one location with finite total amount but zero extent. A delta in space, delta(x - a) (or its multidimensional version), is the only honest way to write 'all of this quantity, packed at the point a.' Solve a differential equation with a delta source and the answer you get is a Green's function — the response of the medium to a single point poke, the spatial cousin of the impulse response, and the building block from which the response to a smeared-out source is assembled by integration.
The third job is sampling, and it is just the sifting property wearing work clothes. A row of deltas spaced T apart — a 'Dirac comb' — multiplied against a continuous signal plucks out the signal's value at each tick and ignores the rest. That single picture is the mathematical heart of converting an analog signal into the stream of numbers a computer stores, and the reason there is a sharpest spacing T below which no detail is lost. So the same idealized spike that models a hammer blow also models the click of an analog-to-digital converter — the abstraction pays for itself three times over.
The delta in the transform machinery
Where the delta truly earns its keep is inside integral transforms, where the messy switch-and-spike calculus turns into clean algebra. Take the Laplace transform: applying it to delta(t) just samples the kernel e^{-st} at t = 0 by the sifting property, giving the Laplace transform of delta equal to 1 — the simplest possible answer. A delayed kick delta(t - a) transforms to e^{-as}. So when you solve a differential equation driven by a sudden impulse, the delta becomes the constant 1 in transform space, you do ordinary algebra, and invert. The impulse response is literally the inverse transform of the system's transfer function.
- Model the sudden event as a delta. A hammer blow at t = a, a point load at x = a, an idealized current pulse — write it as a (possibly shifted, possibly scaled) delta term in your equation.
- Transform, and let the sifting property collapse it. The delta becomes a clean factor — 1, or e^{-as}, or e^{-i omega a} — and your differential equation becomes an algebraic one in the transform variable.
- Solve the algebra, then invert. The inverse transform carries you back to the time or space response — for a delta input, that response is exactly the impulse response or Green's function you wanted.
The Fourier side tells the deepest story. The Fourier transform of the Dirac delta is a flat constant — the delta contains every frequency in equal measure. That is the precise sense in which a single sharp spike is 'made of all the waves at once,' and it is why one clean impulse is the perfect probe: feed a system a delta, and you have simultaneously tested it at every frequency. Run the duality the other way and a constant in time transforms to a delta in frequency, encoding the obvious fact that a pure unchanging signal lives at exactly one frequency, zero. The step function joins in too: differentiating a jump in any signal plants a delta in its derivative's spectrum, which is exactly why the Gibbs overshoot never quite goes away near a jump.
Handling them safely: what is and isn't allowed
Because the delta is a distribution, not a function, some familiar operations are fine and others are forbidden — and confusing the two is where people go wrong. You may multiply delta by a smooth function (g(x) delta(x - a) = g(a) delta(x - a), it just samples g), integrate it, shift it, scale it, and differentiate it as often as you like — delta'(x), the 'dipole,' makes perfect sense and obeys the integral of f times delta' equals minus f'(a), found by integration by parts. What you may NOT do is multiply two deltas together: delta(x) times delta(x), or delta(x)^2, has no meaning, because two unit-area spikes at the same point give an area you cannot define.
There is also a quiet subtlety in the step itself: what is the value of H(0)? The answer is that it does not matter. Different texts set H(0) to 0, 1, or 1/2, and every integral involving H comes out the same regardless, because changing a function at one isolated point changes no integral. (The symmetric choice 1/2 is the natural one, since it is the average of the two sides — and it is exactly the value a Fourier series converges to at a jump, where the partial sums settle on the midpoint of the gap.) The lesson generalizes: distributions are defined by what they do under integration, so anything that survives integration is real, and anything that does not is a harmless choice of convention.
Step back and see what you have gained. By widening the idea of 'function' to include these idealized objects, calculus from Volume I — the derivative, the integral, the Fundamental Theorem — keeps working on switches, kicks, and point sources that the old rules choked on. You traded a little rigor up front (you must remember they only mean something inside an integral) for an enormous gain in reach: the step and the spike let you write down, cleanly and exactly, the sudden and the concentrated events that fill real physics and engineering. They are not really functions, and that is precisely the point.