Building any signal out of impulses
Imagine you want to know how a concert hall reacts to an entire symphony. That sounds impossibly complicated — millions of overlapping notes. But in rung 2 you discovered a shortcut: clap your hands once (an impulse) and record the reverberation. That single recording, the impulse response h, captures everything the room does. The promise of this rung is breathtaking: if you know how the room answers *one* clap, you know how it answers *any* sound at all. The bridge between the two is convolution, and the first step is a trick for seeing every signal as nothing but a crowd of impulses.
Start in discrete time, where it's cleanest. Take any signal x[n] — a list of numbers, one per time step. Look at the value at, say, n = 4. You can isolate it as a single impulse δ[n−4] scaled by the height x[4]. Do that for every sample and add them all back up, and you've rebuilt the original signal exactly. Nothing was lost; you just rewrote x[n] as a sum of shifted, scaled impulses. That's the whole foundation, written in one line: x[n] = Σ x[k]·δ[n−k], summed over every k.
Decomposing x[n] = {2, 3, 1} (samples at n = 0, 1, 2)
x[n] 2·delta[n-0] 3·delta[n-1] 1·delta[n-2]
| 3 |
3| o = | + | 3 +
2|o | 2 | | 1
1| | o 2 | | | 1
0+--+--+-- ---+---- ----+---- -----+---
0 1 2 0 1 2
one signal = a sum of three shifted, scaled impulsesSuperposition gives you convolution
Now feed that decomposed signal into the LTI system and let the two magic properties do the work. Watch one impulse first. By definition, the system's answer to δ[n] is the impulse response h[n]. By time-invariance, its answer to a *shifted* impulse δ[n−k] is just the *same* response, shifted: h[n−k]. By homogeneity, scaling the input by x[k] scales the output by x[k] too: the term x[k]·δ[n−k] produces x[k]·h[n−k].
The input is the *sum* of all those impulse terms, so by superposition the output is the *sum* of all those responses. Add them up over every k and you have arrived — with no leap of faith, just L and TI applied honestly — at the convolution sum: y[n] = Σ x[k]·h[n−k]. We write it y = x ∗ h. This is not one formula among many; it is the complete input–output relationship of every LTI system. Hand me x and h, and y is fully determined.
x[n] ----> [ LTI system, fingerprint = h[n] ] ----> y[n]
the only thing this box can do is:
y[n] = SUM over k of x[k] * h[n-k]
= x[n] * h[n]
Continuous-time twin (the integral version):
y(t) = INTEGRAL x(tau) * h(t - tau) d tau
= x(t) * h(t)The flip-shift-multiply-sum recipe, worked
Look hard at the index in h[n−k]. As k runs forward, the argument n−k runs *backward* — the impulse response appears flipped in time and then shifted to line up with sample n. That gives the famous four-step mechanical recipe. Let's grind through a tiny example by hand so it stops being symbols: convolve x = {1, 2, 3} (at n = 0,1,2) with h = {1, 1} (a two-tap running adder).
- Flip. Reverse h in time to get h[−k] = {…, 1, 1} reading right-to-left. The flipped sequence is what you slide across x.
- Shift. To compute output sample y[n], slide the flipped h so its reference point lands at position n. Each output sample is one snapshot of this slide.
- Multiply. Wherever the flipped, shifted h overlaps x, multiply the overlapping values pair by pair.
- Sum. Add those products into the single number y[n]. Slide one step and repeat for the next output sample.
x = {1, 2, 3} h = {1, 1} y = x * h (length 3+2-1 = 4)
n=0: x: 1 2 3
flip h: 1 1 -> overlap (1*1) = 1 => y[0]=1
n=1: x: 1 2 3
flip h: 1 1 -> (1*1)+(2*1) = 3 => y[1]=3
n=2: x: 1 2 3
flip h: 1 1 -> (2*1)+(3*1) = 5 => y[2]=5
n=3: x: 1 2 3
flip h: 1 1 -> (3*1) = 3 => y[3]=3
y = {1, 3, 5, 3}
Quick check by overlap-add (no flipping):
1*{1,1} = {1,1}
2*{1,1} = {2,2}
3*{1,1} = {3,3}
----------------------
sum = {1,3,5,3} <- same answer, shift-scale-add viewA graphical continuous example
In continuous time the same recipe becomes a sliding-overlap-area picture, and one example is worth a hundred formulas: convolve a 1-second rectangular pulse with itself. Think of it physically — you flick a system on for one second (the input pulse x), and the system's own impulse response h happens to also be a one-second rectangle (it 'holds' for a second). The output y(t) is the running overlap area as you slide the flipped h past x.
x(t): rect, 1 wide, height 1 h(t): rect, 1 wide, height 1
Slide flipped h across x; y(t) = area of overlap:
t<0 no overlap yet y = 0
t=0 edges just touch y = 0
0<t<1 overlap GROWS linearly y = t (ramp up)
t=1 full overlap, area = 1 y = 1 (peak)
1<t<2 overlap SHRINKS linearly y = 2 - t (ramp down)
t=2 edges part y = 0
y(t)
1 | /\
| / \
| / \
| / \
0 |______/________\________ t
0 1 2
rect * rect = TRIANGLE (1 wide -> 2 wide, half as tall... no:
peak 1, base 2 — wider AND smoother)Stare at that triangle and you've grasped the deepest intuition convolution offers: convolution smooths and spreads. Sharp corners in the input get rounded; the output is wider than either input (base 2, just like M+N−1 in the discrete case). Replace x with a real audio waveform and h with a room's impulse response and that same overlapping-and-summing is literally how reverb is computed. Replace them with a row of image pixels and a small bump-shaped h and the very same operation is the blur in your photo editor.
From impulse to step, and a glimpse of frequency
Convolution instantly explains a relationship you may have met in rung 2: the step response is the convolution of the impulse response with a unit step. A unit step u[n] is just a string of 1s switched on at n = 0. Convolving h with that string of 1s means, at each output sample, summing every value of h up to that point — convolution with a step is a running accumulation of the impulse response. So the step response is the running sum (in continuous time, the running integral) of h. Differencing the step response gives h back. They are two views of the same fingerprint.
h[n] = {1, 1, 1} (a 3-tap averager, unscaled)
u[n] = {1, 1, 1, 1, ...} (unit step)
step response s[n] = h * u = running sum of h:
s[0] = 1
s[1] = 1+1 = 2
s[2] = 1+1+1 = 3
s[3] = 1+1+1 = 3 (h has run out; sum stays flat)
s[4] = 3 ... -> settles at the DC gain = sum(h) = 3
And to recover h: difference the step response
h[n] = s[n] - s[n-1] = {1, 1, 1} <- back to the fingerprintThere's one last gift, and it's the headline of the rungs to come. Convolution in time is honest but laborious — flip, shift, multiply, sum, for every output sample. The astonishing fact is that this messy time-domain grind becomes plain multiplication in the frequency domain. Feed a pure sinusoid into an LTI system and out comes the same-frequency sinusoid, merely rescaled and delayed; the table of those rescalings across all frequencies is the frequency response H, which is the Fourier transform of h. So y = x ∗ h in time is Y = X·H in frequency. That is why engineers obsess over the frequency response: it turns the system's single hardest operation into elementary-school arithmetic.