Windowing and Spectral Leakage: Cleaning Up the FFT

The disappointment of the first real FFT

After rung 4 you can compute an FFT and read its magnitude spectrum. So you run an experiment that *should* be the easiest possible case: a pure sine wave at, say, 1000 Hz, sampled cleanly, fed into a 1024-point transform. Textbook theory promises a single delta — one bin lit up, everything else dead silent. You hit run, and instead of a needle you get a lumpy hill: the peak is near 1000 Hz, sure, but there are visible skirts spilling 30, 40, 50 dB down into neighbouring bins on both sides. It looks like noise. It is not.

The clue is hiding in *which* frequency you chose. Pick a frequency that lands exactly on a bin centre and the spike comes back, clean as a whistle. Nudge it a fraction of a bin off-centre and the skirts explode back into view. The transform is not lying about your signal — it is honestly reporting on a signal you did not realise you created the moment you grabbed a finite chunk of samples.

Why a finite block has invisible edges

Here is the mental model that fixes everything. When you take N samples and run a DFT, the maths implicitly assumes those N samples are one period of a signal that repeats forever. The transform tiles your block end-to-end into an infinite loop. If your sine completes a whole number of cycles inside the block, the last sample flows smoothly into the first sample of the next copy — the loop is seamless and you get a clean line. This is called a bin-aligned or coherent frequency.

But a real-world tone almost never lands on a bin centre. Say your 1024-point block at 48 kHz spans about 21.3 ms, and the bin spacing is 48000/1024 ≈ 46.9 Hz. A 1000 Hz tone sits at bin 21.33 — *between* bins. Now the block contains 21.33 cycles. The last sample ends partway up the waveform while the next copy starts at zero, so the seam has a sudden jump, a vertical cliff repeated every block. The transform sees that cliff and faithfully decomposes it. A sharp discontinuity is rich in harmonics, so the energy of your single tone is sprayed across a wide swath of bins. That spray *is* the leakage.

Bin-aligned (5.0 cycles in block)      Off-bin (5.3 cycles in block)

  copy A  |  copy B                       copy A  |  copy B
 /\  /\  /| /\  /\  /\                    /\  /\  /|
/  \/  \/ |/  \/  \/  \                  /  \/  \/ |\
----------+----------                   ---------+ +--------
          ^                                      ^ ^
     seam is SMOOTH                        seam has a JUMP
     -> one clean bin                      -> energy leaks everywhere

Left: an integer number of cycles loops seamlessly. Right: a fractional cycle count leaves a cliff at every block boundary — and the FFT decomposes that cliff into broadband leakage.

The rectangular window and its ugly side-lobes

Whether you realise it or not, every plain FFT already applies a window: the rectangular (boxcar) window, which multiplies each in-block sample by 1 and everything outside by 0. The problem is what a box looks like in the frequency domain. Its transform is a sinc-shaped pattern with one tall main lobe flanked by a parade of side-lobes that decay slowly — the first side-lobe of a rectangular window is only about −13 dB below the peak, and the lobes fall off at a lazy 6 dB per octave. Those side-lobes are exactly the skirts you saw.

Why does −13 dB matter so much? Imagine two tones in your signal: a loud one at 1000 Hz and a quiet one at 1200 Hz that is 25 dB weaker. The loud tone's side-lobes, sitting at −13 dB and decaying slowly, are *taller than the entire quiet tone*. The quiet tone is buried under the leakage skirt of its loud neighbour and becomes invisible. In audio, RF, vibration and radar work this is a daily hazard: leakage from a strong carrier masks the weak signal you actually care about.

Taper the edges: Hann and Hamming windows

The cure follows directly from the diagnosis. The cliff at the seam came from the signal being chopped off abruptly. So instead of multiplying by a hard box, multiply your N samples by a smooth window function that fades to zero at both ends. Now the first and last samples are gently pulled to nothing, the seam in the repeating loop becomes continuous, the cliff vanishes — and with it, most of the leakage. This is windowing: a sample-by-sample multiply, dirt cheap, applied before the FFT.

Two raised-cosine windows dominate everyday practice. The Hann window (often misnamed 'Hanning') is a single cosine bump, w[n] = 0.5·(1 − cos(2πn/(N−1))), touching exactly zero at both ends. Its first side-lobe drops to about −31 dB and the lobes roll off fast at 18 dB/octave — superb for hunting weak tones. The Hamming window, w[n] = 0.54 − 0.46·cos(2πn/(N−1)), is almost the same shape but does not quite reach zero at the ends. That tiny pedestal cancels the nearest side-lobe to about −43 dB, the best first-side-lobe of the two, at the price of slower far-out roll-off.

# Apply a Hann window before the FFT (Python / NumPy)
import numpy as np

N      = 1024
sig    = np.sin(2*np.pi*1000*np.arange(N)/48000)   # 1 kHz, off-bin

win    = np.hanning(N)             # raised-cosine taper, 0 -> 1 -> 0
spec_r = np.abs(np.fft.rfft(sig))        # rectangular: ugly skirts
spec_w = np.abs(np.fft.rfft(sig*win))    # windowed: skirts collapse

# Coherent-gain fix: a Hann window throws away ~half the energy.
# Scale the magnitude back up so amplitudes read correctly:
spec_w *= 2.0 / np.sum(win)        # 1/mean(win) = 1/0.5 = 2.0

The whole technique is one extra line — multiply by the window — plus a gain correction so peak heights still mean something.

The trade-off you can never escape

Windowing is not free magic — it is a transaction. Tapering the edges suppresses side-lobes, but it also widens the main lobe. A rectangular window has a main lobe ~1 bin wide (measured to the first null it is 2 bins). A Hann window's main lobe is roughly *twice* as wide; a Hamming's is similar. So the very act of killing leakage blurs the peak, and two tones that the rectangular window could *just* resolve may merge into one fat bump under a Hann. You buy dynamic range by spending frequency resolution.

Window         Main-lobe width   Highest side-lobe   Side-lobe roll-off
-----------    ---------------   -----------------   ------------------
Rectangular    1.0  bins (best)       -13 dB (worst)        6 dB/oct
Hamming        ~1.4 bins             -43 dB                6 dB/oct
Hann           ~1.5 bins             -31 dB               18 dB/oct (best)
Blackman       ~1.7 bins (worst)     -58 dB                18 dB/oct

           narrow main lobe <----------------> low side-lobes
          (sharp resolution)                  (wide dynamic range)

Walk down the table and the main lobe widens while side-lobes drop. There is no window that wins both columns — that is the resolution-versus-leakage trade-off in one picture.

Just looking for tones in a clean signal? Start with Hann — a great all-rounder with fast-decaying side-lobes.
Hunting a weak tone hiding next to a strong one? Reach for a low-side-lobe window like Blackman or a Kaiser, and accept the wider main lobe.
Need to split two tones that are extremely close in frequency? Keep the main lobe narrow — stay rectangular (or near it) and live with the skirts.
Measuring the absolute power of a tone? Window first to stop leakage stealing energy into neighbours, then apply the window's coherent-gain correction.

Putting it together: read a spectrum like a pro

Step back and the whole chain makes sense. A continuous wave becomes a discrete-time signal by sampling; sampling already obliged you to respect Nyquist (rung 3). Then you grab a finite block — and that grab silently multiplies by a window. The FFT reports the spectrum of *signal × window*, which in the frequency domain is your true spectrum convolved with the window's response. Every peak you see is the window's shape, parked on top of a real tone. Once you internalise that, a 'messy' spectrum stops being a mystery and becomes a measurement you can read and trust.

This intuition is also the foundation of the next rung. Designing a windowed-sinc lowpass filter is *exactly the same operation* run in reverse: you take an ideal brick-wall response, inverse-transform it to an infinitely long impulse response, then window it to a finite length — and the side-lobes you fought here reappear as ripple in the filter's stopband. Master windows now and FIR filter design will feel like an old friend.