Sampling & the Discrete/Fast Fourier Transform

From a curve to a list of numbers

Everything so far in this rung has lived in the world of continuous functions. You met the [[fourier-transform-pair|Fourier transform]] as the limit of a Fourier series when the period stretches to infinity: an integral over all of x turns a signal f(t) into its spectrum F(omega), and a second integral turns it back. Beautiful — but a computer cannot evaluate an integral over all time, because it has never seen f(t) at every instant. It has seen a microphone voltage, or a sensor reading, measured at a tick of a clock: t = 0, then t = T, then t = 2T, and so on. The continuous curve has already been replaced by a finite list of numbers before any mathematics begins.

Call the spacing between ticks the sampling interval T_s, and its reciprocal f_s = 1/T_s the sampling rate — samples per second. A compact disc, for instance, samples at f_s = 44100 hertz: forty-four thousand numbers a second per channel. The central question of this whole guide is disarmingly simple to ask: when you keep only those samples and throw the continuous curve away, have you lost anything? Could two different signals produce the very same list of numbers? The honest, and at first surprising, answer is that under one clean condition you lose absolutely nothing — and when that condition fails, you lose something specific and irreversible.

Sampling is multiplication by a comb

To see what sampling does to the spectrum, model it honestly. Measuring f(t) only at the ticks is the same as multiplying f(t) by a row of infinitely thin spikes, one at each multiple of T_s — a Dirac comb, the sum over all integers n of delta(t - n T_s). Remember from earlier in this rung that the Fourier transform of a single delta is a flat constant, and that the transform of a periodic train of deltas is itself a periodic train of deltas — a comb in time transforms into a comb in frequency, with teeth spaced f_s apart. That one fact is the whole engine of the sampling theorem.

Now invoke the convolution theorem from two guides ago: multiplying two signals in time *convolves* their transforms in frequency. We multiplied f by a comb in time, so in frequency the spectrum F(omega) gets convolved with a frequency comb — and convolving anything with a comb simply stamps a copy of it at every tooth. The picture is irresistibly concrete: the original spectrum F is rubber-stamped over and over along the frequency axis, one copy centered at 0, the next at f_s, the next at 2 f_s, and so on in both directions. Sampling in time *periodicizes* the spectrum. That single image tells you everything about when sampling is safe.

The Nyquist rate, and the crime of aliasing

Suppose the original signal is band-limited: its spectrum F(omega) is exactly zero above some highest frequency B — no energy lives beyond there. Then each rubber-stamped copy of F is a little island of width 2B (it runs from -B to +B), and the copies sit spaced f_s apart. If f_s is large enough, the islands stand cleanly separated with gaps between them, and the original F is still sitting there untouched in the middle, perfectly recoverable: multiply by a box that keeps only the central island, transform back, and you have reconstructed f(t) exactly. No information was lost. This is the heart of the [[sampling-and-nyquist-criterion|sampling theorem]] of Nyquist and Shannon.

How large is large enough? The islands stop overlapping precisely when the spacing f_s is at least twice the highest frequency: f_s >= 2B. That threshold, 2B, is the Nyquist rate — the minimum sampling rate that preserves a signal whose top frequency is B. Equivalently, for a given sampling rate f_s, the highest frequency you can faithfully capture is the Nyquist frequency f_s/2. The compact disc's 44100 hertz was chosen exactly so its Nyquist frequency, 22050 hertz, sits just above the roughly 20000 hertz ceiling of human hearing. The factor of two is not a safety margin or an engineering fudge; it falls straight out of the geometry of those two stamped copies just touching.

What goes wrong below the threshold is aliasing, and it is genuinely irreversible. If f_s < 2B, the islands are too close and their skirts overlap; the high-frequency tail of one stamped copy spills into the band of the next, and at any frequency in the overlap the spectrum is now a sum of two contributions you can no longer separate. A high frequency that has wrapped around is wearing a disguise — it has become the *alias* of a low one. The textbook picture is the wagon wheel in old films that seems to spin backward, or a 7000-hertz tone sampled at 8000 hertz that comes out, indistinguishably, as a 1000-hertz tone. Once two frequencies have landed on the same samples, no later processing can tell them apart, because the samples themselves are identical.

The DFT: a Fourier transform you can actually compute

Suppose we have done it right and now hold N samples, x[0], x[1], ..., x[N-1], taken over one window of time. The continuous transform integrates over all t; we can only sum over the samples we have. So replace the integral by a finite sum and the continuous frequency omega by a discrete grid of N frequencies, and you get the [[discrete-and-fast-fourier-transform|discrete Fourier transform]], the DFT: X[k] = sum over n from 0 to N-1 of x[n] times e^{-i 2 pi k n / N}. This is the literal, computable cousin of the analysis integral, with the same minus sign in the exponent that, by orthogonality, sifts out one frequency component at a time — exactly the trick you saw with the complex exponential Fourier series, now over a finite grid instead of a period.

The DFT quietly assumes your N samples are one period of a periodic signal — it tiles your window end to end to fill all time. That assumption has teeth. If the true signal does not complete a whole number of cycles inside the window, the tiling creates a jump where the window's end meets its repeated beginning, and that artificial discontinuity smears each sharp spectral line into a spread of nearby bins — spectral leakage, the discrete echo of the very jumps that caused the Gibbs phenomenon back in the Fourier-series guides. The standard remedy is to taper the samples smoothly to zero at both ends with a window function (Hann, Hamming, and their kin) before transforming, trading a little blur for the suppression of that fake edge.

The fast Fourier transform: why it changed everything

Look hard at the DFT sum and count the work. Each of the N output values X[k] is a sum of N products, so a direct computation costs about N times N multiply-adds. For a one-second audio clip with N around 44000 that is roughly two billion operations for a single spectrum — and N^2 grows so brutally that an hour of audio would be hopeless. For decades this cost made the DFT a beautiful idea that was mostly too expensive to use at scale. What rescued it is not a different transform but a faster *way to compute the same numbers*: the [[discrete-and-fast-fourier-transform|fast Fourier transform]], the FFT.

The FFT's idea, made famous by Cooley and Tukey in 1965 (and, it turned out, glimpsed by Gauss around 1805), is divide and conquer. Split the N samples into the even-indexed ones and the odd-indexed ones. A short calculation shows the full DFT of length N can be assembled from the DFT of the N/2 evens plus the DFT of the N/2 odds, glued with a single twist factor e^{-i 2 pi k / N} per pair — a step called the butterfly. Each half is again split into halves, and so on, all the way down. Because you halve the problem at each of about log-base-2 of N levels, the total work collapses from N^2 to a number proportional to N log N.

Cost of transforming N samples:

   direct DFT   ~  N * N         operations
   FFT          ~  N * log2(N)   operations

   N = 1,024          DFT ~ 1,000,000     FFT ~ 10,000     ( ~100x )
   N = 1,048,576      DFT ~ 1.1e12        FFT ~ 21,000,000 ( ~50,000x )

Idea (radix-2, Cooley-Tukey):
   X[k]   =  E[k]  +  w^k * O[k]
   X[k+N/2] = E[k] - w^k * O[k]        with  w = e^{-i 2 pi / N}
   where E = DFT of even-indexed samples,
         O = DFT of odd-indexed samples   (each of length N/2)
   recurse until length 1.

The same X[k] values, two costs. Splitting evens from odds and recursing turns N^2 into N log N — a speedup that grows without bound as N grows, which is exactly why the FFT, not the bare DFT, runs underneath modern signal processing.

Do not let the word *fast* mislead you: the FFT computes exactly the same X[k] as the DFT, to the same precision — it is an algorithm, not an approximation. But the change in cost is what made the theory livable. The N log N FFT is what lets your phone decode music, draw a real-time spectrum, run noise cancellation, compress a photo, and demodulate a wireless signal, all while sipping battery. It also speeds up things that look unrelated: because multiplication in one domain is convolution in the other, an FFT, a cheap pointwise multiply, and an inverse FFT will convolve two long signals — or even multiply two enormous integers — far faster than the schoolbook method. When people call the FFT one of the most important algorithms ever written, this is what they mean.

Putting it together: from microphone to spectrum

Step back and the whole pipeline reads as one clean story, each piece a tool you have now met. A continuous signal is band-limited by an analog filter, sampled above its Nyquist rate so the spectral copies do not overlap, chopped into a window and tapered to tame leakage, and then run through an FFT that delivers — cheaply, exactly — the discrete spectrum the continuous Fourier transform was always pointing at. Reverse every arrow with the inverse FFT and a reconstruction filter and you walk back out to sound. The continuous theory of the earlier guides is the dream; this chain is how a real machine lives inside that dream.

Anti-alias. Pass the continuous signal through a low-pass filter that removes everything above the intended Nyquist frequency, so the sampling theorem's band-limited hypothesis actually holds.
Sample. Measure at rate f_s >= 2B (above the Nyquist rate), turning the curve into a list x[0..N-1] with no information lost.
Window. Multiply the N samples by a smooth taper (Hann, Hamming) to suppress the fake jump at the window's seam and so reduce spectral leakage.
Transform. Run the FFT to get X[0..N-1] in N log N time; bin k corresponds to frequency k times f_s / N, and only bins up to N/2 carry new information for a real signal.
Interpret (or invert). Read off magnitudes and phases as the spectrum — or process them and run the inverse FFT, then a reconstruction filter, to return to a continuous signal.

That is the bridge the whole rung was building toward. The continuous Fourier transform told you *what* frequency content means; the sampling theorem told you *when* a finite list of numbers preserves it; the DFT made the idea computable; and the FFT made it cheap enough to run a billion times a day. The same family — the convolution theorem, the comb, the Dirac delta as the limit of a spike — kept reappearing at every step, which is exactly why one analysis course can underwrite an entire engineering discipline. With this in hand, the next guide opens the door to the transform's wider relatives, each tuned to a geometry or a symmetry the plain Fourier transform was not built for.