计算机科学 1965

一种机器计算复数傅里叶级数的算法

詹姆斯·库利与约翰·图基

把本要重复的求和只算一次——傅里叶变换便从 N² 降到 N log N。

Choose your version

In depth · the introduction

正是这个技巧，把一项慢得不可能的计算，变成你的手机一秒钟做上千次的小事——并悄悄地，开启了声音、图像与无线的数字时代。

把这个想法拆开看

傅里叶变换，会拿过一个信号——一段声波、一束无线电脉冲——弄清楚它由哪些纯粹的频率构成。精确做到这一点的「配方」已有几百年，却慢得叫人受不了：要分析 N 个采样，你得做大约 N × N 次乘法。数据翻一倍，工作量就翻四倍。真实信号里的 N 很大，而在 1960 年代初的计算机上，这几乎是没指望的。

库利与图基找到了一条捷径。这项计算里满是重复——同样的小乘积，被一算再算。把信号对半分，再对半分，并复用其中共享的部分，他们就把代价从 N² 降到了大约 N × log N。对一千个采样而言，这是「一百万步」与「几千步」之差。

它从哪里来

1965 年，IBM 研究院的程序员詹姆斯·库利，与普林斯顿的统计学家约翰·图基，在《Mathematics of Computation》上发表了一篇短文。据库利本人后来的回忆，这个想法是在一次美国政府科学顾问委员会的会上，由图基随手勾勒出来的：当时的一个紧要问题，是从微弱的地震仪记录里探测远方的核试验——这意味着要快速地对很长的记录做傅里叶变换。库利写下了程序，论文随之而来。

时机就是一切。同样的数学技巧此前早被发现过——高斯在约 1805 年就有了——却一再被遗忘，因为在快速计算机出现以前，没人迫切地需要它。库利与图基的发表，恰好在机器与问题都已大到使它不可或缺的那一刻——这一次，它留下来了。

它为何重要

几乎任何与声音、图像或无线电有关的数字事物，都在某处倚赖傅里叶变换——而 FFT，正是让它快到「仿佛免费」的那一环。你的音乐能流媒体播放、照片能压缩、手机能和基站对话、医院的扫描仪能把微弱的无线电回声变成一幅图像，背后都有它的份。一套只是把算术重新排布的算法，最终成了现代技术里一面不声不响的承重墙。

一个精确的类比

想象给一场庞大的循环赛计分，每名选手都和其余每名选手交过手。笨办法是把同样的小计，一遍遍重新加。聪明的记分员会把选手分组，每组的小计只算一次，再在所有需要它的地方反复取用。FFT 就是这位记分员：它看出暴力法反复重算的那些和，每一个只算一次，再用区区几轮——log N 轮，而非 N 轮——把它们合起来。

它在故事里的位置

这背后的连续数学，要追溯到约瑟夫·傅里叶——他在 1822 年论证：任何信号都能由简单的波叠加而成（见本馆的《热的解析理论》）。而把这个想法变成计算机能存下的采样，则倚赖香农 1948 年信息论核心处的采样定理。FFT 是那座让这一切落地的桥——六十年过去，它依旧是地球上被运行得最多的算法之一。

The original document

Original source text

J. W. Cooley & J. W. Tukey · Mathematics of Computation 19 (1965), pp. 297–301

Introduction

An efficient method for the calculation of the interactions of a 2^m factorial experiment was introduced by Yates and is widely known by his name.

The paper opens not with signals but with statistics: the same combinatorial bookkeeping that speeds up factorial experiments, generalized by I. J. Good, is what speeds up the Fourier transform, and the authors place their work squarely in that lineage.

In their full generality, Good's methods are applicable to certain problems in which one must multiply an N-vector by an N × N matrix which can be factored into m sparse matrices, where m is proportional to log N.

That sentence is the whole idea: the dense N × N transform matrix is rewritten as a product of about log N sparse matrices, so the matrix–vector multiply costs on the order of N log N operations rather than N².

The algorithm

The body derives the factorization for a composite length N = r₁·r₂·…, re-indexing inputs and outputs in mixed radix so the long sum becomes nested shorter sums tied together by twiddle factors W = e^(−2πi/N). Applied to a power of two, N = 2^m, this is the familiar radix-2 transform of m stages.

[ … ]

Numerical examples

Short numerical sections report measured running times on a contemporary IBM computer, confirming that the operation count grows like N log N in practice and that transforms previously too large to attempt finish in a fraction of a second. The complete five-page paper — derivation, the program's logic, and the timings — is available at the source below.

IBM Research · Mathematics of Computation, April 1965