编码理论 1950

检错码与纠错码

理查德·汉明

几个安插得当的冗余位，让一段信息自己发现错误——并把它改对。

Choose your version

In depth · the introduction

一串 0 和 1，怎么能察觉自己当中有一位被翻了——还在没人盯着的时候，悄悄把它改回来？

把这个想法拆开看

计算机和电话线都说着「比特」——0 和 1——而每隔一阵，就有一位会翻转：一个 0 变成了 1。光秃秃地把比特发出去，接收方无从知晓。经典的补救，是一个奇偶校验位：多加一位，让 1 的个数凑成偶数；这样单个翻转会让它变奇，从而拉响警报。可警报只说「出事了」——不说是哪一位，也没法把它修好。

汉明的飞跃，是不止多加一位，而是加上几位，并安排得使每一位各自看守一组「彼此重叠」的位置。当某一位翻转时，那几个被触发的警报，其特定的组合，会用二进制「拼」出出错位的确切位置。知道了位置，接收方只需把它翻回去。信息便自愈了——不必重发，也不必停机。

它从哪里来

理查德·汉明在贝尔实验室工作，与克劳德·香农同处一隅。实验室那些机械式的继电器计算机，会在周末无人看管时跑他的作业——而只要有一个零件读错一位，整批运算便告中止，他星期一回来，便一无所获。他想：机器既然能看出「发生了错误」，为什么不能算出「错误在哪里」并纠正它？正是那份周末的懊恼，催生了大约 1947 年的第一批纠错码；论文因专利审查，被压到了 1950 年。

它为何重要

1948 年，香农刚刚证明：在有噪声的信道上，无差错通信在原理上是可能的——但他的证明没说该怎么造出来。汉明给出了「怎么造」：一套具体、能跑的码，所需的不过是数一数奇偶。这把可靠的数字存储与通信，从一条定理，变成了一个能用的零件；此后每一套纠错系统，都能追溯回这一步。

一个可以这样想的画面

想象在一条很差的电话线上，用「北约字母表」拼一个词——「Alpha、Bravo、Charlie」。哪怕有一个音节被搅糊了，这些代号彼此差得太远，听者仍能听出原本是哪一个。汉明码对比特模式做的是同一件事：合法码字被刻意拉得彼此「很远」，于是翻转任意一位，结果仍然离那唯一正确的信息最近——接收方便能把它啪地拨回去。

它在何处落座

这是香农信息论的「构造版」搭档，本馆中亦有收录。香农（1948）量出了一条信道能承载多少可靠通信；汉明（1950）则造出了第一批真正去承载它的码。这条线，一路延伸到里德—所罗门码，再到如今在你手机里紧贴着香农极限的 LDPC 码与极化码。

The original document

Original source text

R. W. Hamming · The Bell System Technical Journal 29(2): 147–160 · April 1950

Introduction

The author was led to the study given in this paper from a consideration of large scale computing machines in which a large number of operations must be performed without a single error in the end result.

The paper opens on this practical worry: a long machine computation is worthless if a single bit goes wrong, and a plain parity check can only sound an alarm and stop. Hamming sets out to do better — to build codes that not only detect an error but locate and correct it automatically, so the machine can keep running.

A geometric model

He recasts the problem geometrically. Each possible message of n binary digits is a point; the 'distance' between two such points is the number of positions in which they differ. An error displaces a point a small distance from the one that was sent. Choosing a code therefore means scattering the legal codewords so that the distance between any two of them is large enough that a corrupted word still sits unambiguously nearest to the one intended.

Single-error detecting and correcting codes

A minimum distance of 2 lets you detect a single error (a single flip can never reach another codeword) but not place it; a minimum distance of 3 lets you correct any single error. Hamming derives how much redundancy this costs — for a single-error-correcting code the number of check digits must satisfy 2^k ≥ n + 1 — and notes that the redundancy grows only slowly as the block lengthens.

[ … ]

A worked example: the (7,4) code

His showcase carries four message digits in a block of seven, with three check digits placed at the power-of-two positions 1, 2 and 4. Each check digit enforces parity over a fixed group of positions; on reception, the pattern of failed checks, read as a binary number, is exactly the position of the wrong digit, which is then corrected. The paper closes by adding one further check digit to also detect double errors — the construction now built into computer memory.

Bell Telephone Laboratories · 1950