Coding Theory 1950

Error Detecting and Error Correcting Codes

Richard W. Hamming

A few well-placed extra bits let a message catch its own error — and fix it.

Choose your version

In depth · the introduction

How can a stream of 0s and 1s notice that one of them got flipped — and quietly put it right, with nobody watching?

The idea, unpacked

Computers and phone lines speak in bits — 0s and 1s — and now and then a bit flips: a 0 becomes a 1. Send the bits bare and the receiver has no way to know. The classic fix is a single parity bit, one extra bit that makes the number of 1s come out even; a single flip then makes it odd and rings an alarm. But an alarm only says that something broke — not which bit, and it can't fix it.

Hamming's leap was to use not one extra bit but a few, arranged so that each one watches a different, overlapping group of positions. When a bit flips, the particular combination of alarms that go off spells out, in binary, the exact position of the broken bit. Knowing where it is, the receiver simply flips it back. The message heals itself — no resend, no halt.

Where it came from

Richard Hamming worked at Bell Labs, down the hall from Claude Shannon. The lab's mechanical relay computers ran his jobs over the weekend, unattended — and if a single part misread a bit, the whole run aborted, so he'd arrive on Monday to nothing. If the machine can tell that an error happened, he reasoned, why can't it work out where the error is and correct it? Out of that weekend frustration, around 1947, came the first error-correcting codes; patent reviews held the paper back until 1950.

Why it mattered

In 1948 Shannon had just proved that error-free communication over a noisy channel was possible in principle — but his proof didn't say how to build it. Hamming delivered the how: a concrete, runnable code that needed nothing fancier than counting odds and evens. That turned reliable digital storage and communication from a theorem into a working part, and every error-correcting system since traces back to this step.

A way to picture it

Think of spelling a word over a bad phone line with the NATO alphabet — 'Alpha, Bravo, Charlie.' Even if a syllable gets garbled, the code words are so unlike one another that the listener can still tell which was meant. Hamming codes do the same with bit-patterns: the valid codewords are kept so far apart that flipping any one bit still leaves the result closest to the single true message, so the receiver can snap it back.

Where it sits

This is the constructive companion to Shannon's information theory, also in this Library. Shannon (1948) measured how much reliable communication a channel can carry; Hamming (1950) built one of the first codes to actually carry it. The line runs onward to Reed–Solomon codes and to the LDPC and polar codes that now press right up against Shannon's limit inside your phone.

The original document

Original source text

R. W. Hamming · The Bell System Technical Journal 29(2): 147–160 · April 1950

Introduction

The author was led to the study given in this paper from a consideration of large scale computing machines in which a large number of operations must be performed without a single error in the end result.

The paper opens on this practical worry: a long machine computation is worthless if a single bit goes wrong, and a plain parity check can only sound an alarm and stop. Hamming sets out to do better — to build codes that not only detect an error but locate and correct it automatically, so the machine can keep running.

A geometric model

He recasts the problem geometrically. Each possible message of n binary digits is a point; the 'distance' between two such points is the number of positions in which they differ. An error displaces a point a small distance from the one that was sent. Choosing a code therefore means scattering the legal codewords so that the distance between any two of them is large enough that a corrupted word still sits unambiguously nearest to the one intended.

Single-error detecting and correcting codes

A minimum distance of 2 lets you detect a single error (a single flip can never reach another codeword) but not place it; a minimum distance of 3 lets you correct any single error. Hamming derives how much redundancy this costs — for a single-error-correcting code the number of check digits must satisfy 2^k ≥ n + 1 — and notes that the redundancy grows only slowly as the block lengthens.

[ … ]

A worked example: the (7,4) code

His showcase carries four message digits in a block of seven, with three check digits placed at the power-of-two positions 1, 2 and 4. Each check digit enforces parity over a fixed group of positions; on reception, the pattern of failed checks, read as a binary number, is exactly the position of the wrong digit, which is then corrected. The paper closes by adding one further check digit to also detect double errors — the construction now built into computer memory.

Bell Telephone Laboratories · 1950