Information / CS 1952

A Method for the Construction of Minimum-Redundancy Codes

David A. Huffman

Give common symbols short codes, rare ones long — and you reach the shortest code of all.

Choose your version

In depth · the introduction

What if you could squeeze a message smaller for free, just by giving your most common letters the shortest codes — the way Morse gave “E” a single dot?

The big idea

A computer stores everything as bits — 0s and 1s. The lazy way is to give every letter the same number of bits, say eight for each. But letters are not equally common: in English, “e” and space appear constantly while “q” and “z” are rare. So why spend the same room on each?

Huffman's method hands out short codes to frequent symbols and long codes to rare ones, in just the right proportion to make the average message as short as it can possibly be. And there is a beautiful guarantee: no other code that can be read back unambiguously can do better. It is not a clever trick that usually works — it is provably the best.

How it came about

In 1951 David Huffman was a graduate student at MIT, taking an information-theory course from Robert Fano — a colleague of Claude Shannon, who three years earlier had founded the whole field. Fano offered his students a choice: sit the final exam, or write a term paper solving an open problem. The problem was exactly this one: find the most efficient such code. Fano and Shannon had each found good methods, but nobody had proved one was best.

Huffman struggled for weeks and had nearly given up — about to throw his notes away — when the key idea arrived: build the code from the bottom up, starting by merging the two rarest symbols, instead of splitting from the top down as his teachers had. That inversion was the whole solution. The student had solved what his famous professor could not.

Why it mattered

Huffman coding became one of the most widely used algorithms ever written. Every time you open a ZIP file, view a JPEG photo, or play an MP3, a Huffman tree is quietly at work, packing symbols into the fewest bits. Because it is simple, fast, and provably optimal among codes of its kind, it became the default building block of compression — and it still is, seventy years on.

A way to picture it

Think of packing a suitcase. The things you use every day — phone, toothbrush — you want right on top, easy and quick to reach; the things you rarely touch can go to the bottom, harder to get at. Huffman does the same with symbols: the ones you “reach for” most often get the shortest codes, the rarities sink to the deepest, longest paths. Arrange it just so, and the whole suitcase packs as tightly as it possibly can.

Where it sits

Huffman stands on Claude Shannon, whose 1948 theory (also in this Library) defined entropy — the true floor on how few bits a message needs — and proved that floor can be approached. Shannon and Fano found codes that came close; Huffman found the one that reaches the best a symbol-by-symbol code can. After him came arithmetic coding and, more recently, ANS, which press still closer to Shannon's limit. From the telegraph through ZIP to streaming video, this is the lineage of squeezing more meaning into fewer bits.

The original document

Original source text

Summary & the problem

David A. Huffman · Proceedings of the IRE, vol. 40, no. 9 · September 1952 · pp. 1098–1101 (manuscript received December 6, 1951)

An optimum method of coding an ensemble of messages consisting of a finite number of members is developed. A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized.

One important method of transmitting messages is to transmit in their place sequences of symbols. If there are more messages which might be sent than there are kinds of symbols available, then some of the messages must use more than one symbol.

Probably the most familiar ensemble code was stated in the phrase “one if by land and two if by sea.” In this case, the message ensemble consisted of the two individual messages “by land” and “by sea,” and the message codes were “one” and “two.”

The two basic restrictions

The following basic restrictions will be imposed on an ensemble code: (a) No two messages will consist of identical arrangements of coding digits. (b) The message codes will be constructed in such a way that no additional indication is necessary to specify where a message code begins and ends once the starting point of a sequence of messages is known.

Restriction (b) necessitates that no message be coded in such a way that its code appears, digit for digit, as the first part of any message code of greater length.

C. E. Shannon and R. M. Fano have developed ensemble coding procedures … Their coding procedures are not optimum, but approach the optimum behavior when N approaches infinity. … However, up to the present time, no definite procedure has been suggested for the construction of such a code to the knowledge of the author. It is the purpose of this paper to derive such a procedure.

The optimum binary procedure

Derived coding requirements

For an optimum code, the length of a given message code can never be less than the length of a more probable message code. If this requirement were not met, then a reduction in average message length could be obtained by interchanging the codes for the two messages in question.

Optimum binary code

Restriction (c) makes it necessary that the two least probable messages have codes of equal length. … It will be necessary to assign these two message codes to the Nth and the (N–1)st messages. … Once this has been done, these two messages are equivalent to a single composite message. … Its probability will be the sum of the probabilities of the two messages from which it was created. The ensemble containing this composite message in the place of its two component messages will be called the first auxiliary message ensemble.

The procedure is applied again and again until the number of members in the most recently formed auxiliary message ensemble is reduced to two. One of each of the binary digits is assigned to each of these two composite messages. These messages are then combined to form a single composite message with probability unity, and the coding is complete.

[ … ]

In the worked example (Table II), an ensemble of N = 13 messages is coded; the resulting average message length is Lav = 3.42 binary digits per message — below the four bits a fixed-length code would need, and close to the source entropy.

Generalization & acknowledgments

Optimum coding of an ensemble of messages using three or more types of digits is similar to the binary coding procedure. … in order to satisfy restriction (e), it will be required that all these brackets, with the possible exception of one combining the least probable messages of the original ensemble, always combine a number of messages equal to D.

The author is indebted to Dr. W. K. Linvill and Dr. R. M. Fano, both of the Massachusetts Institute of Technology, for their helpful criticism of this paper.