摘要与问题
David A. Huffman · Proceedings of the IRE, vol. 40, no. 9 · September 1952 · pp. 1098–1101 (manuscript received December 6, 1951)
An optimum method of coding an ensemble of messages consisting of a finite number of members is developed. A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized.
One important method of transmitting messages is to transmit in their place sequences of symbols. If there are more messages which might be sent than there are kinds of symbols available, then some of the messages must use more than one symbol.
Probably the most familiar ensemble code was stated in the phrase “one if by land and two if by sea.” In this case, the message ensemble consisted of the two individual messages “by land” and “by sea,” and the message codes were “one” and “two.”
两条基本限制
The following basic restrictions will be imposed on an ensemble code: (a) No two messages will consist of identical arrangements of coding digits. (b) The message codes will be constructed in such a way that no additional indication is necessary to specify where a message code begins and ends once the starting point of a sequence of messages is known.
Restriction (b) necessitates that no message be coded in such a way that its code appears, digit for digit, as the first part of any message code of greater length.
C. E. Shannon and R. M. Fano have developed ensemble coding procedures … Their coding procedures are not optimum, but approach the optimum behavior when N approaches infinity. … However, up to the present time, no definite procedure has been suggested for the construction of such a code to the knowledge of the author. It is the purpose of this paper to derive such a procedure.
最优二进制步骤
Derived coding requirements
For an optimum code, the length of a given message code can never be less than the length of a more probable message code. If this requirement were not met, then a reduction in average message length could be obtained by interchanging the codes for the two messages in question.
Optimum binary code
Restriction (c) makes it necessary that the two least probable messages have codes of equal length. … It will be necessary to assign these two message codes to the Nth and the (N–1)st messages. … Once this has been done, these two messages are equivalent to a single composite message. … Its probability will be the sum of the probabilities of the two messages from which it was created. The ensemble containing this composite message in the place of its two component messages will be called the first auxiliary message ensemble.
The procedure is applied again and again until the number of members in the most recently formed auxiliary message ensemble is reduced to two. One of each of the binary digits is assigned to each of these two composite messages. These messages are then combined to form a single composite message with probability unity, and the coding is complete.
[ … ]
In the worked example (Table II), an ensemble of N = 13 messages is coded; the resulting average message length is Lav = 3.42 binary digits per message — below the four bits a fixed-length code would need, and close to the source entropy.
推广与致谢
Optimum coding of an ensemble of messages using three or more types of digits is similar to the binary coding procedure. … in order to satisfy restriction (e), it will be required that all these brackets, with the possible exception of one combining the least probable messages of the original ensemble, always combine a number of messages equal to D.
The author is indebted to Dr. W. K. Linvill and Dr. R. M. Fano, both of the Massachusetts Institute of Technology, for their helpful criticism of this paper.