Difference between revisions of "Huffman codes"

Revision as of 03:23, 15 March 2021

Construction

Optimality

The optimality of the Huffman code hinges upon the following two properties of optimal prefix-free codes:

Shorter codewords are assigned to more probable symbols.
There exists an optimal code in which the codewords assigned to the two smallest probabilities are siblings.

Let us prove the first statement by contradiction. Let us suppose that $l_{1}<l_{2}$ instead for some symbols $x_{1},x_{2}$ such that $p_{1}<p_{2}$ . If we calculate the expected length due to all symbols, $x_{1}$ and $x_{2}$ will contribute the term $p_{1}l_{1}+p_{2}l_{2}$ . We proceed by showing the term obtained by swapping the assignment, $p_{1}l_{2}+p_{2}l_{1}$ , is strictly lower.

$(p_{1}l_{2}+p_{2}l_{1})-(p_{1}l_{1}+p_{2}l_{2})=(p_{1}-p_{2})(l_{2}-l_{1})<0,$

since $p_{1}<p_{2}$ and $l_{1}<l_{2}$ . This contradicts the supposed optimality of the code, and so it must be the case that $l_{1}\geq l_{2}$ whenever $p_{1}<p_{2}$ .

We only outline the proof for the second statement. First, we must note that there must be at least two codewords which are siblings of each other. Otherwise, we can "trim" the tree by moving up a codeword by one level. Next, we can use the first statement to perform an exchange argument that will give the desired property.

Achieving Shannon's limit

Using Huffman codes to encoder every single symbol from ${\mathcal {X}}$ can potentially incur a large penalty since the expected length can be at most 1 D-ary digit away from the entropy bound (using base- $D$ logarithm). If we group symbols together $n$ at a time, we can achieve the Shannon limit to arbitrary precision.

The idea behind achieving Shannon's compression limit is to use a larger source alphabet. If take $n$ symbols at a time and create a Huffman code, we are essentially using the larger input alphabet ${\mathcal {X}}^{n}$ . Using the bounds for Huffman code, the expected length $L_{n}$ is given by

$H_{D}(X^{n})\leq L_{n}<H_{D}(X^{n})+1$ .

Here, $L_{n}$ accounts for the encoding of a block of $n$ symbols. We can normalize the inequality and obtain a tighter expression:

${\frac {1}{n}}H_{D}(X^{n})\leq L_{n}<{\frac {1}{n}}[H_{D}(X^{n})+1]$ .

Since we are encoding $n$ independent and identically distributed symbols at a time, the joint entropy $H(X^{n})=nH(X)$ . Hence,

$H_{D}(X)\leq {\frac {1}{n}}L_{n}<H_{D}(X)+{\frac {1}{n}}$ ,

where the upper bound can be made to approach $H_{D}(X)$ by letting $n$ be sufficiently large.

The proof above is representative of many arguments in information theory that take the blocklength $n$ to arbitrarily large values. In that respect, we can look at $H(X)$ as an asymptotic compression limit. However, it is also important to keep in mind that directly applying the above construction is very impractical. If we take the English alphabet, with 26 letters, and group them five-at-a-time, then the effective alphabet blows up quickly to above 10 million! Fortunately, more modern coding schemes exist that approach the entropy bound without the heavy exponential overhead. The interested student is encouraged to look up "arithmetic coding."

@@ Line 3: / Line 3: @@
 == Optimality ==
-The optimality of the Huffman code hinges on the following two properties of optimal prefix-free codes:
+The optimality of the Huffman code hinges upon the following two properties of optimal prefix-free codes:
-*
+* Shorter codewords are assigned to more probable symbols.
-*
+* There exists an optimal code in which the codewords assigned to the two smallest probabilities are siblings.
+Let us prove the first statement by contradiction. Let us suppose that <math>l_1 < l_2</math> instead for some symbols <math>x_1,x_2</math> such that <math> p_1 < p_2</math>. If we calculate the expected length due to all symbols, <math>x_1</math> and <math>x_2</math> will contribute the term <math>p_1l_1 + p_2 l_2</math>. We proceed by showing the term obtained by swapping the assignment, <math>p_1l_2 + p_2l_1</math>, is strictly lower.
+<math>(p_1l_2 + p_2l_1) - (p_1l_1 + p_2l_2) = (p_1 - p_2)(l_2 - l_1) <0,</math>
+since <math>p_1 < p_2</math> and <math>l_1 < l_2</math>. This contradicts the supposed optimality of the code, and so it must be the case that <math>l_1 \ge l_2</math> whenever <math> p_1 < p_2</math>.
+We only outline the proof for the second statement. First, we must note that there must be at least two codewords which are siblings of each other. Otherwise, we can "trim" the tree by moving up a codeword by one level. Next, we can use the first statement to perform an exchange argument that will give the desired property.
 == Achieving Shannon's limit ==

Difference between revisions of "Huffman codes"

Revision as of 03:23, 15 March 2021

Construction

Optimality

Achieving Shannon's limit

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools