Uniquely Decodable Codes

In this article, we prove an important inequality which characterizes uniquely decodable codes and derive the entropy bound which places a lower bound on the achievable compression with such codes.

The Kraft-McMillan Inequality

The Kraft-McMillan inequality provides (1) a necessary condition for unique decodability, and (2) a sufficient condition for the existence of uniquely decodable codes.

(Necessity.) Let ${\mathcal {C}}$ be a $D$ -ary uniquely decodable code whose codewords have lengths $l_{1},l_{2},...l_{m}$ . Then, $\sum _{i}D^{-l_{i}}\leq 1.$

(Sufficiency.) Let $l_{1},l_{2},...,l_{m}$ be a sequence of positive integers such that $\sum _{i}D^{-l_{i}}\leq 1.$ . Then, there exists a uniquely decodable $D$ -ary code $C$ whose codewords have lengths $l_{1},l_{2},...,l_{m}$ .

Since all instantaneous/prefix-free codes are uniquely decodable, then the above statements are also true if we replace "uniquely decodable" by "instantaneous." Let us first prove the statements for prefix-free codes, and then we discuss the more general case of uniquely decodable codes.

Proof for Prefix-Free Codes

Prefix-free codes can be visualized as a $D$ -ary tree, in which all non-leaf nodes have at most $D$ children. In the figure below, we have a binary (2-ary) prefix-free code where each "left turn" corresponds to 0 and each "right turn" corresponds to 1.

Since no codeword can be a prefix of another codeword, the following (equivalent) properties must be satisfied:

no codeword can be a descendant of another codeword, and
no two codewords can share a descendant in the full binary tree.

The first property is a direct consequence of being prefix-free. Essentially, it tells us that once we have assigned a node in the tree as a codeword, all of its descendants in the full binary tree are denied the possibility of being a codeword. The figure illustrates this property using the dashed circles below solid black nodes. The second property requires a bit more thought: if a tree node has codewords A and B are ancestors, then either A is a descendant of B or B is a descendant of A. The proof of this fact is left as an exercise.

We are now ready to prove the necessity condition for prefix-free codes. Let $l_{\text{max}}$ be the maximum codeword length in $l_{1},...,l_{m}$ . Construct a full $D$ -ary tree with depth $l_{\text{max}}$ . At the lowest level, there are $D^{l_{\text{max}}}$ leaf nodes. For example, in the figure we have $D=2$ and $l_{\text{max}}=4$ , which corresponds to $2^{4}=16$ leaf nodes.

Each codeword will have some descendants at the lowest level. In the figure, the codeword 01 is at level 2 and it has 4 descendants at the lowest level. In general, a D-ary codeword at level $l_{i}$ will have $D^{l_{\text{max}}-l_{i}}$ descendants at the lowest level, $l_{\text{max}}$ . (If the codeword is itself at the lowest level, then we count that codeword as its own descendant.) Now, since no two codewords can share a descendant, we can obtain the total number of distinct descendants by simply adding them up. The resulting total must be at most equal to number of leaf nodes, which is equal to $D^{l_{\text{max}}}$ .

$\sum _{i}D^{l_{\text{max}}-l_{i}}\leq D^{l_{\text{max}}}$ .

Dividing both sides by $D^{l_{\text{max}}}$ gives us Kraft's inequality. We have just proved the necessity condition.

To prove sufficiency, we will construct a prefix-free code using a full $D$ -ary as a visual aid. Let $l_{1},...,l_{m}$ be a sequence of positive integers which satisfy Kraft's inequality for some $D$ . Without loss of generality, assume that $l_{1}\leq l_{2}\leq ...\leq l_{m}$ .

We start with a full D-ary tree with depth $l_{\text{max}}=l_{m}$ . At this point, we mark all tree nodes as "available". We select $m$ codewords by proceeding as follows:

Assign the $i$ th codeword to an available node at level $l_{i}$ .
Eliminate all of the descendants of the recently selected codeword, and mark them as "not available."

Since the codeword lengths satisfy Kraft's inequality, we should never run into a case where we "run out" of available nodes to select.

Checkpoint: Construct a ternary (

D=3

) prefix-free code with the following codeword lengths: (1, 2, 2, 2, 2).

Proof for Uniquely Decodable Codes

The proof of the Kraft-McMillan Inequality for uniquely decodable codes is interesting since it starts with evaluating $K^{m}$ :

K^{m}=\left(\sum _{i=1}^{n}{\frac {1}{r^{\ell _{i}}}}\right)^{m}=\sum _{i_{1}=1}^{n}\sum _{i_{1}=1}^{n}\cdots \sum _{i_{m}=1}^{n}{\frac {1}{r^{\ell _{i_{1}}+\ell _{i_{2}}+\ldots +\ell _{i_{m}}}}}

(2)

Let $\ell =\max \left(\ell _{1},\ell _{2},\ldots ,\ell _{n}\right)$ . Thus, the minimum value of $\ell _{i_{1}}+\ell _{i_{2}}+\ldots +\ell _{i_{m}}$ is $m$ , when all the codewords are 1 bit long, and the maximum is $m\ell$ , when all the codewords have the maximum length. We can then write:

K^{m}=\sum _{k=m}^{m\ell }{\frac {N_{k}}{r^{k}}}

(3)

Where $N_{k}$ is the number of combinations of $m$ codewords that have a combined length of $k$ . Note that the number of distinct codewords of length $k$ is $r^{k}$ . If this code is uniquely decodable, then each sequence can represent one and only one sequence of codewords. Therefore, the number of possible combinations of codewords whose combined length is $k$ cannot be greater than $r^{k}$ , or:

N_{k}\leq r^{k}

(4)

We can then write:

K^{m}\leq \sum _{k=m}^{m\ell }{\frac {r^{k}}{r^{k}}}=m\ell -m+1

(5)

Thus, we can conclude that $K\leq 1$ since if this were not true, $K^{m}$ would exceed $m\ell -m+1$ for large $m$ .

The Entropy Bound

Let ${\mathcal {C}}$ be a $D$ -ary uniquely decodable code for a random variable $X$ , with source entropy $H_{D}(X)$ . Then, the expected length of ${\mathcal {C}}$ is bounded below by $H_{D}(X)$ ,

$L=\sum _{i}{p_{i}l_{i}}\geq H_{D}(X),$

where $H_{D}(X)$ is the entropy of $X$ using base- $D$ logarithms.

We can write

$L=\sum _{i}{p_{i}l_{i}}=\sum _{i}p_{i}\log _{D}D^{l_{i}},$

and aim to produce an expression with $D^{-l_{i}}$ so that we can use the Kraft-McMillan inequality, which is satisfied by any uniquely decodable code.

To show that $L\geq H_{D}(X)$ , we will show that $L-H_{D}(X)\geq 0$ .

$L-H_{D}(X)=\sum _{i}p_{i}\log _{D}D^{l_{i}}+\sum _{i}p_{i}\log _{D}p_{i}=\sum _{i}p_{i}\log _{D}(D^{l_{i}}p_{i})$

Next, we perform a change of base by observing that $log_{D}t={\frac {\ln t}{\ln D}}$ .

$L-H_{D}(X)={\frac {1}{\ln D}}\sum _{i}p_{i}\ln(D^{l_{i}}p_{i})$ .

For all positive real numbers $x$ , $\ln x\geq 1-1/x$ . This step will enable us to produce the necessary $D^{-l_{i}}$ terms.

$L-H_{D}(X)\geq {\frac {1}{\ln D}}\sum _{i}p_{i}\left(1-{\frac {1}{D^{l_{i}}p_{i}}}\right)={\frac {1}{\ln D}}\left(\sum _{i}p_{i}-\sum {i}D^{-l_{i}}\right)\geq {\frac {1}{\ln D}}(1-1)=0,$

where we used both the Kraft inequality and $\sum _{i}p_{i}=1$ .

Uniquely Decodable Codes

Contents

The Kraft-McMillan Inequality

Proof for Prefix-Free Codes

Proof for Uniquely Decodable Codes

The Entropy Bound

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools