161-A3.1

Activity: Mutual Information and Channel Capacity
Instructions: In this activity, you are tasked to
- Walk through the examples.
- Calculate the channel capacity of different channel models.
Should you have any questions, clarifications, or issues, please contact your instructor as soon as possible.

Example 1: Mutual Information

Given the following joint probabilities:

Table 1: $X$ : Blood Type, $Y$ : Chance for Skin Cancer
	A	B	AB	O
Very Low	1/8	1/16	1/32	1/32
Low	1/16	1/8	1/32	1/32
Medium	1/16	1/16	1/16	1/16
High	1/4	0	0	0

To get the entropies of $X$ and $Y$ , we need to calculate the marginal probabilities:

P_{X}=\{{\tfrac {1}{2}},{\tfrac {1}{4}},{\tfrac {1}{8}},{\tfrac {1}{8}}\}

(1)

P_{Y}=\{{\tfrac {1}{4}},{\tfrac {1}{4}},{\tfrac {1}{4}},{\tfrac {1}{4}}\}

(2)

And since:

H\left(A\right)=\sum _{i=1}^{n}P\left(a_{i}\right)\cdot \log _{2}\left({\frac {1}{P\left(a_{i}\right)}}\right)

(3)

We get:

H\left(X\right)={\frac {1}{2}}\log _{2}2+{\frac {1}{4}}\log _{2}4+{\frac {1}{8}}\log _{2}8+{\frac {1}{8}}\log _{2}8={\frac {7}{4}}\,\mathrm {bits} =1.75\,\mathrm {bits}

(4)

H\left(Y\right)={\frac {1}{4}}\log _{2}4+{\frac {1}{4}}\log _{2}4+{\frac {1}{4}}\log _{2}4+{\frac {1}{4}}\log _{2}4=2\,\mathrm {bits}

(5)

Calculating the conditional entropies using:

H\left(X\mid Y\right)=\sum _{i=1}^{4}\sum _{j=1}^{4}P\left(x_{i},y_{j}\right)\cdot \log _{2}\left({\frac {P\left(y_{j}\right)}{P\left(x_{i},y_{j}\right)}}\right)={\frac {11}{8}}\,\mathrm {bits} =1.375\,\mathrm {bits}

(6)

H\left(Y\mid X\right)=\sum _{i=1}^{4}\sum _{j=1}^{4}P\left(x_{i},y_{j}\right)\cdot \log _{2}\left({\frac {P\left(x_{i}\right)}{P\left(x_{i},y_{j}\right)}}\right)={\frac {13}{8}}\,\mathrm {bits} =1.625\,\mathrm {bits}

(7)

Note that $H\left(X\mid Y\right)\neq H\left(Y\mid X\right)$ . Calculating the mutual information, we get:

I\left(X;Y\right)=H\left(X\right)-H\left(X\mid Y\right)={\frac {7}{4}}-{\frac {11}{8}}=0.375\,\mathrm {bits}

(8)

Or equivalently:

I\left(X;Y\right)=H\left(Y\right)-H\left(Y\mid X\right)=2-{\frac {13}{8}}=0.375\,\mathrm {bits}

(9)

Let us try to understand what this means:

If we only consider $X$ , we have the a priori probabilities, $P_{X}$ for each blood type, and we can calculate the entropy, $H\left(X\right)$ , i.e. the expected value of the information we get when we observe the results of the blood test.
Will our expectations change if we do not have access to the blood test, but instead, we get to access (1) the person's susceptibility to skin cancer, and (2) the joint probabilities in Table 1? Since we are given more information, we expect one of the following:
- The uncertainty to be equal to the original uncertainty if $X$ and $Y$ are independent, since $H\left(X\mid Y\right)=H\left(X\right)$ , thus $I\left(X;Y\right)=H\left(X\right)-H\left(X\mid Y\right)=0$ .
- A reduction in the uncertainty equal to $I\left(X;Y\right)$ , due to the additional information given by $P\left(x_{i},y_{j}\right)$ .
- If $X$ and $Y$ are perfectly correlated, we reduce the uncertainty to zero since $H\left(X\mid Y\right)=H\left(X\mid X\right)=0$ and $I\left(X;Y\right)=H\left(X\right)$ .

Example 2: A Noiseless Binary Channel

Figure 1: Channel model for a noiseless binary channel.

Consider transmitting information over a noiseless binary channel shown in Fig. 1. The input, $A=\{0,1\}$ has a priori probabilities $P\left(A=0\right)=\alpha$ and $P\left(A=1\right)=1-\alpha$ , and the output is $B=\{0,1\}$ . Calculating the output probabilities:

{\begin{aligned}P\left(B=0\right)&=P\left(B=0\mid A=0\right)\cdot P\left(A=0\right)+P\left(B=0\mid A=1\right)\cdot P\left(A=1\right)\\&=1\cdot \alpha +0\cdot \left(1-\alpha \right)\\&=\alpha \\\end{aligned}}

(10)

{\begin{aligned}P\left(B=1\right)&=P\left(B=1\mid A=0\right)\cdot P\left(A=0\right)+P\left(B=1\mid A=1\right)\cdot P\left(A=1\right)\\&=0\cdot \alpha +1\cdot \left(1-\alpha \right)\\&=1-\alpha \\\end{aligned}}

(11)

Thus, the entropy at the output is:

H\left(B\right)=\alpha \log _{2}{\frac {1}{\alpha }}+\left(1-\alpha \right)\log _{2}{\frac {1}{1-\alpha }}=H\left(A\right)

(12)

The conditional entropy, $H\left(B\mid A\right)$ is then:

{\begin{aligned}H\left(B\mid A\right)=&\,\,\sum _{i=1}^{n}P\left(a_{i}\right)\sum _{j=1}^{m}P\left(b_{j}\mid a_{i}\right)\cdot \log _{2}\left({\frac {1}{P\left(b_{j}\mid a_{i}\right)}}\right)\\=&\,\,P\left(A=0\right)\cdot P\left(B=0\mid A=0\right)\cdot \log _{2}{\frac {1}{P\left(B=0\mid A=0\right)}}\\&+P\left(A=0\right)\cdot P\left(B=1\mid A=0\right)\cdot \log _{2}{\frac {1}{P\left(B=1\mid A=0\right)}}\\&+P\left(A=1\right)\cdot P\left(B=0\mid A=1\right)\cdot \log _{2}{\frac {1}{P\left(B=0\mid A=1\right)}}\\&+P\left(A=1\right)\cdot P\left(B=1\mid A=1\right)\cdot \log _{2}{\frac {1}{P\left(B=1\mid A=1\right)}}\\=&\,\,\alpha \cdot 1\cdot \log _{2}1\\&+\alpha \cdot 0\cdot \log _{2}{\frac {1}{0}}\\&+\left(1-\alpha \right)\cdot 0\cdot \log _{2}{\frac {1}{0}}\\&+\left(1-\alpha \right)\cdot 1\cdot \log _{2}1\\=&\,\,0\\\end{aligned}}

(13)

Which is expected since $A$ and $B$ are perfectly correlated, and there is no uncertainty in $B$ once $A$ is given. Thus, the mutual information is:

{\begin{aligned}I\left(A;B\right)&=H\left(B\right)-H\left(B\mid A\right)=H\left(B\right)\\&=\alpha \log _{2}{\frac {1}{\alpha }}+\left(1-\alpha \right)\log _{2}{\frac {1}{1-\alpha }}\\\end{aligned}}

(14)

To get the channel capacity, we get the maximum $I\left(A;B\right)$ over all $\alpha$ . Thus, we can take the derivative of the mutual information and set it equal to zero, to find the optimal $\alpha$ . And noting that $\log _{2}x={\tfrac {1}{\ln 2}}\ln x$ , we get:

{\begin{aligned}{\frac {\partial I\left(A;B\right)}{\partial \alpha }}&={\frac {\partial }{\partial \alpha }}\left(\alpha \log _{2}{\frac {1}{\alpha }}+\left(1-\alpha \right)\log _{2}{\frac {1}{1-\alpha }}\right)=0\\&={\frac {\partial }{\partial \alpha }}\left({\frac {\alpha }{\ln 2}}\ln {\frac {1}{\alpha }}+{\frac {1-\alpha }{\ln 2}}\ln {\frac {1}{1-\alpha }}\right)=0\\&={\frac {1}{\ln 2}}\ln {\frac {1}{\alpha }}-\alpha {\frac {1}{\alpha ^{2}}}{\frac {\alpha }{\ln 2}}-{\frac {1}{\ln 2}}\ln {\frac {1}{1-\alpha }}+{\frac {1-\alpha }{\ln 2}}{\frac {1-\alpha }{\left(1-\alpha \right)^{2}}}=0\\&=\ln {\frac {1}{\alpha }}-\ln {\frac {1}{1-\alpha }}=0\\\end{aligned}}

(15)

Thus, we can get $\alpha =0.5$ . This can also be seen from the plot of $I\left(A;B\right)$ as a function of $\alpha$ shown in Fig. 2. Therefore the channel capacity is:

C=\max _{P\left(A\right)}I\left(A;B\right)=\left.I\left(A;B\right)\right|_{\alpha =0.5}=1\,\mathrm {bit/channel\,use}

(16)

As expected, in a noiseless binary channel, we can transmit a maximum of 1 bit of information per channel use.

Example 3: A Noisy Channel with Non-Overlapping Outputs

Figure 3: A noisy channel with non-overlapping outputs.

Consider the channel shown in Fig. 3. The input, $A=\{0,1\}$ has a priori probabilities $P\left(A=0\right)=\alpha$ and $P\left(A=1\right)=1-\alpha$ , and in this case, the output is $B=\{1,2,3,4\}$ . Once again, we can calculate the output probabilities:

P\left(B=1\right)=P\left(B=1\mid A=0\right)\cdot P\left(A=0\right)={\frac {1}{2}}\cdot \alpha

(17)

P\left(B=2\right)=P\left(B=2\mid A=0\right)\cdot P\left(A=0\right)={\frac {1}{2}}\cdot \alpha

(18)

P\left(B=3\right)=P\left(B=3\mid A=1\right)\cdot P\left(A=1\right)={\frac {1}{3}}\cdot \left(1-\alpha \right)

(19)

P\left(B=4\right)=P\left(B=4\mid A=1\right)\cdot P\left(A=1\right)={\frac {2}{3}}\cdot \left(1-\alpha \right)

(20)

Note that the terms that evaluate to zero have already been omitted. The entropy at the output is then:

{\begin{aligned}H\left(B\right)&={\frac {\alpha }{2}}\log _{2}{\frac {2}{\alpha }}+{\frac {\alpha }{2}}\log _{2}{\frac {2}{\alpha }}+{\frac {1-\alpha }{3}}\log _{2}{\frac {3}{1-\alpha }}+{\frac {2\left(1-\alpha \right)}{3}}\log _{2}{\frac {3}{2\left(1-\alpha \right)}}\\&=\alpha \log _{2}{\frac {1}{\alpha }}+\left(1-\alpha \right)\log _{2}{\frac {1}{1-\alpha }}+\alpha \log _{2}2+\left(1-\alpha \right)\left(\log _{2}3-{\frac {2}{3}}\log _{2}2\right)\\\end{aligned}}

(21)

Calculating the conditional entropy:

{\begin{aligned}H\left(B\mid A\right)=&\,\,P\left(A=0\right)P\left(B=1\mid A=0\right)\cdot \log _{2}{\frac {1}{P\left(B=1\mid A=0\right)}}\\&\,\,+P\left(A=0\right)P\left(B=2\mid A=0\right)\cdot \log _{2}{\frac {1}{P\left(B=2\mid A=0\right)}}\\&\,\,+P\left(A=1\right)P\left(B=3\mid A=1\right)\cdot \log _{2}{\frac {1}{P\left(B=3\mid A=1\right)}}\\&\,\,+P\left(A=1\right)P\left(B=4\mid A=1\right)\cdot \log _{2}{\frac {1}{P\left(B=4\mid A=1\right)}}\\=&\,\,{\frac {\alpha }{2}}\cdot \log _{2}2+{\frac {\alpha }{2}}\cdot \log _{2}2+{\frac {1-\alpha }{3}}\cdot \log _{2}3+{\frac {2\left(1-\alpha \right)}{3}}\cdot \log _{2}{\frac {3}{2}}\\=&\,\,\alpha \cdot \log _{2}2+\left(1-\alpha \right)\left(\log _{2}3-{\frac {2}{3}}\log _{2}2\right)\\\end{aligned}}

(22)

This allows us to calculate the mutual information:

{\begin{aligned}I\left(A;B\right)&=H\left(B\right)-H\left(B\mid A\right)\\&=\alpha \log _{2}{\frac {1}{\alpha }}+\left(1-\alpha \right)\log _{2}{\frac {1}{1-\alpha }}\\\end{aligned}}

(23)

Which once again leads to a channel capacity, $C=1$ bit per channel use, at $\alpha =0.5$ . This reinforces our intuition that as long as each output symbol is determined by only one input symbol, i.e. no overlaps, we can infer which input symbol was transmitted when we know the output symbol even in the presence of noise.

Activity: The Binary Symmetric Channel (BSC)

Figure 4: The binary symmetric channel.

Consider the Binary Symmetric Channel shown in Fig. 4.

Sources

Yao Xie's slides on Entropy and Mutual Information

161-A3.1

Contents

Example 1: Mutual Information

Example 2: A Noiseless Binary Channel

Example 3: A Noisy Channel with Non-Overlapping Outputs

Activity: The Binary Symmetric Channel (BSC)

Sources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools