|
|
Line 141: |
Line 141: |
| {{NumBlk|::|<math>I\left(Z;X\right) \le I\left(Z;Y\right) </math>|{{EquationRef|22}}}} | | {{NumBlk|::|<math>I\left(Z;X\right) \le I\left(Z;Y\right) </math>|{{EquationRef|22}}}} |
| | | |
− | We can visualize this relation using the Venn diagrams in Fig. 3. | + | We can visualize this relation using the Venn diagrams in Fig. 3. Note that the equality occurs when <math>I\left(X;Y\mid Z\right) = 0</math>. It should also be evident that given <math>Y</math>, <math>X\mid Y</math> is independent of <math>Z\mid Y</math>, and that <math>I\left(X;Z\mid Y\right) = 0</math>. |
| | | |
| Essentially, the data processing inequality implies that no amount of processing or clever manipulation of data can improve inference. Stated in another way, no clever transformation of the received code <math>Y</math> can give more information about the sent code <math>X</math>. | | Essentially, the data processing inequality implies that no amount of processing or clever manipulation of data can improve inference. Stated in another way, no clever transformation of the received code <math>Y</math> can give more information about the sent code <math>X</math>. |
Entropy Chain Rule
Figure 1: Entropy visualization for two random variables using Venn diagrams.
As we increase the number of random variables we are dealing with, it is important to understand how this increase affects entropy. We have previously shown that for two random variables and :
-
|
|
(1)
|
We can use Venn diagrams to visualize these relationships, as seen in Fig. 1. For three random variables , , and :
-
|
|
(2)
|
In general:
-
|
|
(3)
|
Conditional Mutual Information
Conditional mutual information is defined as the expected value of the mutual information of two random variables given the value of a third random variable, and for three random variables , , and , it is defined as:
-
|
|
(4)
|
We can rewrite the definition of conditional mutual information as:
-
|
|
(5)
|
Figure 2: Entropy visualization for three random variables using Venn diagrams.
We can visualize this relationship using the Venn diagrams in Fig. 2. Compare this to our expression for the mutual information of two random variables and :
-
|
|
(6)
|
Chain Rule for Mutual Information
For random variables and :
-
|
|
(7)
|
And for random variables , and :
-
|
|
(8)
|
We can then express the conditional mutual information as:
-
|
|
(9)
|
Rearranging, we then obtain the chain rule for mutual information:
-
|
|
(10)
|
Thus, we can extend this for additional random variables:
-
|
|
(11)
|
In general:
-
|
|
(12)
|
Markovity
A Markov Chain is a random process that describes a sequence of possible events where the probability of each event depends only on the outcome of the previous event. Thus, we say that is a Markov chain in this order, denoted as:
-
|
|
(13)
|
If we can write:
-
|
|
(14)
|
Or in a more compact form:
-
|
|
(15)
|
We can use Markov chains to model how a signal is corrupted when passed through noisy channels. For example, if is a binary signal, it can change with a certain probability, , to , and it can again be corrupted to produce .
Consider the joint probability . We can express this as:
-
|
|
(16)
|
And if , we get:
-
|
|
(17)
|
Since , we can write:
-
|
|
(18)
|
Thus, we can say that and are conditionally independent given . If we think of as some past event, and as some future event, then the past and future events are independent if we know the present event . Note that this property is good definition of, as well as a useful tool for checking Markovity.
We can rewrite the joint probability as:
-
|
|
(19)
|
Therefore, if , then it follows that .
The Data Processing Inequality
Figure 3: Venn diagram visualizationn of mutual information in a Markov chain.
Consider three random variables, , , and . The mutual information can be expressed as:
-
|
|
(20)
|
If , i.e. , , and form a Markov chain, then is conditionally independent of given , resulting in . Thus,
-
|
|
(21)
|
And since , we get the expression known as the Data Processing Inequality:
-
|
|
(21)
|
And since is also a Markov chain, then:
-
|
|
(22)
|
We can visualize this relation using the Venn diagrams in Fig. 3. Note that the equality occurs when . It should also be evident that given , is independent of , and that .
Essentially, the data processing inequality implies that no amount of processing or clever manipulation of data can improve inference. Stated in another way, no clever transformation of the received code can give more information about the sent code .
Sufficient Statistics
Fano's Inequality