Difference between revisions of "Mutual Information"

From Microlab Classes
Jump to navigation Jump to search
Line 2: Line 2:
  
 
== Definition ==
 
== Definition ==
Given a probability model of the source, we have an ''a priori'' estimate <math>P\left(a_i\right)</math> that symbol <math>a_i</math> will be sent next. Upon observing <math>b_j</math>, we can revise our estimate to <math>P\left(a_i\mid b_j\right)</math>. The change in information, or ''mutual information'', is given by:
+
[[File:Noisy channel.png|thumb|500px|Figure 1: A noisy channel.]]
 +
Given a probability model of the source, we have an ''a priori'' estimate <math>P\left(a_i\right)</math> that symbol <math>a_i</math> will be sent next. Upon observing <math>b_j</math>, we can revise our estimate to <math>P\left(a_i\mid b_j\right)</math>, as shown in Fig. 1. The change in information, or ''mutual information'', is given by:
  
 
{{NumBlk|::|<math>I\left(a_i ; b_j\right)=\log_2\left(\frac{1}{P\left(a_i\right)}\right)-\log_2\left(\frac{1}{P\left(a_i \mid b_j\right)}\right)=\log_2\left(\frac{P\left(a_i \mid b_j\right)}{P\left(a_i\right)}\right)</math>|{{EquationRef|1}}}}
 
{{NumBlk|::|<math>I\left(a_i ; b_j\right)=\log_2\left(\frac{1}{P\left(a_i\right)}\right)-\log_2\left(\frac{1}{P\left(a_i \mid b_j\right)}\right)=\log_2\left(\frac{P\left(a_i \mid b_j\right)}{P\left(a_i\right)}\right)</math>|{{EquationRef|1}}}}

Revision as of 17:51, 17 September 2020

In general, the channel is itself can add noise. This means that the channel itself serves as an additional layer of uncertainty to our transmissions. Consider a channel with input symbols , and output symbols . Note that the input and output alphabets do not need to have the same number of symbols. Given the noise in the channel, if we observe the output symbol , we are not sure which was the input symbol. We can then characterize the channel as a set of probabilities . Let us consider the information we get from observing a symbol .

Definition

Figure 1: A noisy channel.

Given a probability model of the source, we have an a priori estimate that symbol will be sent next. Upon observing , we can revise our estimate to , as shown in Fig. 1. The change in information, or mutual information, is given by:

 

 

 

 

(1)

Let's look at a few properties of mutual information. Expressing the equation above in terms of :

 

 

 

 

(2)

Thus, we can say:

 

 

 

 

(3)

This is expected since, after observing , the amount of uncertainty is reduced, i.e. we know a bit more about , and the most change in information we can get is when and are perfectly correlated, with . From Bayes' Theorem, we have the property:

 

 

 

 

(4)

Note that if and are independent, where and , then:

 

 

 

 

(5)

We can get the average mutual information over all the input symbols as:

 

 

 

 

(6)

Similarly, for all the output symbols:

 

 

 

 

(7)

For both input and output symbols, we get:

 

 

 

 

(8)

Non-Negativity of Mutual Information

To show the non-negativity of mutual information, let us use Jensen's Inequality, which states that for a convex function, :

 

 

 

 

(9)

Using the fact that is convex, and applying this to our expression for mutual information, we get:

 

 

 

 

(10)

Note that when and are independent.

Conditional and Joint Entropy

Given and , and their entropies:

 

 

 

 

(11)

 

 

 

 

(12)

Conditional Entropy

The conditional entropy is a measure of the average uncertainty about when is known, and we can define it as:

 

 

 

 

(13)

And similarly,

 

 

 

 

(14)

Joint Entropy

If we extend the definition of entropy to two (or more) random variables, and , we can define the joint entropy of and as:

 

 

 

 

(15)

Expanding expression for joint entropy, and using we get:

 

 

 

 

(16)

If we instead used , we would get the alternative expression:

 

 

 

 

(17)

We can then expand our expression for as:

 

 

 

 

(18)

Sources

  • Tom Carter's notes on Information Theory
  • Dan Hirschberg's notes on Data Compression

References