Difference between revisions of "Mutual Information"

Revision as of 16:01, 17 September 2020

In general, the channel is itself can add noise. This means that the channel itself serves as an additional layer of uncertainty to our transmissions. Consider a channel with input symbols $A=\{a_{1},a_{2},\ldots ,a_{n}\}$ , and output symbols $B=\{b_{1},b_{2},\ldots ,b_{m}\}$ . Note that the input and output alphabets do not need to have the same number of symbols. Given the noise in the channel, if we observe the output symbol $b_{j}$ , we are not sure which $a_{i}$ was the input symbol. We can then characterize the channel as a set of probabilities $\{P\left(a_{i}\mid b_{j}\right)\}$ . Let us consider the information we get from observing a symbol $b_{j}$ .

Definition

Given a probability model of the source, we have an a priori estimate $P\left(a_{i}\right)$ that symbol $a_{i}$ will be sent next. Upon observing $b_{j}$ , we can revise our estimate to $P\left(a_{i}\mid b_{j}\right)$ . The change in information, or mutual information, is given by:

I\left(a_{i};b_{j}\right)=\log _{2}\left({\frac {1}{P\left(a_{i}\right)}}\right)-\log _{2}\left({\frac {1}{P\left(a_{i}\mid b_{j}\right)}}\right)=\log _{2}\left({\frac {P\left(a_{i}\mid b_{j}\right)}{P\left(a_{i}\right)}}\right)

(1)

Let's look at a few properties of mutual information. Expressing the equation above in terms of $I\left(a_{i}\right)$ :

I\left(a_{i};b_{j}\right)=I\left(a_{i}\right)+\log _{2}\left(P\left(a_{i}\mid b_{j}\right)\right)

(2)

Thus, we can say:

I\left(a_{i};b_{j}\right)\leq I\left(a_{i}\right)

(3)

This is expected since, after observing $b_{j}$ , the amount of uncertainty is reduced, i.e. we know a bit more about $a_{i}$ , and the most change in information we can get is when $a_{i}$ and $b_{j}$ are perfectly correlated, with $I\left(a_{i};b_{j}\right)=I\left(a_{i}\right)$ . From Bayes' Theorem, we have the property:

I\left(a_{i};b_{j}\right)=\log _{2}\left({\frac {P\left(a_{i}\mid b_{j}\right)}{P\left(a_{i}\right)}}\right)=\log _{2}\left({\frac {P\left(b_{j}\mid a_{i}\right)}{P\left(b_{j}\right)}}\right)=I\left(b_{j};a_{i}\right)

(4)

Note that if $a_{i}$ and $b_{j}$ are independent, where $P\left(a_{i}\mid b_{j}\right)=P\left(a_{i}\right)$ and $P\left(b_{j}\mid a_{i}\right)=P\left(b_{j}\right)$ , then:

I\left(a_{i};b_{j}\right)=\log _{2}\left({\frac {P\left(a_{i}\mid b_{j}\right)}{P\left(a_{i}\right)}}\right)=\log _{2}\left({\frac {P\left(b_{j}\mid a_{i}\right)}{P\left(b_{j}\right)}}\right)=\log _{2}\left(1\right)=0

(5)

We can get the average mutual information over all the input symbols as:

I\left(A;b_{j}\right)=\sum _{i=1}^{n}P\left(a_{i}\mid b_{j}\right)\cdot I\left(a_{i};b_{j}\right)=\sum _{i=1}^{n}P\left(a_{i}\mid b_{j}\right)\cdot \log _{2}\left({\frac {P\left(a_{i}\mid b_{j}\right)}{P\left(a_{i}\right)}}\right)

(6)

Similarly, for all the output symbols:

I\left(a_{i};B\right)=\sum _{j=1}^{m}P\left(b_{j}\mid a_{i}\right)\cdot \log _{2}\left({\frac {P\left(b_{j}\mid a_{i}\right)}{P\left(b_{j}\right)}}\right)

(7)

For both input and output symbols, we get:

{\begin{aligned}I\left(A;B\right)&=\sum _{i=1}^{n}P\left(a_{i}\right)\cdot I\left(a_{i};B\right)\\&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i}\right)P\left(b_{j}\mid a_{i}\right)\cdot \log _{2}\left({\frac {P\left(b_{j}\mid a_{i}\right)}{P\left(b_{j}\right)}}\right)\\&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {P\left(a_{i},b_{j}\right)}{P\left(a_{i}\right)\cdot P\left(b_{j}\right)}}\right)\\&=I\left(B;A\right)\end{aligned}}

(8)

Non-Negativity of Mutual Information

To show the non-negativity of mutual information, let us use Jensen's Inequality, which states that for a convex function, $f\left(x\right)$ :

\langle f\left(x\right)\rangle \geq f\left(\langle x\rangle \right)

(9)

Using the fact that $f\left(x\right)=-\log _{2}\left(x\right)$ is convex, and applying this to our expression for mutual information, we get:

{\begin{aligned}I\left(A;B\right)&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {P\left(a_{i},b_{j}\right)}{P\left(a_{i}\right)\cdot P\left(b_{j}\right)}}\right)\\&=-\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {P\left(a_{i}\right)\cdot P\left(b_{j}\right)}{P\left(a_{i},b_{j}\right)}}\right)\\&=\left\langle -\log _{2}\left({\frac {P\left(a_{i}\right)\cdot P\left(b_{j}\right)}{P\left(a_{i},b_{j}\right)}}\right)\right\rangle \\&\geq -\log _{2}\left(\left\langle {\frac {P\left(a_{i}\right)\cdot P\left(b_{j}\right)}{P\left(a_{i},b_{j}\right)}}\right\rangle \right)=-\log _{2}\left(\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot {\frac {P\left(a_{i}\right)\cdot P\left(b_{j}\right)}{P\left(a_{i},b_{j}\right)}}\right)\\&\geq -\log _{2}\left(\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i}\right)\cdot P\left(b_{j}\right)\right)=-\log _{2}\left(\sum _{i=1}^{n}P\left(a_{i}\right)\sum _{j=1}^{m}P\left(b_{j}\right)\right)=-\log _{2}\left(1\right)\\&\geq 0\\\end{aligned}}

(10)

Note that $I\left(A;B\right)=0$ when $A$ and $B$ are independent.

Conditional and Joint Entropy

Given $A$ and $B$ , and their entropies:

H\left(A\right)=\sum _{i=1}^{n}P\left(a_{i}\right)\cdot \log _{2}\left({\frac {1}{P\left(a_{i}\right)}}\right)

(11)

H\left(B\right)=\sum _{j=1}^{m}P\left(b_{j}\right)\cdot \log _{2}\left({\frac {1}{P\left(b_{j}\right)}}\right)

(12)

Conditional Entropy

The conditional entropy is a measure of the average uncertainty about $B$ when $A$ is known, and we can define it as:

{\begin{aligned}H\left(B\mid A\right)&=\sum _{i=1}^{n}P\left(a_{i}\right)\sum _{j=1}^{m}P\left(b_{j}\mid a_{i}\right)\cdot \log _{2}\left({\frac {1}{P\left(b_{j}\mid a_{i}\right)}}\right)\\&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i}\right)P\left(b_{j}\mid a_{i}\right)\cdot \log _{2}\left({\frac {1}{P\left(b_{j}\mid a_{i}\right)}}\right)\\&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(b_{j},a_{i}\right)\cdot \log _{2}\left({\frac {P\left(a_{i}\right)}{P\left(b_{j},a_{i}\right)}}\right)\\\end{aligned}}

(13)

And similarly,

{\begin{aligned}H\left(A\mid B\right)&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(b_{j},a_{i}\right)\cdot \log _{2}\left({\frac {P\left(b_{j}\right)}{P\left(b_{j},a_{i}\right)}}\right)\\&\neq H\left(B\mid A\right)\\\end{aligned}}

(14)

Joint Entropy

If we extend the definition of entropy to two (or more) random variables, $A$ and $B$ , we can define the joint entropy of $A$ and $B$ as:

H\left(A,B\right)=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {1}{P\left(a_{i},b_{j}\right)}}\right)

(15)

Expanding expression for joint entropy, and using $P\left(a_{i},b_{j}\right)=P\left(a_{i}\mid b_{j}\right)P\left(b_{j}\right)$ we get:

{\begin{aligned}H\left(A,B\right)&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {1}{P\left(a_{i}\mid b_{j}\right)P\left(b_{j}\right)}}\right)=-\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left(P\left(a_{i}\mid b_{j}\right)P\left(b_{j}\right)\right)\\&=-\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\left(\log _{2}\left(P\left(a_{i}\mid b_{j}\right)\right)+\log _{2}\left(P\left(b_{j}\right)\right)\right)\\&=-\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left(P\left(a_{i}\mid b_{j}\right)\right)-\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left(P\left(b_{j}\right)\right)\\&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {1}{P\left(a_{i}\mid b_{j}\right)}}\right)+\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {1}{P\left(b_{j}\right)}}\right)\\&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {P\left(b_{j}\right)}{P\left(a_{i},b_{j}\right)}}\right)+\sum _{j=1}^{m}\left(\sum _{i=1}^{n}P\left(a_{i},b_{j}\right)\right)\cdot \log _{2}\left({\frac {1}{P\left(b_{j}\right)}}\right)\\&=H\left(A\mid B\right)+\sum _{j=1}^{m}P\left(b_{j}\right)\cdot \log _{2}\left({\frac {1}{P\left(b_{j}\right)}}\right)\\&=H\left(A\mid B\right)+H\left(B\right)\end{aligned}}

(16)

If we instead used $P\left(a_{i},b_{j}\right)=P\left(b_{j}\mid a_{i}\right)P\left(a_{i}\right)$ , we would get the alternative expression:

H\left(A,B\right)=H\left(B\mid A\right)+H\left(A\right)

(17)

We can then expand our expression for $I\left(A;B\right)$ as:

{\begin{aligned}I\left(A;B\right)&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {P\left(a_{i},b_{j}\right)}{P\left(a_{i}\right)\cdot P\left(b_{j}\right)}}\right)\\&=\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \left(\log _{2}\left(P\left(a_{i},b_{j}\right)\right)-\log _{2}\left(P\left(a_{i}\right)\right)-\log _{2}\left(P\left(b_{j}\right)\right)\right)\\&=-\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {1}{P\left(a_{i},b_{j}\right)}}\right)+\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {1}{P\left(a_{i}\right)}}\right)+\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {1}{P\left(b_{j}\right)}}\right)\\&=-\sum _{i=1}^{n}\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\cdot \log _{2}\left({\frac {1}{P\left(a_{i},b_{j}\right)}}\right)+\sum _{i=1}^{n}\left(\sum _{j=1}^{m}P\left(a_{i},b_{j}\right)\right)\cdot \log _{2}\left({\frac {1}{P\left(a_{i}\right)}}\right)+\sum _{j=1}^{m}\left(\sum _{i=1}^{n}P\left(a_{i},b_{j}\right)\right)\cdot \log _{2}\left({\frac {1}{P\left(b_{j}\right)}}\right)\\&=-H\left(A,B\right)+H\left(A\right)+H\left(B\right)\\&=H\left(A\right)-H\left(A\mid B\right)\\&=H\left(B\right)-H\left(B\mid A\right)\\\end{aligned}}

(18)

Sources

Tom Carter's notes on Information Theory
Dan Hirschberg's notes on Data Compression

@@ Line 1: / Line 1: @@
 In general, the channel is itself can add noise. This means that the channel itself serves as an additional layer of uncertainty to our transmissions. Consider a channel with input symbols <math>A=\{a_1, a_2, \ldots, a_n\}</math>, and output symbols <math>B=\{b_1, b_2, \ldots, b_m\}</math>. Note that the input and output alphabets do not need to have the same number of symbols. Given the noise in the channel, if we observe the output symbol <math>b_j</math>, we are not sure which <math>a_i</math> was the input symbol. We can then characterize the channel as a set of probabilities <math>\{P\left(a_i\mid b_j\right)\}</math>. Let us consider the information we get from observing a symbol <math>b_j</math>.
-==== Definition ====
+== Definition ==
 Given a probability model of the source, we have an ''a priori'' estimate <math>P\left(a_i\right)</math> that symbol <math>a_i</math> will be sent next. Upon observing <math>b_j</math>, we can revise our estimate to <math>P\left(a_i\mid b_j\right)</math>. The change in information, or ''mutual information'', is given by:
-{{NumBlk|::|<math>I\left(a_i ; b_j\right)=\log_2\left(\frac{1}{P\left(a_i\right)}\right)-\log_2\left(\frac{1}{P\left(a_i \mid b_j\right)}\right)=\log_2\left(\frac{P\left(a_i \mid b_j\right)}{P\left(a_i\right)}\right)</math>|{{EquationRef|13}}}}
+{{NumBlk|::|<math>I\left(a_i ; b_j\right)=\log_2\left(\frac{1}{P\left(a_i\right)}\right)-\log_2\left(\frac{1}{P\left(a_i \mid b_j\right)}\right)=\log_2\left(\frac{P\left(a_i \mid b_j\right)}{P\left(a_i\right)}\right)</math>|{{EquationRef|1}}}}
 Let's look at a few properties of mutual information. Expressing the equation above in terms of <math>I\left(a_i\right)</math>:
-{{NumBlk|::|<math>I\left(a_i ; b_j\right)=I\left(a_i\right) + \log_2\left(P\left(a_i \mid b_j\right)\right)</math>|{{EquationRef|14}}}}
+{{NumBlk|::|<math>I\left(a_i ; b_j\right)=I\left(a_i\right) + \log_2\left(P\left(a_i \mid b_j\right)\right)</math>|{{EquationRef|2}}}}
 Thus, we can say:
-{{NumBlk|::|<math>I\left(a_i ; b_j\right)\leq I\left(a_i\right)</math>|{{EquationRef|15}}}}
+{{NumBlk|::|<math>I\left(a_i ; b_j\right)\leq I\left(a_i\right)</math>|{{EquationRef|3}}}}
 This is expected since, after observing <math>b_j</math>, the amount of uncertainty is reduced, i.e. we know a bit more about <math>a_i</math>, and the most change in information we can get is when <math>a_i</math> and <math>b_j</math> are perfectly correlated, with <math>I\left(a_i ; b_j\right)= I\left(a_i\right)</math>. From Bayes' Theorem, we have the property:
-{{NumBlk|::|<math>I\left(a_i ; b_j\right)=\log_2\left(\frac{P\left(a_i \mid b_j\right)}{P\left(a_i\right)}\right)=\log_2\left(\frac{P\left(b_j \mid a_i\right)}{P\left(b_j\right)}\right)=I\left(b_j ; a_i\right)</math>|{{EquationRef|16}}}}
+{{NumBlk|::|<math>I\left(a_i ; b_j\right)=\log_2\left(\frac{P\left(a_i \mid b_j\right)}{P\left(a_i\right)}\right)=\log_2\left(\frac{P\left(b_j \mid a_i\right)}{P\left(b_j\right)}\right)=I\left(b_j ; a_i\right)</math>|{{EquationRef|4}}}}
 Note that if <math>a_i</math> and <math>b_j</math> are independent, where <math>P\left(a_i\mid b_j\right) = P\left(a_i\right)</math> and <math>P\left(b_j\mid a_i\right) = P\left(b_j\right)</math>, then:
-{{NumBlk|::|<math>I\left(a_i ; b_j\right)=\log_2\left(\frac{P\left(a_i \mid b_j\right)}{P\left(a_i\right)}\right) = \log_2\left(\frac{P\left(b_j \mid a_i\right)}{P\left(b_j\right)}\right) = \log_2\left(1\right)= 0</math>|{{EquationRef|17}}}}
+{{NumBlk|::|<math>I\left(a_i ; b_j\right)=\log_2\left(\frac{P\left(a_i \mid b_j\right)}{P\left(a_i\right)}\right) = \log_2\left(\frac{P\left(b_j \mid a_i\right)}{P\left(b_j\right)}\right) = \log_2\left(1\right)= 0</math>|{{EquationRef|5}}}}
 We can get the average mutual information over all the input symbols as:
-{{NumBlk|::|<math>I\left(A ; b_j\right)= \sum_{i=1}^n P\left(a_i\mid b_j\right)\cdot I\left(a_i;b_j\right)=\sum_{i=1}^n P\left(a_i\mid b_j\right)\cdot \log_2\left(\frac{P\left(a_i\mid b_j\right)}{P\left(a_i\right)}\right)</math>|{{EquationRef|18}}}}
+{{NumBlk|::|<math>I\left(A ; b_j\right)= \sum_{i=1}^n P\left(a_i\mid b_j\right)\cdot I\left(a_i;b_j\right)=\sum_{i=1}^n P\left(a_i\mid b_j\right)\cdot \log_2\left(\frac{P\left(a_i\mid b_j\right)}{P\left(a_i\right)}\right)</math>|{{EquationRef|6}}}}
 Similarly, for all the output symbols:
-{{NumBlk|::|<math>I\left(a_i ; B\right)= \sum_{j=1}^m P\left(b_j\mid a_i\right)\cdot  \log_2\left(\frac{P\left(b_j\mid a_i\right)}{P\left(b_j\right)}\right)</math>|{{EquationRef|19}}}}
+{{NumBlk|::|<math>I\left(a_i ; B\right)= \sum_{j=1}^m P\left(b_j\mid a_i\right)\cdot  \log_2\left(\frac{P\left(b_j\mid a_i\right)}{P\left(b_j\right)}\right)</math>|{{EquationRef|7}}}}
 For both input and output symbols, we get:
@@ Line 36: / Line 36: @@
 & = \sum_{i=1}^n \sum_{j=1}^m P\left(a_i, b_j\right)\cdot \log_2\left(\frac{P\left( a_i, b_j\right)}{P\left(a_i\right)\cdot P\left(b_j\right)}\right) \\
 & = I\left(B ; A\right)
-\end{align}</math>|{{EquationRef|20}}}}
+\end{align}</math>|{{EquationRef|8}}}}
-==== Non-Negativity of Mutual Information ====
+== Non-Negativity of Mutual Information ==
 To show the non-negativity of mutual information, let us use ''Jensen's Inequality'', which states that for a convex function, <math>f\left(x\right)</math>:
-{{NumBlk|::|<math>\langle f\left(x\right)\rangle \ge f\left(\langle x\rangle\right)</math>|{{EquationRef|21}}}}
+{{NumBlk|::|<math>\langle f\left(x\right)\rangle \ge f\left(\langle x\rangle\right)</math>|{{EquationRef|9}}}}
 Using the fact that <math>f\left(x\right)=-\log_2\left( x\right)</math> is convex, and applying this to our expression for mutual information, we get:
@@ Line 54: / Line 54: @@
 = -\log_2\left(\sum_{i=1}^n P\left(a_i\right) \sum_{j=1}^m P\left(b_j\right)\right) = -\log_2\left(1\right) \\
 & \ge 0\\
-\end{align}</math>|{{EquationRef|22}}}}
+\end{align}</math>|{{EquationRef|10}}}}
 Note that <math>I\left(A ; B\right) =0</math> when <math>A</math> and <math>B</math> are independent.
-==== Conditional and Joint Entropy ====
+== Conditional and Joint Entropy ==
 Given <math>A</math> and <math>B</math>, and their entropies:
-{{NumBlk|::|<math>H\left(A\right)=\sum_{i=1}^n P\left(a_i\right)\cdot\log_2\left(\frac{1}{P\left(a_i\right)}\right)</math>|{{EquationRef|23}}}}
+{{NumBlk|::|<math>H\left(A\right)=\sum_{i=1}^n P\left(a_i\right)\cdot\log_2\left(\frac{1}{P\left(a_i\right)}\right)</math>|{{EquationRef|11}}}}
-{{NumBlk|::|<math>H\left(B\right)=\sum_{j=1}^m P\left(b_j\right)\cdot\log_2\left(\frac{1}{P\left(b_j\right)}\right)</math>|{{EquationRef|24}}}}
+{{NumBlk|::|<math>H\left(B\right)=\sum_{j=1}^m P\left(b_j\right)\cdot\log_2\left(\frac{1}{P\left(b_j\right)}\right)</math>|{{EquationRef|12}}}}
+=== Conditional Entropy ===
 The '''conditional entropy''' is a measure of the average uncertainty about <math>B</math> when <math>A</math> is known, and we can define it as:
@@ Line 69: / Line 70: @@
 & =\sum_{i=1}^n \sum_{j=1}^m P\left(a_i\right) P\left(b_j\mid a_i\right)\cdot\log_2\left(\frac{1}{P\left(b_j\mid a_i\right)}\right) \\
 & =\sum_{i=1}^n \sum_{j=1}^m P\left(b_j, a_i\right)\cdot\log_2\left(\frac{P\left(a_i\right)}{P\left(b_j, a_i\right)}\right)\\
-\end{align}</math>|{{EquationRef|25}}}}
+\end{align}</math>|{{EquationRef|13}}}}
 And similarly,
@@ Line 76: / Line 77: @@
 & =\sum_{i=1}^n \sum_{j=1}^m P\left(b_j, a_i\right)\cdot\log_2\left(\frac{P\left(b_j\right)}{P\left(b_j, a_i\right)}\right)\\
 & \neq  H\left(B\mid A\right) \\
-\end{align}</math>|{{EquationRef|26}}}}
+\end{align}</math>|{{EquationRef|14}}}}
+=== Joint Entropy ===
 If we extend the definition of entropy to two (or more) random variables, <math>A</math> and <math>B</math>, we can define the '''joint entropy''' of <math>A</math> and <math>B</math> as:
-{{NumBlk|::|<math>H\left(A, B\right)=\sum_{i=1}^n \sum_{j=1}^m P\left(a_i, b_j\right)\cdot\log_2\left(\frac{1}{P\left(a_i, b_j\right)}\right)</math>|{{EquationRef|27}}}}
+{{NumBlk|::|<math>H\left(A, B\right)=\sum_{i=1}^n \sum_{j=1}^m P\left(a_i, b_j\right)\cdot\log_2\left(\frac{1}{P\left(a_i, b_j\right)}\right)</math>|{{EquationRef|15}}}}
 Expanding expression for joint entropy, and using <math>P\left(a_i, b_j\right) = P\left(a_i\mid b_j\right)P\left(b_j\right)</math> we get:
@@ Line 93: / Line 95: @@
 & = H\left(A\mid B\right) + \sum_{j=1}^m P\left(b_j\right)\cdot \log_2\left(\frac{1}{P\left(b_j\right)}\right)\\
 & = H\left(A\mid B\right) + H\left(B\right)
-\end{align}</math>|{{EquationRef|28}}}}
+\end{align}</math>|{{EquationRef|16}}}}
 If we instead used <math>P\left(a_i, b_j\right) = P\left(b_j\mid a_i\right)P\left(a_i\right)</math>, we would get the alternative expression:
-{{NumBlk|::|<math>H\left(A, B\right)=H\left(B\mid A\right) + H\left(A\right)</math>|{{EquationRef|29}}}}
+{{NumBlk|::|<math>H\left(A, B\right)=H\left(B\mid A\right) + H\left(A\right)</math>|{{EquationRef|17}}}}
 We can then expand our expression for <math>I\left(A;B\right)</math> as:
@@ Line 109: / Line 111: @@
 & = H\left(A\right) - H\left(A\mid B\right)\\
 & = H\left(B\right) - H\left(B\mid A\right)\\
-\end{align}</math>|{{EquationRef|30}}}}
+\end{align}</math>|{{EquationRef|18}}}}
 == Sources ==

Difference between revisions of "Mutual Information"

Revision as of 16:01, 17 September 2020

Contents

Definition

Non-Negativity of Mutual Information

Conditional and Joint Entropy

Conditional Entropy

Joint Entropy

Sources

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools