In general, the channel is itself can add noise. This means that the channel itself serves as an additional layer of uncertainty to our transmissions. Consider a channel with input symbols
, and output symbols
. Note that the input and output alphabets do not need to have the same number of symbols. Given the noise in the channel, if we observe the output symbol
, we are not sure which
was the input symbol. We can then characterize the channel as a set of probabilities
. Let us consider the information we get from observing a symbol
.
Definition
Given a probability model of the source, we have an a priori estimate
that symbol
will be sent next. Upon observing
, we can revise our estimate to
. The change in information, or mutual information, is given by:
-

|
|
(13)
|
Let's look at a few properties of mutual information. Expressing the equation above in terms of
:
-

|
|
(14)
|
Thus, we can say:
-

|
|
(15)
|
This is expected since, after observing
, the amount of uncertainty is reduced, i.e. we know a bit more about
, and the most change in information we can get is when
and
are perfectly correlated, with
. From Bayes' Theorem, we have the property:
-

|
|
(16)
|
Note that if
and
are independent, where
and
, then:
-

|
|
(17)
|
We can get the average mutual information over all the input symbols as:
-

|
|
(18)
|
Similarly, for all the output symbols:
-

|
|
(19)
|
For both input and output symbols, we get:
-

|
|
(20)
|
Non-Negativity of Mutual Information
To show the non-negativity of mutual information, let us use Jensen's Inequality, which states that for a convex function,
:
-

|
|
(21)
|
Using the fact that
is convex, and applying this to our expression for mutual information, we get:
-

|
|
(22)
|
Note that
when
and
are independent.
Conditional and Joint Entropy
Given
and
, and their entropies:
-

|
|
(23)
|
-

|
|
(24)
|
The conditional entropy is a measure of the average uncertainty about
when
is known, and we can define it as:
-

|
|
(25)
|
And similarly,
-

|
|
(26)
|
If we extend the definition of entropy to two (or more) random variables,
and
, we can define the joint entropy of
and
as:
-

|
|
(27)
|
Expanding expression for joint entropy, and using
we get:
-

|
|
(28)
|
If we instead used
, we would get the alternative expression:
-

|
|
(29)
|
We can then expand our expression for
as:
-

|
|
(30)
|
Sources
- Tom Carter's notes on Information Theory
- Dan Hirschberg's notes on Data Compression
References