Entropy, Relative Entropy, Mutual Information
Welcome to CoE 161 / CoE 197!
Since we are offering this class remotely, there will be many changes to our normal course delivery:
- There will be no face-to-face lecture classes. All the material will be made available via this site.
- There will be more emphasis on student-centric activities, e.g. analysis, design, and simulations. Thus, you will be mostly "learning by doing". In this context, we will set aside an hour every week for consultations and questions via video-conferencing.
- Grades will be based on the submitted deliverables from the activities. Though we will not be very strict regarding the deadlines, it is a good idea to keep up with the class schedule and avoid cramming later in the semester.
Let's get started!
Contents
Definitions
Entropy
- a measure of the uncertainty of a random variable
- The entropy of a random variable is a measure of the uncertainty of the random variable
- it is a measure of the amount of information required on the average to describe the random variable
Relative Entropy
- a measure of the distance between two distributions
- a measure of the inefficiency of assuming that the distribution is when the true distribution is .
Mutual Information
- a measure of the amount of information that one random variable contains about another random variable
Entropy
Definitions:
- Shannon entropy
- a measure of the uncertainty of a random variable
- The entropy of a random variable is a measure of the uncertainty of the random variable
- it is a measure of the amount of information required on the average to describe the random variable
Desired Properties[1]
- Uniform distributions have maximum uncertainty.
- Uncertainty is additive for independent events.
- Adding an outcome with zero probability has no effect.
- The measure of uncertainty is continuous in all its arguments.
- Uniform distributions with more outcomes have more uncertainty.
- Events have non-negative uncertainty.
- Events with a certain outcome have zero uncertainty.
- Flipping the arguments has no effect.
Formulation
The entropy of a discrete random variable, , is
-
(1)
where has a probability mass function (pmf), , and an alphabet .
Expected Value
For a discrete random variable, , with probability mass function, , the expected value of is
-
(2)
For a discrete random variable, , with probability mass function, , the expected value of is
-
(3)
Consider the case where . We get
-
(4)
Lemma 1: Entropy is greater than or equal to zero
-
(5)
Proof: Since , then , and subsequently, . Thus from Eq. (4) we get .
Lemma 2: Changing the logarithm base
-
(6)
Proof:
- Given that
- And since
- We get
Note that the entropy, , has units of bits for , or nats (natural units) for , or dits (decimal digits) for .
Joint Entropy
Definition:
- a measure of the uncertainty associated with a set of variables
The joint entropy of a pair of discrete random variables with joint pmf is defined as
-
(7)