Difference between revisions of "Entropy, Relative Entropy, Mutual Information"
Jump to navigation
Jump to search
(28 intermediate revisions by the same user not shown) | |||
Line 14: | Line 14: | ||
== Entropy == | == Entropy == | ||
− | The entropy of a | + | Definitions: |
+ | * Shannon entropy | ||
+ | * a measure of the uncertainty of a random variable | ||
+ | * The entropy of a random variable is a measure of the uncertainty of the random variable | ||
+ | ** it is a measure of the amount of information required on the average to describe the random variable | ||
− | {{NumBlk|:|<math>H\left(X\right)=-\sum_{x\in \mathcal{X}} p\left(x\right) \log_2 p\left(x\right)</math>|{{EquationRef|1}}}} | + | === Desired Properties<ref>[https://towardsdatascience.com/entropy-is-a-measure-of-uncertainty-e2c000301c2c?gi=624b56e1fe17]</ref> === |
+ | # Uniform distributions have maximum uncertainty. | ||
+ | # Uncertainty is additive for independent events. | ||
+ | # Adding an outcome with zero probability has no effect. | ||
+ | # The measure of uncertainty is continuous in all its arguments. | ||
+ | # Uniform distributions with more outcomes have more uncertainty. | ||
+ | # Events have non-negative uncertainty. | ||
+ | # Events with a certain outcome have zero uncertainty. | ||
+ | # Flipping the arguments has no effect. | ||
+ | |||
+ | === Formulation === | ||
+ | The ''entropy'' of a discrete random variable, <math>X</math>, is | ||
+ | |||
+ | {{NumBlk|:|<math>H\left(X\right)=-\sum_{x\in \mathcal{X}} p\left(x\right) \cdot\log_2 p\left(x\right)</math>|{{EquationRef|1}}}} | ||
where <math>X</math> has a probability mass function (pmf), <math>p\left(x\right)</math>, and an alphabet <math>\mathcal{X}</math>. | where <math>X</math> has a probability mass function (pmf), <math>p\left(x\right)</math>, and an alphabet <math>\mathcal{X}</math>. | ||
Line 31: | Line 48: | ||
Consider the case where <math>g\left(x\right)=\log_2\left(\tfrac{1}{p\left(x\right)}\right)</math>. We get | Consider the case where <math>g\left(x\right)=\log_2\left(\tfrac{1}{p\left(x\right)}\right)</math>. We get | ||
− | {{NumBlk|:|<math>E\left[\log_2\left(\tfrac{1}{p\left(x\right)}\right)\right]=\sum_{x\in \mathcal{X}} \log_2\left(\tfrac{1}{p\left(x\right)}\right) \cdot p\left(x\right)=-\sum_{x\in \mathcal{X}} p\left(x\right) \log_2 p\left(x\right)=H\left(X\right)</math>|{{EquationRef|4}}}} | + | {{NumBlk|:|<math>E\left[\log_2\left(\tfrac{1}{p\left(x\right)}\right)\right]=\sum_{x\in \mathcal{X}} \log_2\left(\tfrac{1}{p\left(x\right)}\right) \cdot p\left(x\right)=-\sum_{x\in \mathcal{X}} p\left(x\right) \cdot \log_2 p\left(x\right)=H\left(X\right)</math>|{{EquationRef|4}}}} |
+ | |||
+ | === Lemma 1: Entropy is greater than or equal to zero === | ||
+ | {{NumBlk|:|<math>H\left(X\right)\ge 0</math>|{{EquationRef|5}}}} | ||
+ | |||
+ | '''Proof''': Since <math>0 \le p\left(x\right) \le 1</math>, then <math>\log_2\left(\tfrac{1}{p\left(x\right)}\right) \ge 0</math>, and subsequently, <math>E\left[\log_2\left(\tfrac{1}{p\left(x\right)}\right)\right] \ge 0</math>. Thus from Eq. ({{EquationNote|4}}) we get <math>H\left(X\right)\ge 0</math>. | ||
+ | |||
+ | === Lemma 2: Changing the logarithm base === | ||
+ | {{NumBlk|:|<math>H_b\left(X\right)=\left(\log_b a\right)\cdot H_a\left(X\right)</math>|{{EquationRef|6}}}} | ||
+ | |||
+ | '''Proof''': | ||
+ | * Given that | ||
+ | ** <math>H_b\left(X\right)=-\sum_{x\in \mathcal{X}} p\left(x\right)\cdot \log_b p\left(x\right)</math> | ||
+ | ** <math>H_a\left(X\right)=-\sum_{x\in \mathcal{X}} p\left(x\right)\cdot \log_a p\left(x\right)</math> | ||
+ | * And since <math>\log_b p = \log_b a \cdot log_a p</math> | ||
+ | * We get <math>H_b\left(X\right)=\left(\log_b a\right)\cdot H_a\left(X\right)</math> | ||
+ | |||
+ | Note that the entropy, <math>H_b\left(X\right)</math>, has units of ''bits'' for <math>b=2</math>, or ''nats'' (natural units) for <math>b=e</math>, or ''dits'' (decimal digits) for <math>b=10</math>. | ||
+ | |||
+ | == Joint Entropy == | ||
+ | Definition: | ||
+ | * a measure of the uncertainty associated with a set of variables | ||
+ | |||
+ | The ''joint entropy'' of a pair of discrete random variables <math>\left(X, Y\right)</math> with joint pmf <math>p\left(x, y\right)</math> is defined as | ||
+ | |||
+ | {{NumBlk|:|<math>H\left(X, Y\right)=-\sum_{x \in \mathcal{X}} \sum_{y \in \mathcal{Y}} p\left(x, y\right)\cdot \log_2 p\left(x, y\right)</math>|{{EquationRef|7}}}} | ||
+ | |||
+ | |||
+ | == References == | ||
+ | <references /> |
Latest revision as of 09:39, 9 September 2020
Contents
Definitions
Entropy
- a measure of the uncertainty of a random variable
- The entropy of a random variable is a measure of the uncertainty of the random variable
- it is a measure of the amount of information required on the average to describe the random variable
Relative Entropy
- a measure of the distance between two distributions
- a measure of the inefficiency of assuming that the distribution is when the true distribution is .
Mutual Information
- a measure of the amount of information that one random variable contains about another random variable
Entropy
Definitions:
- Shannon entropy
- a measure of the uncertainty of a random variable
- The entropy of a random variable is a measure of the uncertainty of the random variable
- it is a measure of the amount of information required on the average to describe the random variable
Desired Properties[1]
- Uniform distributions have maximum uncertainty.
- Uncertainty is additive for independent events.
- Adding an outcome with zero probability has no effect.
- The measure of uncertainty is continuous in all its arguments.
- Uniform distributions with more outcomes have more uncertainty.
- Events have non-negative uncertainty.
- Events with a certain outcome have zero uncertainty.
- Flipping the arguments has no effect.
Formulation
The entropy of a discrete random variable, , is
-
(1)
where has a probability mass function (pmf), , and an alphabet .
Expected Value
For a discrete random variable, , with probability mass function, , the expected value of is
-
(2)
For a discrete random variable, , with probability mass function, , the expected value of is
-
(3)
Consider the case where . We get
-
(4)
Lemma 1: Entropy is greater than or equal to zero
-
(5)
Proof: Since , then , and subsequently, . Thus from Eq. (4) we get .
Lemma 2: Changing the logarithm base
-
(6)
Proof:
- Given that
- And since
- We get
Note that the entropy, , has units of bits for , or nats (natural units) for , or dits (decimal digits) for .
Joint Entropy
Definition:
- a measure of the uncertainty associated with a set of variables
The joint entropy of a pair of discrete random variables with joint pmf is defined as
-
(7)