Difference between revisions of "Entropy, Relative Entropy, Mutual Information"

Latest revision as of 09:39, 9 September 2020

Definitions

Entropy

a measure of the uncertainty of a random variable
The entropy of a random variable is a measure of the uncertainty of the random variable
- it is a measure of the amount of information required on the average to describe the random variable

Relative Entropy

a measure of the distance between two distributions
a measure of the inefficiency of assuming that the distribution is $q$ when the true distribution is $p$ .

Mutual Information

a measure of the amount of information that one random variable contains about another random variable

Entropy

Definitions:

Shannon entropy
a measure of the uncertainty of a random variable
The entropy of a random variable is a measure of the uncertainty of the random variable
- it is a measure of the amount of information required on the average to describe the random variable

Desired Properties^[1]

Uniform distributions have maximum uncertainty.
Uncertainty is additive for independent events.
Adding an outcome with zero probability has no effect.
The measure of uncertainty is continuous in all its arguments.
Uniform distributions with more outcomes have more uncertainty.
Events have non-negative uncertainty.
Events with a certain outcome have zero uncertainty.
Flipping the arguments has no effect.

Formulation

The entropy of a discrete random variable, $X$ , is

H\left(X\right)=-\sum _{x\in {\mathcal {X}}}p\left(x\right)\cdot \log _{2}p\left(x\right)

(1)

where $X$ has a probability mass function (pmf), $p\left(x\right)$ , and an alphabet ${\mathcal {X}}$ .

Expected Value

For a discrete random variable, $X$ , with probability mass function, $p\left(x\right)$ , the expected value of $X$ is

E\left[X\right]=\sum _{x\in {\mathcal {X}}}x\cdot p\left(x\right)

(2)

For a discrete random variable, $X$ , with probability mass function, $p\left(x\right)$ , the expected value of $g\left(X\right)$ is

E\left[g\left(X\right)\right]=\sum _{x\in {\mathcal {X}}}g\left(x\right)\cdot p\left(x\right)

(3)

Consider the case where $g\left(x\right)=\log _{2}\left({\tfrac {1}{p\left(x\right)}}\right)$ . We get

E\left[\log _{2}\left({\tfrac {1}{p\left(x\right)}}\right)\right]=\sum _{x\in {\mathcal {X}}}\log _{2}\left({\tfrac {1}{p\left(x\right)}}\right)\cdot p\left(x\right)=-\sum _{x\in {\mathcal {X}}}p\left(x\right)\cdot \log _{2}p\left(x\right)=H\left(X\right)

(4)

Lemma 1: Entropy is greater than or equal to zero

H\left(X\right)\geq 0

(5)

Proof: Since $0\leq p\left(x\right)\leq 1$ , then $\log _{2}\left({\tfrac {1}{p\left(x\right)}}\right)\geq 0$ , and subsequently, $E\left[\log _{2}\left({\tfrac {1}{p\left(x\right)}}\right)\right]\geq 0$ . Thus from Eq. (4) we get $H\left(X\right)\geq 0$ .

Lemma 2: Changing the logarithm base

H_{b}\left(X\right)=\left(\log _{b}a\right)\cdot H_{a}\left(X\right)

(6)

Proof:

Given that
- $H_{b}\left(X\right)=-\sum _{x\in {\mathcal {X}}}p\left(x\right)\cdot \log _{b}p\left(x\right)$
- $H_{a}\left(X\right)=-\sum _{x\in {\mathcal {X}}}p\left(x\right)\cdot \log _{a}p\left(x\right)$
And since $\log _{b}p=\log _{b}a\cdot log_{a}p$
We get $H_{b}\left(X\right)=\left(\log _{b}a\right)\cdot H_{a}\left(X\right)$

Note that the entropy, $H_{b}\left(X\right)$ , has units of bits for $b=2$ , or nats (natural units) for $b=e$ , or dits (decimal digits) for $b=10$ .

Joint Entropy

Definition:

a measure of the uncertainty associated with a set of variables

The joint entropy of a pair of discrete random variables $\left(X,Y\right)$ with joint pmf $p\left(x,y\right)$ is defined as

H\left(X,Y\right)=-\sum _{x\in {\mathcal {X}}}\sum _{y\in {\mathcal {Y}}}p\left(x,y\right)\cdot \log _{2}p\left(x,y\right)

(7)

References

↑ [1]

[1] [1]

[1]

@@ Line 14: / Line 14: @@
 == Entropy ==
-The entropy of a discrete random variable, <math>X</math>, is
+Definitions:
+* Shannon entropy
+* a measure of the uncertainty of a random variable
+* The entropy of a random variable is a measure of the uncertainty of the random variable
+** it is a measure of the amount of information required on the average to describe the random variable
-{{NumBlk|:|<math>H\left(X\right)=-\sum_{x\in \mathcal{X}} p\left(x\right) \log_2 p\left(x\right)</math>|{{EquationRef|1}}}}
+=== Desired Properties<ref>[https://towardsdatascience.com/entropy-is-a-measure-of-uncertainty-e2c000301c2c?gi=624b56e1fe17]</ref> ===
+# Uniform distributions have maximum uncertainty.
+# Uncertainty is additive for independent events.
+# Adding an outcome with zero probability has no effect.
+# The measure of uncertainty is continuous in all its arguments.
+# Uniform distributions with more outcomes have more uncertainty.
+# Events have non-negative uncertainty.
+# Events with a certain outcome have zero uncertainty.
+# Flipping the arguments has no effect.
+=== Formulation ===
+The ''entropy'' of a discrete random variable, <math>X</math>, is
+{{NumBlk|:|<math>H\left(X\right)=-\sum_{x\in \mathcal{X}} p\left(x\right) \cdot\log_2 p\left(x\right)</math>|{{EquationRef|1}}}}
 where <math>X</math> has a probability mass function (pmf), <math>p\left(x\right)</math>, and an alphabet <math>\mathcal{X}</math>.
 === Expected Value ===
-Given a random variable, <math>X</math> with probability mass function <math>p\left(x\right)</math>, the expected value of <math>g\left(X\right)</math> is
+For a discrete random variable, <math>X</math>, with probability mass function, <math>p\left(x\right)</math>, the expected value of <math>X</math> is
+{{NumBlk|:|<math>E\left[X\right]=\sum_{x\in \mathcal{X}} x\cdot p\left(x\right)</math>|{{EquationRef|2}}}}
+For a discrete random variable, <math>X</math>, with probability mass function, <math>p\left(x\right)</math>, the expected value of <math>g\left(X\right)</math> is
+{{NumBlk|:|<math>E\left[g\left(X\right)\right]=\sum_{x\in \mathcal{X}} g\left(x\right)\cdot p\left(x\right)</math>|{{EquationRef|3}}}}
+Consider the case where <math>g\left(x\right)=\log_2\left(\tfrac{1}{p\left(x\right)}\right)</math>. We get
+{{NumBlk|:|<math>E\left[\log_2\left(\tfrac{1}{p\left(x\right)}\right)\right]=\sum_{x\in \mathcal{X}} \log_2\left(\tfrac{1}{p\left(x\right)}\right) \cdot p\left(x\right)=-\sum_{x\in \mathcal{X}} p\left(x\right) \cdot \log_2 p\left(x\right)=H\left(X\right)</math>|{{EquationRef|4}}}}
+=== Lemma 1: Entropy is greater than or equal to zero ===
+{{NumBlk|:|<math>H\left(X\right)\ge 0</math>|{{EquationRef|5}}}}
+'''Proof''': Since <math>0 \le p\left(x\right) \le 1</math>, then <math>\log_2\left(\tfrac{1}{p\left(x\right)}\right) \ge 0</math>, and subsequently, <math>E\left[\log_2\left(\tfrac{1}{p\left(x\right)}\right)\right] \ge 0</math>. Thus from Eq. ({{EquationNote|4}}) we get <math>H\left(X\right)\ge 0</math>.
+=== Lemma 2: Changing the logarithm base ===
+{{NumBlk|:|<math>H_b\left(X\right)=\left(\log_b a\right)\cdot H_a\left(X\right)</math>|{{EquationRef|6}}}}
+'''Proof''':
+* Given that
+** <math>H_b\left(X\right)=-\sum_{x\in \mathcal{X}} p\left(x\right)\cdot \log_b p\left(x\right)</math>
+** <math>H_a\left(X\right)=-\sum_{x\in \mathcal{X}} p\left(x\right)\cdot \log_a p\left(x\right)</math>
+* And since <math>\log_b p = \log_b a \cdot log_a p</math>
+* We get <math>H_b\left(X\right)=\left(\log_b a\right)\cdot H_a\left(X\right)</math>
+Note that the entropy, <math>H_b\left(X\right)</math>, has units of ''bits'' for <math>b=2</math>, or ''nats'' (natural units) for <math>b=e</math>, or ''dits'' (decimal digits) for <math>b=10</math>.
+== Joint Entropy ==
+Definition:
+* a measure of the uncertainty associated with a set of variables
+The ''joint entropy'' of a pair of discrete random variables <math>\left(X, Y\right)</math> with joint pmf <math>p\left(x, y\right)</math> is defined as
+{{NumBlk|:|<math>H\left(X, Y\right)=-\sum_{x \in \mathcal{X}} \sum_{y \in \mathcal{Y}} p\left(x, y\right)\cdot \log_2 p\left(x, y\right)</math>|{{EquationRef|7}}}}
+== References ==
+<references />

Difference between revisions of "Entropy, Relative Entropy, Mutual Information"

Latest revision as of 09:39, 9 September 2020

Contents

Definitions

Entropy

Relative Entropy

Mutual Information

Entropy

Desired Properties^[1]

Formulation

Expected Value

Lemma 1: Entropy is greater than or equal to zero

Lemma 2: Changing the logarithm base

Joint Entropy

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Difference between revisions of "Entropy, Relative Entropy, Mutual Information"

Latest revision as of 09:39, 9 September 2020

Contents

Definitions

Entropy

Relative Entropy

Mutual Information

Entropy

Desired Properties[1]

Formulation

Expected Value

Lemma 1: Entropy is greater than or equal to zero

Lemma 2: Changing the logarithm base

Joint Entropy

References

Navigation menu

Search

Desired Properties^[1]