The Data Processing Inequality

Markovity

A Markov Chain is a random process that describes a sequence of possible events where the probability of each event depends only on the outcome of the previous event. Thus, we say that $X,Y,Z$ is a Markov chain in this order, denoted as:

X\rightarrow Y\rightarrow Z

(1)

If we can write:

P\left(X=x,Y=y,Z=z\right)=P\left(Z=z\mid Y=y\right)\cdot P\left(Y=y\mid X=x\right)\cdot P\left(X=x\right)

(2)

Or in a more compact form:

P\left(x,y,z\right)=P\left(z\mid y\right)\cdot P\left(y\mid x\right)\cdot P\left(x\right)

(3)

We can use Markov chains to model how a signal is corrupted when passed through noisy channels. For example, if $X$ is a binary signal, it can change with a certain probability, $p$ , to $Y$ , and it can again be corrupted to produce $Z$ .

Consider the joint probability $P\left(x,z\mid y\right)$ . We can express this as:

P\left(x,z\mid y\right)={\frac {P\left(x,y,z\right)}{P\left(y\right)}}

(4)

And if $X\rightarrow Y\rightarrow Z$ , we get:

P\left(x,z\mid y\right)={\frac {P\left(z\mid y\right)\cdot P\left(y\mid x\right)\cdot P\left(x\right)}{P\left(y\right)}}

(5)

Since $P\left(y,x\right)=P\left(y\mid x\right)\cdot P\left(x\right)=P\left(x\mid y\right)\cdot P\left(y\right)$ , we can write:

P\left(x,z\mid y\right)={\frac {P\left(z\mid y\right)\cdot P\left(y,x\right)}{P\left(y\right)}}=P\left(z\mid y\right)\cdot P\left(x\mid y\right)

(6)

Thus, we can say that $X$ and $Z$ are conditionally independent given $Y$ . If we think of $X$ as some past event, and $Z$ as some future event, then the past and future events are independent if we know the present event $Y$ . Note that this property is good definition of, as well as a useful tool for checking Markovity.

We can rewrite the joint probability $P\left(x,y,z\right)$ as:

{\begin{aligned}P\left(x,y,z\right)&=P\left(z\mid y\right)\cdot P\left(y\mid x\right)\cdot P\left(x\right)\\&={\frac {P\left(z,y\right)}{P\left(y\right)}}\cdot P\left(y,x\right)\\&={\frac {P\left(z,y\right)}{P\left(y\right)}}\cdot P\left(x\mid y\right)\cdot P\left(y\right)\\&=P\left(z,y\right)\cdot P\left(x\mid y\right)\\&=P\left(x\mid y\right)\cdot P\left(y\mid z\right)\cdot P\left(z\right)\\\end{aligned}}

(7)

Therefore, if $X\rightarrow Y\rightarrow Z$ , then it follows that $Z\rightarrow Y\rightarrow X$ .

Chain Rules

As we increase the number of random variables we are dealing with, it is important to understand how this increase affects entropy and mutual information.

Entropy Chain Rule

We have previously shown that for two random variables $X$ and $Y$ :

{\begin{aligned}H\left(X,Y\right)&=H\left(X\right)+H\left(Y\mid X\right)\\&=H\left(Y\right)+H\left(X\mid Y\right)\\&=H\left(Y,X\right)\end{aligned}}

(8)

For three random variables $X$ , $Y$ , and $Z$ :

{\begin{aligned}H\left(X,Y,Z\right)&=H\left(X\right)+H\left(Y,Z\mid X\right)\\&=H\left(X\right)+H\left(Y\mid X\right)+H\left(Z\mid Y,X\right)\\\end{aligned}}

(8)

Conditional Mutual Information

Conditional mutual information is defined as the expected value of the mutual information of two random variables given the value of a third random variable.

The Data Processing Inequality

Contents

Markovity

Chain Rules

Entropy Chain Rule

Conditional Mutual Information

The Data Processing Inequality

Sufficient Statistics

Fano's Inequality

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools