Asymptotic Equipartition Property

Essentially, the Asymptotic Equipartition Property tell us that "almost all long sequences of fixed length are almost equally likely." The AEP is an important requirement for the validity of Shannon's noisy channel coding theorem.

Typical sets

We make our notion of "almost all" and "almost equal" precise using a tolerance parameter $\epsilon$ . For every $\epsilon >0$ , define the $\epsilon$ typical set $A_{\epsilon }^{(n)}$ be the set of all $n$ -sequences from ${\mathcal {X}}^{n}$ whose empirical entropies are less than $\epsilon$ away from the $H(X)$ :

$A_{\epsilon }^{(n)}=\left\{x^{n}\in {\mathcal {X}}^{n}:\left|-{\frac {1}{n}}\log p(x^{n})-H(X)\right|<\epsilon \right\},$

where we assume that the elements of $x^{n}$ are drawn as i.i.d. samples of $X$ in which case $p(x^{n})$ can be expanded as the product of the probabilities of all of its entries. Observe that the i.i.d. assumption implies that only the relative frequency of entries matter.

Numerical example

Consider the following binary sequence of length $n=20$ where $P(X=1)=2/3$ :

$x^{20}=[1,1,0,0,0,1,1,1,1,0,0,1,0,0,1,1,0,1,1,0]$

That sequence consists of 11 ones and nine zeros, which corresponds to the probability $p(x^{20})=(2/3)^{11}(1/3)^{9}=5.8744\times 10^{-7}$ . This probability will not change if we permute the ones and zeros within the sequence. To check if the sequence we have is $\epsilon$ -typical, we first have to set a value for $\epsilon$ , say $\epsilon =0.01$ .

$\left|-{\frac {1}{n}}\log p(x^{n})-H(X)\right|=\left|-{\frac {1}{20}}\log(5.8744\times 10^{-7})-0.9183\right|=0.1167>0.01$

Hence, the sequence we have is not $\epsilon$ -typical for $\epsilon =0.01$ .

Special case: binary sequences

Let us now try to characterize typical sets for binary sequences. Let $X$ be a binary random variable over the alphabet $\{0,1\}$ such that $P(X=1)=p$ . Let us fix $n$ (the sequence length) and $\epsilon >0$ (the typicality threshold). If we drawn an $n$ -sequence $x^{n}$ , we will get $k$ ones and $n-k$ zeros, which corresponds to the probability $p(x^{n})=p^{k}(1-p)^{n-k}$ .

Now define ${\hat {p}}=k/n$ , the relative frequency of ones in the sequence so that

$-{\frac {1}{n}}\log p(x^{n})=-{\frac {1}{n}}(k\log p+(n-k)\log(1-p))=-{\frac {k}{n}}\log p-\left(1-{\frac {n}{k}}\right)\log(1-p)=-{\hat {p}}\log p-\left(1-{\hat {p}}\right)\log(1-p)$ .

Using the definition of typical sets and $H(X)=-p\log p-(1-p)\log(1-p)$ , the sequence $x^{n}$ is $\epsilon$ -typical if and only if

${\begin{aligned}\left|\left[-{\hat {p}}\log p-\left(1-{\hat {p}}\right)\log(1-p)\right]-\left[-p\log p-(1-p)\log(1-p)\right]\right|&<\epsilon \\\left|-({\hat {p}}-p)\log p+\left(p-{\hat {p}}\right)\log(1-p)\right|&<\epsilon \end{aligned}}$

Isolating $|{\hat {p}}-p|$ ,

$|{\hat {p}}-p|<{\frac {\epsilon }{\left|\log {\frac {p}{1-p}}\right|}}$

The above inequality is a fairly satisfying result in that the relative frequencies of zeros and ones in the sequence $x^{n}$ should be close to the distribution. The parameter $\epsilon$ places a well-defined upper bound on the deviation $|{\hat {p}}-p|$ , beyond which the sequence is no longer $\epsilon$ -typical.

Checkpoint: Use the definition to show that if $X$ is a equiprobable over any binary alphabet, then any $n$ -sequence is $\epsilon$ -typical for all $\epsilon >0$ .

Checkpoint: Use the derived inequality in this section to show that there are no $\epsilon$ -typical sequences for $n=20$ and $\epsilon =0.01$ if the sequence is drawn i.i.d from ${\mathcal {X}}=\{0,1\}$ where $P(X=1)=2/3$ .

Just how large are these typical sets? Let us fix an $\epsilon$ , and count $n$ -sequences drawn from a distribution $p(X)$ .

Behavior for large $n$

In the previous section, you were asked to show that there are no $\epsilon$ -typical sequences for $\epsilon =0.01$ and $n=20$ when the sequence is drawn from the probability distribution $[1/3,2/3]$ over a binary alphabet. Although the proof was given as an exercise, we can see (informally) that it is hard to get close to the true probabilities for relatively small $n$ . This phenomenon becomes more apparent for shorter sequences, say $n=5$ . If you draw only five random bits, the relative frequencies that you can get are very limited: ${\hat {p}}\in \{0,1/5,2/5,3/5,4/5,1\}$ .

In information theory, fascinating properties emerge if we consider very long sequences. Indeed, for large $n$ , we get a much richer set of relative frequencies that makes it easier to approach the true probabilities of the underlying distribution. Let us still fix $\epsilon =0.01$ and draw bits from the distribution $[1/3,2/3]$ , but now let us inspect what happens if we increase $n$ .

$n$	$\mathbb {P} \left(X^{n}\in A_{\epsilon }^{(n)}\right)$
50	0.117828
100	0.167524
500	0.364729
1000	0.497573
5000	0.866395
10000	0.966111
50000	0.999998

The table demonstrates that for $n$ sufficiently large, the probability of an $n$ -sequence being $\epsilon$ -typical comes very close to 1. This is the Asymptotic Equipartition Property. Formally, it states that for every $\epsilon >0$ , there exists a minimum sequence length $n_{0}$ such that for all $n>n_{0}$ , $\mathbb {P} \left(X^{n}\in A_{\epsilon }^{(n)}\right)>1-\epsilon$ .

Asymptotic Equipartition Property

Contents