Essentially, the Asymptotic Equipartition Property tell us that "almost all long sequences of fixed length are almost equally likely." The AEP is an important requirement for the validity of Shannon's noisy channel coding theorem.
Typical sets
We make our notion of "almost all" and "almost equal" precise using a tolerance parameter
. For every
, define the
typical set
be the set of all
-sequences from
whose empirical entropies are less than
away from the
:
where we assume that the elements of
are drawn as i.i.d. samples of
in which case
can be expanded as the product of the probabilities of all of its entries. Observe that the i.i.d. assumption implies that only the relative frequency of entries matter.
Numerical example
Consider the following binary sequence of length
where
:
That sequence consists of 11 ones and nine zeros, which corresponds to the probability
. This probability will not change if we permute the ones and zeros within the sequence. To check if the sequence we have is
-typical, we first have to set a value for
, say
.
Hence, the sequence we have is not
-typical for
.
Special case: binary sequences
Let us now try to characterize typical sets for binary sequences. Let
be a binary random variable over the alphabet
such that
. Let us fix
(the sequence length) and
(the typicality threshold). If we drawn an
-sequence
, we will get
ones and
zeros, which corresponds to the probability
.
Now define
, the relative frequency of ones in the sequence so that
.
Using the definition of typical sets and
, the sequence
is
-typical if and only if
Isolating
,
The above inequality is a fairly satisfying result in that the relative frequencies of zeros and ones in the sequence
should be close to the distribution. The parameter
places a well-defined upper bound on the deviation
, beyond which the sequence is no longer
-typical.
Checkpoint: Use the definition to show that if
is a equiprobable over any binary alphabet, then any
-sequence is
-typical for all
.
Checkpoint: Use the derived inequality in this section to show that there are no
-typical sequences for
and
if the sequence is drawn i.i.d from
where
.
Just how large are these typical sets? Let us fix an
, and count
-sequences drawn from a distribution
.
Behavior for large 
In the previous section, you were asked to show that there are no
-typical sequences for
and
when the sequence is drawn from the probability distribution
over a binary alphabet. Although the proof was given as an exercise, we can see (informally) that it is hard to get close to the true probabilities for relatively small
. This phenomenon becomes more apparent for shorter sequences, say
. If you draw only five random bits, the relative frequencies that you can get are very limited:
.
In information theory, fascinating properties emerge if we consider very long sequences. Indeed, for large
, we get a much richer set of relative frequencies that makes it easier to approach the true probabilities of the underlying distribution. Let us still fix
and draw bits from the distribution
, but now let us inspect what happens if we increase
.
|
|
50
|
0.117828
|
100
|
0.167524
|
500
|
0.364729
|
1000
|
0.497573
|
5000
|
0.866395
|
10000
|
0.966111
|
50000
|
0.999998
|
The table demonstrates that for
sufficiently large, the probability of an
-sequence being
-typical comes very close to 1. This is the Asymptotic Equipartition Property. Formally, it states that for every
, there exists a minimum sequence length
such that for all
,
.
Consequences