In this article, we discuss an extension of typical sets to jointly distributed random variables.
Definition
Let
be an
-sequence of pairs drawn from the set
. For a fixed
,
is said to be jointly typical with respect to random variables
and
if the three conditions below are all satisfied:



From the definition, we see that the notion of joint typicality is stronger than the previous formulation. If we only needed the sequence of pairs
for
, we could simply invoke the Asymptotic Equipartition Property with the random variable
replaced by
. The first two properties further constrain the component sequences
and
beyond the earlier requirement of AEP. Altogether, these three properties guarantee us that sufficiently long sequences behave near their true joint and marginal distributions with very high probability.
Example: binary symmetric channel
For our example, we consider the sequence of input-output pairs
of a binary symmetric channel (BSC). A BSC takes in inputs from the input alphabet
and produces outputs which also belong to the same alphabet
. If we write the joint distribution as
, we can generate the pairs by first generating a value
using
, and then drawing a value
using the conditional distribution
. For the BSC,
if
and
if
, where
is called the crossover probability.
In our calculation, the input
is assumed to be distributed such that
, and the crossover probability is set to
. The typicality threshold is set at
. From
and
, we obtain the following joint distribution:
|
|
|
|
3/10
|
1/30
|
|
1/15
|
3/5
|
The table below shows the probability of producing a jointly typical sequence as the blocklength increases.
|
|
50
|
0
|
100
|
0.003050
|
500
|
0.155243
|
1000
|
0.317690
|
5000
|
0.703099
|
10000
|
0.818271
|
50000
|
0.982649
|
To produce the values in the table, we enumerate the relatively frequencies of pairs that will result in a typical sequence for a fixed
. In our example, you can think of this a "profile" consisting of four numbers,
, where
is the number of
pairs in the sequence for
. At
, there is only one such profile:
. Using the multinomial theorem, we can calculate the probability of this profile as
For larger
, the number of profiles increases and concentrates the total probability weight near the underlying joint distribution
.
Important Properties