Difference between revisions of "Jointly typical sequences"

From Microlab Classes
Jump to navigation Jump to search
(Created page with "== Definition == == Illustration == == Example: Binary symmetric channel ==")
 
Line 1: Line 1:
== Definition ==
 
  
== Illustration ==
 
  
== Example: Binary symmetric channel ==
+
==Definition==
 +
Let <math>(x^n,y^n)</math> be an <math>n</math>-sequence of pairs drawn from the set <math>\mathcal{X} \times \mathcal{Y}</math>. For a fixed <math>\epsilon > 0</math>, <math>(x^n, y^n)</math> is said to be jointly typical with respect to random variables <math>X</math> and <math>Y</math> if the three conditions below are all satisfied:
 +
* <math>|-\frac{1}{n} \log p(x^n) - H(X)| < \epsilon </math>
 +
* <math>|-\frac{1}{n} \log p(y^n) - H(Y)| < \epsilon </math>
 +
* <math>|-\frac{1}{n} \log p(x^n,y^n) - H(X,Y)| < \epsilon </math>
 +
 
 +
From the definition, we see that the notion of joint typicality is stronger than the previous formulation. If we only needed the sequence of pairs <math>(x_i, y_i)</math> for <math> i = 1, 2, \cdots, n</math>, we could simply invoke the Asymptotic Equipartition Property with the random variable <math>X</math> replaced by <math>(X,Y)</math>. The first two properties further constrain the component sequences <math>(x_1, x_2, \cdots, x_n)</math> and <math>(y_1, y_2, ..., y_n)</math> beyond the earlier requirement of AEP. Altogether, these three properties guarantee us that sufficiently long sequences behave near their true joint and marginal distributions with very high probability.
 +
 
 +
== Example: binary symmetric channel ==
 +
For our example, we consider the sequence of input-output pairs <math>(x_i, y_i)</math> of a binary symmetric channel (BSC). A BSC takes in inputs from the input alphabet <math>\mathcal{X} = \{0,1\}</math> and produces outputs which also belong to the same alphabet <math>\mathcal{Y} = \{0, 1\}</math>. If we write the joint distribution as <math>p(X,Y) = p(X) p(Y|X)</math>, we can generate the pairs by first generating a value <math>x_i</math> using <math>p(X)</math>, and then drawing a value <math>y_i</math> using the conditional distribution <math>p(Y|X=x_i)</math>. For the BSC, <math>p(Y|X) = p_e</math> if <math>Y\ne X</math> and <math>p(Y|X) = 1 - p_e </math> if <math>Y=X</math>, where <math>0 \le p_e \le 1/2</math> is called the crossover probability.
 +
 
 +
In our calculation, the input <math>X</math> is assumed to be distributed such that <math>P(X=1) = 2/3</math>, and the crossover probability is set to <math>p_e = 1/10</math>. The typicality threshold is set at <math>\epsilon = 0.01</math>. From <math>p(X)</math> and <math>p(Y|X)</math>, we obtain the following joint distribution:
 +
 
 +
{| class="wikitable"
 +
|-
 +
! scope="col"|
 +
! scope="col"| <math>Y=0</math>
 +
! scope="col"| <math>Y=1</math>
 +
|-
 +
! scope="row"| <math>X=0</math>
 +
| 3/10
 +
| 1/30
 +
|-
 +
! scope="row"| <math>X=1</math>
 +
| 1/15
 +
| 3/5
 +
|-
 +
|}
 +
 
 +
The table below shows the probability of producing a jointly typical sequence as the blocklength increases.
 +
{| class="wikitable"
 +
|-
 +
! scope="col"| <math>n</math>
 +
! scope="col"| <math>\mathbb{P}\left(X^n \in A_\epsilon^{(n)}\right)</math>
 +
|-
 +
| 50
 +
| 0
 +
|-
 +
| 100
 +
| 0.003050
 +
|-
 +
| 500
 +
| 0.155243
 +
|-
 +
| 1000
 +
| 0.317690
 +
|-
 +
| 5000
 +
| 0.703099
 +
|-
 +
| 10000
 +
| 0.818271
 +
|-
 +
| 50000
 +
| 0.982649
 +
|-
 +
|}
 +
 
 +
To produce the values in the table, we enumerate the relatively frequencies of pairs that will result in a typical sequence for a fixed <math>n</math>. In our example, you can think of this a "profile" consisting of four numbers, <math>(k_{00}, k_{01}, k_{10}, k_{11})</math>, where <math>k_{ij}</math> is the number of <math>(i,j)</math> pairs in the sequence for <math>i,j\in\{0,1\}</math>. At <math>n=100</math>, there is only one such profile: <math>(k_{00}, k_{01}, k_{10}, k_{11}) = (30, 3, 7, 60)</math>. Using the multinomial theorem, we can calculate the probability of this profile as
 +
 
 +
<math>\frac{100!}{30!3!7!60!}(3/10)^{30} (1/30)^3 (1/15)^7 (3/5)^{60} \approx 0.003050</math>
 +
 
 +
For larger <math>n</math>, the number of profiles increases and concentrates the total probability weight near the underlying joint distribution <math>p(X,Y)</math>.

Revision as of 08:59, 20 April 2021


Definition

Let be an -sequence of pairs drawn from the set . For a fixed , is said to be jointly typical with respect to random variables and if the three conditions below are all satisfied:

From the definition, we see that the notion of joint typicality is stronger than the previous formulation. If we only needed the sequence of pairs for , we could simply invoke the Asymptotic Equipartition Property with the random variable replaced by . The first two properties further constrain the component sequences and beyond the earlier requirement of AEP. Altogether, these three properties guarantee us that sufficiently long sequences behave near their true joint and marginal distributions with very high probability.

Example: binary symmetric channel

For our example, we consider the sequence of input-output pairs of a binary symmetric channel (BSC). A BSC takes in inputs from the input alphabet and produces outputs which also belong to the same alphabet . If we write the joint distribution as , we can generate the pairs by first generating a value using , and then drawing a value using the conditional distribution . For the BSC, if and if , where is called the crossover probability.

In our calculation, the input is assumed to be distributed such that , and the crossover probability is set to . The typicality threshold is set at . From and , we obtain the following joint distribution:

3/10 1/30
1/15 3/5

The table below shows the probability of producing a jointly typical sequence as the blocklength increases.

50 0
100 0.003050
500 0.155243
1000 0.317690
5000 0.703099
10000 0.818271
50000 0.982649

To produce the values in the table, we enumerate the relatively frequencies of pairs that will result in a typical sequence for a fixed . In our example, you can think of this a "profile" consisting of four numbers, , where is the number of pairs in the sequence for . At , there is only one such profile: . Using the multinomial theorem, we can calculate the probability of this profile as

For larger , the number of profiles increases and concentrates the total probability weight near the underlying joint distribution .