Difference between revisions of "Channel coding theorem"
Adrian Vidal (talk | contribs) |
Adrian Vidal (talk | contribs) |
||
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Channel capacity == | == Channel capacity == | ||
+ | === Motivation === | ||
+ | Formally, a channel is defined as a stochastic system which takes an input random variable <math>X</math> and an output random variable <math>Y</math> with a specified conditional probability distribution, <math>p(Y|X)</math>. Ideally, in a data transmission or storage systems, channels are noiseless which means that <math>X</math> is completely determined by <math>Y</math>, which implies that <math>H(X|Y) = 0</math>. However, most channels will exhibit some level of randomness and potentially one value <math>Y=y</math> may come from more than one value of <math>X</math>. Thus, it is desired that the mutual information between <math>X</math> and <math>Y</math> is maximized. | ||
+ | |||
+ | === Example === | ||
+ | Consider the following ternary-input ternary-output channel with input and output alphabet <math>\mathcal{X} = \mathcal{Y} = \{0, 1, 2\}</math> with the following conditional probability <math>p(Y|X)</math>: | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | | | ||
+ | | <math>Y=0</math> | ||
+ | | <math>Y=1</math> | ||
+ | | <math>Y=2</math> | ||
+ | |- | ||
+ | | <math>X = 0 </math> | ||
+ | | 0.9 | ||
+ | | 0.1 | ||
+ | | 0 | ||
+ | |- | ||
+ | | <math>X = 1</math> | ||
+ | | 0.1 | ||
+ | | 0.9 | ||
+ | | 0 | ||
+ | |- | ||
+ | | <math>X = 2</math> | ||
+ | | 0 | ||
+ | | 0 | ||
+ | | 1 | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | Here, it is clear that <math>Y=0</math> may have been due to <math>X=0</math> or <math>X = 1</math>. Although we do not have any control over this channel (we cannnot change <math>p(Y|X)</math>), we do have control over <math>p(X)</math>, the input distribution. Our goal then is to select an input distribution (transmission strategy) so that the mutual information between the inputs and outputs is maximized. The maximum mutual information obtained in such a manner is defined as the channel capacity. For discrete memoryless channels, we define the channel capacity <math>C</math> as | ||
+ | |||
+ | <math>C = \max_{p(X)} I(X;Y) </math>. | ||
+ | |||
+ | In our example, our transmission strategy is to assign probabilities to each input symbol <math>x \in \{0, 1, 2\}</math> with complete knowledge of the channel <math>p(Y|X)</math>. (If we do not know <math>p(Y|X)</math>, we need to do some form of ''channel estimation'', which out of our current scope in CoE 161.) | ||
+ | |||
+ | Let us say that this is your first time to do capacity calculations. You see the definition and think that maybe you can guess the right input distribution. After some thought, you came up with three candidate solutions for <math>p_X</math>: | ||
+ | |||
+ | * Always transmit <math>X=2</math>. | ||
+ | * Transmit <math>X=0</math> or <math>X=1</math> with 50-50 probability. | ||
+ | * Transmit all symbols with equal probability, <math>p(x) = 1/3</math> for all <math>x</math>. | ||
+ | |||
+ | {{Note|Checkpoint: Without calculating <math>I(X;Y)</math>, argue that the first two transmission schemes are suboptimal.}} | ||
+ | |||
+ | == Channel coding == | ||
+ | In noisy channels, <math>H(X|Y) > 0</math> and so the mutual information <math>I(X;Y)</math> is strictly less than the source entropy<math>H(X)</math>. Even if we know the capacity-achieving input distribution, it is always possible to make mistakes if we inspect the channel outputs on a per-symbol basis. In this section, we introduce channel coding and see that by adding redundancy, we can control the probability of error. | ||
+ | |||
+ | === Example: repetition code === | ||
+ | A simple way of error control is to employ repetition. In this scheme, we simply transmit the same symbol multiple times and decode the received sequence by taking the majority vote. For example, if we want to transmit the symbol 'a', we can use the channel 10 times which can possibly result in the sequence 'axaaaasdas', which contains 6 a's, one x, two s's, and one d. The receiver then decides that the transmitted symbol is 'a.' | ||
+ | |||
+ | Is it possible to make mistakes even with repetition? Yes! To see this, let us go back to the binary case and transmit binary symbols over a binary symmetric channel with crossover probability <math>0 < p_e < 1/2<\math>. Let <math>n</math> be the number of channel uses, and for simplicity, let us assume that <math>n</math> is odd so that there is no possiblity of a tie in deciding between 0 and 1 as the transmitted symbol. We will make a decision error at the receiver if and only if the channel flips more than <math>(n-1)/2</math> out of <math>n</math> bits. The probability of this event is given by | ||
+ | |||
+ | <math>P(\hat{X}\ne X) = \sum_{i=\frac{n+1}{2}}^{n} \left(\begin{array}{c}n \\ i\end{array}\right) p_e^{i}(1 - p_e)^{n -i }</math> | ||
+ | |||
+ | Fortunately, we do not need to evaluate this to see that the receiver will decode correctly with high probability. By the asymptotic equipartition property, as long as we have sufficiently large <math>n</math>, the received sequence will be a typical <math>n</math>-sequence. For the BSC, the typical output sequence will consist of roughly <math>np_e</math> errors. Since <math>p_e<1/2</math>, then there are fewer erroneous bits than correct bits so that majority-rule decision will make the right choice. | ||
== Channel coding theorem == | == Channel coding theorem == | ||
− | == | + | === The channel capacity is achievable === |
+ | To show that the channel capacity is achievable, we will demonstrate a transmission scheme that is guaranteed to have arbitrarily low error probability for sufficiently large <math>n</math>. The transmission scheme was selected primarily as a mathematical tool, and not as a guide for practical systems. The outline of the proof is as follows: | ||
+ | |||
+ | * We describe the transmission scheme which uses the input distribution <math>p_X(X)</math> to generate a random codebook. | ||
+ | * We describe a decoding scheme which uses typical sets. | ||
+ | * We show that sufficiently large <math>n</math> can be used to satisfy a desired upper bound in error probability. | ||
+ | * We conclude that <math>\max_{p(X)} I(X;Y) </math> is achievable. | ||
+ | |||
+ | The receiver makes an error only if at least one of the following events occurs: | ||
+ | * The received codeword is not jointly typical with the transmitted codeword. | ||
+ | * Some other transmitted codeword is jointly typical with the received codeword. | ||
+ | |||
+ | By the AEP, the first event has probability at most <math>\epsilon</math>. | ||
+ | |||
+ | === We cannot do better than the channel capacity === | ||
+ | Because of the inherent stochasticity of the channel, each codeword <math>X^n</math> gets received as a "diffuse" set of possible sequences <math>Y^n</math>. If we increase the blocklength, the AEP guarantees that these diffuse sets "harden," i.e., the sets will contain only typical sequences with high probability. This geometric intuition is useful because it makes it easy for us to see why we cannot use every codeword from <math>\mathcal{X}^n</math>: if we use slightly similar sequences <math>x_1^n</math> and <math>x_2^n</math>, then the typical sets at the receiver may overlap and error-free decoding is not possible. Thus, if we can somehow collect codewords <math>X^n</math> that are sufficiently different, then their output typical sets will not overlap with very high probability. | ||
+ | |||
+ | How many typical sets can we "pack" inside <math>\mathcal{Y}^n</math>? First, we observe that there are only <math>2^{nH(Y)}</math> typical <math>n</math>-sequences in the output alphabet. Second, for a randomly chosen codeword <math>X^n</math>, there are <math>2^{nH(Y|X)}</math> possible received sequences. The maximum number of codewords that we can use from <math>X^n</math> is given by | ||
+ | |||
+ | <math>\frac{2^{nH(Y)}}{2^{nH(Y|X)}} = 2^{n(H(Y) - H(Y|X))} = 2^{nI(X;Y)} </math>. | ||
+ | |||
+ | The best we can do for error-free transmission is to transmit <math>nI(X;Y)</math> information bits over <math>n</math> channel uses. Going beyond this limit will force the codeword "spheres" in <math>Y^n</math> to overlap and thus result in an error probability that is bounded away from zero. This is true for any choice of <math>p(X)</math>, thus we have the following upper bound on the maximum rate | ||
+ | |||
+ | <math>R \le \max_{p(X)} I(X;Y)</math>. | ||
+ | |||
+ | Together with the achievability bound, we can now say that for sufficiently large n, the maximum rate for error-free transmission over a channel <math>p(Y|X)</math> is <math>\max_{p(X)} I(X;Y)</math>. | ||
== Some examples == | == Some examples == | ||
+ | === Binary symmetric channel === | ||
+ | |||
+ | === Binary erasure channel === |
Latest revision as of 22:32, 27 April 2021
Contents
Channel capacity
Motivation
Formally, a channel is defined as a stochastic system which takes an input random variable and an output random variable with a specified conditional probability distribution, . Ideally, in a data transmission or storage systems, channels are noiseless which means that is completely determined by , which implies that . However, most channels will exhibit some level of randomness and potentially one value may come from more than one value of . Thus, it is desired that the mutual information between and is maximized.
Example
Consider the following ternary-input ternary-output channel with input and output alphabet with the following conditional probability :
0.9 | 0.1 | 0 | |
0.1 | 0.9 | 0 | |
0 | 0 | 1 |
Here, it is clear that may have been due to or . Although we do not have any control over this channel (we cannnot change ), we do have control over , the input distribution. Our goal then is to select an input distribution (transmission strategy) so that the mutual information between the inputs and outputs is maximized. The maximum mutual information obtained in such a manner is defined as the channel capacity. For discrete memoryless channels, we define the channel capacity as
.
In our example, our transmission strategy is to assign probabilities to each input symbol with complete knowledge of the channel . (If we do not know , we need to do some form of channel estimation, which out of our current scope in CoE 161.)
Let us say that this is your first time to do capacity calculations. You see the definition and think that maybe you can guess the right input distribution. After some thought, you came up with three candidate solutions for :
- Always transmit .
- Transmit or with 50-50 probability.
- Transmit all symbols with equal probability, for all .
Channel coding
In noisy channels, and so the mutual information is strictly less than the source entropy. Even if we know the capacity-achieving input distribution, it is always possible to make mistakes if we inspect the channel outputs on a per-symbol basis. In this section, we introduce channel coding and see that by adding redundancy, we can control the probability of error.
Example: repetition code
A simple way of error control is to employ repetition. In this scheme, we simply transmit the same symbol multiple times and decode the received sequence by taking the majority vote. For example, if we want to transmit the symbol 'a', we can use the channel 10 times which can possibly result in the sequence 'axaaaasdas', which contains 6 a's, one x, two s's, and one d. The receiver then decides that the transmitted symbol is 'a.'
Is it possible to make mistakes even with repetition? Yes! To see this, let us go back to the binary case and transmit binary symbols over a binary symmetric channel with crossover probability be the number of channel uses, and for simplicity, let us assume that is odd so that there is no possiblity of a tie in deciding between 0 and 1 as the transmitted symbol. We will make a decision error at the receiver if and only if the channel flips more than out of bits. The probability of this event is given by
Fortunately, we do not need to evaluate this to see that the receiver will decode correctly with high probability. By the asymptotic equipartition property, as long as we have sufficiently large , the received sequence will be a typical -sequence. For the BSC, the typical output sequence will consist of roughly errors. Since , then there are fewer erroneous bits than correct bits so that majority-rule decision will make the right choice.
Channel coding theorem
The channel capacity is achievable
To show that the channel capacity is achievable, we will demonstrate a transmission scheme that is guaranteed to have arbitrarily low error probability for sufficiently large . The transmission scheme was selected primarily as a mathematical tool, and not as a guide for practical systems. The outline of the proof is as follows:
- We describe the transmission scheme which uses the input distribution to generate a random codebook.
- We describe a decoding scheme which uses typical sets.
- We show that sufficiently large can be used to satisfy a desired upper bound in error probability.
- We conclude that is achievable.
The receiver makes an error only if at least one of the following events occurs:
- The received codeword is not jointly typical with the transmitted codeword.
- Some other transmitted codeword is jointly typical with the received codeword.
By the AEP, the first event has probability at most .
We cannot do better than the channel capacity
Because of the inherent stochasticity of the channel, each codeword gets received as a "diffuse" set of possible sequences . If we increase the blocklength, the AEP guarantees that these diffuse sets "harden," i.e., the sets will contain only typical sequences with high probability. This geometric intuition is useful because it makes it easy for us to see why we cannot use every codeword from : if we use slightly similar sequences and , then the typical sets at the receiver may overlap and error-free decoding is not possible. Thus, if we can somehow collect codewords that are sufficiently different, then their output typical sets will not overlap with very high probability.
How many typical sets can we "pack" inside ? First, we observe that there are only typical -sequences in the output alphabet. Second, for a randomly chosen codeword , there are possible received sequences. The maximum number of codewords that we can use from is given by
.
The best we can do for error-free transmission is to transmit information bits over channel uses. Going beyond this limit will force the codeword "spheres" in to overlap and thus result in an error probability that is bounded away from zero. This is true for any choice of , thus we have the following upper bound on the maximum rate
.
Together with the achievability bound, we can now say that for sufficiently large n, the maximum rate for error-free transmission over a channel is .