Difference between revisions of "Coding teaser: source coding"

Revision as of 20:37, 1 March 2022

Make sure to read the Wiki page about the basics of a channel. Several discussions and interpretations are important for this chapter 😊.

Complete Channel Model

Previously, we showed a simple channel model that consists of the source, channel, and receiver. In modern communication systems, we also add an encoder after the source and a decoder before the receiver. Figure 1 shows the complete channel model. The encoder is used to transform the source symbols into a code that suitable for the channel. The receiver decodes the received message coming out of the channel.

Figure 1: Complete channel model with the encoder added after the source and the decoder added before the receiver.

Let's bring back Bob and Alice into the scene again. This time let's use the complete channel model. Since most of our communication systems use binary representations, Bob needs to encode his data into binary digits before he sends it over the channel.

Bob sending an encoded message to Alice.

Let's look at a few examples. Suppose our source alphabet $S$ has the symbols of all combinations of flipping three fair coins. Then our symbols would be $\{TTT,TTH,THT,THH,HTT,HTH,HHT,HHH\}$ . Bob needs to map each symbol into a binary representation such that $\{TTT,TTH,THT,THH,HTT,HTH,HHT,HHH\}\rightarrow \{000,001,010,011,100,101,110,111\}$ . This is the encoding process. Let's add new definitions:

The binary set $\{0,1\}$ are the code symbols.
The encoded patterns $\{000,001,010,011,100,101,110,111\}$ are called the codewords. For example, $000$ is the codeword for the symbol $TTT$ . $101$ is the codeword for $HTH$ . In practice, we let some random variable $X$ whose outcomes are codewords $\{x_{1},x_{2},x_{3},...,x_{n}\}$ associated with probabilities $\{P(x_{1}),P(x_{2}),P(x_{3}),...,P(x_{n})\}$ . For example, $x_{1}=000$ is the codeword for $TTT$ . Moreover, from our example, the source symbols are equiprobable and we have a direct mapping from source symbol to codeword (i.e., $s_{i}\rightarrow x_{i}$ ) then it follows that the codewords are equiprobable too: $P(x_{1})=P(x_{2})=P(x_{3})=P(x_{4})=P(x_{5})=P(x_{6})=P(x_{7})=P(x_{8})={\frac {1}{8}}$ . The probability distribution of $X$ can change depending on the encoding scheme.
We usually tabulate the encoding or mapping. We call this table the codebook. It's a simple look-up-table (LUT). Let's call the table below as codebook-1.

Source Symbol	Codeword
TTT	000
TTH	001
THT	010
THH	011
HTT	100
HTH	101
HHT	110
HHH	111

The encoder looks at this table and endcodes message before it gets into the channel. For example, suppose we want to send the message $TTTHHHTHTHTH$ . The encoder uses codebook-1 to translate the source message into an encoded message: $000111010101$ . Receivers should also know the same codebook-1 so that they can decode the message. For example, if the encoded message is $101011001110$ , then we can decode the message as $HTHTHHTTHHHT$ . Choosing the encoding where the $T\rightarrow 0$ and $H\rightarrow 1$ is simple. This encoding is arbitrary and it seems convenient for our needs. However, nothing stops us from encoding the source symbols with a different representation. For example, we can choose an encoding where $T\rightarrow 1$ and $H\rightarrow 0$ . We can go for more complex ones like codebook-2 as shown below:

Source Symbol	Codeword
TTT	01
TTH	001
THT	0001
THH	00001
HTT	000001
HTH	0000001
HHT	00000001
HHH	000000001

Where a source message $TTTHHHTHTHTH$ translates into $0100000000100010000001$ . Codebook-2 can also decode an encoded message. For example, when we receive $00000010000100100000001$ we get $HTHTHHTTHHHT$ . The choice of the encoding process is purely arbitrary. However, we are interested if they are efficient? One metric is the average code length defined as:

L(X)=\sum _{i=1}^{n}P(x_{i})l_{i}

(1)

Where $l_{i}$ is the code length of the codeword $x_{i}$ . For example, the average code length of codebook-1 is:

$L(X)=8\cdot {\frac {1}{8}}\cdot 3=3$

The average code length of codebook-2 is:

$L(X)={\frac {1}{8}}\cdot 2+{\frac {1}{8}}\cdot 3+{\frac {1}{8}}\cdot 4+{\frac {1}{8}}\cdot 5+{\frac {1}{8}}\cdot 6+{\frac {1}{8}}\cdot 7+{\frac {1}{8}}\cdot 8+{\frac {1}{8}}\cdot 9=5.5$

Clearly, codebook-1 has a shorter codelength than codebook-2. Let's introduce another metric called the coding efficiency. Coding efficiency is defined as:

\eta ={\frac {H(S)}{L(X)}}

(2)

Where $H(S)$ is the source entropy. The coding efficiency $\eta$ has a range of $0\leq \eta \leq 1$ . If you recall our "go-to" interpretation of entropy, given some entropy $H(S)$ means we can represent $m=2^{H(S)}$ outcomes. Essentially, $H(S)$ is the minimum number of bits we can use to represent all outcomes of a source regardless of its distribution. Of course, a smaller coding length is desirable if we want to compress representations. For our simple coin flip example, $H(S)=3$ . The efficiency of codebook-1 is $\eta ={\frac {3}{3}}=1$ . Codebook-1 is said to be maximally efficient because it matches the minimum number of bits to represent the source symbols. The efficiency of codebook-2 is $\eta ={\frac {3}{5.5}}=0.5455$ .

L(X)

should never be lower than

H(S)

. Otherwise, that doesn't make sense. How can you represent

H(S)=3

with

L(X)=2

? Something's wrong with your codebook if this happens.

Let's bring back Bob and Alice into the scene again. If you recall the story of Bob and Alice, bob wants to send a file to Alice but it will take long due to the limited bandwidth of the channel. When we say bandwidth, the channel can only send a maximum amount of bits per second. Let's define $C$ as the maximum channel rate in units of bits/s. Channels have different maximum rates because of their physical limitations and noise. For example, an ethernet bandwidth can achieve a bandwidth of 10 Gbps (gigabits per second), while Wifi can only achieve at a maximum of 6.9 Gbps. It really is slower to communicate wirelessly.

Channel Capacity

Figure 1:

I(R,S)

of a BSC with varying

p

and parametrizing

\epsilon

.

Previously, we comprehensively investigated the measures of $H(R)$ , $H(R|S)$ , and $I(R,S)$ on a BSC. Recall that:

$I(R,S)=H(R)-H(R|S)$

If you think about it, $I(R,S)$ is a measure of channel quality. It tells us how much we can estimate the input from an observed output. In English, we said that $I(R,S)$ is a measure of the received information contained in the source information. Ideally if there is no noise, then $H(R|S)=0$ resulting in $I(R,S)=H(S)=H(R)$ . We want this because this means all the information from the source gets to the receiver! When noise is present $H(R|S)\neq 0$ , it degrades the quality of our $I(R,S)$ and the information about the receiver contained in the source is now being shared with noise. This only means one thing: we want $I(R,S)$ to be at maximum. From the previous discussion, we presented how $I(R,S)$ varies with $p$ and parametrizing the noise probability $\epsilon$ . Figure 1 shows this plot (it's the same image from the last discussion). We can visualize $I(R,S)$ like a pipe that constricts the flow of information depending on the noise parameter $\epsilon$ . Figure 2 GIF shows the pipe visualization.

Figure 2: Pipe visualization for

I(R,S)

Let's modify our definition of $I(R,S)$ a bit. In communication theory, $I(R,S)$ is a metric for transmission rate. So the units is really in bits/second. Figure 1 demonstrates how much data can be transmitted per second. Interchanging between bits or bits/second might be confusing but for now, when we talk about channels (and also $I(R,S)$ ), let's use bits/s. Bits/s is also a measure of bandwidth. From the discussions, it is clear that we can define the maximum channel capacity $C$ as:

C=\max(I(S,R))

(1)

We can re-write this as:

C=\max(H(R)-H(R|S))

(1)

Here, $C$ is in units of bits/second. It tells us the maximum number of data that we can transmit per unit of time. Both $H(R)$ and $H(R|S)$ are affected by noise parameter $\epsilon$ . In a way, noise limits $C$ . For example, a certain $\epsilon$ can only provide a certain maximum $I(R,S)$ . Looking at figure 1, when $\epsilon =0$ (or $\epsilon =1$ ) we can get a maximum $I(R,S)=1$ bits/s. When $\epsilon =0.25$ (or $\epsilon =0.75$ ) we can get a maximum $I(R,S)=0.189$ bits/s. These occur when $p=0.5$ . We can prove this mathematically. Let's determine $C$ of a BSC given some $p$ and $\epsilon$ .

$C=\max(H(R)-H(R|S))$

Since we know $H(R)$ follows the Bernoulli entropy $H_{b}(q)$ where $q=p+\epsilon -2\epsilon p$ . Then we can just write:

$H(R)=H_{b}(q)=-q\log _{2}(q)-(1-q)\log _{2}(1-q)$

$\max(H(R))=1$ bits when $q=0.5$ . Since $\epsilon$ is fixed depending on the channel, we can manipulate the input probability $p$ such that we can force $q=0.5$ and achieve the maximum $H(R)$ . Hold on to this very important thought. We also know that $H(R|S)=H_{b}(\epsilon )$ . Unfortunately, since $\epsilon$ is fixed then we can't immediately assume $H_{b}(\epsilon )=1$ not unless $\epsilon =0.5$ . Therefore we can write $C$ as:

$C=\max(H_{b}(q)-H_{b}(\epsilon ))$

Leading to:

C=1-H_{b}(\epsilon )

(2)

Equation 2 shows how the maximum capacity is limited by the noise $\epsilon$ . It describes the pipe visualization in figure 1. When $\epsilon =0$ , $H_{b}(\epsilon )=0$ and our BSC can transmit $C=1.0$ bits/s of data. When $\epsilon =0.5$ , $H_{b}(\epsilon )=1$ , the noise entropy is at maximum and our channel can transmit $C=0$ bits/s. Clearly, we can't transmit data this way. Lastly, when $\epsilon =1$ , then $H_{b}(\epsilon )=0$ again and we're back to transmitting $C=1$ bits/s.

Again, we can assume $\max(H_{b}(q))=1$ because we can transform the input to have a new probability distribution $p$ that maximizes $H_{b}(q)$ ! If we have the In fact, this is a prelude Shannon's source coding theorem:

"Let a source have entropy $H$ (bits per symbol) and a channel have a capacity $C$ (bits per second). Then it is possible to encode the output of the source in such a way as to transmit at the average rate $C/H-\epsilon$ symbols per second over the channel where $\epsilon$ is arbitrarily small. It is not possible to transmit at an average rate greater than $C/H$ ." ^[1]

Note that the above statement only says that we can recode the source into something that can maximize the channel capacity. Also note that the $\epsilon$ is an arbitrary error. Not the noise probability. Figure 1 and our idea

Huffman Coding

↑ Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana: University of Illinois Press.

[1] Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana: University of Illinois Press.

[1]

@@ Line 1: / Line 1: @@
 {{Note| Make sure to read the [[Channeling_your_inner_capacity|Wiki page about the basics of a channel]]. Several discussions and interpretations are important for this chapter 😊.  |reminder}}
-==Channel Rates and Improved Channel Model==
+==Complete Channel Model==
-Let's bring back Bob and Alice into the scene again. If you recall the [[Engaging_introduction_to_information_theory|story of Bob and Alice]], bob wants to send a file to Alice but it will take long due to the limited bandwidth of the channel. When we say bandwidth, the channel can only send a maximum amount of bits per second. Let's define  <math> C </math> as the maximum channel rate in units of bits/s. Channels have different maximum rates because of their physical limitations and noise. For example, an ethernet bandwidth can achieve a bandwidth of 10 Gbps (gigabits per second), while Wifi can only achieve at a maximum of 6.9 Gbps. It really is slower to communicate wirelessly.
+Previously, we showed a simple channel model that consists of the source, channel, and receiver. In modern communication systems, we also add an '''encoder''' after the source and a '''decoder''' before the receiver. Figure 1 shows the complete channel model. The encoder is used to transform the source symbols into a code that suitable for the channel. The receiver decodes the received message coming out of the channel.
-[[File:Bob alice nodencode.gif|center]]
+[[File:Complete channel model.PNG|frame|center|Figure 1: Complete channel model with the encoder added after the source and the decoder added before the receiver.]]
-Since most of our communication systems use binary representations, Bob has to convert his data into binary digits before he sends it over the channel. Let's look at a few examples. Suppose our alphabet <math> S </math> has the symbols of all combinations of flipping three fair coins. Then our symbols would be <math> \{TTT,TTH,THT,THH,HTT,HTH,HHT,HHH\} </math>. Bob needs to map each symbol into a binary representation such that <math> \{TTT,TTH,THT,THH,HTT,HTH,HHT,HHH\} \rightarrow \{000,001,010,011,100,101,110,111\}</math>. This mapping is called an encoding process. Let's add new definitions:
+Let's bring back Bob and Alice into the scene again. This time let's use the complete channel model. Since most of our communication systems use binary representations, Bob needs to encode his data into binary digits before he sends it over the channel.
+[[File:Bob alice w encode.gif|center|Bob sending an encoded message to Alice.]]
+Let's look at a few examples. Suppose our source alphabet <math> S </math> has the symbols of all combinations of flipping three fair coins. Then our symbols would be <math> \{TTT,TTH,THT,THH,HTT,HTH,HHT,HHH\} </math>. Bob needs to map each symbol into a binary representation such that <math> \{TTT,TTH,THT,THH,HTT,HTH,HHT,HHH\} \rightarrow \{000,001,010,011,100,101,110,111\}</math>. This is the '''encoding process'''. Let's add new definitions:
 * The binary set <math> \{0,1\} </math> are the '''code symbols'''.
-* The mapping set  <math> \{TTT,TTH,THT,THH,HTT,HTH,HHT,HHH\} </math> are called the '''codewords'''. For example, <math> 000 </math> is the codeword for the symbol <math> TTT </math>. <math> 101 </math> is the codeword for <math> HTH </math>.
+* The encoded patterns <math> \{000,001,010,011,100,101,110,111\} </math> are called the '''codewords'''. For example, <math> 000 </math> is the codeword for the symbol <math> TTT </math>. <math> 101 </math> is the codeword for <math> HTH </math>. In practice, we let some random variable <math> X </math> whose outcomes are codewords <math>\{x_1,x_2,x_3,...,x_n\}</math> associated with probabilities <math> \{P(x_1),P(x_2),P(x_3),...,P(x_n)\} </math>. For example, <math> x_1 = 000 </math> is the codeword for <math> TTT </math>. Moreover, from our example, the source symbols are equiprobable and we have a direct mapping from source symbol to codeword (i.e., <math> s_i \rightarrow x_i </math> ) then it follows that the codewords are equiprobable too: <math> P(x_1)=P(x_2)=P(x_3)=P(x_4)=P(x_5)=P(x_6)=P(x_7)=P(x_8)=\frac{1}{8} </math>. The probability distribution of <math> X </math> can change depending on the encoding scheme.
-* We usually tabulate the encoding or mapping. We call this table the '''codebook'''. It's a simple look-up-table (LUT).
+* We usually tabulate the encoding or mapping. We call this table the '''codebook'''. It's a simple '''look-up-table (LUT)'''. Let's call the table below as codebook-1.
 {| class="wikitable"
@@ Line 43: / Line 47: @@
 |-
 |}
+The encoder looks at this table and endcodes message before it gets into the channel. For example, suppose we want to send the message <math> TTTHHHTHTHTH </math>. The encoder uses codebook-1 to translate the source message into an encoded message: <math> 000111010101 </math>. Receivers should also know the same codebook-1 so that they can decode the message. For example, if the encoded message is <math> 101011001110 </math>, then we can decode the message as <math> HTHTHHTTHHHT </math>. Choosing the encoding where the <math> T \rightarrow 0</math> and <math> H \rightarrow 1 </math> is simple. This encoding is arbitrary and it seems convenient for our needs. However, nothing stops us from encoding the source symbols with a different representation. For example, we can choose an encoding where <math> T \rightarrow 1</math> and <math> H \rightarrow 0 </math>.  We can go for more complex ones like codebook-2 as shown below:
+{| class="wikitable"
+|-
+! scope="col"| Source Symbol
+! scope="col"| Codeword
+|-
+| style="text-align:center;" | TTT
+| style="text-align:center;" | 01
+|-
+| style="text-align:center;" | TTH
+| style="text-align:center;" | 001
+|-
+| style="text-align:center;" | THT
+| style="text-align:center;" | 0001
+|-
+| style="text-align:center;" | THH
+| style="text-align:center;" | 00001
+|-
+| style="text-align:center;" | HTT
+| style="text-align:center;" | 000001
+|-
+| style="text-align:center;" | HTH
+| style="text-align:center;" | 0000001
+|-
+| style="text-align:center;" | HHT
+| style="text-align:center;" | 00000001
+|-
+| style="text-align:center;" | HHH
+| style="text-align:center;" | 000000001
+|-
+|}
+Where a source message <math> TTTHHHTHTHTH </math> translates into <math> 0100000000100010000001 </math>. Codebook-2 can also decode an encoded message. For example, when we receive <math> 00000010000100100000001 </math> we get <math> HTHTHHTTHHHT </math>. The choice of the encoding process is purely arbitrary. However, we are interested if they are efficient? One metric is the average code length defined as:
+{{NumBlk|::|<math> L(X) = \sum_{i=1}^n P(x_i) l_i </math>|{{EquationRef|1}}}}
+Where <math> l_i </math> is the code length of the codeword <math> x_i </math>. For example, the average code length of codebook-1 is:
+<math> L(X) = 8 \cdot \frac{1}{8} \cdot 3 = 3</math>
+The average code length of codebook-2 is:
+<math> L(X) = \frac{1}{8}\cdot 2 + \frac{1}{8}\cdot 3 + \frac{1}{8}\cdot 4 + \frac{1}{8}\cdot 5 + \frac{1}{8}\cdot 6 + \frac{1}{8}\cdot 7 + \frac{1}{8}\cdot 8 + \frac{1}{8}\cdot 9 = 5.5 </math>
+Clearly, codebook-1 has a shorter codelength than codebook-2. Let's introduce another metric called the '''coding efficiency'''. Coding efficiency is defined as:
+{{NumBlk|::|<math> \eta = \frac{H(S)}{L(X)} </math>|{{EquationRef|2}}}}
+Where <math> H(S) </math> is the source entropy. The coding efficiency <math> \eta </math> has a range of <math> 0 \leq \eta \leq 1 </math>. If you recall our "go-to" interpretation of entropy, given some entropy <math> H(S) </math> means we can represent <math> m = 2^{H(S)} </math> outcomes. Essentially, <math> H(S) </math> is the minimum number of bits we can use to represent all outcomes of a source regardless of its distribution. Of course, a smaller coding length is desirable if we want to compress representations. For our simple coin flip example, <math> H(S) = 3 </math>. The efficiency of codebook-1 is <math> \eta = \frac{3}{3} = 1 </math>. Codebook-1 is said to be maximally efficient because it matches the minimum number of bits to represent the source symbols. The efficiency of codebook-2 is <math> \eta = \frac{3}{5.5} = 0.5455 </math>.
+{{Note| <math> L(X) </math> should never be lower than <math> H(S) </math>. Otherwise, that doesn't make sense. How can you represent <math> H(S) = 3 </math> with <math> L(X) = 2 </math>? Something's wrong with your codebook if this happens. |reminder}}
 <math> </math>
+Let's bring back Bob and Alice into the scene again. If you recall the [[Engaging_introduction_to_information_theory|story of Bob and Alice]], bob wants to send a file to Alice but it will take long due to the limited bandwidth of the channel. When we say bandwidth, the channel can only send a maximum amount of bits per second. Let's define  <math> C </math> as the maximum channel rate in units of bits/s. Channels have different maximum rates because of their physical limitations and noise. For example, an ethernet bandwidth can achieve a bandwidth of 10 Gbps (gigabits per second), while Wifi can only achieve at a maximum of 6.9 Gbps. It really is slower to communicate wirelessly.
 ==Channel Capacity==

Difference between revisions of "Coding teaser: source coding"

Revision as of 20:37, 1 March 2022

Complete Channel Model

Channel Capacity

Huffman Coding

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools