Difference between revisions of "Channeling your inner capacity"

From Microlab Classes
Jump to navigation Jump to search
Line 9: Line 9:
  
  
Figure 2 shows a simplified model of their communication. Bob is the '''source'''. Sometimes we use the term '''sender''' or '''transmitter''' for the source. The medium where the message goes through is the '''channel''' (in this case the wireless channel). Alice is the '''receiver''' who probably gets the correct message at the other end of the channel. Let's take a look at each component formally.
+
Figure 2 shows a simplified model of their communication. Bob is the '''source'''. Sometimes we use the term '''sender''' or '''transmitter''' for the source. The medium where the message goes through is the '''channel''' (in this case, the wireless channel). Alice is the '''receiver''' who probably gets the correct message at the other end of the channel. Let's take a look at each component formally.
  
  
Line 17: Line 17:
 
=== The Source ===
 
=== The Source ===
  
The source is the component that produces messages or objects sent over a channel. We represent the source as a random variable <math> S </math> which contains the outcomes <math> \{s_1, s_2, s_3, ... , s_n \} </math> and each outcome has its own probability distribution <math>\{P(s_1), P(s_2), P(s_3), ... , P(s_n) \}</math>. The random variable <math> S </math> is also called the '''source alphabet''' and the outcomes are the '''symbols''' of the source alphabet. A combinational sequence of outcomes forms the '''message''' that we send over the channel. Consider the following examples:
+
The source is the component that produces messages or objects sent over a channel. We represent the source as a random variable <math> S </math> which contains the outcomes <math> \{s_1, s_2, s_3, ... , s_n \} </math> and each outcome has its own probability distribution <math>\{P(s_1), P(s_2), P(s_3), ... , P(s_n) \}</math>. The random variable <math> S </math> is also called the '''source alphabet''' and the outcomes are the '''symbols''' of the source alphabet. A combinational sequence of symbols forms the '''message''' that travels through the channel. Consider the following examples:
  
* We can let the source alphabet <math> S </math> be the English alphabet where the symbols are all the letters from a to z including the space. Suppose we want to send the message "the quick brown fox jumps over the lazy dog". All letters of the English alphabet has a probability distribution associated with it. The second programming exercises did the exact same method for English, German, French, and Tagalog languages.
+
* We can let the source alphabet <math> S </math> be the English alphabet where the symbols are all the letters from a to z including the space. Suppose we want to send the message "the quick brown fox jumps over the lazy dog". All letters of the English alphabet has a probability distribution associated with it. We have seen this from our programming exercise of module 2. We extracted the probability distributions of N-block combinations for English, German, French, and Tagalog languages.
  
* We can let <math> S </math> be the binary alphabet where the symbols are <math>\{ 1, 0\} </math>. The message could be streams of 1s and 0s that could mean something. For example, a message could be a sequence of events when a light is on or off. It could also be an indicator for sending decimal values in the form of binary digits.
+
* We can let <math> S </math> be the binary alphabet where the symbols are <math>\{ 1, 0\} </math>. The message could be streams of 1s and 0s that could mean something. For example, a message could be a sequence of events when a light is on or off. It could also be an indicator for sending decimal values in the form of binary digits. Finally, it could represent binary pixels of a fixed-size image.
  
* In biology, we can let <math> S </math> be the DNA bases whose symbols are <math> \{A,C,G,T\} </math>. <math> A </math> is adanine, <math> C </math> is cytosine, <math> G </math> is guanine, and <math> T </math> is thymine. These symbols combine to form DNA sequences that are messages to instruct what kind of protein synthesis should a cell do.  
+
* In biology, we can let <math> S </math> be the DNA bases whose symbols are <math> \{A,C,G,T\} </math>. <math> A </math> is adenine, <math> C </math> is cytosine, <math> G </math> is guanine, and <math> T </math> is thymine. These symbols combine to form DNA sequences that are messages to instruct a cell to do certain types of [https://www.biologyonline.com/dictionary/protein-synthesis#:~:text=Protein%20synthesis%20is%20the%20process,from%20carbon%20sources%20like%20glucose. protein synthesis].  
  
 
Make sure to understand carefully what source alphabets, symbols, and messages mean.
 
Make sure to understand carefully what source alphabets, symbols, and messages mean.
Line 29: Line 29:
 
=== The Channel ===
 
=== The Channel ===
  
The channel is the medium where the source message will be sent to get to the receiver side. In our Bob and Alice example, they used a wireless channel. The channel has a maximum capacity of the number of symbols that we can transmit per second. We call this the '''channel capacity'''. We will discuss this later. A channel is, most of the time, associated with '''noise'''. A '''noiseless channel''' is where the information of the message gets to the receiver "perfectly". We'll revisit this later too. A '''noisy channel''' is when any of the symbols of the source message gets flipped or when additional non-source symbols gets added to the message. The noise disrupts the message and thereby affecting the information that the receiver gets. Let's take a few examples:
+
The channel is the medium where the source message travels until it gets to the receiver. In our Bob and Alice example, they used a wireless channel. The channel has a maximum capacity measured as the number of symbols that can be transmitted per second. We call this the '''channel capacity'''. We will discuss this later. A channel is, most of the time, associated with '''noise'''. A '''noiseless channel''' is where the information of the message gets to the receiver "perfectly". That means whatever the source sends gets to the receiver without any glitches or any manipulation of information. A '''noisy channel''' flips existing symbols of a message or adds unnecessary (or new) symbols that are not originally part of the source alphabet. The noise disrupts the message and thereby affecting the information that the receiver retrieves. We associate the chance of flipping symbols as conditional probabilities. For example, suppose we received a 1 at the receiver's end but, on a bird's eye view, the source actually sent a 0. This probability is the probability of receiving a 1 given that a 0 was sent (i.e., <math>P(r=1|s=0) = \epsilon </math>). This probability models the noise.
 +
 
 +
Let's take a few examples:
  
 
* In the Bob and Alice story, the wireless channel can disrupt Bob's message. Suppose one of the messages Bob sends is "love can be as sweet as smelling the roses". Unfortunately, noise can either corrupt symbols or add unwanted symbols such the the message becomes "love cant bed as sweat as smelting the noses". This is a disaster if it gets to Alice. 😱
 
* In the Bob and Alice story, the wireless channel can disrupt Bob's message. Suppose one of the messages Bob sends is "love can be as sweet as smelling the roses". Unfortunately, noise can either corrupt symbols or add unwanted symbols such the the message becomes "love cant bed as sweat as smelting the noses". This is a disaster if it gets to Alice. 😱
  
* Another example is memory. Suppose you're in Mars and you need to record audio logs and videos unto a [https://en.wikipedia.org/wiki/Solid-state_drive solid-state drive (SSD)]. Memories can be corrupted due to [https://www.scienceabc.com/innovation/what-are-bit-flips-and-how-are-spacecraft-protected-from-them.html#:~:text=Cosmic%20Bit%2DFlip&text=Again%2C%20as%20the%20name%20suggests,been%20stripped%20of%20their%20electrons. cosmic-ray bit flips]. These bit flips occur due to cosmic energies that zap some bits of memory. Say some data <math> 1010 </math> gets flipped into <math> 0010 </math>. Both representations mean totally different things.
+
* Another example is memory. Suppose you're in Mars and you need to record audio logs and videos on a [https://en.wikipedia.org/wiki/Solid-state_drive solid-state drive (SSD)]. Memories can be corrupted due to [https://www.scienceabc.com/innovation/what-are-bit-flips-and-how-are-spacecraft-protected-from-them.html#:~:text=Cosmic%20Bit%2DFlip&text=Again%2C%20as%20the%20name%20suggests,been%20stripped%20of%20their%20electrons. cosmic-ray bit flips]. These bit flips occur due to cosmic energies that zap some bits of memory. Say some data <math> 1010 </math> gets flipped into <math> 0010 </math>. Both representations mean totally different things.
  
 
* Last example is social media. Good people (source) would like to spread factual news (message) that are noteworthy of broadcasting to ordinary citizens (receivers). Unfortunately, this does not stop evil citizens in creating fake news (noise). So since both factual news and fake news mix together in social media, fake news disrupts the intended messages for ordinary citizens. Don't you hate it when this happens? 😩 Innocent people fall into this trap.
 
* Last example is social media. Good people (source) would like to spread factual news (message) that are noteworthy of broadcasting to ordinary citizens (receivers). Unfortunately, this does not stop evil citizens in creating fake news (noise). So since both factual news and fake news mix together in social media, fake news disrupts the intended messages for ordinary citizens. Don't you hate it when this happens? 😩 Innocent people fall into this trap.
Line 40: Line 42:
  
 
=== The Receiver ===
 
=== The Receiver ===
 +
 +
The receiver, obviously, receives and accepts the message at the other end of the channel. Just like the source, the receiver is a random variable <math> R </math> that has <math> \{r_1, r_2, r_3, ... , r_m \} </math> outcomes associated to <math>\{P(r_1), P(r_2), P(r_3), ... , P(r_m) \} </math> probability distribution. We can also call <math> R </math> as the receiver alphabet with symbols <math> \{r_1, r_2, r_3, ... , r_m \} </math>. Observe that we purposely set the maximum number as <math> m </math> because it is possible that the receiver may receive more symbols than what the source alphabet (i.e., <math>n \leq m </math>) when the channel is noisy. A noiseless channel produces <math> R = S</math> where all outcomes <math> \{r_1, r_2, r_3, ... , r_m \} = \{s_1, s_2, s_3, ... , s_m \} </math> and the probability distributions <math>\{P(s_1), P(s_2), P(s_3), ... , P(s_n) \} = \{P(r_1), P(r_2), P(r_3), ... , P(r_n) \} </math>. If the channel is noisy then <math> R \neq S</math> and either the outcomes or the probability distributions are not equal. Take note that it's possible to have the exact same outcomes but different probability distributions. Let's take a look at some examples:
 +
 +
* From the Bob and Alice example, when Bob sent "love can be as sweet as smelling the roses" but Alice received "love cant bed as sweat as smelting the noses" shows. Here, we can show that <math> n = m </math> but the probability distribution of <math> R </math> will be different from <math> S </math>. Checkout the tabulated data below. We'll leave it up to you how we got these numbers. It should be obvious.
 +
 +
{| class="wikitable"
 +
|-
 +
! scope="col"| Outcomes
 +
! scope="col"| <math> S </math>
 +
! scope="col"| <math> R </math>
 +
|-
 +
| style="text-align:center;" | a
 +
| style="text-align:center;" | 0.071
 +
| style="text-align:center;" | 0.091
 +
|-
 +
| style="text-align:center;" | b
 +
| style="text-align:center;" | 0.024
 +
| style="text-align:center;" | 0.023
 +
|-
 +
| style="text-align:center;" | c
 +
| style="text-align:center;" | 0.024
 +
| style="text-align:center;" | 0.023
 +
|-
 +
| style="text-align:center;" | d
 +
| style="text-align:center;" | 0
 +
| style="text-align:center;" | 0.023
 +
|-
 +
| style="text-align:center;" | e
 +
| style="text-align:center;" | 0.167
 +
| style="text-align:center;" | 0.136
 +
|-
 +
| style="text-align:center;" | g
 +
| style="text-align:center;" | 0.024
 +
| style="text-align:center;" | 0.023
 +
|-
 +
| style="text-align:center;" | h
 +
| style="text-align:center;" | 0.024
 +
| style="text-align:center;" | 0.023
 +
|-
 +
| style="text-align:center;" | i
 +
| style="text-align:center;" | 0.024
 +
| style="text-align:center;" | 0.023
 +
|-
 +
| style="text-align:center;" | l
 +
| style="text-align:center;" | 0.071
 +
| style="text-align:center;" | 0.045
 +
|-
 +
| style="text-align:center;" | m
 +
| style="text-align:center;" | 0.024
 +
| style="text-align:center;" | 0.023
 +
|-
 +
| style="text-align:center;" | n
 +
| style="text-align:center;" | 0.048
 +
| style="text-align:center;" | 0.068
 +
|-
 +
| style="text-align:center;" | o
 +
| style="text-align:center;" | 0.048
 +
| style="text-align:center;" | 0.045
 +
|-
 +
| style="text-align:center;" | r
 +
| style="text-align:center;" | 0.024
 +
| style="text-align:center;" | 0
 +
|-
 +
| style="text-align:center;" | s
 +
| style="text-align:center;" | 0.143
 +
| style="text-align:center;" | 0.136
 +
|-
 +
| style="text-align:center;" | t
 +
| style="text-align:center;" | 0.048
 +
| style="text-align:center;" | 0.091
 +
|-
 +
| style="text-align:center;" | v
 +
| style="text-align:center;" | 0.024
 +
| style="text-align:center;" | 0.023
 +
|-
 +
| style="text-align:center;" | w
 +
| style="text-align:center;" | 0.024
 +
| style="text-align:center;" | 0.023
 +
|-
 +
| style="text-align:center;" | space
 +
| style="text-align:center;" | 0.190
 +
| style="text-align:center;" | 0.180
 +
|-
 +
|}
 +
 +
* For binary channels it is possible to that the receiver alphabet is the same as the source alphabet  <math> R = S</math> where <math> \{ 0,1 \}</math> are the symbols. Suppose the source sends a binary image where each pixel is either a 1 or 0 with <math> P(s=1) = p </math> and <math> P(s=0) = 1- p </math>. If the channel is noiseless then <math> P(r=1) = p </math> and <math> P(r=0) = 1-p </math> and the noise conditional probability <math> P(r=0|s=1) = P(r=1|s=0) = 0 </math>. However, if the channel is noisy then the noise probabilities will take effect: <math> P(r=0|s=1) = P(r=1|s=0) = \epsilon </math> resulting in <math> P(r=1) = \epsilon + p - 2\epsilon p </math> and <math> P(r=0) = 1-\epsilon-p+2\epsilon p </math>. Deriving these results are part of your theoretical exercise 😁.
  
 
==Binary Symmetric Channels==
 
==Binary Symmetric Channels==

Revision as of 12:19, 26 February 2022

Channel Basics

When Shannon developed his theory of information, it was in the context of improving communication. His goal was to determine if there was a way to maximize the transmission rates over some noisy channel. In this module, we will learn more about the channel and some formal terminologies that we will use in the succeeding modules. The GIF below recalls the story of Bob and Alice. Bob wants to send a message to Alice across some channel. Let's simplify the scenario. Let's say Bob just wants to send a love letter to Alice through some wireless (satellite based) channel.

Bob alice nodencode.gif
Figure 1: Simplified communication model


Figure 2 shows a simplified model of their communication. Bob is the source. Sometimes we use the term sender or transmitter for the source. The medium where the message goes through is the channel (in this case, the wireless channel). Alice is the receiver who probably gets the correct message at the other end of the channel. Let's take a look at each component formally.



The Source

The source is the component that produces messages or objects sent over a channel. We represent the source as a random variable which contains the outcomes and each outcome has its own probability distribution . The random variable is also called the source alphabet and the outcomes are the symbols of the source alphabet. A combinational sequence of symbols forms the message that travels through the channel. Consider the following examples:

  • We can let the source alphabet be the English alphabet where the symbols are all the letters from a to z including the space. Suppose we want to send the message "the quick brown fox jumps over the lazy dog". All letters of the English alphabet has a probability distribution associated with it. We have seen this from our programming exercise of module 2. We extracted the probability distributions of N-block combinations for English, German, French, and Tagalog languages.
  • We can let be the binary alphabet where the symbols are . The message could be streams of 1s and 0s that could mean something. For example, a message could be a sequence of events when a light is on or off. It could also be an indicator for sending decimal values in the form of binary digits. Finally, it could represent binary pixels of a fixed-size image.
  • In biology, we can let be the DNA bases whose symbols are . is adenine, is cytosine, is guanine, and is thymine. These symbols combine to form DNA sequences that are messages to instruct a cell to do certain types of protein synthesis.

Make sure to understand carefully what source alphabets, symbols, and messages mean.

The Channel

The channel is the medium where the source message travels until it gets to the receiver. In our Bob and Alice example, they used a wireless channel. The channel has a maximum capacity measured as the number of symbols that can be transmitted per second. We call this the channel capacity. We will discuss this later. A channel is, most of the time, associated with noise. A noiseless channel is where the information of the message gets to the receiver "perfectly". That means whatever the source sends gets to the receiver without any glitches or any manipulation of information. A noisy channel flips existing symbols of a message or adds unnecessary (or new) symbols that are not originally part of the source alphabet. The noise disrupts the message and thereby affecting the information that the receiver retrieves. We associate the chance of flipping symbols as conditional probabilities. For example, suppose we received a 1 at the receiver's end but, on a bird's eye view, the source actually sent a 0. This probability is the probability of receiving a 1 given that a 0 was sent (i.e., ). This probability models the noise.

Let's take a few examples:

  • In the Bob and Alice story, the wireless channel can disrupt Bob's message. Suppose one of the messages Bob sends is "love can be as sweet as smelling the roses". Unfortunately, noise can either corrupt symbols or add unwanted symbols such the the message becomes "love cant bed as sweat as smelting the noses". This is a disaster if it gets to Alice. 😱
  • Another example is memory. Suppose you're in Mars and you need to record audio logs and videos on a solid-state drive (SSD). Memories can be corrupted due to cosmic-ray bit flips. These bit flips occur due to cosmic energies that zap some bits of memory. Say some data gets flipped into . Both representations mean totally different things.
  • Last example is social media. Good people (source) would like to spread factual news (message) that are noteworthy of broadcasting to ordinary citizens (receivers). Unfortunately, this does not stop evil citizens in creating fake news (noise). So since both factual news and fake news mix together in social media, fake news disrupts the intended messages for ordinary citizens. Don't you hate it when this happens? 😩 Innocent people fall into this trap.

Of course, if the channels were noiseless, the examples above won't have any issues with the received messages. We won't be dealing with the physics of channels in this course. Although, it is worth mentioning since channels can be modelled for different entities.

The Receiver

The receiver, obviously, receives and accepts the message at the other end of the channel. Just like the source, the receiver is a random variable that has outcomes associated to probability distribution. We can also call as the receiver alphabet with symbols . Observe that we purposely set the maximum number as because it is possible that the receiver may receive more symbols than what the source alphabet (i.e., ) when the channel is noisy. A noiseless channel produces where all outcomes and the probability distributions . If the channel is noisy then and either the outcomes or the probability distributions are not equal. Take note that it's possible to have the exact same outcomes but different probability distributions. Let's take a look at some examples:

  • From the Bob and Alice example, when Bob sent "love can be as sweet as smelling the roses" but Alice received "love cant bed as sweat as smelting the noses" shows. Here, we can show that but the probability distribution of will be different from . Checkout the tabulated data below. We'll leave it up to you how we got these numbers. It should be obvious.
Outcomes
a 0.071 0.091
b 0.024 0.023
c 0.024 0.023
d 0 0.023
e 0.167 0.136
g 0.024 0.023
h 0.024 0.023
i 0.024 0.023
l 0.071 0.045
m 0.024 0.023
n 0.048 0.068
o 0.048 0.045
r 0.024 0
s 0.143 0.136
t 0.048 0.091
v 0.024 0.023
w 0.024 0.023
space 0.190 0.180
  • For binary channels it is possible to that the receiver alphabet is the same as the source alphabet where are the symbols. Suppose the source sends a binary image where each pixel is either a 1 or 0 with and . If the channel is noiseless then and and the noise conditional probability . However, if the channel is noisy then the noise probabilities will take effect: resulting in and . Deriving these results are part of your theoretical exercise 😁.

Binary Symmetric Channels

Sample Application: Noisy Images