Difference between revisions of "Source coding"

From Microlab Classes
Jump to navigation Jump to search
Line 11: Line 11:
  
 
== Source codes ==
 
== Source codes ==
Simply put, a source code <math>\mathcal{C}</math> is a mapping from an input alphabet <math>\mathcal{X}</math> to another alphabet <math>\mathcal{D}</math>, which is the set of ''codewords''. If we let <math>D = |\mathcal{D}|</math>, then we can say that <math>\mathcal{C}</math> is a D-ary code. A source code <math>\mathcal{C}</math> can be ''extended'' so that sequences of symbols drawn from <math>\mathcal{X}</math> are encoded as the concatenafirst rowtion of the individual encodings of each source symbol. For example, consider the source alphabet <math>\mathcal{X} = \{1, 2, 3, 4\}</math> and the source code
+
Simply put, a source code <math>\mathcal{C}</math> is a mapping from an input alphabet <math>\mathcal{X}</math> to another alphabet <math>\mathcal{D}</math>, which is the set of ''codewords''. If we let <math>D = |\mathcal{D}|</math>, then we can say that <math>\mathcal{C}</math> is a D-ary code. A source code <math>\mathcal{C}</math> can be ''extended'' so that sequences of symbols drawn from <math>\mathcal{X}</math> are encoded as the concatenation of the individual encodings of each source symbol. For example, consider the source alphabet <math>\mathcal{X} = \{1, 2, 3, 4\}</math> and the source code
  
 
{| class="wikitable"
 
{| class="wikitable"

Revision as of 02:08, 15 March 2021


A First Look at Shannon's Communication Theory

Figure 1: A general communication system[1].

In his landmark 1948 paper[1], Claude Shannon developed a general model for communication systems, as well as a framework for analyzing these systems. The model has three components: (1) the sender or source, (2) the channel, and (3) the receiver or sink. The model also includes the transmitter that encodes the message into a signal, the receiver, for decoding the signal back into a message, as well the noise of the channel, as shown in Fig. 1.

In Shannon's discrete model, the source provides a stream of symbols from a finite alphabet, , which are then encoded. The code is sent through the channel, which could be corrupted by noise, and when the code reaches the other end, it is decoded by the receiver, and then the sink extracts information from the steam of symbols.

Note that sending information in space, e.g. from here to there is equivalent sending information in time, e.g. from now to then. Thus, Shannon's theory applies to both information transmission and information storage.

Source codes

Simply put, a source code is a mapping from an input alphabet to another alphabet , which is the set of codewords. If we let , then we can say that is a D-ary code. A source code can be extended so that sequences of symbols drawn from are encoded as the concatenation of the individual encodings of each source symbol. For example, consider the source alphabet and the source code

1

2

3

4

0

010

01

10

will have an extension which will map the sequence to 010001.

The given definition, by itself, is so general that it may include codes that barely have any practical use. For example, from the definition it is possible to have a code which maps every symbol to 0. You should see that such a code is useless since there is no way to recover the original symbols from a stream of zeros. A code , such as the example just now, is called singular if there exists distinct symbols such that . Whenever a code produces a one-to-one mapping between and , we will call such a code nonsingular.

As a preview, below is a list of increasingly stricter classes of source codes:

  • nonsingular: if whenever .
  • uniquely encodable: if the extension of is nonsingular.
  • instantaneous: if no codeword is a prefix of another codeword.

Uniquely encodable codes produces encodings that can only correspond to a unique sequence of input symbols. The code presented above is nonsingular, but not uniquely encodable because the encoding 0100 can be obtained from either or .

Code-hierarchy.svg
singular nonsingular but not uniquely encodable uniquely encodable but not instantaneous instantaneous
1 0 0 10 0
2 0 010 00 10
3 0 01 11 110
4 0 10 110 111

References

  1. 1.0 1.1 C. E. Shannon, A Mathematical Theory of Communication, The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656, July, October, 1948. (pdf)