Engaging introduction to information theory

From Microlab Classes
Revision as of 22:13, 2 February 2022 by Ryan Antonio (talk | contribs) (Initial commit for this page. Included first part of the introduction.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

What's with information theory?

If Newton developed theories for calculus and Einstein developed the theory of relativity, then Claude Shannon, at par with these mathematical geniuses, developed information theory. In 1948, Claude's published work called "A Mathematical Theory of Communication" made this mathematical framework where we can quantify information. Information based on any dictionary can mean knowledge, facts, meaning, or even a message that was obtained from investigation, study, or instruction. Information in a mathematical sense can mean something similar but with a little twist in its definition. Let's take a look at a few examples. Suppose you were told the following:

  1. Classes will start next week!
  2. The entire semester will be suspended due to the pandemic!
  3. AEKG EAKJGRALGN EAFKEA EAFFH

If we ask you identify which of the two statements convey the most information, hopefully you'll pick statement number 2! If you think about it carefully, the first statement is probably something you know already. It has a high probability of happening but this information isn't new to you. As opposed to the second statement, the probability of this happening is almost close to none and we'll be the most surprised if it does happen! The second statement has the most information. Lastly, the third statement is just a jumble of letters that is meaningless and therefore has no meaning at all. In the perspective of the English language, the third statement has very low chance of happening because we know very well that this is not how we structure our sentences in English.

The example above tells us something about information. Information consists of surprise and meaning. In this class, we'll be more interested in the surprise part of information. There are two motivations for this: (1) Shannon's information theory was originally developed for communication systems and 'surprise' was more relevant. (2) Meaning or semantics is a challenging problem. We won't discuss it for now but for those interested you might want to check out the branch of artificial intelligence called natural language processing (NLP).

A tale of 2n paths

A simple case of data compression

A brief history of information theory

Odd one out!