Difference between revisions of "161-A2.3"
Adrian Vidal (talk | contribs) |
Adrian Vidal (talk | contribs) |
||
Line 1: | Line 1: | ||
− | The exercise below will appear in your submission bin after passing Exercise 2.1. (Prefix-free | + | The exercise below will appear in your submission bin after passing Exercise 2.1. (Prefix-free codes). |
== Exercise 2.2. Huffman encoding (5 points) == | == Exercise 2.2. Huffman encoding (5 points) == |
Revision as of 07:34, 15 March 2021
The exercise below will appear in your submission bin after passing Exercise 2.1. (Prefix-free codes).
Exercise 2.2. Huffman encoding (5 points)
If we can model our information source as discrete and memoryless, Huffman codes will provide a prefix-free encoding that has the minimum expected length per symbol. In practice, we do not have access to the underlying probability distribution of our information source, so we use the frequency counts as a proxy. For instance, if we have a coin-flipping machine as an information source and we observe 201 heads and 99 tails after 300 observations, then we estimate the probability of heads to be 201/300. We can similarly perform such an estimate for other information sources with larger alphabets.
The Huffman code can be visualized as a binary tree where each symbol is located at a leaf. The path from the root node to a leaf node corresponds to a binary encoding of one symbol. To produce a Huffman code, we need a sequence of samples to observe their relative frequencies. Construction then proceeds in a greedy fashion wherein more probable symbols tend to have shorter encodings. In this exercise, you are to use a Huffman encoder to convert a sequence of samples into a bit string.
Task description
In this task, you need to write a function huffman_encode
which will take a HuffmanEncoder
object and a list of strings, and return a binary string consisting of '0'
s and '1'
s which corresponds to an encoding of samples. The list of strings may contain strings with more than one character, and each string is guaranteed to have some encoding using the Huffman encoder.
Although the Huffman encoder is constructed in the header script for this exercise, we still encourage you to understand how it works. Note that in the interest of conciseness and readability, the script below may not be the fastest way to implement Huffman encoders.
Header |
class Node:
def __init__(self, data, weight):
self.left = None
self.right = None
self.data = data
self.weight = weight
class HuffmanEncoder:
def __init__(self, samples):
alphabet = set(samples)
alphabet_size = len(alphabet)
nodes = [Node({symb}, samples.count(symb)) for symb in alphabet]
for it in range(1, len(alphabet)):
a,b = sorted(range(len(nodes)), key = lambda idx : nodes[idx].weight)[:2]
supernode = Node(nodes[a].data.union(nodes[b].data), nodes[a].weight + nodes[b].weight)
supernode.left = nodes[a]
supernode.right = nodes[b]
nodes = [nodes[idx] for idx in range(len(nodes)) if idx not in (a,b)]
nodes.append(supernode)
self.root = nodes[0]
def encode(self, samples):
return huffman_encode(self, samples)
|
Student submission |
def huffman_encode(encoder, samples):
# Your code goes here
# More code..
return ans
|
Main test script |
# Main test script goes here, and will not be visible to students.
# The script will test if the submitted code does the prescribed functionality.
# For successfully validated scripts, test results will be sent via email.
|
Sample input
To be added