Homework 2 - Compression
For Questions 1-3, assume that the English language is a memoryless source
which produces characters with the following probabilities:
| Letter | Prob | Letter | Prob | Letter | Prob | Letter | Prob |
| A | 0.064 | H | 0.042 | O | 0.056 | V | 0.010 |
| B | 0.014 | I | 0.063 | P | 0.017 | W | 0.018 |
| C | 0.027 | J | 0.003 | Q | 0.004 | X | 0.003 |
| D | 0.035 | K | 0.006 | R | 0.049 | Y | 0.018 |
| E | 0.100 | L | 0.035 | S | 0.056 | Z | 0.002 |
| F | 0.020 | M | 0.020 | T | 0.071 | SPACE | 0.166 |
| G | 0.014 | N | 0.056 | U | 0.031 | | |
-
Construct a (binary) Huffman code for English. What is its expected
codeword length? Encode the words ``THERE'' and ``ZEBRA'' in your code.
-
Calculate the information content of each letter in English. What is the
entropy of English? Approximately how much information would you expect in
a book containing 1,000,000 letters of English?
-
Calculate the amount of information contained in this question.
-
Consider the following image:
Each box represents a pixel.
- Encode this image as a sequence of 0's and 1's.
- Use the LZW algorithm to compress this sequence.
- Use run length encoding (with a maximum run-length of 16) to
compress this sequence. Encode the run lengths as 4-bit binary
numbers (ie. 1 = 0001, 2 = 0010, etc).
- Which code compresses best?
|