Skip to content

Communication, Codes and Cyphers

Spring 2002

Homework 2 - Compression

For Questions 1-3, assume that the English language is a memoryless source which produces characters with the following probabilities:

LetterProbLetterProbLetterProbLetterProb
A0.064H0.042O0.056V0.010
B0.014I0.063P0.017W0.018
C0.027J0.003Q0.004X0.003
D0.035K0.006R0.049Y0.018
E0.100L0.035S0.056Z0.002
F0.020M0.020T0.071SPACE0.166
G0.014N0.056U0.031
  1. Construct a (binary) Huffman code for English. What is its expected codeword length? Encode the words ``THERE'' and ``ZEBRA'' in your code.

  2. Calculate the information content of each letter in English. What is the entropy of English? Approximately how much information would you expect in a book containing 1,000,000 letters of English?

  3. Calculate the amount of information contained in this question.
  4. Consider the following image:

    Each box represents a pixel.

    1. Encode this image as a sequence of 0's and 1's.
    2. Use the LZW algorithm to compress this sequence.
    3. Use run length encoding (with a maximum run-length of 16) to compress this sequence. Encode the run lengths as 4-bit binary numbers (ie. 1 = 0001, 2 = 0010, etc).
    4. Which code compresses best?
[ Valid XHTML 1.0! ]