Skip to content

Communication, Codes and Cyphers

Spring 2002

Sample Projects

There is a vast amount of territory you could cover in a project related to this course. The following are some suggestions for projects. I strongly encourage you to devise your own, based on your own particular skills and interests. Feel free to discuss your ideas with me.

As a general rule, I am looking for projects which require you to apply your critical thinking skills. Simple reporting of facts without any analysis will not get good grades.

Basic Projects

These projects are fairly basic, safe, and easy to complete. They are unadventurous. As a result, I would only give a grade higher than a B for such a project if it is truly exceptional.

  • A biography of an important figure in the field, such as Shannon or Turing.
  • Describing how a particular code not covered in detail in this course works (eg. mp3, video compression, knapsack codes, zero-knowledge proofs, etc.).
  • An essay discussing the role codes and codebreakers played in some world events (eg. the UK breaking the German Enigma cyphers in WWII, the Navajo "codetalkers").
  • Writing a computer program that implements or is related to one of the codes covered in detail in the course.

Better Projects

These projects are deeper, more challenging, more interesting, and have a higher chance of failure. I will look more kindly on an ambitious project which falls slightly short, than an unadventurous one which succeeds.

  • The statistics we use about the frequency of words and letters reflect the English language frequencies. Choose a different language (eg. French) and, through the analysis of source documents, construct frequency tables for that language. Compare the frequencies with those of English, particularly in the information/entropy of the language. Construct Huffman codes for the language, and/or give examples of code-breaking using your data.

    You should describe your methodology, and compare your results to any published sources that you can find (explaining any discrepancies).

  • Implement a computer program or suite of programs for breaking substitution cyphers (or any other cypher). Ideally this should provide more than just frequency analysis of the cyphertext, it should try to guess words and letters, and identify what it thinks the plain-text should be in many cases. The closer you can get it to human standards of analysis, the better.

    You should throughly document the program. Compare your work with that of others.

  • Devise a code for musical notation. It should be flexible enough to encode almost any piece of standard sheet music. Using your code, perform frequency/entropy analyses for several different pieces of music of different styles. Are there noticable differences, and if so, do the differences reflect any qualities of the music?

    Compare your code with systems such as MIDI. Is there any other published work looking at this sort of question?

  • The prime number described at The Prime Pages may constitute an illegal circumvention device under the Digital Millennium Copyright Act (DMCA), since if it is decompressed with gzip it becomes a program for breaking the encryption system on DVDs.

    Computer code for performing RSA and similar public-key encryption are considered munitions by the US government if the secret keys are sufficiently hard to break. Exporting these computer programs is illegal when in machine readable form, but not if it is printed in a book or other human-readable form.

    Analyse legal issues involving codes and cryptography. Consider court cases like the "DeCSS" case against the hacker magazine 2600. Is publishing computer code protected speech, or is it sometimes the legal equivalent of yelling "Fire!" in a crowded theater? Compare and contrast the legal situation in the US with the situation in other countries.

    You may also wish to consider issues surrounding the patenting of encoding schemes, such as RSA cryptography and the various LZ compression systems.

  • Zipf's law states that given a natural language such as English, the probability of occurrence of the nth most common word is approximately A/n, where A is a constant depending on the language.

    In 1954 Benoit Mandelbrot suggested a modification where the probability is given instead by the formula

    A/(n + V)(1/D)
    where V and D are constants depending on the language, and A is a parameter so that the sum of the probabilities is 1.

    Report on the failings of Zipf's law, the possible advantages of Mandelbrot's scheme, and the connections with fractals. Estimate values of V, D and A which are reasonable for English.

    This project requires you to be able to read French (or find someone who can and is willing to translate for you).

    References:
    • B. Mandelbrot, Structure Formalle des Textes et Communication, Word, 10 (1954), 1-26.
    • B. Mandelbrot, Fractals, Form Chance and Dimension. 1977. Freeman, San Francisco.
  • English grammar is error-prone, especially when spoken. Compare

    Have you eaten, mother?
    with
    Have you eaten mother?
    Indeed it can even be ambiguous, as in the phrase
    the shooting of the hunters
    (are the hunters shooting or being shot?)

    Moreover, English is plagued by homophones and homographs, is not consistent in pronunciation and is not at all optimised for speed.

    Devise an artificial language, with a more robust grammar - preferably error-correcting; an alphabet which accurately reflects pronunciation; which has a lower entropy than English. A sample vocabulary should be included, which avoids homophones and homographs, and includes an error-detection and -correction system.

    Using the language should not require the speaker to do matrix multiplication in their heads - composition should be simple. Any error-correction is permitted to use sophisticated techniques. The language should be as grammatically sophisticated as English.

    Additional enhancements, such as genderless pronouns; mellifluousness; consistency; the ability to deal with proper names or words from other languages nicely; or closeness to English; will be given consideration as well.

    Your report should detail why and in what ways your language is superior to English, backed up by calculations and examples to prove your point. You should also discuss any limitations or disadvantages; and you may feel inclined to speculate about practical applications. Compare your work with the work of others, and look at how the limitations of English are overcome in safety critical situations, such as Air Traffic Control.

[ Valid XHTML 1.0! ]