Sample Projects
There is a vast amount of territory you could cover in a project related
to this course. The following are some suggestions for projects. I strongly
encourage you to devise your own, based on your own particular skills and
interests. Feel free to discuss your ideas with me.
As a general rule, I am looking for projects which require you to apply
your critical thinking skills. Simple reporting of facts without any
analysis will not get good grades.
Basic Projects
These projects are fairly basic, safe, and easy to complete. They are
unadventurous. As a result, I would only give a grade higher
than a B for such a project if it is truly exceptional.
- A biography of an important figure in the field, such as Shannon or
Turing.
- Describing how a particular code not covered in detail in this course
works (eg. mp3, video compression, knapsack codes, zero-knowledge proofs,
etc.).
- An essay discussing the role codes and codebreakers played in some
world events (eg. the UK breaking the German Enigma cyphers in WWII, the
Navajo "codetalkers").
- Writing a computer program that implements or is related to one of
the codes covered in detail in the course.
Better Projects
These projects are deeper, more challenging, more interesting, and have a
higher chance of failure. I will look more kindly on an ambitious project
which falls slightly short, than an unadventurous one which succeeds.
-
The statistics we use about the frequency of words and letters reflect
the English language frequencies. Choose a different language (eg.
French) and, through the analysis of source documents, construct frequency
tables for that language. Compare the frequencies with those of English,
particularly in the information/entropy of the language. Construct
Huffman codes for the language, and/or give examples of code-breaking
using your data.
You should describe your methodology, and compare your results to any
published sources that you can find (explaining any discrepancies).
-
Implement a computer program or suite of programs for breaking
substitution cyphers (or any other cypher). Ideally this should
provide more than just frequency analysis of the cyphertext, it
should try to guess words and letters, and identify what it thinks
the plain-text should be in many cases. The closer you can get it
to human standards of analysis, the better.
You should throughly document the program. Compare your work with
that of others.
-
Devise a code for musical notation. It should be flexible enough to
encode almost any piece of standard sheet music. Using your code,
perform frequency/entropy analyses for several different pieces of
music of different styles. Are there noticable differences, and if
so, do the differences reflect any qualities of the music?
Compare your code with systems such as MIDI. Is there any other
published work looking at this sort of question?
-
The prime number described at The
Prime Pages may constitute an illegal circumvention device
under the Digital Millennium Copyright Act (DMCA), since if it is
decompressed with gzip it becomes a program for breaking the
encryption system on DVDs.
Computer code for performing RSA and similar public-key encryption
are considered munitions by the US government if the secret keys are
sufficiently hard to break. Exporting these computer programs is
illegal when in machine readable form, but not if it is printed in
a book or other human-readable form.
Analyse legal issues involving codes and cryptography. Consider
court cases like the "DeCSS" case against the hacker magazine 2600.
Is publishing computer code protected speech, or is it sometimes
the legal equivalent of yelling "Fire!" in a crowded theater? Compare
and contrast the legal situation in the US with the situation in
other countries.
You may also wish to consider issues surrounding the patenting of
encoding schemes, such as RSA cryptography and the various LZ
compression systems.
-
Zipf's law states that given a natural language such as English, the
probability of occurrence of the nth most common word is approximately
A/n, where A is a constant depending on the language.
In 1954 Benoit Mandelbrot suggested a modification where the probability
is given instead by the formula
A/(n + V)(1/D)
where V and D are constants depending on the language, and
A is a parameter so that the sum of the probabilities is 1.
Report on the failings of Zipf's law, the possible advantages of
Mandelbrot's scheme, and the connections with fractals. Estimate values of
V, D and A which are reasonable for English.
This project requires you to be able to read French (or find someone who
can and is willing to translate for you).
References:
- B. Mandelbrot, Structure Formalle des Textes et
Communication, Word, 10 (1954), 1-26.
- B. Mandelbrot, Fractals, Form Chance and Dimension. 1977.
Freeman, San Francisco.
-
English grammar is error-prone, especially when spoken. Compare
Have you eaten, mother?
with
Have you eaten mother?
Indeed it can even be ambiguous, as in the phrase
the shooting of the hunters
(are the hunters shooting or being shot?)
Moreover, English is plagued by homophones and homographs, is not
consistent in pronunciation and is not at all optimised for speed.
Devise an artificial language, with a more robust grammar - preferably
error-correcting; an alphabet which accurately reflects pronunciation;
which has a lower entropy than English. A sample vocabulary should be
included, which avoids homophones and homographs, and includes an
error-detection and -correction system.
Using the language should not require the speaker to do matrix
multiplication in their heads - composition should be simple. Any
error-correction is permitted to use sophisticated techniques. The
language should be as grammatically sophisticated as English.
Additional enhancements, such as genderless pronouns; mellifluousness;
consistency; the ability to deal with proper names or words from other
languages nicely; or closeness to English; will be given consideration as
well.
Your report should detail why and in what ways your language is superior
to English, backed up by calculations and examples to prove your point.
You
should also discuss any limitations or disadvantages; and you may feel
inclined to speculate about practical applications. Compare your work
with the work of others, and look at how the limitations of English
are overcome in safety critical situations, such as Air Traffic Control.
|