| UNLV : Math : Corran Webster : Teaching : Teaching : Codes : Introduction | Skip to content | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Communication, Codes and CyphersIntroduction |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Contents |
1.2 Examples1.2.1 Morse CodeMorse code is probably the code the lay public would most recognize - once it was the international standard for telecommunication. Its source alphabet are the 26 letters of the uppercase Roman alphabet and it encodes them to the channel alphabet which is a binary code of dots and dashes. The translation is as follows:
There are a number of features of Morse code that we will see repeated later on. The first is that each English letter gets translated into a string of dots and dashes, and that the translations or codewords of different letters may have different lengths - Morse code is a variable length code. The second, and perhaps most important, feature is that these lengths vary roughly according to how often the letters occur in English - the most common letters, like E and T have the shortest codewords, while uncommon letters, like Q and Z, have the longest. The correspondence is not exact - O is more common in English than I, yet it has a longer codeword. The reason for this is that Morse developed his code before the techniques we will be looking in to later were devised. However Morse code as presented above has a serious flaw. If I sent you the message · · · - - - · · ·I might mean SOS, but I may also have meant VMS, or EEETTTEEE or a number of other possible messages. In other words, using just dots and dashes, Morse code is not uniquely decipherable. In practice, the way that people got around this was to leave a pause after the end of each letter (usually of the same length as a dash). In effect this is adding another symbol to the channel alphabet: a pause. We will denote this by a slash character /. So what we should have transmitted to make our meaning unambiguous was: · · · / - - - / · · · /So we now have a code which is uniquely decipherable (in fact it is now instantaneously decipherable; see Section 1.4). The problem with this change is that it makes all our codewords one symbol longer, which slows down the transmission rate. Morse code is definitely not the best we can do, and this is largely the reason it is no longer a major standard for communication. 1.2.2 ASCIIASCII, or the American Standard Code for Information Interchange, is now one of the standards for electronic communication, particularly between computers. Indeed it is the basic standard coding for Internet messages. It is also a standard for the representation of data within a computer. The source alphabet for ASCII is a collection of 128 (= 27) characters, including upper- and lower- case Roman characters, numbers, punctuation, a <space> character and a variety of control characters such as <return> and <tab>. If you have used a computer at all, the source alphabet of ASCII will be familiar to you as, roughly speaking, the characters that you can type from your keyboard. Since, internally, computers work in binary, the channel alphabet is simply the two binary digits (or bits) 0 and 1. The encoding takes each source character and translates it to a string of seven 0's and 1's. For example, the upper-case A becomes 1000001, while <space> becomes 0100000. The complete code is:
Most of the first thirty-two characters are various control codes, but the most commonly used are CR (Carriage Return), LF (Line Feed) and SP (SPace). Others, like BS (BackSpace), DEL (DELete) and ESC (ESCape) may be familiar as keys on a standard computer keyboard. Since computers usually store information in bytes, which hold 8 bits, we have one bit left over. What this bit is used for can vary: some systems will use it to extend the code to allow representation of additional characters; often however it is used in a simple error detection scheme called a parity check. Parity checking involves looking at the codeword for a character, and counting the number of 1's in it. If this number is odd, then we set the eighth bit to be 1; if the number is even, we set the eighth bit to 0. This means that the total number of 1's in the 8 bit string is now always even. So if the decoder receives a codeword which has an odd number of 1's, we know that there must have been an error in the channel. Unfortunately, we have no way of telling what the error was. This is no problem if the decoder can contact the sender and ask for the appropriate bit of the message to be sent again. However, if two errors happen in the codeword, we will have no way of knowing, since we will once again have an even number of 1's in our codeword. 1.2.3 ISBNMost books and other similar (non-periodical) publications have an ISBN, or International Standard Book Number. This is a 10 digit code number that can usually be found on the same page as the publication details or on the back cover. A typical example might be: 0-19-853287-3The ISBN consists of 4 parts, separated by dashes (the actual length of each section may vary). The first is a country code, the second a code identifying the publisher, the third part identifies the particular book, and the last is a single error-check digit. The tenth digit is chosen so that if the ISBN has digits abcdefghij then the number given by the formula 10a + 9b + 8c + 7d + 6e + 5f + 4g + 3h + 2i + jis evenly divisible by 11 (this might mean that j would have to be 10 and in that case an X is used for the tenth "digit"). This scheme can not only detect if one of the digits is changed, but can also detect if the order of two of the digits get swapped. We will see why this is true when we talk about modulo arithmetic, but a couple of examples should convince you that it works. ExampleSuppose that the ISBN above gets changed by an error to 0-19-857287-3Then we have that the formula gives 0 + 9 + 72 + 56 + 30 + 35 + 8 + 24 + 14 + 3 = 251and this has remainder 9 when we divide by 11. Similarly, if the ISBN were to be changed to 0-12-853987-3then we would get that the formula gives 0 + 9 + 16 + 56 + 30 + 15 + 36 + 24 + 14 + 3 = 203and this has remainder 5 when we divide by 11. 1.2.4 Substitution CyphersThe simple substitution cypher dates back at least to Roman times. The way it works is that you simply mix-up (or permute) the letters of your alphabet and replace each letter in your message by the corresponding letter in the permuted alphabet. For example, on the USENET newsgroup rec.humor.funny, sick or offensive jokes are usually encoded so that people who do not want to read them do not have to. The code used is called ROT-13, where every letter of the alphabet is shifted by 13 letters. We can write this permutation as: ABCDEFGHIJKLMNOPQRSTUVWXYZ NOPQRSTUVWXYZABCDEFGHIJKLMby which we mean we change all the A's to N's, B's to O's, and so forth. Using this code the message: WE SHOULD MAKE A BIG WOODEN HORSEgets translated to: JR FUBHYQ ZNXR N OVT JBBQRA UBEFR Given a long enough message encrypted by a substitution cypher it is not hard to break them and work out what the original message is. The usual approach is to look at the frequency with which the letters in the encrypted message occur: if the message is long enough, then the frequency of an encrypted letter should approach the frequency of the unencrypted letter to which it corresponds. In fact, using a little educated guesswork and knowledge of English, it is often possible to work out what the unencrypted version must be for a message with as few as 50 letters. Needless to say, you do not want to be using a substitution cypher to transmit national secrets. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||