| UNLV : Math : Corran Webster : Teaching : CCC | Skip to content | |
Communication, Codes and CyphersSpring 2002 |
||
ContentsQuick Links |
Introduction to RSARecall that a number p is prime if it's only whole number divisors are 1 and itself (by convention, 1 is not considered prime). Two numbers are relatively prime if their greatest common divisor is 1, or in other words, their only common factor is 1. Example:Prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23, ... The numbers 12 and 35 are relatively prime, since the factors of 12 are 1, 2, 3, 4, 6 and 12; while the factors of 35 are 1, 5, 7 and 35. The numbers 12 and 14 are not relatively prime, as 2 divides both numbers. An efficient way to find the greatest common divisor is the Euclidean algorithm. Procedure
ExampleConsider 35 and 12:
35 = 2 x 12 + 11
Since 11 is not 0, we try again:
12 = 1 x 11 + 1
Since 1 is not 0, we try again:
11 = 11 x 1 + 0
Now we have a remainder of 0, so we are done, and the gcd is 1. CodeIn the python programming language, the following code finds the gcd:
def gcd(a, b):
if a < b:
a, b = b, a
while b:
a, b = b, a % b
return a
Computational ComplexityIt is important to be able to estimate, at least in general terms, how efficient an algorithm is, based on the size of the inputs. Computer scientists are concerned with how many CPU operations the algorithm takes, as well as the resources such as memory or bandwidth that might be required by the algorithm. For our purposes, we will be primarily interested in how many CPU instructions are required to perform the an algorithm when the input size is n. In addition, we are really only concerned with how fast the number of operations grows as n grows. ExampleTo multiply two numbers consisiting of n digits (or bits), the simplest algorithm is the long multiplication algorithm you learnt in school. In this algorithm, you must multiply each digit from the first number by each digit in the second number, which means that you are doing n^2 multiplications. You also need to add together the lines, which means at worst about (2n)^2 digit additions. If a digit multiplication takes time m, and a digit addition takes time a, then the total running time of the long multiplication algorithm is roughly (m + 4a) n^2 The important idea here is that this quantity grows at a square rate: if you double the number of digits, you quadruple the time required to do the multiplication. Computer scientists often use "big O" notation to describe running time. An algorithm has running time O(f(n)) if (in terms of calculus):
r(n)
lim ---- = c
n->oo f(n)
where 0 < c < oo. (For those who are interested, this often requires L'Hopital's rule to calculate). In intuitive terms, this gives you an idea of how fast an algorithm gros for large inputs. ExampleLong multiplication is O(n^2). The best known multiplication algrithm is O(n log n log(log n)). For this algorithm, if you double the number of digits from 2048 to 4096, you increase the running time by a factor of only about 2.275, which is signficantly better than the factor of 4 for long multiplication. ExampleThe fastest a general purpose sorting algorithm can run is O(n log n). ExampleThe fastest known general purpose factoring algorithm is the number field sieve, which runs in:
O(e^(1.9 n^(1/3) (log n)^(2/3)))
Computer scientists are generally happy with any algorithm which runs in polynomial time or better, ie there is some polynomial such that:
r(n)
lim ---- = 0.
n->oo p(n)
A problem which can be solved by a computer by an algorithm which runs in polynomial time is of class P. ExampleMultiplication and sorting are in P. As far as we know, factoring is not. Another important class are called NP problems (NP stands for non-deterministic polynomial time). Roughly speaking, these are problems whose solutions can be verified in polynomial time. ExampleAny problem in P is in NP. Factoring is in NP, since if you know the factors, all you need to do to verify the solution is to multiply and compare the answer, which is polynomial time. The Travelling Salesman problem is NP. An important theoretical result says that there are certain NP problems, such as the Travelling Salesman problem, which if you can solve using a certain algorithm, you can adapt that algorithm to provide a solution to any NP problem, using only polynomial additional time. These sorts of problems are called NP-complete. It is conjectured that there are NP problems which are not in P, but none are known. On the other hand, some people believe that P and NP are equal. To decide one way or another, all that is needed is either a single NP problem which can be proved not to be P; or a single NP-complete problem which can be shown to be P. In any case, most public key cryptosystems have at their heart an NP problem which is not known to be in P. In essence, decoding an encrypted message is an NP problem, and the secret key is the solution which allows the message to be decrypted in polynomial time. RSA CodesRSA codes rely on the difficulty of factoring large numbers. Procedure
(That this number exists and is unique is guaranteed by a basic theorem from abstract number theory, and it is easy to find using an algorithm related to the Euclidean algorithm.)
(That you get M_k back by this process is guaranteed by a generalization of Fermat's Little Theorem, another theorem from abstract number theory.) ExampleFor simplicity, we will choose small prime numbers:
p = 5, q = 11, so n = 55
Now we pick our secret number d. d must be coprime to (5-1)(11-1) = 40, so it can't have 2 or 5 as factors. Let's choose d = 23. A quick series of calculations tells us that e = 7, since:
23 * 7 = 161 = 1 (mod 40)
So to encode something, we need to know the numbers n = 55 and e = 7. Let's encode by letting Space = 0, A = 1, ..., Z = 26, a = 27, ..., z = 52, Period = 53, Comma = 54, say. So the message:
Hello, RSA.
Becomes the sequence of numbers:
8, 31, 38, 38, 41, 54, 0, 18, 19, 1
These are encoded by raising to the 7th power, mod 55, giving the sequence:
2, 26, 47, 47, 46, 54, 0, 17, 24, 1
or, converting back to letters, "BZuut, QXA". A few simple calculations show that if you raise these numbers to the 23rd power, mod 55, you get the original numbers back. For example:
2^23 = 8388608 = 8 (mod 55)
Note that some of the numbers get quite large. For example:
47^23 = 287243845682065590744605010781602099023 = 38 (mod 55)
You can keep the numbers small by calculating the remainder at each step; you will still get the same answer overall:
47*47 = 2209 = 9 (mod 55)
47*47*47 = 47*9 = 423 = 38 (mod 55)
47*47*47*47 = 47*38 = 1786 = 26 (mod 55)
...
47^22 = 2209 = 9 (mod 55)
47^23 = 47*9 = 423 = 38 (mod 55)
(Note also that there is a lot of repetition in the intermediate powers, which can be exploited to work out the power more quickly.) This example is not particularly good, since it is really just a complicated way of generating a substitution cypher, however, if we choose larger numbers, and convert blocks of letters together, we get a quite secure code. Examplep = 73428577 is probably prime, as is q = 126733 (the probability that each is not prime is about one in 10 billion, and finding these numbers took less than a second on my Mac G4). Their product is:
n = 9305823848941
and since log_2 n = 43.08... we can encode blocks of 43 bits or less into blocks of 44 bits or more. 5 ASCII characters take up 40 bits, so given a message in ASCII, we could break into blocks of 5 characters and encode. To break this code using statistical attacks, you would need to know the distributions of all of the (roughly) 1 trillion possible 5 character blocks in a typical message, as well as a large collection of encrypted text. Breaking this code by trying to guess the factorization of n is potentially much more reasonable. The square root of n is 3050544, and a brute-force attempt to factorize this number takes only a couple of seconds on my Mac G4. How Secure is RSA?Breaking RSA means finding the secret decoding number d for a given public key (n, e). If you know d, then you can read any message that is transmitted. Factoring MethodsThe simplest way to break RSA is to factor the number n (which is publically available) into the numbers p and q; given these you can find the secret key d from the number e in the same way that the number e was generated from d. A 2 MHz PC can execute about 63 x 10^12 operations in a year. As a (very) rough ballpark figure, that might translate to checking about 10^12 numbers to see if they divide a given number n using a simple brute-force method. This means that any secret key with less than about 80 bits can be easily broken in about a year by a single desktop PC. Or in a single day by 365 desktop PCs splitting the problem between them. Cleverer algorithms on faster hardware can potentially do much better. A team led by Nicko van Someren claims to be able to factor 512 bit RSA keys in about 6 weeks with spare off-the-shelf hardware from their lab; a paper distributed earlier this year by Bernstein describes how a special purpose chip could be designed and built which could factor 1024 bit RSA keys in a matter of seconds. While the cost of creating such a chip is high, most major governments already have the infrastructure to do so, and it is likely that if they don't have such chips already, then they will soon. Doubling the number of bits in an RSA key roughly multiplies by 8 the time required to create and use the code; however it increases the required time for factorization by a much larger amount. Using the best known general purpose factoring algorithm for large numbers, the number field sieve, doubling the key size from 1024 to 2048 bits increases the time required to break the key by a factor of about 22.8 billion. Putting this in terms of Bernstein's special purpose chip: if this chip could break 1024 bit RSA in a second, then it would take over 700 years to break 2048 bit RSA. This is probably not enough to be considered secure, since anyone who can build one machine could potentially build several thousand and break the code in a few weeks. Increasing from 1024 to 4096 bits increases the time needed for usage of the code by a factor of 64, but increases the cracking time by a factor of 9.2 x 10^23: a hypothetical Bernstein machine would need 27984 trillion years to break it. Moore's LawMoore's Law states that the amount of computing power roughly doubles every year so. This is an emprical law, first stated by Gordon Moore (a co-founder of Intel) in terms of the number of transistors that can be put onto a chip, but seems to also apply to processor speed, computer memory sizes and storage sizes. It was originally proposed in 1965, and held true until the 1970's. Since then the rate of doubling has slowed to about every 18 months to two years, but most observers (including Moore himself) think that it will continue to hold for at least another 30 years or so. What this means is that you have a problem that might take 100 years to solve on today's hardware, then according to Moore's law, in 2010 you'll be able to solve it in about a year using the hardware available then, and in 2020 you'll be able to solve it in a few days of computing time. If Moore's law holds for another 100 years, it is likely that the 4096 bit RSA keys (which would take 27984 trillion years with current technology) could be broken in less than a year. What this means is that the fastest way to factor a large number may well be to wait until computers are powerful enough to factor the number in a reasonable time. From a security standpoint, it means that you should be extremely skeptical of claims about In conclusion, given the best publically known factoring algorithms, 4096 bit RSA is secure against anyone for practical purposes; and 1024 bit RSA is secure against anyone but major govenments and corporations. All this could change if someone came up with a more efficient factoring algorithm, or some new form of computing technology which is significantly more efficient than the present technology. It has been shown that so-called quantum computers can, theoretically, factor numbers extremely quickly. However no one has yet built a quantum computer, and even components that would be needed to build one are extremely limited - the largest collection of quantum memory assembled to date could hold 7 bits of data. There is also an important conjecture (that P=NP) in computation theory which, if true, would mean that factoring numbers could be done much more quickly. Other AttacksWe really don't need to know p and q to be able to calculate d. In particular, if you can find the number (p - 1)(q - 1) you can find d fairly easily. This number happens to be equal to the Euler phi function of n, which is defined to be the number of numbers less than n which are relatively prime to it. If there were an efficient way to calculate this Euler phi function for a given number n, then this would give a way to break RSA. All known ways of calculating the Euler phi function require knowing p and q, however; and given the Euler phi function, you can find p and q. This means that this attack is likely to be as hard as factoring n. Another way of looking at the problem is that, since C is M raised to the n-th power (mod n), all we need is an efficient way to find e-th roots (mod n). Very little is known about this problem, but it is likely to be hard to solve as well. In conclusion, breaking RSA could potentially be much easier than factoring large numbers. |
|
|
|
||