1.2. Common Cryptographic Primitives

As claimed at the outset of this chapter, it is rather difficult to give a precise definition of the term cryptography. The best way to understand it is by examples. In this section, we briefly describe the common problems that cryptography deals with.

1.2.1. The Classical Problem: Secure Transmission of Messages

To start with, we introduce the legendary figures of cryptography: Alice, Bob and Carol. Alice wants to send a message to Bob over a public communication channel like the Internet and wants to ensure that nobody other than Bob can make out the meaning of the message. A third party like Carol, who has access to the communication channel, can intercept the message. But the message should be wrapped or transformed before transmission in such a way that knowledge of some secret piece of information is needed to unwrap or transform back the message. It is Bob who has this information, but not Carol (nor Dorothy nor Emily nor . . .).

It is expedient to point out here that Alice, Bob and Carol need not be human beings. They can stand for organizations (like banks) or, more correctly, for computers or computer programs run by individuals or organizations. It is, therefore, customary to call them parties, entities or subjects instead of persons or characters. In the cryptology jargon, Carol has got several names used interchangeably: adversary, eavesdropper, opponent, intruder, attacker and enemy are the most common ones. When a message transmission like that just mentioned is involved, Alice is called the sender and Bob is called the receiver of the message.

It is a natural strategy to put the message in a box and lock the box using a key, called the encryption key. A matching decryption key is needed to unlock the box and retrieve the message. The process of putting the message in the box is commonly called encoding and that of locking the box is called encryption. The reverse processes, namely unlocking the box and taking the message out of the box are respectively called decryption and decoding. This is precisely the classical encryption–decryption protocol of cryptography.^[1]

^[1] Some people prefer to use the terms enciphering and deciphering in place of the words encryption and decryption respectively.

In the world of electronic communication, a message M is usually a bit string, and encoding, encryption, decryption and decoding are well-defined transformations of bit strings. If we denote by f_e the transformation function consisting of encoding and encryption, then we get a new bit string C = f_e(M, K_e), where K_e stands for the encryption key. This bit string C is sent over the communication channel. After Bob receives C, he uses the reverse transformation f_d (decryption followed by decoding) to get the original message M back; that is, M = f_d(C, K_d). Note that the decryption key K_d is needed as an argument to f_d. If Carol does not know this, she cannot compute M. We conventionally call M the plaintext message and C the ciphertext message.

The encoding and decoding operations do not make use of keys and can be performed by anybody. (It should not be difficult to put a letter in or take a letter out of an unlocked box!) One might then wonder why it is necessary to do these transformations instead of applying the encryption and decryption operations directly on M and C respectively. With whatever we have discussed so far, we cannot give a full answer to this question. For the answer, we will need to wait until we reach the later chapters. We only mention here that the encryption algorithms often require as input some mathematical entities (like integers or elements of a field) which are logically not bit strings. But that’s not all! As we see later, the additional transformations often add to the security of the protocols. On the other hand, for a general discussion, it is often unnecessary to start from the encoding process and end at the decoding process. As a result, we will assume, unless otherwise stated, that M is the input to the encryption routine and the output of the decryption routine, in which case f_e and f_d stand for the encryption and decryption functions only.

Symmetric-key or secret-key cryptography

In the simplest form of locking mechanism, one has K_e = K_d. That is, the same key, called the symmetric key or the secret key, is used for both encryption and decryption. Common examples of such symmetric-key algorithms include DES (Data Encryption Standard) together with its various modifications like the Triple DES and DES-X, IDEA (International Data Encryption Algorithm), SAFER (Secure And Fast Encryption Routine), FEAL (Fast Encryption Algorithm), Blowfish, RC5 and AES (Advanced Encryption Standard). We will not describe all these algorithms in this book. Interested readers can look at the abundant literature to know more about them.

Asymmetric-key or public-key cryptography

The biggest disadvantage of using a secret-key system is that Alice and Bob must agree upon the key K_e = K_d secretly, for example by personal contact or over a secure channel. This is a serious limitation and is not often practical nor even possible. Another drawback of secret-key systems is that every pair of parties needs a key for communication. Thus, if there are n entities communicating over a net, the number of keys would be of the order of n². Also, each entity has to remember O(n) keys for communicating with other entities. In practice, however, an entity does not communicate with every other entity on the net. Yet the total number of keys to be remembered by an entity could be quite high.

Both these problems can be avoided by using what is called an asymmetric-key or a public-key protocol. In such a protocol, each entity decides a key pair (K_e, K_d), makes the encryption key K_e public and keeps the decryption key K_d secret. K_e is also called the public key and K_d the private key. Anybody who wants to send a message to Bob gets Bob’s public key, encrypts the message with the key, and sends the ciphertext to Bob. Upon receiving the ciphertext, Bob uses his private key to decrypt the message. One may view such a lock as a self-locking padlock. Anybody can lock a box with a self-locking padlock, but opening it requires a key which only Bob possesses.

The source of security of such a system is based on the difficulty of computing the private key K_d given the public key K_e. It is apparent that K_e and K_d are sort of inverses of each other, because the former is used to generate C from M and the latter is used to generate M from C. This is where mathematics comes into the picture. We mention a few possible constructions of key pairs in the next section and the rest of the book deals with an in-depth study of these public-key protocols.

Attractive as it looks, public-key protocols have a serious drawback, namely they are orders of magnitude slower than their secret-key counterparts. This is of concern, if huge amounts of data need to be encrypted and decrypted. This shortcoming can be overcome by using both secret-key and public-key protocols in tandem as follows: Alice generates a secret key (say, for AES), encrypts the message by the secret key and the secret key by the public key of Bob and sends both the encrypted message and the encrypted secret key. Bob first decrypts the encrypted secret key using his private key and uses this decrypted secret key to decrypt the message. Since secret keys are usually short bit strings (most commonly of length 128 bits), the slow performance of the public-key algorithms causes little trouble. But at the same time, Alice and Bob are relieved of having a previous secret meeting or communication for agreeing on the secret key. Moreover, neither Alice nor Bob needs to remember the secret key. During every session of message transmission, a random secret key can be generated and later destroyed, when the communication is over.

1.2.2. Key Exchange

There is an alternative method by which Alice and Bob can exchange secret information (like AES keys) over a public communication channel. Let us first see how this can be done in the physical lock-and-key scenario. Alice generates a secret, puts it in a box, locks the box with her own key and sends it to Bob. Bob, upon receiving the locked box, adds a second lock to it and sends the doubly locked box back to Alice. Alice then removes her lock and again sends the box to Bob. Finally, Bob uses his key to unlock the box and retrieve the secret. A third party (Carol) that can access the box during the three communications finds it locked by Alice or Bob or both. Since Carol does not possess the keys to these locks, she cannot open the box to discover the secret.

This process can be abstractly described as follows: Alice and Bob first independently generate key pairs (A_{K_e}, A_{K_d}) and (B_{K_e}, B_{K_d}) respectively. Alice then sends A_{K_e} to Bob and Bob sends B_{K_e} to Alice. The private keys A_{K_d} and B_{K_d} are not disclosed. They also agree upon a function g with which Alice computes g_A = g(A_{K_d}, B_{K_e}) and Bob computes g_B = g(B_{K_d}, A_{K_e}). If g_A = g_B, then this common value can be used as a shared secret between Alice and Bob.

Our intruder Carol knows g and taps the values of A_{K_e} and B_{K_e}. So the function g should be such that a knowledge of these values alone does not suffice for the computation of g_A = g_B. One of the private keys A_{K_d} or B_{K_d} is needed for the computation. Since (A_{K_e}, A_{K_d}) and (B_{K_e}, B_{K_d}) are key pairs, it is assumed that private keys are difficult to compute from the knowledge of the corresponding public keys.

Such a technique of exchanging secret values over an insecure channel is called a key-exchange or a key-agreement protocol. It is important to point out here that such a protocol is usually based on the public-key paradigm; that is to say, we do not know secret-key counterparts for a key-exchange protocol. Since a shared secret between the communicating parties is usually short, the low speed of public-key algorithms is really not a concern in this case.

1.2.3. Digital Signatures

A digital signature is yet another application of the public-key paradigm. Suppose Alice wants to sign a message M in such a way that the signature S can be verified by anybody but nobody other than Alice would be able to generate the signature S on the message M. This can be achieved as follows: Alice generates a key pair (K_e, K_d), makes K_e public and keeps K_d secret. She now uses the decryption function f_d to generate the signature, that is, S = f_d(M, K_d). The signature S is then made public. Anybody who has access to Alice’s public key K_e applies the reverse transformation f_e to get back the message M = f_e(S, K_e).

If Carol signs the message M with a different key , then she generates the signature S′ = f_d(M, ). Now, since and K_e are not matching keys, verification using K_e gives M′ = f_e(S′, K_e), which is different from M. If we assume that M is a message written in a human-readable language (like English), then M′ would generally look like a meaningless sequence of characters which is neither English nor any sensible string to a human reader. So the signature verifier would then immediately conclude that this is a case of forged signature.

Such a scheme of generating digital signatures is called a signature scheme with message recovery. It is obvious that this is the same as our encrypt–decrypt scheme with the sequence of encryption and decryption steps reversed. If the message M to be signed is quite long, using this algorithm calls for a large execution time both for signature generation and for verification. It is, therefore, customary to use another variant of signature schemes called signature schemes with appendix that we describe now.

Instead of applying the decryption transform directly on M, Alice first computes a short representative H(M) of her message M. Her signature now becomes the pair S = (M, σ), where σ = f_d(H(M), K_d). Typically, a hash function (see Section 1.2.6 ) is used to compute the representative H(M) from M and is assumed to be a public knowledge. Now anybody can verify the signature by checking if the equality H(M) = f_e(σ, K_e) holds. If a key different from K_d is used to generate the signature, one would (in general) get a value σ′ ≠ σ and the signature forging will be detected by observing that H(M) ≠ f_e(σ′, K_e).

1.2.4. Entity Authentication

By entity authentication, we mean a process in which one entity called the claimant proves its identity to another entity called the prover. Entity-authentication techniques, thus, tend to prevent impersonation of an entity by an intruder. Both secret-key and public-key techniques are used for entity-authentication schemes.

The simplest example of an entity-authentication scheme is the use of passwords, as in a computer where a user (the claimant) tries to gain access to some resources in a computer (the prover) by proving its identity using a password. Password schemes are mostly based on secret-key techniques. For example, the UNIX password system is based on encrypting the zero message (a string of 64 zero bits) using a repeated application of a variant of the DES algorithm with 64 bits of the user input (the password) as the key. Password-based authentication schemes are fixed and time-invariant and are often called weak authentication schemes.

We see applications of public-key techniques in challenge–response authentication schemes (also called strong authentication schemes). Assume that an entity, Alice, wants to prove her identity to another entity, Bob. Alice generates a key pair (K_e, K_d), makes K_e public and keeps K_d secret. Now, Bob chooses a random message M, encrypts M using Alice’s public key—that is, computes C = f_e(M, K_e)—and sends C to Alice. Alice, upon reception of C, decrypts it using her private key K_d; that is, she regenerates M = f_d(C, K_d) and sends M to Bob. Bob compares this value of M with the one he generated, and if a match occurs, Bob becomes sure that the entity who is claiming to be Alice possesses the knowledge of Alice’s private key. If Carol uses any private key other than K_d for the decryption, she gets a message M′ different from M and thereby cannot prove to Bob her identity as Alice. This is how this scheme prevents impersonation of Alice by Carol.

Entity authentication is often carried out using another interesting technique called zero-knowledge proof. In such a protocol, the prover (or any third party listening to the conversation) gains no knowledge regarding the secret possessed by the claimant, but develops the desired confidence regarding the claim by the claimant of the possession of the secret. We provide here an informal example explaining zero-knowledge proofs.

Let us think of a circular cave as shown in Figure 1.1. The cave has two exits, left and right, denoted by L and R respectively. The cave also has a door inside it, which is invisible outside the cave. Alice (A) wants to prove to Bob (B) that she possesses a key to this door without showing him the key or the process of unlocking the door with the key. Bob stations himself somewhere outside the exits of the cave. Alice enters the cave and randomly chooses the left or right wing of the cave (and goes there). She does not disclose this choice to Bob, because Bob is not allowed to know the session secrets too. Once Alice is placed in the cave, Bob makes a random choice from L and R and asks Alice (using cell phones or by shouting loudly) to come out of the cave via that chosen exit. Suppose Bob challenges Alice to use L. If Alice is in the left wing, she can come out of the cave using L. If Alice is in the right wing, she must use her secret key to open the central door to come to the left wing and then go out using exit L. If Alice does not possess the secret key, she can succeed in obeying Bob’s directive with a probability of half. If this procedure is repeated t times, then the probability that Alice succeeds on all occasions without possessing the secret key is (1/2)^t = 1/2^t. By choosing t appropriately, Bob can make the probability of accepting a false claim arbitrarily small. For example, if t = 20, then the chance is less than one in a million that Alice can establish a false claim.

Figure 1.1. Zero-knowledge proofs

Thus, if Alice succeeds every time, Bob gains the desired confidence that Alice actually possesses the secret. However, during this entire process, Bob can obtain no information regarding Alice’s secrets (the key and the choices of wings). Another important aspect of this interaction is that Alice has no way of predicting Bob’s questions, preventing impostors (of Alice) from fooling Bob.

1.2.5. Secret Sharing

Suppose that a secret piece of information is to be distributed among n entities in such a way that n – 1 (or fewer) entities are unable to construct the secret. All of the n entities must participate to reveal the secret. As usual, let us assume that the secret is an l-bit string. A simple strategy would be to break the string into n parts and provide each entity with a part. This method is, however, not really attractive, because it gives partial information about the secret. Thus, for example, if a 256-bit long bit string is to be distributed equally among 16 entities, any 15 of them working together can reconstruct the secret by trying only 2¹⁶ = 65536 possibilities for the unknown 16 bits.

We now describe an alternative strategy that does not suffer from this drawback. Once again, we break the secret string into n parts and consider the parts as integers a₀, . . . , a_n–1. We construct the polynomial f(x) = xⁿ+a_n–1x^n–1 + · · · + a₁x+a₀ and give the integers f(1), f(2), . . . , f(n) to the entities. When all of the entities cooperate, the linear system of equations f(i) = iⁿ + a_n–1i^n–1 + · · · + a₁i + a₀, 1 ≤ i ≤ n, can be solved to find out the unknown coefficients a₀, . . . , a_n–1 which, in turn, reveal the secret. On the other hand, if n – 1 or less entities cooperate, they get an underspecified system of equations in n unknowns, from which the actual solution is not readily available.

The secret-sharing problem can be generalized in the following way: to distribute a secret among n parties in such a way that any m or more of the parties can reconstruct the secret (for some m ≤ n), whereas any m – 1 or less parties cannot do the same. A polynomial of degree m as in the above example readily adapts to this generalized situation.

1.2.6. Hashing

A function which converts bit strings of arbitrary lengths to bit strings of a fixed (finite) length is called a hash function. Hash functions play a crucial role in cryptography. We have already seen an application of it for designing a digital signature scheme with appendix. If H is a hash function, a pair of input values (strings) x₁ and x₂ for which H(x₁) = H(x₂) is called a collision for H. For any hash function H, collisions must exist, since H is a map from an infinite set to a finite set. However, for cryptographic purposes we want that collisions should be difficult to obtain. More specifically, a cryptographic hash function H should satisfy the following desirable properties:

First pre-image resistance

Except for a small set of hash values y it should be difficult to find an input x with H(x) = y. We exclude a small set of values, because an adversary might prepare (and maintain) a list of pairs (x, H(x)) for certain values of x of her choice. If the given value of y is the second coordinate of one pair in her list, she can produce the corresponding input value x easily.

Second pre-image resistance

Given a pair (x, H(x)), it should be difficult to find an input x′ different from x with H(x) = H(x′).

Collision resistance

It should be difficult to find two different input strings x, x′ with H(x) = H(x′).

Hash functions are also called message digests and can be used with a secret key. Popular examples of unkeyed hash functions are SHA-1, MD5 and MD2, whereas those for keyed hash functions include HMAC and CBCMAC.

1.2.7. Certification

So far we have seen several protocols which are based on the use of public keys of remote entities, but have never questioned the authenticity of public keys. In other words, it is necessary to ascertain that a public key is really owned by a remote entity. Public-key certificates are used to that effect. These are data structures that bind public-key values to entities. This binding is achieved by having a trusted certification authority digitally sign each certificate.

Typically a certificate is issued for a period of validity. However, it is possible that a certificate becomes invalid before its date of expiry for several reasons, like possible or suspected compromise of the private key. Under such circumstances it is necessary that the certification authority revokes the certificate and maintains a list called certificate revocation list (CRL) of revoked certificates. When Alice verifies the authenticity of Bob’s public-key certificate by verifying the digital signature of the authority and does not find the certificate in the CRL, she gains the desired confidence in using Bob’s public key.

The X 5.09 public-key infrastructure specifies Internet standards for certificates and CRLs.

2.2. Sets, Relations and Functions

Sets are absolutely basic entities used throughout the present-day study of mathematics. Unfortunately, however, we cannot define sets. Loosely speaking, a set is an (unordered) collection of objects. But we run into difficulty with this definition for collections that are too big. Of course, infinite sets like the set of all integers or real numbers are not too big. However, a collection of all sets is too big to be called a set. (Also see Exercise 2.6.) It is, therefore, customary to have an axiomatic definition of sets. That is to say, a collection qualifies to be a set if it satisfies certain axioms. We do not go into the details of this axiomatic definition, but tell the axioms as properties of sets. Luckily enough, we won’t have a chance in the rest of this book to deal with collections that are not sets. So the reader can, for the time being, have faith in the above (wrong) identification of a set as a collection.

An object in a set is commonly called an element of A. By the notation , we mean that a is an element of the set A. Often a set A can be represented explicitly by writing down its elements within curly brackets or braces. For example, A = {2, 3, 5, 7} denotes the set consisting of the elements 2, 3, 5, 7 which are incidentally all the (positive) prime numbers less than 10. We often use the ellipsis sign (. . .) to denote an infinite (or even a finite) set. For example, would denote the set of all (positive) prime numbers. (We prove later that is an infinite set.) Alternatively, we often describe a set by mentioning the properties of the elements of the set. For example, the set can also be described as .

Some frequently occurring sets are denoted by special symbols. We list a few of them here.

	The set of all natural numbers, that is, {1, 2, 3, . . .}
	The set of all non-negative integers, that is, {0, 1, 2, . . .}
	The set of all integers, that is, {. . . , –2, –1, 0, 1, 2, . . .}
	The set of all (positive) prime numbers, that is, {2, 3, 5, 7, . . .}
	The set of all rational numbers, that is,
	The set of all non-zero rational numbers
	The set of all real numbers
	The set of all non-zero real numbers
	The set of all complex numbers
	The set of all non-zero complex numbers
	The empty set

The cardinality of a set A is the number of elements in A. We use the symbol #A to denote the cardinality of A. If #A is finite, we call A a finite set. Otherwise A is said to be infinite. The empty set has cardinality zero.

2.2.1. Set Operations

Let A and B be two sets. We say that A is a subset of B and denote this as A ⊆ B, if all elements of A are in B. Two sets A and B are equal (that is, A = B) if and only if A ⊆ B and B ⊆ A. A is said to be a proper subset of B (denoted ), if A ⊆ B and A ≠ B (that is, B ⊈ A).

The union of A and B is the set whose elements are either in A or in B (or both). This set is denoted by A ∪ B. The intersection of A and B is the set consisting of elements that are common to A and B. The intersection of A and B is denoted by A ∩ B. If , then we say that A and B are disjoint. In that case, the union A∪B is also called a disjoint union and is referred to as by A⊎B. (For a generalization, see Exercise 2.7.) The difference of A and B, denoted A \ B, is the set whose elements are in A but not in B. If A is understood from the context and B ⊆ A, then we denote A \ B by and refer to as the complement of B (in A). The product A × B of two sets A and B is the set of all ordered pairs (a, b) where and .

The notion of union, intersection and product of sets can be readily extended to an arbitrary family of sets. Let A_i, , be a family of sets indexed by I. In this case, we denote the union and intersection of A_i, , by and respectively. The product of A_i, , is denoted by . When A_i = A for all , we denote the product also as A^I. If, in addition, I is a finite set of cardinality n, then the product A^I is also written as Aⁿ.

2.2.2. Relations

A relation ρ on a set A is a subset of A × A. For , we usually say a ρ b implying that a is related by ρ to b. Common examples are the standard relations =, ≠, ≤, <, ≥, > on (or or ).

A relation ρ on a set A is called reflexive, if a ρ a for all . For example, =, ≤ and ≥ are reflexive relations on , but the relations ≠, <, > are not.

A relation ρ on A is called symmetric, if a ρ b implies b ρ a. On the other hand, ρ is called anti-symmetric if a ρ b and b ρ a imply a = b. For example, = is symmetric and anti-symmetric, <, ≤, > and ≥ are anti-symmetric but not symmetric, ≠ is symmetric but not anti-symmetric.

A relation ρ on A is called transitive if a ρ b and b ρ c imply a ρ c, For example, =, <, ≤, >, ≥ are all transitive, but ≠ is not transitive.

An equivalence relation is one which is reflexive, symmetric and transitive. For example, = is an equivalence relation on , but neither of the other relations mentioned above (≠, <, ≥ and so on) is an equivalence relation on .

A partition of a set A is a collection of pairwise disjoint subsets A_i, , of A, such that , that is, A is the union of A_i, , and for i, , i ≠ j, . The following theorem establishes an important connection between equivalence relations and partitions.

Theorem 2.1.

An equivalence relation on a set A produces a partition of A. Conversely, every partition of a set A corresponds to an equivalence relation on A.

Proof

Let ρ be an equivalence relation on a set A. For , let us denote . Clearly, , since (by reflexivity). Now we show that for a, , either [a] = [b] or . Assume that . Choose . By construction, a ρ c. Now choose . Then a ρ d and b ρ d. By symmetry, d ρ b, so that by transitivity a ρ b, that is, b ρ a. But a ρ c. Hence, once again by transitivity, b ρ c, that is, . Thus [a] ⊆ [b]. Similarly [b] ⊆ [a].

Conversely, let A_i, , be a partition of A. Define a relation ρ on A such that a ρ b if and only if a and b are in the same subset A_i for some i. It is easy to see that ρ is an equivalence relation on A.

The subset [a] of A defined in the proof of the above theorem is called the equivalence class of a with respect to the equivalence relation ρ.

An anti-symmetric and transitive relation is called a partial order (or simply an order). All of the relations =, ≤, <, ≥, > are partial orders on (but ≠ is not). A partial order ρ on A is called a total order or a linear order or a simple order, if for every a, , a ≠ b, either a ρ b or b ρ a. For example, if we take A = {1, 2, 3} and the relation ρ = {(1, 2), (1, 3)}, then ρ is a partial order but not a total order (because it does not specify a relation between 2 and 3). On the other hand, ρ′ = {(1, 2), (1, 3), (2, 3)} is a total order. A set with a partial (resp. total) order is often called a partially ordered (resp. totally ordered or linearly ordered or simply ordered) set.

2.2.3. Functions

Let A and B two sets (not necessarily distinct). A function or a map f from A to B, denoted f : A → B, assigns to each some element . In this case, we write b = f(a) or f maps a ↦ b and say that b is the image of a (under f). For example, if , then the assignment a ↦ a² is a function. On the other hand, the assignment (the non-negative square root) is not a function, because it is not defined for negative values of a. However, if and , then the assignment (with non-negative real and imaginary parts) is a function.

The function f : A → A assigning a ↦ a for all is called the identity map on A and is usually denoted by id_A. On the other hand, if f : A → B maps all the elements of A to a fixed element of B, then f is said to be a constant function. A function which is not constant is called a non-constant function.

A function f : A → B that maps different elements of A to different elements of B is called injective or one-one. In other words, we call f to be injective if and only if f(a) = f(a′) implies a = a′. The function given by a ↦ a² is not injective, since f(–a) = f(a) for all . On the other hand, the function given by a ↦ 2a is injective. An injective map f : A → B is sometimes denoted by the special symbol f : A ↪ B.

The image of a function f : A → B is defined to be the following subset of . It is denoted by f(A) or by Im f. The function f is said to be surjective or onto or a surjection, if Im f = B, that is, every element b of B has at least one preimage (which means f(a) = b). As an example, the function given by a ↦ a/2 (if a is even) and by a ↦ (a – 1)/2 (if a is odd) is surjective, whereas the function that maps a → |a| (the absolute value) is not surjective. A surjective map f : A → B is sometimes denoted by the special symbol f : A ↠ B.

A map f : A → B is called bijective or a bijection, if it is both injective and surjective. For example, the identity map on a set is bijective. Another example of a bijective function is that maps a to the ath prime.

Let f : A → B and g : B → C be functions. The composition of f and g is the function from A to C that takes a ↦ g(f(a)). It is denoted by g ο f, that is, (g ο f)(a) = g(f(a)). Note that in the notation g ο f one applies f first and then g. The notion of composition of functions can be extended to more than two functions. In particular, if f : A → B, g : B → C and h : C → D are functions, then (h ο g) ο f and h ο (g ο f) are the same function from A to D, so that we can unambiguously write this as h ο g ο f.

2.2.4. The Axioms of Mathematics

The study of mathematics is based on certain axioms. We state four of these axioms. It is not possible to prove the axioms independently, but it can be shown that they are equivalent in the sense that each of them can be proved, if any of the others is assumed to be true.

Let A be a partially ordered set under the relation . An element is called maximal (resp. minimal), if there is no element , b ≠ a, that satisfies (resp. ). Let B be a non-empty subset of A. Then an upper bound (resp. a lower bound) for B is an element such that (resp. ) for all . If an upper bound (resp. a lower bound) a of B is an element of B, then a is called a last element or a largest element or a maximum element (resp. a first element or a least element or a smallest element or a minimum element) of B. By antisymmetry, it follows that a first (resp. last) element of B, if existent, is unique. A chain of A is a totally ordered (under ) subset of A.

Consider the sets , and with the natural order ≤. Neither of these sets contains a maximal element. contains a minimal element 1, but and do not contain minimal elements. The subset of even natural numbers has two lower bounds, namely 1 and 2, of which 2 is the first element of .

A totally ordered set A is said to be well ordered (and the relation is called a well order), if every non-empty subset B of A contains a first element.

Axiom 2.1. Zermalo’s well-ordering principle

Every set A can be well ordered, that is, there is a relation which well orders A.

The set is well-ordered under the natural relation ≤. The set can be well ordered by the relation defined as . A well ordering of is not known.

Axiom 2.2. Zorn’s lemma

Let A be a partially ordered set. If every chain of A has an upper bound (in A), then A has at least one maximal element.

To illustrate Zorn’s lemma, consider any non-empty set A and define to be the set of all subsets of A. is called the power set of A and is partially ordered under containment ⊆. A chain of is a set of subsets of A such that for all i, either A_i ⊆ A_j or A_j ⊆ A_i. Clearly, the union is an upper bound of the chain. Then Zorn’s lemma guarantees that has at least one maximal element. In this case, the maximal element, namely A, is unique. If A is finite, then for the set of all proper subsets of A, a maximal element (under the partial order ⊆) exists by Zorn’s lemma, but is not unique, if #A > 1.

Axiom 2.3. Hausdorff’s maximal principle

Let be a partial order on a set A. Then there is a maximal chain B of A, that is, if C is any chain with B ⊆ C ⊆ A, then C = B.

Finally, let A be a set and , that is, is the set of all non-empty subsets of A. A choice function of A is a function such that for every we have .

Axiom 2.4. Axiom of choice

Every set has a choice function.

Exercise Set 2.2

2.1	Let G = (V, E) be an undirected graph. Define a relation ρ on the vertex set V of G by: u ρ v if and only if there is a path from u to v. Show that ρ is an equivalence relation on V. What are the equivalence classes for this relation? Let G = (V, E) be a directed acyclic graph. Define the relation ρ on V as in (a). Show that ρ is a partial order on V. When is ρ a total order?
2.2	Let f : A → B and g : B → A be functions. Show that if f ο g = id_B, then g is injective and f is surjective. In particular, f (and also g) is bijective, if f ο g = id_B and g ο f = id_A. In this case, we call g to be the inverse of f and denote this as g = f^–1. Show by examples that both the conditions f ο g = id_B and g ο f = id_A are necessary for f to be bijective.
2.3	Let f : A → B a map from a finite set A to a finite set B. Prove that #A ≤ #B, if f is injective, #A ≥ #B, if f is surjective, and #A = #B, if f is bijective.
2.4	Let A be a finite set and let f : A → A be a map. Show that the following conditions are equivalent. f is injective. f is surjective. f is bijective. Show by examples that this equivalence need not hold, if A is an infinite set.
2.5	Let A and B be two arbitrary sets, f : A → B a map, A′ ⊆ A and B′ ⊆ B. We define and . Show that: If A′ ⊆ A″ ⊆ A, then f(A′) ⊆ f(A″). If B′ ⊆ B″ ⊆ B, then f^–1(B′) ⊆ f^–1(B″). f^–1(f(A′)) ⊇ A′. f(f^–1(B′)) ⊆ B′. f(f^–1(f(A′))) = f(A′). f^–1(f(f^–1(B′))) = f^–1(B′).
2.6	Russell’s paradox A collection C is called ordinary, if C is not a member of C. A collection which is not ordinary is called extraordinary. Show that the collection of all ordinary collections is neither ordinary nor extraordinary.
2.7	Let A_i, , be a family of sets (not necessarily pairwise disjoint). For each , consider the set . Show that the family B_i, , are pairwise disjoint. The union is called the disjoint union of A_i, .

2.3. Groups

So far we have studied sets as unordered collections. However things start getting interesting if we define one or more binary operations on sets. Such operations define structures on sets and we compare different sets in light of their respective structures. Groups are the first (and simplest) examples of sets with binary operations.

Definition 2.1.

A binary operation on a set A is a map from A × A to A. If ◊ is a binary operation on A, it is customary to write a ◊ a′ to denote the image of (a, a′) (under ◊).

For example, addition, subtraction and multiplication are all binary operations on (or or ). Subtraction is not a binary operation on , since, for example, 2 – 3 is not an element of . Division is not a binary operation on , since division by zero is not defined. Division is a binary operation on .

2.3.1. Definition and Basic Properties

Definition 2.2.

A group^[1] (G, ◊) is a set G together with a binary operation ◊ on G, that satisfy the following three conditions:

^[1] In binary operations and algebras generally there is a morass of terminology which reflects on the literacy of the promulgators. Starting for example with a poor choice, namely “group”, we now have “semigroup” (why?), “loop” (why?), “groupoid”, and “partial groupoid”. . . .Among other poor choices are “ring”, “field”, “ideal”, “category theory”, and “universal algebra”. “Ideal” was used by Dedekind in a sense which made sense to mathematicians of that day but it does not today. “Field” can best be labeled as ridiculous. As to categories of category theory, the concept of category is too broad for that reduction. It is not good taste to take such a term and place it in restricted surroundings.

—Preston C. Hammer

Associativity (a ◊ b) ◊ c = a ◊ (b ◊ c) for all a, b, .
Identity element There exists a (unique) element such that e ◊ a = a ◊ e = a for all . The element e is called the identity of G.
Inverse For each , there exists a (unique) element such that a ◊ b = b ◊ a = e. The element b is called the inverse of a.
If, in addition, we assume that
Commutativity a ◊ b = b ◊ a for all a, ,
then G is called a commutative or an Abelian group.

A group (G, ◊) is also written in short as G, when the operation ◊ is understood from the context. More often than not, the operation ◊ is either addition (+) or multiplication (·) in which cases we also say that G is respectively an additive or a multiplicative group. For a multiplicative group, we often omit the multiplication sign and denote a · b simply as ab. The identity in an additive group is usually denoted by 0, whereas that in a multiplicative group by 1. The inverse of an element a in these cases are denoted respectively by –a and a^–1. Groups written additively are usually Abelian, but groups written multiplicatively need not be so.

Note that associativity allows us to write a ◊ b ◊ c unambiguously to represent (a ◊ b) ◊ c = a ◊ (b ◊ c). More generally, if , then a₁ ◊ ··· ◊ a_n represents a unique element of the group irrespective of how we insert brackets to compute the element a₁ ◊ ··· ◊ a_n.

Example 2.1.

The set is an Abelian group under addition. The identity is 0 and the inverse of a is –a. Note, however, that is not a group under multiplication, because though it contains the multiplicative identity 1, multiplicative inverse is not defined for all elements in except ±1.
The set of non-zero rational numbers is a group under multiplication. The identity is 1 = 1/1 and the inverse of a/b is b/a.
For a set A, the set of all bijective functions A → A is a group under composition of functions. The identity element is id_A and the inverse of f is denoted by f^–1. (See also Exercise 2.2.) This group is not Abelian in general.
The set of all m × n matrices with entries from is a group under matrix addition. On the other hand, the set of all n × n invertible matrices over is a group under matrix multiplication and is called the general linear group. Note that is another example of a group that is not Abelian (for n > 1).
A group G is called finite, if G as a set consists of (only) finitely many elements. Finite groups play an extremely important role in cryptography. Here is our first example of finite groups: Let n be an integer ≥ 2. The set

is a group under addition modulo n (that is, add (and subtract) two elements in as integers and if the result is not in , take the remainder of division by n). For this group, the identity element is 0 and –a = n – a for a ≠ 0 and –0 = 0. (See Example 2.3 for a formal definition of .)
For an integer n ≥ 2, define the set

If n is prime, then . The set is a group under multiplication modulo n with identity 1. We need little more machinery than introduced so far in order to prove that every element has a multiplicative inverse modulo n. Other group axioms are easy to check.

Proposition 2.1.

Let (G, ◊) be a group and let a, b, . Then a ◊ b = a ◊ c implies b = c. Similarly, a ◊ c = b ◊ c implies a = b. These statements are commonly known as (left and right) cancellation laws.

Proof

We prove only the left cancellation law. The proof of the other law is similar. Let e denote the identity of G and d the inverse of a. Then b = e ◊ b = (d ◊ a) ◊ b = d ◊ (a ◊ b) = d ◊ (a ◊ c) = (d ◊ a) ◊ c = e ◊ c = c.

2.3.2. Subgroups, Cosets and Quotient Groups

Definition 2.3.

Let (G, ◊) be a group. Then a subset H of G is called a subgroup of G, if H is a group under the operation ◊ inherited from G. For a subset H of G to be a subgroup, it is necessary and sufficient that H is closed under the operation ◊ and under inverse. Any subgroup of an Abelian group is also Abelian.

Example 2.2.

For any group G with identity element e, the subsets {e} and G are subgroups of G. They are called the trivial subgroups of G.
For an integer n ≥ 2, the set of all integral multiples of n is an additive subgroup of and is denoted by .
The set consisting of all n × n real matrices of determinant 1 is a subgroup of and is commonly referred to as the special linear group.
Note that though in Example 2.1 is a subset of , it is not a subgroup of , since it is not closed under the addition of . It is a group under addition modulo n which is not the same as integer addition.

Let (G, ◊) be a group. For subsets A and B of G, we denote by A ◊ B the set . In particular, if A = {a} (resp. B = {b}), then A ◊ B is denoted by a ◊ B (resp. A ◊ b). Note that the sets A ◊ B and B ◊ A are not necessarily equal. If G is Abelian, then A ◊ B = B ◊ A.

Definition 2.4.

Let (G, ◊) be a group, H a subgroup of G and . The set a ◊ H is called the left coset of a with respect to H and the set H ◊ a is called the right coset of a with respect to H. If G is Abelian, then a left coset is naturally a right coset and vice versa. In that case, we call a ◊ H (or H ◊ a) simply a coset.

From now onward, we consider left cosets only and call them cosets. If the underlying group is Abelian, then they are the same thing. The theory of right cosets can be parallelly developed, but we choose to omit that here. For simplicity, we also assume that the group G is a multiplicative group, so that the operation ◊ would be replaced by · (or by mere juxtaposition).

Proposition 2.2.

Let G be a (multiplicative) group and H a subgroup of G. Then, the cosets aH, , partition G. Two cosets aH and bH are equal if and only if . There is a bijective map from aH to bH for every a, .

Proof

We define a relation ~ on G such that a ~ b if and only if . Clearly, a ~ a. Now a ~ b implies , so that (See Exercise 2.8), that is, b ~ a. Finally, a ~ b and b ~ c imply a ~ c, since a^–1c = (a^–1b)(b^–1c). Thus ~ is an equivalence relation on G and hence by Theorem 2.1 produces a partition of G. We now show that the equivalence class [a] of is the coset aH. This follows from that for some for some .

Now we define a map by ah ↦ bh for every . The map is clearly surjective. Injectivity of follows from the left cancellation law (Proposition 2.1). Hence is bijective.

The following theorem is an important corollary to the last proposition.

Theorem 2.2. Lagrange’s theorem

Let G be a finite group and H a subgroup of G. Then, the cardinality of G is an integral multiple of the cardinality of H.

Proof

From Proposition 2.2, the cosets form a partition of G and there is a bijective map from one coset to another. Hence by Exercise 2.3 all cosets have the same cardinality. Finally, note that H is the coset of the identity element.

Definition 2.5.

Let G be a group and H a subgroup of G. The number of distinct cosets of H in G is called the index of H in G and is denoted by [G : H]. If G is finite, then [G : H] = #G/#H.

Definition 2.6.

Let H be a subgroup of a (multiplicative) group G. Then H is called a normal subgroup of G, if (aH)(bH) = (abH) for all a, . It is clear that any subgroup H of an Abelian group G satisfies this condition and hence is normal.

If H is a normal subgroup of a group G, then the cosets aH, , form a group with multiplication defined by (aH)(bH) = (abH). This group is called the quotient group of G with respect to H and is denoted by G/H.

Example 2.3.

Let n be an integer ≥ 2. The subgroup of (, +) (Example 2.2) is normal, since is Abelian. The coset of is the set . The quotient group is denoted as and is essentially the same as the group {0, 1, . . . , n – 1} with the operation of addition modulo n (Example 2.1).
For any group G with identity e, the trivial subgroups G and {e} are normal. G/G is a group with a single element, whereas G/{e} is essentially the same as the group G.

2.3.3. Homomorphisms

Definition 2.7.

Let (G, ◊) and (G′, ⊙) be groups. A function f : G → G′ is called a homomorphism (of groups), if f(a ◊ b) = f(a) ⊙ f(b) for all a, , that is, if f commutes with the group operations of G and G′.

A group homomorphism f : G → G′ is called an isomorphism, if there exists a group homomorphism g : G′ → G such that g ο f = id_G and f ο g = id_G′. It can be easily seen that a homomorphism f : G → G′ is an isomorphism if and only if f is bijective as a function.^[2] If there exists an isomorphism f : G → G′, we say that the groups G and G′ are isomorphic and write G ≅ G′.

^[2] If f : G → G′ is a bijective homomorphism, its inverse f^–1 : G′ → G is bijective as a function. However, it is not obvious that f^–1 has to be a group homomorphism. We are lucky here; f^–1 is.

A homomorphism f from G to itself is called an endomorphism (of G). An endomorphism which is also an isomorphism is called an automorphism. The set of all automorphisms of a group G is a group under function composition. We denote this group by Aut G.

Example 2.4.

The canonical inclusion a ↦ a/1 is a group homomorphism from (, +) to (, +). More generally, if H is a subgroup of G, then the map h ↦ h for all is a group homomorphism. In particular, the identity map on any group G is an automorphism of G (and is the identity element of the group Aut G).
For a (multiplicative) group G and a normal subgroup H, the map G → G/H that takes to its coset aH is a surjective group homomorphism. It is called the canonical surjection of G onto G/H. For example, the map that takes a to its remainder of division by n (≥ 2) is a canonical surjection from the additive group to the quotient group . (Also see Examples 2.1, 2.2 and 2.3.)
The map that takes a complex number z = a + ib to its conjugate is a group automorphism of both (, +) and (, ·).

Proposition 2.3.

Let f be a group homomorphism from (G, ◊) to (G′, ⊙). Let e and e′ denote the identity elements of G and G′ respectively. Then f(e) = e′. If a, and c, satisfy a ◊ b = e, c ⊙ d = e′ and f(a) = c, then f(b) = d.

Proof

We have e′ ⊙ f(e) = f(e) = f(e ◊ e) = f(e) ⊙ f(e), so that by right cancellation f(e) = e′. To prove the second assertion we note that c ⊙ d = e′ = f(e) = f(a ◊ b) = f(a) ⊙ f(b) = c ⊙ f(b). Thus f(b) = d.

Definition 2.8.

With the notations of the last proposition we define the kernel of f to be the following subset of G:

Ker .

We also define the image of f to be the subset

of G′. Then we have the following important theorem.

Theorem 2.3. Isomorphism theorem

Ker f is a normal subgroup of G, Im f is a subgroup of G′, and G/ Ker f ≅ Im f.

Proof

In order to simplify notations, let us assume that G and G′ are multiplicatively written groups. For u, , we have f(uv^–1) = f(u)(f(v))^–1 = e′, that is, . By Exercise 2.8, Ker f is a subgroup of H. We now show that it is normal. Note that for and we have f(aua^–1) = f(a)f(u)f(a^–1) = e′, that is, , since f(u) = e′ and f(a^–1) = f(a)^–1. By Exercise 2.10, Ker f is a normal subgroup of G. Now let a′ = f(a) and b′ = f(b) be arbitrary elements of Im f. Then, f(ab^–1) = a′(b′)^–1, that is, . Thus, by Exercise 2.8 Im f is a subgroup of G′.

Now define a map that takes a Ker f ↦ f(a). Let a Ker f = b Ker f. Then by Proposition 2.2, , that is, b = au for some . But then f(b) = f(au) = f(a)f(u) = f(a)e′ = f(a). This shows that the map is well-defined. It is easy to check that is a group homomorphism. Now implies f(a) = f(b), that is, f(a^–1b) = e′, that is, , that is, a Ker f = b Ker f. Thus is injective. It is clearly surjective. Thus is bijective and hence an isomorphism from G/ Ker f to Im f.

2.3.4. Generators and Orders

Definition 2.9.

Let G be a group. In this section, we assume, unless otherwise stated, that G is multiplicatively written and has identity e. Let a_i, , be a family of elements of G. Consider the subset H of G defined as

with the empty product (corresponding to r = 0) being treated as e. It is easy to check that H is a subgroup of G and contains all a_i, . We call H to be the subgroup generated by a_i, , or that the elements a_i, , generate H. H is called finitely generated, if it is generated by finitely many elements. In particular, H is called cyclic, if it is generated by a single element. If H is cyclic and generated by , then g is called a generator or a primitive element of H. Note that, in general, a cyclic subgroup has more than one generators (Exercise 2.47).

Example 2.5.

The additive groups and are generated by 1 and hence are cyclic. The multiplicative group is cyclic if and only if n is 2, 4, p^r or 2p^r, where p is an odd prime and (See Exercise 2.50). A generator of for such an n is often called a primitive root modulo n.
The group (, ·) is generated by the “primes” p/1, , and –1.
Let G be a multiplicative group (not necessarily Abelian) with identity e and let . Then the subgroup H generated by a is the set of elements of the form a^r, , and is always Abelian. If H is finite, then the elements a^r, , cannot be all distinct, that is, a^s = a^t for some s, , s > t. Then a^s–t = e, where s – t > 0. Now a^–1 = a^s–t–1 and, more generally, a^–k = a^k(s–t–1). Thus we may consider H to consist of non-negative powers of a only. Let . It is easy to see that H = {a^r | r = 0, . . . , n – 1}.

Definition 2.10.

Let G be a finite group with identity e. The order of G is defined to be the cardinality of the set G and is denoted by ord G. The order of an element is the cardinality of the subgroup of G generated by a and is denoted by ord_G a or simply by ord a, when G is understood from the context.

With these notations we prove the following important proposition.

Proposition 2.4.

The order m := ord_G a of is the smallest of the positive integers r for which a^r = e. If n = ord G, then n is an integral multiple of m. In particular, aⁿ = e.

Proof

Let H be the (cyclic) subgroup of G generated by a. Then by Example 2.5 H = {a^r | r = 0, . . . , m – 1} and m is the smallest of the positive integers r for which a^r = e. By Lagrange’s theorem (Theorem 2.2), n is an integral multiple of m. That is, n = km for some . But then aⁿ = (a^m)^k = e^k = e.

Lemma 2.1.

Let G be a finite cyclic group. Then any subgroup of G is also cyclic.

Proof

Let G be generated by g and ord G = n. Then G = {g^r | r = 0, . . . , n – 1}. The subgroup {e} of G is clearly cyclic. For an arbitrary subgroup H ≠ {e} of G, define . Now take any and write r = qk + δ, where q and δ are respectively the quotient and remainder of division of r by k with 0 ≤ δ < k. Then g^r = (g^k)^qg^δ and so . The minimality of k implies that δ = 0, that is, g^r = (g^k)^q.

Proposition 2.5.

Let G be a finite cyclic multiplicative group with identity e and let H be a subgroup of order m. Then an element is an element of H if and only if a^m = e.

Proof

If , then a^m = e by Proposition 2.4. Conversely, assume that a^m = e, but a ∉ H. Let K be the subgroup of G generated by the elements of H and by a. By Lemma 2.1, K is cyclic. By assumption, K contains more than m elements (since H ∪ {a} ⊆ K). But every element of K has order dividing m, a contradiction.

Finite cyclic groups play a crucial role in public-key cryptography. To see how, let G be a group which is finite, cyclic with generator g and multiplicatively written. Given one can compute g^r using ≤ 2 lg r + 2 group multiplications (See Algorithms 3.9 and 3.10). This means that if it is easy to multiply elements of G, then it is also easy to compute g^r. On the other hand, there are certain groups for which it is very difficult to find out the integer r from the knowledge of g and g^r, even when one is certain that such an integer exists. This is the basic source of security in many cryptographic protocols, like those based on finite fields, elliptic and hyperelliptic curves.

*2.3.5. Sylow’s Theorem

Sylow’s theorem is a powerful tool for studying the structure of finite groups. Recall that if G is a finite group of order n and if H is a subgroup of G of order m, then by Lagrange’s theorem m divides n. But given any divisor m′ of n, there need not exist a subgroup of G of order m′. However, for certain special values of m′, we can prove the existence of subgroups of order m′. Sylow’s theorem considers the case that m′ is a power of a prime.

Definition 2.11.

Let G be a finite group of cardinality n and let p be a prime. If n = p^r for some , we call G a p-group. More generally, let p be a prime divisor of n. Then a p-subgroup of G is a subgroup H of G such that H is a p-group. If H is a p-subgroup of G with cardinality p^r for some , then p^r divides n. Moreover, if p^r+1 does not divide n, then H is called a p-Sylow subgroup of G.

We shortly prove that p-Sylow subgroups always exist. Before doing that, we prove a simpler result.

Theorem 2.4. Cauchy’s theorem

Let G be a finite group and p a prime dividing ord G. Then G has a subgroup of order p.

Proof

Let n := ord G. Note that if we can find an element such that ord a = p, then the subgroup generated by a is the desired subgroup. To do that consider the set consisting of all p-tuples (a₁, . . . , a_p) with such that a₁ . . . a_p = e. consists of n^p–1 elements, since we can choose a₁, . . . , a_p–1 arbitrarily and independently from G and for each such choice of a₁, . . . , a_p–1 the value of a_p = (a₁ . . . a_p–1)^–1 gets fixed. Since p divides n, it follows that p divides too. Now we define a relation ~ on by (a₁, . . . , a_p) ~ (b₁, . . . , b_p) if and only if (b₁, . . . , b_p) = (a_i, . . . , a_p, a₁, . . . , a_i–1) for some (that is, (b₁, . . . , b_p) is a cyclic shift of (a₁, . . . , a_p)). It is easy to see that ~ is an equivalence relation on . The equivalence class of (a₁, . . . , a_p) contains 1 or p elements depending on whether a₁ = · · · = a_p or not. Let r and s be the the number of equivalence classes containing 1 and p elements of respectively. Then , so that p divides r. Since the equivalence class of (e, . . . , e) contains only one element, we must have r ≥ 1, that is, r ≥ p. This, in turn, proves the existence of , a ≠ e, such that . But then a^p = e.

Now we are in a position to prove the general theorem.

Theorem 2.5. Sylow’s theorem

Let G be a finite group of order n and let p be a prime dividing n. Then there exists a p-Sylow subgroup of G.

Proof

We proceed by induction on n. If n = p, then G itself is a p-Sylow subgroup of G. So we assume n > p and write n = p^rm, where p does not divide m. If r = 1, then the theorem follows from Cauchy’s theorem (Theorem 2.4). So we assume r > 1 and consider the class equation of G, namely, (See Exercise 2.16). If p does not divide [G : C(a)] for some a ∉ Z(G), then #C(a) = #G/[G : C(a)] = p^rm′ < #G for some m′ < m. By induction, C(a) has a p-Sylow subgroup which is also a p-Sylow subgroup of G. On the other hand, if p divides [G : C(a)] for all a ∉ Z(G), then p divides #Z(G), as can be easily seen from the class equation. We apply Cauchy’s theorem on Z(G) to obtain a subgroup H of Z(G) with #H = p. By Exercise 2.16(b), H is a normal subgroup of G and we consider the canonical surjection μ : G → G/H. Since #(G/H) = p^r–1m < n and r > 1, by induction G/H has a p-Sylow subgroup, say K. But then μ^–1(K) is a p-Sylow subgroup of G.

Note that if H is a p-Sylow subgroup of G and , then gHg^–1 is also a p-Sylow subgroup of G. The converse is also true, that is, if H and H′ are two p-Sylow subgroups of G, then there exists a such that H′ = gHg^–1. We do not prove this assertion here, but mention the following important consequence of it. If G is Abelian, then H′ = gHg^–1 = gg^–1H = H, that is, there is only one p-Sylow subgroup of G. If G is Abelian and with pairwise distinct primes p_i and with , then G is the internal direct product of its p_i-Sylow subgroups, i = 1, . . . , t (Exercises 2.17 and 2.19).

Exercise Set 2.3

2.8	Let G be a multiplicatively written group (not necessarily Abelian). Prove the following assertions. For all elements a, , we have (ab)^–1 = b^–1a^–1 and (a^–1)^–1 = a. A subset H of G is a subgroup of G if and only if for all a, .
2.9	Let G be a multiplicatively written group and let H and K be subgroups of G. Show that: H ∩ K is a subgroup of G. H ∪ K is a subgroup of G if and only if H ⊆ K or K ⊆ H. HK is a subgroup of G if and only if HK = KH. In particular, if K is normal in G, then HK is a subgroup of G. G × G is a group and H × K is a subgroup of G × G. If , then gHg^–1 is a subgroup of G.
2.10	Let G be a multiplicatively written group and H a subgroup of G. Show that the following conditions are equivalent: H is a normal subgroup of G. for all and . gHg^–1 = H for all . gH = Hg for all . Show that if [G : H] = 2, then H is normal.
2.11	Let G be a (multiplicative) group. Second isomorphism theorem Let H and K be subgroups of G and let K be normal in G. Show that H/(H ∩ K) ≅ (HK)/K. [H] Third isomorphism theorem Let H and K be normal subgroups of G with H ⊆ K. Show that G/K ≅ (G/H)/(K/H) (where ). [H]
2.12	Show that the only automorphisms of the group (, +) are the identity map and the map that sends a ↦ –a. Show that the group of automorphisms of (, +) is isomorphic to (, ·).
2.13	Let H be a subgroup of G generated by a_i, . Show that H is the smallest subgroup of G, that contains all of a_i, .
2.14	Let be a homomorphism of (multiplicative) groups. Show that: If H is a subgroup of G, then is a subgroup of G′. If is surjective and H is normal, then H′ is also normal. If H′ is a subgroup of G′, then is a subgroup of G. If H′ is normal, then H is also normal. If is surjective and H is normal, then H′ is also normal. Correspondence theorem Let H be a normal subgroup of G. Then the subgroups (resp. normal subgroups) of G/H are in one-to-one correspondence with the subgroups (resp. normal subgroups) of G, that contain H. [H]
2.15	Let G be a cyclic group. Show that G is isomorphic to or to for some depending on whether G is infinite or finite.
2.16	Let G be a finite (multiplicative) group (not necessarily Abelian). We define the centre of G to be the set . Show that Z(G) is a subgroup of G. If H ⊆ Z(G) is a subgroup of G, show that H is a normal subgroup of G. The centralizer of is defined to be the set . Show that C(a) is a subgroup of G. Show also that C(a) = G if and only if . Define a relation ~ on G by a ~ b if and only if b = gag^–1 for some . Show that ~ is an equivalence relation on G. We say that the elements a and b of G are conjugate, if the equivalence classes [a] and [b] are the same. The equivalence classes are called the conjugacy classes of G. Show that the cardinality of the conjugacy class of is equal to the index [G : C(a)]. Deduce the class equation of G, that is, #G = #Z(G) + ∑[G : C(a)], where the sum is over a set of all pairwise non-conjugate a ∉ Z(G).
2.17	Let G be a (multiplicative) Abelian group with identity e and order , where p_i are distinct primes and . For each i, let H_i be the p_i-Sylow subgroup of G. Show that: G = H₁ · · · H_r. [H] Every element can be written uniquely as g = h₁ · · · h_r with . Moreover, in that case we have ord_G g = (ord_H₁ h₁) · · · (ord_{H_r} h_r). G is cyclic if and only if all of H₁, . . . , H_r are cyclic.
2.18	Let G be a finite (multiplicative) Abelian group with identity e. Assume that for every there are at most n elements x of G satisfying xⁿ = e. Show that G is cyclic. [H]
2.19	Let G be a (multiplicative) group and let H₁, . . . , H_r be normal subgroups of G. If G = H₁ · · · H_r and every element can be written uniquely as g = h₁ · · · h_r with , then G is called the internal direct product of H₁, . . . , H_r. (For example, if G is finite and Abelian, then by Exercise 2.17 it is the internal direct product of its Sylow subgroups.) Show that: If G is finite, it is the internal direct product of normal subgroups H₁, . . . , H_r if and only if G = H₁ · · · H_r and H_i ∩ H_j = {e} for all i, j, i ≠ j. If G is the internal direct product of the normal subgroups H₁, . . . , H_r, then G is isomorphic to the (external) direct product H₁ × · · · × H_r. [H]
2.20	Let H_i, i = 1, . . . , r, be finite Abelian groups of orders m_i and let H := H₁ × · · ·× H_r be their direct product. Show that H is cyclic if and only if each H_i is cyclic and m₁, . . . , m_r are pairwise coprime.

2.4. Rings

So far we have studied algebraic structures with only one operation. Now we study rings which are sets with two (compatible) binary operations. Unlike groups, these two operations are usually denoted by + and · . One can, of course, go for general notations for these operations. However, that generalization doesn’t seem to pay much, but complicates matters. We stick to the conventions.

2.4.1. Definition and Basic Properties

Definition 2.12.

A ring (R, +, ·) (or R in short) is a set R together with two binary operations + and · on R such that the following conditions are satisfied. As in the case of multiplicative groups we write ab for a · b.

Additive group The set R is an Abelian group under +. The additive identity is denoted by 0.
· is associative (ab)c = a(bc) for every a, b, .
· is commutative ab = ba for every a, .
Multiplicative identity There is an element (denoted by 1) in R such that a · 1 = 1 · a = a for every . The element 1 is called the identity of R.
Distributivity The operation · is distributive over +, that is, a(b+c) = ab + ac and (a + b)c = ac + bc for every a, b, .

Notice that it is more conventional to define a ring as an algebraic structure (R, +, ·) that satisfies conditions (1), (2) and (5) only. A ring (by the conventional definition) is called a commutative ring (resp. a ring with identity), if it (additionally) satisfies condition (3) (resp. (4)). As per our definition, a ring is always a commutative ring with identity. Rings that are not commutative or that do not contain the identity element are not used in the rest of the book. So let us be happy with our unconventional definition of a ring.^[3]

^[3] Cool! But what’s circular in a ring? Historically, such algebraic structures were introduced by Hilbert to designate a Zahlring (a number ring, see Section 2.13). If α is an algebraic integer (Definition 2.95) and we take a Zahlring of the form and consider the powers α, α², α³, . . . , we eventually get an α^d which can be expressed as a linear combination of the previous (that is, smaller) powers of α. This is perhaps the reason that prompted Hilbert to call such structures “rings”. Also see Footnote 1.

We do not rule out the possibility that 0 = 1 in R. In that case, for any , we have a = a · 1 = a · 0 = 0 (See Proposition 2.6), that is to say, the set R consists of the single element 0. In this case, R is called the zero ring and is denoted (by an abuse of notation) by 0.

Finally, note that R is, in general, not a group under multiplication. This is because we do not expect a ring R to contain the multiplicative inverse of every element of R. Indeed the multiplicative inverse of the element 0 exists if and only if R = 0.

Example 2.6.

The sets , , and are all rings under usual addition and multiplication. Each of , and contains the multiplicative inverse of every non-zero element, whereas the only elements in , that have multiplicative inverses, are ±1.
Let denote the set {0, 1, . . . , n – 1} for an integer n ≥ 2. Then is a ring under addition and multiplication modulo n. The additive identity is 0 and the multiplicative identity is 1. Later we see a more formal definition of this ring. Recall from Example 2.1 how we have defined the groups and under addition and multiplication modulo n. These groups have a connection with the ring as we will shortly see.
Let R be a ring and S a set. The set of all functions S → R is a ring under pointwise addition and multiplication of functions (that is, if f and g are two such functions, then we define (f + g)(a) := f(a) + g(a) and (f g)(a) := f(a)g(a) for every ). The additive (resp. multiplicative) identity in this ring is the constant function 0 (resp. 1).
Let R be a ring. The set R[X] of all polynomials in one indeterminate X and with coefficients from R is a ring. The identity elements in R[X] are the constant polynomials 0 and 1. The addition and multiplication operations in R[X] are the standard ones on polynomials. For a non-zero polynomial , the largest non-negative integer d for which the coefficient of X^d is non-zero is called the degree of the polynomial f and is denoted by deg f. The coefficient of X^{deg f} in f is called the leading coefficient of f and is denoted by lc(f). The degree of the zero polynomial is conventionally taken to be –∞. A non-zero polynomial with leading coefficient 1 is called a monic polynomial.
More generally, for one can define the ring R[X₁, . . . , X_n] of multivariate polynomials over R. Polynomial rings are of paramount importance in algebra and number theory. We devote Section 2.6 to a study of these rings.
We also define the ring R(X) of rational functions over R, which consists of elements of the form f/g with f, , g ≠ 0. More generally, the set of elements f/g with f, , g ≠ 0, is a ring denoted R(X₁, . . . , X_n).
Let R_i, , be a family of rings, and the product of the sets R_i, , that is, the set of all ordered tuples indexed by I. For tuples and , define the sum and the product . It is easy to see that R is a ring with identity elements and . It is called the direct product of the rings R_i, . If I is of finite cardinality n and if R_i = A for all , then is denoted in short by Aⁿ.

Proposition 2.6.

Let R be a ring. For all a, , we have:

a · 0 = 0 · a = 0
a(–b) = (–a)b = –ab
(–a)(–b) = ab

Proof

a · 0 = a · (0 + 0) = a · 0 + a · 0, so that a · 0 = 0. Similarly, 0 · a = 0.
By (1), 0 = a · 0 = a(b + (–b)) = ab + a(–b), that is, a(–b) = –ab. Similarly, (–a)b = –ab.
(–a)(–b) = –(a(–b)) = –(–ab) = ab.

Definition 2.13.

Let R be a ring.

An element is called a zero-divisor of R, if ab = 0 for some , b ≠ 0. By this definition, 0 is a zero-divisor of R, unless R = 0. The elements 0, 3, 5, 6, 9, 10 and 12 are all the zero-divisors of .
An element is called a unit of R, if there exists an element such that ab = 1. The elements 1 and –1 are units in any ring. It is easy to see that an element cannot be simultaneously a zero-divisor and a unit. The set of all units in a ring R is denoted by R* and is a group under the multiplication of the ring R (See Exercise 2.21), called the multiplicative group or the group of units of R. The multiplicative group of the ring (Example 2.6) is .
An element is called nilpotent, if a^k = 0 for some . By this definition, 0 is a nilpotent element in any ring. It is also evident that every nilpotent element in a non-zero ring is a zero-divisor. An example of a non-zero nilpotent element in a ring is .
An element is called idempotent, if a² = a. In every ring, 0 and 1 are idempotent. The element 6 is idempotent in . It is easy to check that 0 is the only element in a ring, that is both nilpotent and idempotent.

Definition 2.14.

Let R be a ring.

R is called an integral domain (or simply a domain), if R ≠ 0 and if R contains no non-zero zero-divisors. Examples of integral domains: , , , , . On the other hand, 3 · 5 = 0 in , so Z₁₅ is not an integral domain.
R is called a field, if R ≠ 0 and if R* = R \ {0}, that is, if every non-zero element of R is a unit. This means that in a field one can divide any element by any non-zero element. The most common fields are , and . Note that is not a field, since, for example, 2 does not have a multiplicative inverse in .
A field R with #R finite is called a finite field. The simplest examples of finite fields are the fields for prime integers p. In fact, it is easy to see that is a field if and only if n is a prime. Finite fields are widely applied for building various cryptographic protocols. See Section 2.9 for a detailed study of finite fields.

Corollary 2.1.

A field is an integral domain.

Proof

Recall from Definition 2.13 that an element in a ring cannot be simultaneously a unit and a zero-divisor.

Definition 2.15.

Let R be a non-zero ring. The characteristic of R, denoted char R, is the smallest positive integer n such that 1 + 1 + · · · + 1 (n times) = 0. If no such integer exists, then we take char R = 0.

, , and are rings of characteristic zero. If R is a non-zero finite ring, then the elements 1, 1 + 1, 1 + 1 + 1, · · · cannot be all distinct. This shows that there are positive integers m and n, m < n, such that 1+1+· · · + 1 (n times) = 1 + 1 + · · · + 1 (m times). But then 1 + 1 + · · · + 1 (n – m times) = 0. Thus any non-zero finite ring has positive (that is, non-zero) characteristic. If char R = t is finite, then for any one has .

In what follows, we will often denote by n the element 1 + 1 + · · · + 1 (n times) of any ring. One should not confuse this with the integer n. One can similarly identify a negative integer –n with the ring element –(1 + 1 + · · · + 1)(n times) = (–1) + (–1) + · · · + (–1)(n times).

Proposition 2.7.

Let R be an integral domain of positive characteristic p. Then p is a prime.

Proof

If p is composite, then we can write p = mn with 1 < m < p and 1 < n < p. But then p = mn = 0 (in R). Since R is an integral domain, we must have m = 0 or n = 0 (in R). This contradicts the minimality of p.

2.4.2. Subrings, Ideals and Quotient Rings

Just as we studied subgroups of groups, it is now time to study subrings of rings. It, however, turns out that subrings are not that important for the study of rings as the subsets called ideals are. In fact, ideals (and not subrings) help us construct quotient rings. This does not mean that ideals are “normal” subrings! In fact, ideals are, in general, not subrings at all, and conversely. The formal definitions are waiting!

Definition 2.16.

Let R be a ring. A subset S of R is called a subring of R, if S is a ring under the ring operations of R. In this case, one calls R a superring or a ring extension of S.

If R and S are both fields, then S is often called a subfield of R and R a field extension (or simply an extension) of S. In that case, one also says that S ⊆ R is a field extension or that R is an extension over S.

is a subring of , and , whereas and are field extensions.

We demand that a ring always contains the multiplicative identity (Definition 2.12). This implies that if S is a subring of R, then for all integers n, the elements are also in S (though they need not be pairwise distinct). Similarly, if R and S are fields, then S contains all the elements of the form mn^–1 for m, , (cf. Exercise 2.26). Thus , the set of all even integers, is not a subring of , though it is a subgroup of (, +) (Example 2.2).

Definition 2.17.

Let R be a ring. A subset of R is called an ideal of R, if is an additive subgroup of (R, +) and if for all and .^[4]

^[4] Kummer introduced the concept of ideal numbers. Later Dedekind reformulated Kummer’s notion of ideal numbers to define what we now know as ideals.

In this book, we will use Gothic letters (usually lower case) like , , , , to denote ideals.^[5]

^[5] Mathematicians always run out of symbols. Many believe if it is Gothic, it is just ideal!

The condition for being an ideal is in one sense more stringent than that for being a subring, that is, an ideal has to be closed under multiplication by any element of the entire ring. On the other hand, we do not demand an ideal to necessarily contain the identity element 1. In fact, is an ideal of . Conversely, is a subring of but not an ideal. Subrings and ideals are different things.

Example 2.7.

Let R be any ring. The subset {0} is an ideal of R, called the zero ideal and denoted also by 0. Similarly, the entire ring R is an ideal of R and is called the unit ideal. Note that if an ideal contains a unit u of R, then 1 = u^–1u is also in and so for every . It follows that an ideal of R is the unit ideal if and only if contains a unit—a justification for the name.
The integral multiples of an integer n form an ideal of denoted by . More generally, for any ring R and for any , the set is an ideal of R and is denoted by Ra or aR or 〈a〉. Such an ideal is called a principal ideal. (See also Definition 2.18.)
Let R be a ring and let , , be a family of ideals of R. The intersection is an ideal of R. The set of finite sums the form (where and ) is an ideal of R. It is called the sum of the ideals , , and is denoted by . The union is, in general, not an ideal of R. In fact, the sum is the smallest ideal that contains (the set) .

Proposition 2.8.

The only ideals of a field are the zero ideal and the unit ideal.

Proof

By definition, every non-zero element of a field is a unit.

Definition 2.18.

Let R be a ring and a_i, , a family of elements of R. The ideal generated by a_i, , is defined to be the sum of the principal ideals Ra_i. We denote this as . In this case, we also say that is generated by a_i, . If I is finite, then we say that is finitely generated. In particular, if #I = 1, then is a principal ideal (See Example 2.7).

An integral domain every ideal of which is principal is called a principal ideal domain or PID in short. A ring every ideal of which is finitely generated is called Noetherian. Thus principal ideal domains are Noetherian.

Note that an ideal may have different generating sets of varying cardinalities. For example, the unit ideal in any ring is principal, since it is generated by 1. The integers 2 and 3 generate the unit ideal of , since . However, neither 2 nor 3 individually generates the unit ideal of . Indeed, using Bézout’s relation (Proposition 2.16) one can show that for every there is a (minimal) generating set of the unit ideal of , that contains exactly n integers. Interested readers may try to construct such generating sets as an (easy) exercise.

Theorem 2.6.

is a principal ideal domain.

Proof

The zero ideal is generated by 0. Let be a non-zero ideal of and let a be the smallest positive integer contained in . We claim that . Clearly, . For the converse, take . We can write b = aq + r, where q and r are the quotient and the remainder of (Euclidean) division of b by a. Now and since 0 ≤ r < a, by the choice of a we must have r = 0, so that .

A very similar argument proves the following theorem. The details are left to the reader. Also see Exercise 2.31.

Theorem 2.7.

If K is a field, then K[X] is a principal ideal domain.

We now prove a very important theorem:

Theorem 2.8. Hilbert’s basis theorem

If R is a Noetherian ring, then so is the polynomial ring R[X₁, . . . , X_n] for . In particular, the polynomial rings and K[X₁, . . . , X_n] are Noetherian, where K is a field.

Proof

Using induction on n we can reduce to the case n = 1. So we prove that if R is Noetherian, then R[X] is also Noetherian. Let be a non-zero ideal of R[X]. Assume that is not finitely generated. Then we can inductively choose non-zero polynomials f₁, f₂, f₃, · · · from such that for each the polynomial f_i is one having the smallest degree in . Let d_i := deg f_i. Then d₁ ≤ d₂ ≤ d₃ ≤ · · ·. Let a_i denote the leading coefficient of f_i. Consider the ideal in R. By hypothesis, is finitely generated, say, . This, in particular, implies that for some . But then the polynomial belongs to , is non-zero and has degree < d_r+1, a contradiction to the choice of f_r+1. Thus must be finitely generated.

Two particular types of ideals are very important in algebra.

Definition 2.19.

Let R be a ring.

An ideal of R is called a prime ideal, if and if implies or for a, . The second condition is equivalent to saying that if and , then the product . For a prime integer p, the principal ideal of is prime. On the other hand, for a composite integer n the ideal of is not prime. For example, and , but the product .
An ideal of R is called a maximal ideal, if and if for any ideal satisfying we have or . This means that there are no non-unit ideals of R properly containing . All the ideals of for prime integers p are maximal ideals (Corollary 2.3). Next consider the polynomial ring and the principal ideal 〈X〉 of R. It is easy to see that 〈X〉  〈X, 2〉  R. Thus 〈X〉 is not maximal.

Prime and maximal ideals can be characterized by some nice equivalent criteria. See Proposition 2.9.

Definition 2.20.

Let R be a ring and an ideal of R. Then is a subgroup of the group (R, +). Since (R, +) is Abelian, is a normal subgroup (Definition 2.6). Thus the cosets , , form an additive Abelian group. We define multiplication on these cosets as . It is easy to check that this multiplication is well-defined. Furthermore, the set of these cosets, denoted , becomes a ring under this addition and multiplication. The ring is called the quotient ring of R with respect to .

We say that two elements a, are congruent modulo an ideal (of R) and write a ≡ b (mod ), if . Thus a ≡ b (mod ) if and only if a and b lie in the same coset of , that is, .

Example 2.8.

For any ring R, the quotient ring R/0 is essentially the same as R and the quotient ring R/R is the zero ring.
The ring of Example 2.6 is formally defined to be the quotient ring . Convince yourself that both these definitions are equivalent.

Proposition 2.9.

Let R be a ring and an ideal of R.

is a prime ideal of R if and only if is an integral domain.
is a maximal ideal of R if and only if is a field.

Proof

Let a, be arbitrary. Then is prime ⇔ implies or implies or is an integral domain.
Let be a maximal ideal. Choose . Then . Consider the ideal . Since is maximal, we must have . This means that a + cb = 1 for some and . Then which implies that is a unit in . That is, is a field.
Conversely, let be a field. Consider any ideal of R with . Choose any . Then . By hypothesis, there exists such that , that is, . Hence , that is, .

The last proposition in conjunction with Corollary 2.1 indicates:

Corollary 2.2.

Maximal ideals are prime.

Corollary 2.3.

For every , the quotient ring is a field. In particular, is a maximal ideal of .

Proof

Since is a prime ideal of , is an integral domain. But is finite, so by Exercise 2.25 is a field.

2.4.3. Homomorphisms

Recall how we have defined homomorphisms of groups. In a similar manner, we define homomorphisms of rings. A ring homomorphism is a map from one ring to another, which respects addition, multiplication and the identity element. More precisely:

Definition 2.21.

Let R and S be rings. A map f : R → S is called a (ring)homomorphism, if f(a+b) = f(a) + f(b) and f(ab) = f(a)f(b) for all a, and if f(1) = 1. A homomorphism f : R → S is called an isomorphism, if there exists a homomorphism g : S → R such that g ο f = id_R and f ο g = id_S. As in the case of groups, bijectivity of f as a function is both necessary and sufficient for a homomorphism f : R → S to be an isomorphism. If f : R → S is an isomorphism, we write R ≅ S and say that R is isomorphic to S or that R and S are isomorphic.

A homomorphism f : R → R is called an endomorphism of R. An automorphism is a bijective endomorphism.

Example 2.9.

For any ring extension R ⊆ S, the canonical inclusion a ↦ a is a homomorphism from R → S. In particular, the identity map on any ring is an automorphism.
Let R be a ring and an ideal of R. The canonical surjection that takes is a ring homomorphism.
Let R be a ring and let . The map R[X] → R that takes f(X) ↦ f(a) is a ring homomorphism and is called the substitution homomorphism.
The map taking n ↦ –n is not a ring homomorphism, since it maps 1 to –1 (and does not satisfy f(ab) = f(a)f(b) for all a, ).
The map that maps z = a + ib to its conjugate is an automorphism of the field .

Proposition 2.10.

Let f : R → S be a ring homomorphism.

If is a unit, then f(a) is a unit in S and f(a^–1) = (f(a))^–1.
Let be an ideal in S. Then is an ideal in R. If is prime, then is also prime.

Proof

If ab = 1, then f(a)f(b) = f(ab) = f(1) = 1.
For , a, and b, with f(a) = b and f(a′) = b′, we have and . Thus is an ideal of R. If , then . If is prime (in which case and are proper ideals of R and S respectively), then or . But then or .

The ideal of the above proposition is called the contraction of and is often denoted by . If R ⊆ S and f is the inclusion homomorphism, then .

Definition 2.22.

Let f : R → S be a ring homomorphism. The set is called the kernel of f and is denoted by Ker f. The set is called the image of f and is denoted by f(R) or Im f.

Theorem 2.9. Isomorphism theorem

With the notations of the last definition, Ker f is an ideal of R, Im f is a subring of S and R/ Ker f ≅ Im f.

Proof

Consider the map that takes a + Ker f ↦ f(a). It is easy to verify that is a well-defined ring homomorphism and is bijective. The details are left to the reader. Also see Theorem 2.3.

Definition 2.23.

Two ideals and of a ring R are called relatively prime or coprime if , that is, if there exist and with a + b = 1.

Theorem 2.10. Chinese remainder theorem (CRT)

Let R be a ring and . Let be ideals in R such that for all i, j, i ≠ j, the ideals and are relatively prime. Then is isomorphic to the direct product .

Proof

The assertion is obvious for n = 1. So assume that n ≥ 2 and define the map by for all . Since for all i, the map is well-defined. It is easy to see that is a ring homomorphism. In order to show that is injective, we let . This means that , that is, for all i. Then , that is, . The trickier part is to prove that is surjective. Let . Let us consider the ideal for each i. For a given i, there exist for each j ≠ i elements and with α_j + β_j = 1. Multiplying these equations shows that we have a such that γ_i + δ_i = 1, where . (This shows that for all i.) Now consider the element . It follows that for all i, that is, .

In Section 2.5, we will see an interesting application of this theorem. Notice that the injectivity of in the last proof does not require the coprimality of ; the surjectivity of requires this condition.

2.4.4. Factorization in Rings

Now we introduce the concept of divisibility in a ring. We also discuss about an important type of rings known as unique factorization domains. This study is a natural generalization of that of the rings and K[X], K a field.

Definition 2.24.

Let R be a ring, a, and . Also let K be a field.

We say that a divides b and write a|b, if there exists an element such that b = ac. If a does not divide b, we write a  b. In , for example, –31|899, since 899 = (–31) · (–29). By this definition, any element divides 0, whereas 0 divides no element other than 0.
It is easy to see that a|b and b|a if and only if b = ca for some unit . In that case, we say that a and b are associates of each other. The relation of being associate is an equivalence relation on R (or R \ {0}), as can be easily verified. The only associates of , a ≠ 0, are ±a, since ±1 are the only units in . Two non-zero polynomials f and g of K[X] are associates if and only if f = αg for some .
A non-zero non-unit is called a prime, if p|ab implies either p|a or p|b. One can check easily that p is prime if and only if the principal ideal 〈p〉 = pR is a prime ideal.
A non-zero non-unit is called irreducible, if p = ab implies either a or b is a unit.

Note that for the concepts of prime and irreducible elements are the same. This is indeed true for any PID (Proposition 2.12). Thus our conventional definition of a prime integer p > 0 as one which has only 1 and p as (positive) divisors tallies with the definition of irreducible elements above. For the ring K[X], on the other hand, it is more customary to talk about irreducible polynomials instead of prime polynomials; they are the same thing anyway.

Proposition 2.11.

Let R be an integral domain and a prime. Then p is irreducible.

Proof

Let p = ab. Then p|(ab), so that by hypothesis p|a or p|b. If p|a, then a = up for some . Hence p = ab = upb, that is, (1 – ub)p = 0. Since R is an integral domain and p ≠ 0, we have 1 – ub = 0, that is, ub = 1, that is, b is a unit. Similarly, p|b implies a is a unit.

Proposition 2.12.

Let R be a PID. An element is prime if and only if p is irreducible.

Proof

[if] Let p be irreducible, but not prime. Then there are a, such that a ∉ 〈p〉 and b ∉ 〈p〉, but . Consider the ideal . Since , we have p = cα for some . By hypothesis, p is irreducible, so that either c or α is a unit. If c is a unit, 〈p〉 = 〈α〉 = 〈p〉 + 〈a〉, that is, , a contradiction. So α is a unit. Then 〈p〉 + 〈a〉 = R which implies that there are elements u, such that up + va = 1. Similarly, there are elements u′, such that u′p + v′b = 1. Multiplying these two equations gives (uu′p + uv′b + u′va)p + (vv′)ab = 1. Now , so that ab = wp for some . But then (uu′p + uv′b + u′va + vv′w)p = 1, which shows that p is a unit, a contradiction.

[only if] Immediate from Proposition 2.11.

Definition 2.25.

An integral domain R is called a unique factorization domain or a UFD in short, if every non-zero element can be written as a product a = up₁ · · · p_r, where , and p₁, . . . , p_r are prime elements (not necessarily distinct) of R. Moreover, such a factorization is unique up to permutation of the primes p₁, . . . , p_r and up to multiplication of the primes by units. This factorization can also be written as , where , , q₁, . . . , q_s are pairwise non-associate primes and α_i > 0 for i = 1, . . . , s. Some authors also use the term factorial ring or factorial domain in order to describe a UFD.

If is a prime and , a ≠ 0, then the multiplicity of p in a is the nonnegative integer v such that p^v|a, but p^v+1  a. This integer v is denoted by v_p(a). It is clear form the definition that for every , a ≠ 0, there exist only finitely many non-associate primes p for which v_p(a) > 0.

Proposition 2.13.

Let R be a UFD. An element is prime if and only if p is irreducible.

Proof

The only if part is immediate from Proposition 2.11. For proving the if part, let p = up₁ · · · pr ( and p_i primes in R) be irreducible. If r = 0, p is a unit, a contradiction. If r > 1, then p can be written as the product of two non-units up₁ · · · p_r–1 and p_r, again a contradiction. So r = 1.

A classical example of an integral domain that is not a UFD is . In this ring, we have two essentially different factorizations of 6 into irreducible elements. The failure of irreducible elements to be primes in such rings is a serious thing to patch up!

Theorem 2.11.

A PID is a UFD

Proof

Let R be a PID and . We show that a has a factorization of the form a = up₁ · · · p_r, where u is a unit and p₁, . . . , p_r are prime elements of R. If a is a unit, we are done. So assume that a =: a₀ is a non-unit and let . Since , there is a maximal ideal containing (Exercise 2.23). Then p₁ is a prime that divides a₀. Let a₀ = a₁p₁. We have . If is the unit ideal, we are done. Otherwise we choose as before a prime p₂ dividing a₁ and with a₁ = a₂p₂ get the ideal properly containing . Repeating this process we can generate a strictly ascending chain of ideals of R. Since R is a PID and hence Noetherian, this process must stop after finitely many steps (Exercise 2.33).

The converse of the above theorem is not necessarily true. For example, the polynomial ring K[X₁, . . . , X_n] over a field K is a UFD for every , but not a PID for n ≥ 2.

Divisibility in a UFD can be rephrased in terms of prime factorizations. Let R be a UFD and let the non-zero elements a, have the prime factorizations and with units u, u′, pairwise non-associate primes p₁, . . . , p_r and with α_i ≥ 0 and β_i ≥ 0. Then a|b if and only if α_i ≤ β_i for all i = 1, . . . , r. This notion leads to the following definitions.

Definition 2.26.

Let R be a UFD and let a, have prime factorizations as in the last paragraph. Any associate of , is called a greatest common divisor of a and b and is denoted by gcd(a, b). Clearly, gcd(a, b) is unique up to multiplication by units of R. Similarly, any associate of , is called a least common multiple of a and b and is denoted by lcm(a, b). lcm(a, b) is again unique up to multiplication by units of R. The gcd of a ≠ 0 and 0 is taken to be an associate of a, whereas gcd(0, 0) is undefined. On the other hand, lcm(a, 0) is defined to be 0 for any .

It is clear that these definitions of gcd and lcm can be readily generalized for any arbitrary finite number of elements.

Corollary 2.4.

Let R be a UFD and a, not both zero. Then gcd(a, b) · lcm(a, b) is an associate of ab.

Proof

Immediate from the definitions.

Corollary 2.5.

Let R be a UFD and a, b, with a|bc. If gcd(a, c) = 1, then a|b.

Proof

Consider the prime factorizations of a, b and c.

For a PID, the gcd and lcm have equivalent characterizations.

Proposition 2.14.

Let R be a PID and a, b be non-zero elements of R. Let d be a gcd of a and b. Then 〈d〉 = 〈a〉 + 〈b〉. If f is an lcm of a and b, then 〈f〉 = 〈a〉 ∩ 〈b〉.

Proof

Let 〈a〉 + 〈b〉 = 〈c〉. We show that c and d are associates. There exist u, such that ua + vb = c. Since d|a and d|b, we have d|c. On the other hand, , so that c|a. Similarly c|b. Considering the prime factorizations of a and b one can then readily verify that c|d. The proof for the second part is similar and is left to the reader.

A direct corollary to the last proposition is the following.

Corollary 2.6.

Let R be a PID, a, (not both zero) and d a gcd of a and b. Then there are elements u, such that ua + vb = d. In particular, the ideals 〈a〉 and 〈b〉 are relatively prime if and only if gcd(a, b) is a unit. In that case, we also say that the elements a and b are relatively prime or coprime.

This completes our short survey of factorization in rings. Note that and K[X] (for a field K) are PID and hence UFD. Thus all the results we have proved in this section apply equally well to both these rings. It is because of this (and not of a mere coincidence) that these two rings enjoy many common properties. Thus our abstract treatment saves us from the duplicate effort of proving the same results once for integers (Section 2.5) and once more for polynomials (Section 2.6).

Exercise Set 2.4

2.21	For a non-zero ring R, prove the following assertions: A unit of R is not a zero-divisor. The product of two units of R is again a unit. The product of two non-units of R is again a non-unit. The element 0 is not a unit in R. The element 1 is always a unit in R. If a is a unit and ab = ac, then b = c. Let K be a field. What are the units in the polynomial ring K[X]? In K[X₁, . . . , X_n]? In the ring K(X) of rational functions? In K(X₁, . . . , X_n)?
2.22	Binomial theorem Let R be a ring, a, and . Show that where are the binomial coefficients.
2.23	Show that every non-zero ring has a maximal (and hence prime) ideal. More generally, show that every non-unit ideal of a non-zero ring is contained in a maximal ideal. [H]
2.24	Let R be a ring. Show that the set of all nilpotent elements of R is an ideal of R. This ideal is usually denoted by and is called the nilradical of R. Show that the quotient ring has no non-zero nilpotent elements. (The ring is called the reduction of R and is often written as R_red. If , then we say that R is reduced. Thus is always reduced.) Show that the nilradical of R is the intersection of the prime ideals of R. [H]
2.25	Show that a finite integral domain R is a field. [H]
2.26	Let R be a ring of characteristic 0. Show that: R contains infinitely many elements. If R is an integral domain, then R contains as subring an isomorphic copy of . If R is a field, then R contains as subfield an isomorphic copy of .
2.27	Let f : R → S be a ring-homomorphism and let and be ideals in R and S respectively. Find examples to corroborate the following statements. Let be such that f(a) is a unit in S. Then a need not be a unit in R. The set need not be an ideal of S. If and if is maximal, then need not be maximal.
2.28	Let K be a field. Show that a homomorphism from K to any non-zero ring is injective. Let L be another field and let f : K → L and g : L → K be homomorphisms such that g ο f = id_K. Show that f and g are isomorphisms.
2.29	Show that a ring R is an integral domain if and only if 0 is a prime ideal of R. Give an example of a reduced ring that is not an integral domain. (Note that an integral domain is always reduced.)
2.30	Let R be a ring and let and be ideals of R with . Show that is an ideal of and that . [H]
2.31	An integral domain R is called a Euclidean domain (ED) if there is a map satisfying the following two conditions: ν(a) ≤ ν(ab) for all a, . For every a, with b ≠ 0, there exist (not necessarily unique) q, such that a = qb + r with r = 0 or ν(r) < ν(b). Show that: is a Euclidean domain with ν(a) = \|a\| for a ≠ 0. The polynomial ring K[X] over a field K is a Euclidean domain with ν(a) = deg a for a ≠ 0. For d = –2, –1, 2, 3, the ring is a Euclidean domain with , a, , not both 0. A Euclidean domain is a PID (and hence a UFD).
2.32	Let R be a ring and an ideal. Consider the set Show that is an ideal of R. It is called the radical or root of . If , then is called a radical or a root ideal. For arbitrary ideals and of R, prove the following assertions. . . If , then . If is a prime ideal, then . if and only if . . . The nilradical .
2.33	Let R be a ring. An ascending chain of ideals is a sequence . The ascending chain is called stationary, if there is some such that for all n ≥ n₀. Show that the following conditions are equivalent. [H] R is Noetherian (that is, every ideal of R is finitely generated). Every ascending chain of ideals in R is stationary. Every non-empty set of ideals of R has a maximal element.
2.34	Let R be an integral domain. Define the set . Define a relation ~ on S as (a, b) ~ (c, d) if and only if ad = bc. Show that ~ is an equivalence relation on S. Let us denote the equivalence class of by a/b and the set of all equivalence classes of S under ~ by K. Now define (a/b)+(c/d) := (ad+bc)/(bd) and (a/b)·(c/d) := (ac)/(bd). Show that these definitions make K a field. This field is called the quotient field of R and is denoted as Q(R). This process resembles the formation of rational numbers from the integers. Indeed, .

2.5. Integers

The set of integers is the main object of study in this section. We use many results from previous sections to derive properties of integers. Recall that is a PID and hence a UFD.

2.5.1. Divisibility

The notions of divisibility, prime and relatively prime integers, gcd and lcm of integers are essentially the same as discussed in connection with a PID or a UFD. We avoid repeating the definitions here, but concentrate on other useful properties of integers, not covered so far. We only mention that whenever we talk about a prime integer, or the gcd or lcm of two or more integers, we will usually refer to a non-negative integer. This convention makes primes, gcds and lcms unique.

Theorem 2.12.

There are infinitely many prime integers.

Proof

Let be arbitrary and let p₁, p₂, . . . , p_n be n distinct primes. The (non-zero non-unit) integer q := p₁p₂ · · · p_n + 1 is divisible by neither of p₁, . . . , p_n and hence must have a prime divisor p_n+1 different from p₁, . . . , p_n. The result then follows by induction on n (and the fact that the set of primes is non-empty).

Theorem 2.13.

For an integer a and an integer b ≠ 0, there exist unique integers q and r such that a = qb + r with 0 ≤ r < |b|.

Proof

Call the smallest non-negative element in the set to be r and the corresponding value of c to be q. Then these integers q and r satisfy the desired properties. To prove the uniqueness let a = q₁b + r₁ = q₂b + r₂, where 0 ≤ r₁ < |b| and 0 ≤ r₂ < |b|. But then (q₂ – q₁)b = r₁ – r₂ with –|b| < r₁ – r₂ < |b|. Since b|(r₁ – r₂), we must then have r₁ – r₂ = 0, that is, r₁ = r₂, which, in turn, implies that q₁ = q₂.

The integers q and r in the above theorem are respectively called the quotient and the remainder of Euclidean division of a by b and are denoted respectively by a quot b and a rem b. Do not confuse Euclidean division with the division (that is, the inverse of multiplication) of the ring . Euclidean division is the basis of the Euclidean gcd algorithm. More specifically:

Proposition 2.15.

For integers a, b with b ≠ 0, let r be the remainder of Euclidean division of a by b. Then gcd(a, b) = gcd(b, r).

Proof

Clearly, 〈a〉 + 〈b〉 = 〈r〉 + 〈b〉. Now use Proposition 2.14.

Proposition 2.16.

Let a and b be two integers, not both zero, and let d be the (positive) gcd of a and b. Then there are integers u and v such that d = ua + vb. (Such an equality is called a Bézout relation.) Furthermore, if a and b are both non-zero and (|a|, |b|) ≠ (1, 1), then u and v can be so chosen that |u| < |b| and |v| < |a|.

Proof

The existence of u and v follows immediately from Proposition 2.14. If a = qb, then u = 0 and v = 1 is a suitable choice. So assume that a  b and b  a, in which case d < |a| and d < |b|. We may assume, without loss of generality, that a and b are positive. First note that if (u, v) satisfies the Bézout relation, then for any the pair (u + kb, v – ka) also satisfies the same relation. So we may replace v by its remainder of Euclidean division by a and may assume |v| < a. But then |ua| – b < |ua| – d ≤ |ua – d| = |vb| ≤ (a – 1)b, which implies |u| < b.

The notions of the gcd and of the Bézout relation can be generalized to any finite number of integers a₁, . . . , a_n as

gcd(a₁, . . . , a_n) = gcd(· · · (gcd(gcd(a₁, a₂), a₃) · · ·), a_n) = u₁a₁ + · · · + u_na_n

for some integers u₁, . . . , u_n (provided that all the gcds mentioned are defined).

2.5.2. Congruences

Since is a PID, congruence modulo a non-zero ideal of can be rephrased in terms of congruence modulo a positive integer as follows.

Definition 2.27.

Let . Two integers a and b are said to be congruent modulo n, denoted a ≡ b (mod n), if n|(a – b), that is, if the remainders of Euclidean divisions of a and b by n are the same. In terms of ideals, this is the same as a ≡ b (mod 〈n〉) (See Definition 2.20). Congruence is an equivalence relation on , the equivalence classes being the cosets of the ideal of .

By an abuse of notation, we often denote the equivalence class [a] of simply by a. The following are some basic properties of congruent integers.

Proposition 2.17.

Let , a ≡ b (mod n) and c ≡ d (mod n). Then:

a ± c ≡ b ± d (mod n).
ac ≡ bd (mod n).
For any polynomial , we have f(a) ≡ f(b) (mod n).
If n′|n, then a ≡ b (mod n′).
If m|a and m|b, then a/m ≡ b/m (mod n/ gcd(n, m)).

Proof

(1) and (2) follow from the consideration of the quotient ring . (3) follows from repeated applications of (1) and (2). For the proof of (4), consider a – b = kn and n = k′n′ for k, . For proving (5), take a – b = kn = lm. Then m/ gcd(n, m) divides k(n/ gcd(n, m)). Since m/ gcd(n, m) and n/ gcd(n, m) are coprime, by Corollary 2.5 l′ := k/(m/ gcd(n, m)) is an integer and we have a/m – b/m = l = kn/m = l′(n/ gcd(n, m)).

Let with gcd(n_i, n_j) = 1 for i ≠ j. Then lcm(n₁, . . . , n_r) = n₁ · · · n_r, and by the Chinese remainder theorem (Theorem 2.10), we have

This implies that, given integers a₁, . . . , a_r, there exists an integer x unique modulo n₁ · · · n_r such that x satisfies the following congruences simultaneously:

x	≡	a₁ (mod n₁)
x	≡	a₂ (mod n₂)
	⋮
x	≡	a_r (mod n_r)

We now give a procedure for constructing the integer x explicitly. Define N := n₁ · · · n_r and N_i := N/n_i for 1 ≤ i ≤ r. Then for each i we have gcd(n_i, N_i) = 1 and, therefore, there are integers u_i and v_i with u_in_i + v_iN_i = 1. Then (mod N) is the desired solution.

Let . We now study the multiplicative group of the ring . We say that an integer has a multiplicative inverse modulo n, if , or, equivalently, if there is an integer b with ab ≡ 1 (mod n). The following proposition is an important characterization of the elements of .

Proposition 2.18.

(The equivalence class of) an integer a belongs to if and only if gcd(a, n) = 1.

Proof

[if] By Proposition 2.16, there exist integers u and v such that ua + vn = 1. But then ua ≡ 1 (mod n).

[only if] For some integers u and v, we have ua + vn = 1, which implies that the gcd of a and n divides 1 and hence is equal to 1.

Definition 2.28.

The cardinality of is denoted by φ(n). By Proposition 2.18, φ(n) is equal to the number of integers between 0 and n – 1 (both inclusive), which are relatively prime to n. The function is called Euler’s totient function. For example, for a prime p we have , so φ(p) = p – 1.

The following two theorems are immediate consequences of Proposition 2.4.

Theorem 2.14. Euler’s theorem

Let and with gcd(a, n) = 1. Then

a^φ(n) ≡ 1 (mod n).

Theorem 2.15. Fermat’s little theorem

Let p be a prime and with gcd(a, p) = 1. Then

a^p–1 ≡ 1 (mod p).

For any integer , one has b^p ≡ b (mod p).

Theorem 2.16. Wilson’s theorem

For every prime p, we have (p – 1)! ≡ –1 (mod p).

Proof

The result holds for p = 2. So assume that p is an odd prime. Since is a field, Fermat’s little theorem gives the factorization

Equation 2.1

Looking at the constant terms in two sides proves Wilson’s theorem.

The structure of the group , , can be easily deduced from Fermat’s little theorem. This gives us the following important result.

Proposition 2.19.

For a prime p, the group is cyclic.

Proof

For every divisor d of p –1, we have X^p–1–1 = (X^d–1)f(X) for some with deg f = p – 1 – d. By Congruence 2.1, X^p–1 – 1 has p – 1 roots modulo p. Since is a field, f(X) (mod p) cannot have more than p – 1 – d roots (Proposition 2.25) and it follows that X^d–1 has exactly d roots modulo p. In particular, if d = q^e for some and , then there exist exactly q^e elements of of orders dividing q^e and exactly q^e–1 elements of of orders dividing q^e–1, that is, there are q^e – q^e–1 > 0 elements of of order q^e. If is the canonical prime factorization of p – 1 (with each e_i ≥ 1), by the above argument there exists an element of order for each i = 1, . . . , r. It is now easy to check that has order .

Euler’s totient function plays an extremely important role in number theory (and cryptology). We now describe a method for computing it.

Lemma 2.2.

If n and n′ are relatively prime positive integers, then φ(nn′) = φ(n)φ(n′).

Proof

If a is invertible modulo nn′, then clearly it is invertible modulo both n and n′. Conversely, if ua ≡ 1 (mod n) and u′a′ ≡ 1 (mod n′), then by the Chinese remainder theorem there are integers x and α, unique modulo nn′, satisfying x ≡ u (mod n), x ≡ u′ (mod n′), α ≡ a (mod n) and α ≡ a′ (mod n′). But then xα ≡ 1 (mod nn′). Therefore, , whence the lemma follows.

Lemma 2.3.

If p is a prime and , then φ(p^e) = p^e – p^e–1 = p^e(1 – 1/p).

Proof

Integers between 0 and p^e – 1, which are relatively prime to p^e are precisely those that are not multiples of p.

Proposition 2.20.

Let be the prime factorization of a positive integer n with , with pairwise distinct primes p₁, . . . , p_r and with e_i > 0. Then

Proof

Immediate from Lemmas 2.2 and 2.3.

By Proposition 2.18, the linear congruence ax ≡ 1 (mod n) is solvable for x if and only if gcd(a, n) = 1. In such a case, the solution is unique modulo n. Now, let us concentrate on the solutions of the general linear congruence:

ax ≡ b (mod n).

Theorem 2.17 characterizes the solutions of this congruence.

Theorem 2.17.

Let d := gcd(a, n). Then the congruence ax ≡ b (mod n) is solvable for x if and only if d|b. A solution of the congruence, if existent, is unique modulo n/d.

Proof

[if] By Proposition 2.17, (a/d)x ≡ b/d (mod n/d). Since gcd(a/d, n/d) = 1, the congruence (a/d)x′ ≡ 1 (mod n/d) is solvable for x′. Then a solution for x is x ≡ (b/d)x′ (mod n/d).

[only if] There exists an integer k such that ax + kn = b. This shows that d|b.

To prove the uniqueness let x and x′ be two integers satisfying the given congruence. But then a(x – x′) ≡ 0 (mod n), that is, (a/d)(x – x′) ≡ 0 (mod n/d), that is, x – x′ ≡ 0 (mod n/d), since gcd(a/d, n/d) = 1.

The last theorem implies that if d|b, then the congruence ax ≡ b (mod n) has d solutions modulo n. These solutions are given by ξ + r(n/d), r = 0, . . . , d – 1, where ξ is the solution modulo n/d of the congruence (a/d)ξ ≡ b/d (mod n/d).

2.5.3. Quadratic Residues

In this section, we consider quadratic congruences, that is, congruences of the form ax²+bx+c ≡ 0 (mod n). We start with the simple case . We assume further that p is odd, so that 2 has a multiplicative inverse mod p. Since we are considering quadratic equations, we are interested only in those integers a for which gcd(a, p) = 1. In that case, a also has a multiplicative inverse mod p and the above congruence can be written as y² ≡ α (mod p), where y ≡ x + b(2a)^–1 (mod p) and α ≡ b²(4a²)^–1 – c(a^–1) (mod p). This motivates us to provide Definition 2.29.

Definition 2.29.

Let p be an odd prime and a an integer with gcd(a, p) = 1. We say that a is a quadratic residue modulo p, if the congruence x² ≡ a (mod p) has a solution (for x). Otherwise we say that a is a quadratic non-residue modulo p.

If a is a quadratic residue modulo an odd prime p, then the equation x² ≡ a (mod p) has exactly two solutions. If ξ is one solution, the other solution is p – ξ. It is, therefore, evident that there are exactly (p – 1)/2 quadratic residues and exactly (p – 1)/2 quadratic non-residues modulo p. For example, the quadratic residues modulo p = 11 are 1 = 1² = 10², 3 = 5² = 6², 4 = 2² = 9², 5 = 4² = 7² and 9 = 3² = 8². The quadratic non-residues modulo 11 are, therefore, 2, 6, 7, 8 and 10. We treat 0 neither as a quadratic residue nor as a quadratic non-residue.

Definition 2.30.

Let p be an odd prime and a an integer with gcd(a, p) = 1. The Legendre symbol is defined as:

Proposition 2.21.

Let p be an odd prime and a and b integers coprime to p.

Euler’s criterion .
.
, and .
If a ≡ b (mod p), then . In particular, if r is the remainder of Euclidean division of a by p, then .

Proof

If a is a quadratic residue modulo p, then a ≡ b² (mod p) for some integer b (coprime to p) and by Fermat’s little theorem we have a^(p–1)/2 ≡ b^p–1 ≡ 1 (mod p). Conversely, the polynomial X^p–1 – 1 = (X^(p–1)/2 – 1)(X^(p–1)/2 + 1) has p – 1 (distinct) roots mod p (again by Fermat’s little theorem). We have seen that no quadratic residues are roots of X^(p–1)/2 + 1. Since is a field, the (p – 1)/2 roots of X^(p–1)/2 – 1 are precisely all the quadratic residues modulo p. This proves Euler’s criterion. The other statements are immediate consequences of this.

Euler’s criterion gives us a nice way to check if a given integer is a quadratic residue modulo an odd prime. While this is much faster than the brute-force strategy of enumerating all the quadratic residues, it is still not the best solution, because it involves a modular exponentiation. We can, however, employ a gcd-like procedure for a faster computation. The development of this method demands further results which are otherwise interesting in themselves as well. The first important result is known as the law of quadratic reciprocity (Theorem 2.18 below). Gauss was the first to prove it and he deemed the result so important that he gave eight proofs for it. At present about two hundred published proofs of this law exist in the literature. We go in the classical way, that is, the Gaussian way, because the proof, though somewhat long, is elementary.

Lemma 2.4. Gauss

Let p be an odd prime and a an integer with gcd(a, p) = 1. Let us denote t := (p – 1)/2. For an integer i, let r_i be the unique integer with r_i ≡ ia (mod p) and –t ≤ r_i ≤ t. Let n be the number of i, 1 ≤ i ≤ t, for which r_i is negative. Then .

Proof

It is easy to check that r_i ≢ ±r_j (mod p) for all i ≠ j with 1 ≤ i, j ≤ t. Thus |r_i|, i = 1, . . . , t, are precisely (a permuted version of) the integers 1, . . . , t. Thus . Canceling t! and using Proposition 2.21(1) gives the desired result.

Definition 2.31.

Let . The largest integer smaller than or equal to x is called the floor of x and is denoted by ⌊x⌋. Similarly, the smallest integer larger than or equal to x is called the ceiling of x and is denoted by ⌈x⌉.

Corollary 2.7.

With the notations of Lemma 2.4 we have (mod 2). If a is odd, then (mod 2). In particular, , that is, 2 is a quadratic residue mod p if and only if p ≡ ±1 (mod 8).

Proof

Since is even , it follows that if r_j > 0, then is even, and if r_j < 0, then is odd. Therefore, (mod 2).

If a is odd, p + a is even. Also 4 is a quadratic residue modulo p. So , where (mod 2) and (mod 2). Putting a = 1 gives and, therefore, , that is, .

Theorem 2.18. Law of quadratic reciprocity

Let p and q be distinct odd primes. Then .

Proof

By Corollary 2.7, , where , , s = (q – 1)/2 and t = (p – 1)/2. So we are done, if we can show that m + n = st. Consider the set S := {(x, y) | 1 ≤ x ≤ s, 1 ≤ y ≤ t} of cardinality st. Now S is the disjoint union of S₁ and S₂, where and . (Note that we cannot have px = qy.) It is easy to see that #S₁ = m and #S₂ = n.

To demonstrate how we can use the results deduced so far, let us compute . Since 360 = 2³ · 3² · 5, we have

Thus 360 is a quadratic residue modulo 997. The apparent attractiveness of this method is beset by the fact that it demands the factorization of several integers and as such does not lead to a practical algorithm. We indeed need further machinery in order to have an efficient algorithm. First, we define a generalization of the Legendre symbol.

Definition 2.32.

Let a, b be integers with b > 0 and odd. We define the Jacobi symbol as

where, in the last case, p₁, . . . , p_t are all the prime factors of b (not necessarily all distinct).

Note that if , then a is not a quadratic residue mod b. However, the converse is not always true, that is, does not necessarily imply that a is a quadratic residue modulo b (Example: a = 2 and b = 9). Of course, if b is an odd prime and if gcd(a, b) = 1, the Legendre and Jacobi symbols correspond to the same value and meaning.

The Jacobi symbol enjoys many properties similar to the Legendre symbol.

Proposition 2.22.

For integers a, a′ and positive odd integers b, b′, we have:

,
, and
if a ≡ a′ (mod b), then . In particular, if r is the remainder of Euclidean division of a by b, then .

Proof

Immediate from the definition and Proposition 2.21.

Theorem 2.19.

For an odd positive integer b
If a is another odd positive integer with gcd(a, b) = 1, then

Proof

Let b = p₁ · · · p_s, where p_i are odd primes (not necessarily distinct). Then by definition , where . Now for odd integers x and y one has (mod 2). Repeated applications of this prove that (mod 2). To prove that , we proceed in a similar manner and note that for odd integers x and y one has (mod 2).
If with odd primes, then by definition

where from Theorem 2.18 it follows that

Now, we can calculate without factoring as follows.

2.5.4. Some Assorted Topics

So far, we have studied some elementary properties of integers. Number theory is, however, one of the oldest and widest branches of mathematics. Various complex-analytic and algebraic tools have been employed to derive more complicated properties of integers. In Section 2.13, we give a short introductory exposition to algebraic number theory. Here, we mention a collection of useful results from analytic number theory. The proofs of these analytic results would lead us too far away and hence are omitted here. Inquisitive (and/or cynical) readers may consult textbooks on analytic number theory for the details missing here.

The prime number theorem

The famous prime number theorem gives an asymptotic estimate of the density of primes smaller than or equal to a positive real number. Gauss conjectured this result in 1791. Many mathematicians tried to prove it during the 19^th century and came up with partial results. Riemann made reasonable progress towards proving the theorem, but could not furnish a complete proof before he died in 1866. It is interesting to mention here that a good portion of the theory of analytic functions (also called holomorphic functions) in complex analysis was developed during these attempts to prove the prime number theorem. The first complete proof of the theorem (based mostly on the ideas of Riemann and Chebyshev) was given independently by the French mathematician Hadamard and by the Belgian mathematician de la Vallée Poussin in 1896. Their proof is regarded as one of the major achievements of modern mathematics. People started believing that any proof of the prime number theorem has to be analytic. Erdös and Selberg destroyed this belief by independently providing the first elementary proof of the theorem in 1949. Here (and elsewhere in mathematics), the adjective elementary refers to something which does not depend on results from analysis or algebra. Caution: Elementary is not synonymous with easy !

Theorem 2.20. Prime Number Theorem

Let π(x) denote the number of primes less than or equal to a real number x > 0. As x → ∞ we have π(x) → x/ln x (that is, the ratio π(x)/(x/ln x) → 1). In particular, for the density π(n)/n of primes among the natural numbers ≤ n asymptotically approaches 1/ ln n as n → ∞. It also follows that the n-th prime is approximately equal to n ln n.

Though the prime number theorem provides an asymptotic estimate (that is, one for x → ∞), for finite values of x (for example, for the values of x in the cryptographic range) it does give good approximations for π(x). Table 2.1 lists π(x) against the rounded values of x/ ln x for x equal to small powers of 10.

Table 2.1. Approximations to π(x)
x	π(x)	x/ ln x	x/(ln x – 1)	Li(x)
10³	168	145	169	178
10⁴	1229	1086	1218	1246
10⁵	9592	8686	9512	9630
10⁶	78,498	72,382	78,030	78,628
10⁷	664,579	620,421	661,458	664,918
10⁸	5,761,455	5,428,681	5,740,304	5,762,209

Given the prime number theorem it follows that π(x) approaches x/(ln x – ξ) for any real ξ. It turns out that ξ = 1 is the best choice. Gauss’ Li function is also an asymptotic estimate for π(x), where for real x > 0 one defines:

Gauss conjectured that Li(x) asymptotically equals π(x). The prime number theorem is, in fact, equivalent to this conjecture. Furthermore, de la Vallée Poussin proved that Li(x) is a better approximation to π(x) than x/(ln x – ξ) for any real ξ. Table 2.1 also lists x/(ln x – 1) and Li(x) against the actual values of π(x).

The asymptotic formula does not rule out the possibility that the error π(x)–(x/ ln x) tends to zero as x → ∞. It has been shown by Dusart [83] that (x/ ln x) – 0.992(x/ ln² x) ≤ π(x) ≤ (x/ ln x) + 1.2762(x/ ln² x) for all x > 598.

Density of smooth integers

Integers having only small prime divisors play an interesting role in cryptography and in number theory in general.

Definition 2.33.

Let . An integer x is called y-smooth (or simply smooth, if y is understood from the context), if all the prime divisors of x are ≤ y. We denote by ψ(x, y) the fraction of positive integers ≤ x, that are y-smooth.

The following theorem gives an asymptotic estimate for ψ(x, y).

Theorem 2.21.

Let x, with x > y and let u := ln x/ ln y. For u → ∞ and y ≥ ln² x we have the asymptotic formula:

ψ(x, y) → u^–u+o(u) = e^{–[(1+o(1))u ln u]}.

In Theorem 2.21, the notation g(u) = o(f(u)) implies that the ratio g(u)/f(u) tends to 0 as u approaches ∞. See Definition 3.1 for more details. An interesting special case of the formula for ψ(x, y) will be used quite often in this book and is given as Corollary 4.1 in Chapter 4.

Like the prime number theorem, Theorem 2.21 gives only asymptotic estimates, but is indeed a good approximation for finite values of x, y and u (that is, for the values of practical interest). The most important implication of this theorem is that the density of y-smooth integers in the set {1, . . . , x} is a very sensitive function of u = ln x/ ln y and decreases very rapidly as x increases. For example, if y = 15,485,863, the millionth prime, then a random integer ≤ 2²⁵⁰ is y-smooth with probability approximately 2.12 × 10^–11, whereas a random integer ≤ 2⁵⁰⁰ is y-smooth with probability approximately 2.23 × 10^–28. (These figures are computed neglecting the o(u) term in the expression of ψ(x, y).) In other words, smaller integers have higher probability of being smooth (that is, y-smooth for a given y).

The extended Riemann hypothesis

The Riemann hypothesis (RH) is one of the deepest unsolved problems in mathematics. An extended version of this hypothesis has important bearings on the solvability of certain computational problems in polynomial time.

Definition 2.34.

The Euler zeta function ζ(s) is defined for a complex variable s with Re s ≥ 1 as

The reader may already be familiar with the results: ζ(1) = ∞, ζ(2) = π²/6 and ζ(4) = π⁴/90. Riemann (analytically) extended the Euler Zeta function for all complex values of s (except at s = 1, where the function has a simple pole). This extended function, called the Riemann zeta function, is known to have zeros at s = –2, –4, –6, . . . . These are called the trivial zeros of ζ(s). It can be proved that all non-trivial zeros of ζ(s) must lie in the so-called critical strip : 0 ≤ Re s ≤ 1, and are symmetric about the critical line: Re s = 1/2.

Conjecture 2.1. Riemann hypothesis (RH)

All non-trivial zeros of ζ(s) lie on the critical line.

In 1900, Hilbert asserted that proving or disproving the RH is one of the most important problems confronting 20^th century mathematicians. The problem continues to remain so even to the 21^st century mathematicians.

In 1901, von Koch proved that the RH is equivalent to the formula:

Conjecture 2.2. An equivalent form of the Riemann hypothesis

π(x) = Li(x) + O(x^1/2 ln x)

Here the order notation f(x) = O(g(x)) means that |f(x)/g(x)| is less than a constant for all sufficiently large x (See Definition 3.1).

Hadamard and de la Vallée Poussin proved that

for some positive constant α. While this estimate was sufficient to prove the prime number theorem, the tighter bound of Conjecture 2.2 continues to remain unproved.

Theorem 2.22. Dirichlet’s theorem on primes in arithmetic progression

Let a, be coprime. The set contains an infinite number of primes.

Dirichlet’s theorem is a powerful generalization of Theorem 2.12 (which corresponds to a = b = 1). One can accordingly generalize the notation π(x) as follows:

Definition 2.35.

Let a, with gcd(a, b) = 1. By π_{a, b}(x), we denote the number of primes in the set , that are ≤ x.

The prime number theorem gives the estimate:

where φ is Euler’s totient function. The RH now generalizes to:

Conjecture 2.3. Extended Riemann hypothesis (ERH)

For a, with gcd(a, b) = 1,

Some authors use the expression Generalized Riemann hypothesis (GRH) in place of ERH. Taking b = 1 demonstrates that the ERH implies the RH. The ERH also implies the following:

Conjecture 2.4.

The smallest positive quadratic non-residue modulo a prime p is < 2 ln² p.

Exercise Set 2.5

2.35	Show that any integer n ≥ 3 satisfies n² = a² – b² for some a, . Show that for any integer n ≥ 2 the integer n⁴ + 4ⁿ is composite.
2.36	Let and S a subset of {1, 2, ..., 2n} of cardinality n + 1. Show that: [H] There exist x, such that x – y = 1. There exist x, such that x – y = n. There exist distinct x, such that x is a multiple of y. There exist distinct x, such that x is relatively prime to y.
2.37	Show that for any , n > 1, the rational number is not an integer. [H]
2.38	Show that the Mersenne number M_n := 2ⁿ – 1 is prime only if n is prime. Show that the Fermat number 2ⁿ + 1 is prime only if n = 2^t for some .
2.39	Let n ≥ 2 be a natural number. A complete residue system modulo n is a set of n integers a₁, . . . , a_n such that a_i ≢ a_j (mod n) for i ≠ j. Similarly, a reduced residue system modulo n is a set of φ(n) integers b₁, . . . , b_φ(n) such that gcd(b_i, n) = 1 for all i = 1, . . . , φ(n) and b_i ≢ b_j (mod n) for i ≠ j. Show that: If {a₁, . . . , a_n} is a complete residue system modulo n, the equivalence classes of a₁, . . . , a_n (modulo the ideal ) constitute the set . In other words, given any integer a, there exists a unique i, 1 ≤ i ≤ n, for which a ≡ a_i (mod n). If {b₁, . . . , b_φ(n)} is a reduced residue system modulo n, then the equivalence classes of b₁, . . . , b_φ(n) constitute the set . In other words, given any integer b coprime to n, there exists a unique i, 1 ≤ i ≤ φ(n), for which b ≡ b_i (mod n). If {a₁, . . . , a_n} is a complete residue system modulo n, then for any integer a coprime to n, the integers aa₁, . . . , aa_n constitute a complete residue system modulo n. For example, if n is odd, then {2, 4, 6, . . . , 2n} is a complete residue system modulo n. If {b₁, . . . , b_φ(n)} is a reduced residue system modulo n, then for any integer b coprime to n, the integers bb₁, . . . , bb_φ(n) constitute a reduced residue system modulo n. For n > 2, the integers 1², 2², . . . , n² do not constitute a complete residue system modulo n. [H] If p is an odd prime and if {a₁, . . . , a_p} and are two complete residue systems modulo p, then is not a complete residue system modulo p. [H]
2.40	Prove that the decimal expansion of any rational number a/b is recurring, that is, (eventually) periodic. (A terminating expansion may be viewed as one with recurring 0.) [H]
2.41	Let p be an odd prime. Show that the congruence x² ≡ –1 (mod p) is solvable if and only if p ≡ 1 (mod 4). [H]
2.42	Let . Show that if n > 2, then φ(n) is even. Show that if n is odd, then φ(n) = φ(2n). Find out all the values of n for which φ(n) = 12.
2.43	For , show that .
2.44	Let n > 2 and gcd(a, n) = 1. Let h be the multiplicative order of a modulo n (that is, in the group ). Show that: aⁱ ≡ a^j (mod n) if and only if i ≡ j (mod h). The multiplicative order of a^l modulo n is h/ gcd(h, l). If a is a primitive element of (that is, if h = φ(n)), then 1, a, a², . . . , a^h–1 is a reduced residue system modulo n. If gcd(b, n) = 1 and b has multiplicative order k modulo n and if gcd(h, k) = 1, then the multiplicative order of ab modulo n is hk.
2.45	Device a criterion for the solvability of ax² + bx + c ≡ 0 (mod p), where p is an odd prime and gcd(a, p) = 1. [H]
2.46	Let p be a prime and . An integer a with gcd(a, p) = 1 is called an r-th power residue modulo p, if the congruence x^r ≡ a (mod p) has a solution. Show that a is an r-th power residue modulo p if and only if a^{(p–1)/ gcd(r, p–1)} ≡ 1 (mod p). This is a generalization of Euler’s criterion for quadratic residues.
2.47	Let G be a finite cyclic group of cardinality n. Show that and that there are exactly φ(n) generators (that is, primitive elements) of G.
2.48	Let m, with m\|n. Show that the canonical (surjective) ring homomorphism induces a surjective group homomorphism of the respective groups of units. (Note that every ring homomorphism induces a group homomorphism , where A* and B* are the groups of units of A and B respectively. Even when is surjective, need not be surjective, in general. As an example consider the canonical surjection for a prime p > 3.)
2.49	In this exercise, we investigate which of the groups is cyclic for a prime p and . Show that and are cyclic, but is not cyclic. Conclude that is not cyclic for e ≥ 3. [H] More specifically, show that for e ≥ 3 the multiplicative group is the direct product of two cyclic subgroups generated by –1 and 5 respectively. Show that if p is an odd prime and , then is cyclic. [H]
2.50	Show that the multiplicative group , n ≥ 2, is cyclic if and only if n = 2, 4, p^e, 2p^e, where p is an odd prime and . [H]

2.6. Polynomials

Unless otherwise stated, in this section we denote by K an arbitrary field and by K[X] the ring of polynomials in one indeterminate X and with coefficients from K. Since K[X] is a PID, it enjoys many properties similar to those of . To start with, we take a look at these properties. Then we introduce the concept of algebraic elements and discuss how irreducible polynomials can be used to construct (algebraic) extensions of fields. When no confusions are likely, we denote a polynomial by f only.

2.6.1. Elementary Properties

Since K[X] is a PID and hence a UFD, every polynomial in K[X] can be written essentially uniquely as a product of prime polynomials. Conventionally prime polynomials are more commonly referred to as irreducible polynomials. Similar to the case of the ring K[X] contains an infinite number of irreducible elements, for if K is infinite, then is an infinite set of irreducible polynomials of K[X], and if K is finite, then as we will see later, there is an irreducible polynomial of degree d in K[X] for every .

It is important to note here that the concept of irreducibility of a polynomial is very much dependent on the field K. If K ⊆ L is a field extension, then a polynomial in K[X] is naturally an element of L[X] also. A polynomial which is irreducible over K need not continue to remain so over L. For example, the polynomial x² – 2 is irreducible over , but reducible over , since , being a real number but not a rational number. As a second example, the polynomial x² + 1 is irreducible over both and but not over . In fact, we will show shortly that an irreducible polynomial in K[X] of degree > 1 becomes reducible over a suitable extension of K.

For polynomials f(X), with g(X) ≠ 0, there exist unique polynomials q(X) and r(X) in K[X] such that f(X) = q(X)g(X) + r(X) with r(X) = 0 or deg r(X) < deg g(X). The polynomials q(X) and r(X) are respectively called the quotient and remainder of polynomial division of f(X) by g(X) and can be obtained by the so-called long division procedure. We use the notations: q(X) = f(X) quot g(X) and r(X) = f(X) rem g(X).

Whenever we talk about the gcd of two non-zero polynomials, we usually refer to the monic gcd, that is, a polynomial with leading coefficient 1. This makes the gcd of two polynomials unique. We have gcd(f(X), g(X)) = gcd(g(X), r(X)), where r(X) = f(X) rem g(X). This gives rise to an algorithm (similar to the Euclidean gcd algorithm for integers) for computing the gcd of two polynomials. Bézout relations also hold for polynomials. More specifically:

Proposition 2.23.

Let f(X), , not both zero, and d(X) the (monic) gcd of f(X) and g(X). Then there are polynomials u(X), such that d(X) = u(X)f(X) + v(X)g(X). (Such an equality is called a Bézout relation.) Furthermore, if f(X) and g(X) are non-zero and not both constant, then u(X) and v(X) can be so chosen that deg u(X) < deg g(X) and deg v(X) < deg f(X).^[6]

^[6] Recall that the degree of the zero polynomial is taken to be –∞.

Proof

Similar to the proof of Proposition 2.16.

The concept of congruence can be extended to polynomials, namely, if , then two polynomials g(X), are said to be congruent modulo f(X), denoted g(X) ≡ h(X) (mod f(X)), if f(X)|(g(X) – h(X)), that is, if there exists with g(X) – h(X) = u(X)f(X), or equivalently, if g(X) rem f(X) = h(X) rem f(X).

The principal ideals 〈f(X)〉 of K[X] play an important role (as do the ideals 〈n〉 of ). Let us investigate the structure of the quotient ring R := K[X]/〈f(X)〉 for a non-constant polynomial . If r(X) denotes the remainder of division of by f(X), then it is clear that the residue classes of g(X) and r(X) are the same in R. On the other hand, two polynomials g(X), with deg g(X) < deg f(X) and deg h(X) < deg f(X) represent the same residue class in R if and only if g(X) = h(X). Thus elements of R are uniquely representable as polynomials of degrees < deg f(X). In other words, we may represent the ring R as the set together with addition and multiplication modulo the polynomial f(X). The ring R contains all the constant polynomials , that is, the field K is canonically embedded in R. In general, R is not a field. The next theorem gives the criterion for R to be a field.

Theorem 2.23.

For a non-constant polynomial , the ring K[X]/〈f(X)〉 is a field if and only if f(X) is irreducible in K[X].

Proof

If f(X) is reducible over K, then we can write f(X) = g(X)h(X) for some polynomials g(X), with 1 ≤ deg g < deg f and 1 ≤ deg h < deg f. Then both g and h represent non-zero elements in K[X]/〈f(X)〉, whose product is 0, that is, K[X]/〈f(X)〉 has non-zero zero divisors.

Conversely, if f(X) is irreducible over K and if g(X) is a non-zero polynomial of degree < deg f(X), then gcd(f(X), g(X)) = 1, so that by Proposition 2.23 there exist polynomials u(X), with u(X)f(X) + v(X)g(X) = 1 and deg v(X) < deg f(X). Thus we see that v(X)g(X) ≡ 1 (mod f(X)), that is, g(X) has a multiplicative inverse modulo f(X).

Let L := K[X]/〈f(X)〉 with f(X) irreducible over K. Then K ⊆ L is a field extension. If deg f(X) = 1, then L is isomorphic to K. If deg f(X) ≥ 2, then L is a proper extension of K. This gives us a useful and important way of representing the extension field L, given a representation for K. (For example, see Section 2.9.)

2.6.2. Roots of Polynomials

The study of the roots of a polynomial is the central objective in algebra. We now derive some elementary properties of roots of polynomials.

Definition 2.36.

Let . An element is said to be a root of f, if f(a) = 0.

Proposition 2.24.

Let and . Then f(X) = (X – a)q(X) + f(a) for some . In particular, a is a root of f(X) if and only if X – a divides f(X).

Proof

Polynomial division of f(X) by X – a gives f(X) = (X – a)q(X) + r(X) with deg r(X) < deg(X – a) = 1. Thus r(X) is a constant polynomial. Let us denote r(X) by . Substituting X = a gives f(a) = r.

Proposition 2.25.

A non-zero polynomial with d := deg f can have at most d roots in K.

Proof

We proceed by induction on d. The result clearly holds for d = 0. So assume that d ≥ 1 and that the result holds for all polynomials of degree d – 1. If f has no roots in K, we are done. So assume that f has a root, say, . By Proposition 2.24, we have f(X) = (X – a)g(X) for some . Clearly, deg g = d – 1 and so by the induction hypothesis g has at most d – 1 roots. Since K is a field (and hence does not contain non-zero zero divisors), it follows that the roots of f are precisely a and the roots of g. This establishes the induction step.

In the last proof, the only result we have used to exploit the fact that K is a field is that K contains no non-zero zero divisors. This is, however, true for every integral domain. Thus Proposition 2.25 continues to hold if K is any integral domain (not necessarily a field). However, if K is not an integral domain, the proposition is not necessarily true. For example, if ab = 0 with a, , a ≠ b, then the polynomial X² + (b – a)X has at least three roots: 0, a and a – b.

For a field extension K ⊆ L and for a polynomial , we may think of the roots of f in L, since too. Clearly, all the roots of f in K are also roots of f in L. However, the converse is not true in general. For example, the only roots of X⁴ – 1 in are ±1, whereas the roots of the same polynomial in are ±1, ±i. Indeed we have the following important result.

Proposition 2.26.

For any non-constant polynomial , there exists a field extension K′ of K such that f has a root in K′.

Proof

If f has a root in K, taking K′ = K proves the proposition. So we assume that f has no root in K (which implies that deg f ≥ 2). In principle, we do not require f to be irreducible. But if we consider a non-constant factor g of f, irreducible over K, we see that the roots of g in any extension L of K are roots of f in L too. Thus we may replace f by g and assume, without loss of generality, that f is irreducible. We construct the field extension K′ := K[X]/〈f〉 of K and denote the equivalence class of X in K′ by α. (One also writes x, X or [X] to denote this equivalence class.) It is clear that , that is, α is a root of f(X) in K′.

We say that the field K′ in the proof of the last proposition is obtained by adjoining the root α of f and denote this as K′ = K(α). We can write f(X) = (X – α)f₁(X), where and deg f₁ = (deg f) – 1. Now there is a field extension K″ of K′, where f₁ has a root. Proceeding in this way we prove the following result.

Proposition 2.27.

A non-constant polynomial f in K[X] with deg f = d has d roots (not necessarily all distinct) in some field extension L of K.

If a polynomial of degree d ≥ 1 has all its roots α₁, . . . , α_d in L, then f(X) = a(X – α₁) · · · (X – α_d) for some (actually ). In this case, we say that f splits (completely or into linear factors) over L.

Definition 2.37.

Let be a non-constant polynomial. A minimal (with respect to inclusion) field extension of K, over which f splits completely is called a splitting field of f over K.^[7] This is a minimal field which contains K and all the roots of f.

^[7] It is necessary to use the phrase “over K” in this definition. X² + 1, treated as a polynomial in , has the splitting field , whereas the same polynomial, treated as an element of , has the splitting field (see Equation (2.3) on p 74).

Every non-constant polynomial has a splitting field L over K. Quite importantly, this field L is unique in some sense. This allows us to call the splitting field of f instead of a splitting field of f. We discuss these topics further in Section 2.8.

Definition 2.38.

Let f be a non-constant polynomial in K[X] and let α be a root of f (in some extension of K). The largest natural number n for which (X –α)ⁿ|f(X) is called the multiplicity of the root α (in f). If n = 1 (resp. n > 1), then α is called a simple (resp. multiple) root of f. If all the roots of f are simple, then we call f a square-free polynomial. It is easy to see that f is square-free, only if f is not divisible by the square of a non-constant polynomial in K[X]. The reverse implication also holds, if char K = 0 or if K is a finite field (or, more generally, if K is a perfect field—see Exercise 2.76).

The notion of multiplicity can be extended to a non-root β of f by setting the multiplicity of β to zero.

2.6.3. Algebraic Elements and Extensions

Here we assume, unless otherwise stated, that K ⊆ L is a field extension.

Definition 2.39.

An element is said to be algebraic over K, if there exists a non-constant polynomial with f(α) = 0. If an element is not algebraic over K, we say that α is transcendental over K. Thus a transcendental (over K) element is a root of no polynomials in K[X]. A field extension K ⊆ L is called an algebraic extension, if every element of L is algebraic over K. A non-algebraic extension is also called a transcendental extension. If K ⊆ L is a transcendental extension, there exists at least one element , which is transcendental (that is, not algebraic) over K.

Example 2.10.

Every element is algebraic over K, since it is a root of the non-constant polynomial .
The element is algebraic over , since α is a root of the polynomial .
The well-known real numbers e and π are transcendental over . (We are not going to prove this.) Of course, the concept of algebraic and transcendental elements is heavily dependent on the field K. For example, e and π, being elements of , are algebraic over .
A complex number , where and a, , is a root of the polynomial and hence is algebraic over . Therefore, the field extension is algebraic.
The extension is transcendental, since contains elements (like e and π) that are transcendental over .

Definition 2.40.

Let be algebraic over K. A non-constant polynomial of least positive degree with f(α) = 0 is called a minimal polynomial of α over K.

Proposition 2.28.

Let be algebraic over K. A minimal polynomial f of α over K is irreducible over K. If is a polynomial with h(α) = 0, then f|h. In particular, any two minimal polynomials f and g of α satisfy g(X) = cf(X) for some .

Proof

Let f = f₁f₂ for some non-constant polynomials f₁, . Since K is a field and 0 = f(α) = f₁(α)f₂(α), we have f₁(α) = 0 or f₂(α) = 0. But deg f₁ < deg f and deg f₂ < deg f, a contradiction to the choice of f.

Using polynomial division one can write h(X) = q(X)f(X) + r(X) for some polynomials q, . Now h(α) = 0 implies r(α) = 0. Since deg r < deg f, by the choice of f we must then have r(X) = 0, that is, f|h.

Finally, if f and g are two minimal polynomials of α over K, then f|g and g|f and it follows that g(X) = cf(X) for some unit c of K[X]. But the only units of K[X] are the non-zero elements of K.

By Proposition 2.28, a monic minimal polynomial f of α over K is uniquely determined by α and K. It is, therefore, customary to define the minimal polynomial of α over K to be this (unique) monic polynomial. Unless otherwise stated, we will stick to this revised definition and write f(X) = minpoly_{α, K}(X).

Example 2.11.

For , we have minpoly_{α, K}(X) = X – α.
A complex number z = a+ib, a, , b ≠ 0, is not a root of a linear polynomial over , but is a root of the quadratic polynomial . Therefore, , that is, f is irreducible over .

Proposition 2.29.

For a field K, the following conditions are equivalent.

Every proper field extension K  L is transcendental (that is, K has no algebraic extensions other than itself).
Every non-constant polynomial in K[X] has a root in K.
Every non-constant polynomial in K[X] splits in K.
Every non-constant irreducible polynomial in K[X] is of degree 1.

Proof

[(a)⇒(b)] Consider a non-constant irreducible polynomial and the field extension L = K[X]/〈f〉 of K. We have seen that L contains a root of f. We will prove in Section 2.8 that such an extension is algebraic (Corollary 2.11). Hence (a) implies that L = K, that is, K contains a root of f.

[(b)⇒(c)] Let be a non-constant polynomial. By (b), f has a root, say, . Thus f(X) = (X – α₁)f₁(X) for some with deg f₁ = (deg f) – 1. If f₁ is a constant polynomial, we are done. Otherwise, we find as above and with f₁(X) = (X – α₂)f₂(X) and with deg f₂ = (deg f) – 2. Proceeding in this way proves (c).

[(c)⇒(d)] Obvious.

[(d)⇒(a)] Let be algebraic over K and let . Since f is irreducible, by (d) deg f = 1, that is, f(X) = X – α, that is, .

Definition 2.41.

A field K satisfying the equivalent conditions of Proposition 2.29 is called an algebraically closed field. For an arbitrary field K, a minimal algebraically closed field containing K is called an algebraic closure of K.

We will see in Section 2.8 that an algebraic closure of every field exists and is unique in some sense. The algebraic closure of an algebraically closed field K is K itself. We end this section with the following well-known theorem. We will not prove the theorem in this book, because every known proof of it uses some kind of complex analysis which this book does not deal with.

Theorem 2.24. Fundamental theorem of algebra

The field is algebraically closed.

is not algebraically closed, since the proper extension is algebraic (See Example 2.10). Indeed, is the algebraic closure of .

Exercise Set 2.6

2.51	Let R be a ring and f, . Show that: deg(f + g) ≤ max(deg f, deg g) with equality holding, if deg f ≠ deg g. deg(f g) ≤ deg f + deg g with equality holding, if R is an integral domain. If R is an integral domain, then R[X] is an integral domain too. More generally, if R is an integral domain, then R[X₁, . . . , X_n] is also an integral domain for all .
2.52	Let f, , where R is an integral domain. Show that if f(a_i) = g(a_i) for i = 1, . . . , n, where n > max(deg f, deg g) and where a₁, . . . , a_n are distinct elements of R, then f = g. In particular, if f(a) = g(a) for an infinite number of , then f = g.
2.53	Lagrange’s interpolation formula Let K be a field and let a₀, . . . , a_n be distinct elements of K. Show that for (not necessarily all distinct), there exists a unique polynomial of degree ≤ n such that f(a_i) = b_i for all i = 0, . . . , n. [H]
2.54	Polynomials over a UFD Let R be a UFD. For a non-zero polynomial , a gcd of the coefficients of f is called a content of f and is denoted by cont f. One can then write f = (cont f)f₁, where with cont . f₁ is called a primitive part of f and is often denoted as pp f. It is clear that cont f and pp f are unique up to multiplication by units of R. If for a non-zero polynomial the content cont (or, equivalently, if f and pp f are associates), then f is called a primitive polynomial. Show that for two non-zero polynomials f, the elements cont(f g) and (cont f)(cont g) are associates in R. In particular, the product of two primitive polynomials is again primitive.
2.55	Let R be a UFD. Show that a non-constant polynomial is irreducible over R if and only if f is irreducible over Q(R), where Q(R) denotes the quotient field of R (see Exercise 2.34).
2.56	Eisenstein’s criterion Let R be a UFD and with a_n ≠ 0. Suppose that there is a prime such that p does not divide a_n, p divides a_i for all i, 0 ≤ i ≤ n – 1, and p² does not divide a₀. Show that f is irreducible over R. As an application of Eisenstein’s criterion show that for a prime the polynomial X^p–1 + · · · + X + 1 is irreducible in . [H]
2.57	Let K ⊆ L be a field extension and f₁, . . . , f_n non-constant polynomials in K[X]. Show that each f_i, i = 1, . . . , n, splits over L if and only if the product f₁ · · · f_n splits over L.
2.58	Show that the irreducible polynomials in have degrees ≤ 2. [H]
2.59	Show that a finite field (that is, a field with finite cardinality) is not algebraically closed. In particular, the algebraic closure of a finite field is infinite.
2.60	A complex number z is called an algebraic number, if z is algebraic over . An algebraic number z is called an algebraic integer, if z is a root of a monic polynomial in . Show that: If z is an algebraic number, then mz is an algebraic integer for some . If is an algebraic integer, then . If is an algebraic integer, then for any integer the complex numbers nz and z + n are algebraic integers.
2.61	Let K be a field and . The formal derivative f′ of f is defined to be the polynomial . Show that: (f + g)′ = f′ + g′ and (f g)′ = f′g + f g′ for any f, . If char K = 0, then f′ = 0 if and only if . If char K = p > 0, then f′ = 0 if and only if f(X) = g(X^p) for some . f (≠ 0) has no multiple roots (in any extension field of K), that is, f is square-free, if and only if gcd(f, f′) = 1. Let f be a (non-constant) irreducible polynomial over K. Show that if char K = 0, then f has no multiple roots. On the other hand, if char K = p > 0, show that f has multiple roots if and only if f(X) = g(X^p) for some . (However, if , then by Fermat’s little theorem g(X^p) = g(X)^p, which contradicts the fact that f(x) is irreducible. Therefore, f cannot have multiple roots.)
2.62	Let be a non-constant polynomial of degree d and let α₁, . . . , α_d be the roots of f (in some extension field of K). The quantity is called the discriminant of f. Prove the following assertions: Δ(f) = 0 if and only if f has a multiple root. . Δ(X² + aX + b) = a² – 4b. Δ(X³ + aX + b) = –(4a³ + 27b²).

2.7. Vector Spaces and Modules

Vector spaces and linear transformations between them are the central objects of study in linear algebra. In this section, we investigate the basic properties of vector spaces. We also generalize the concept of vector spaces to get another useful class of objects called modules. A module which also carries a (compatible) ring structure is referred to as an algebra. Study of algebras over fields (or more generally over rings) is of importance in commutative algebra, algebraic geometry and algebraic number theory.

2.7.1. Vector Spaces

Unless otherwise specified, K denotes a field in this section.

Definition 2.42.

A vector space V over a field K (or a K-vector space, in short) is an (additively written) Abelian group V together with a multiplication map · : K × V → V called the scalar multiplication map, such that the following properties are satisfied by every a, and x, .

a · (x + y) = a · x + a · y,
(a + b) · x = a · x + b · x,
1 · x = x,
a · (b · x) = (ab) · x,

where ab denotes the product of a and b in the field K. When no confusions are likely, we omit the scalar multiplication sign · and write a · x simply as ax.

Example 2.12.

Any field K is trivially a K-vector space with the scalar multiplication being the same as the field multiplication. More generally, if K ⊆ L is a field extension, then L is a K-vector space.
For , the product Kⁿ = K × · · · × K (n factors) is a K-vector space under the scalar multiplication map a(x₁, . . . , x_n) := (ax₁, . . . , ax_n). For arbitrary K-vector spaces V₁, . . . , V_n, we can analogously define the product V₁ × · · · × V_n.
The polynomial ring K[X] (or K[X₁, . . . , X_n]) is a K-vector space (with the natural scalar multiplication).

Corollary 2.8.

Let V be a K-vector space. For every and , we have:

0 · x = 0.
a · 0 = 0.
(–a) · x = a · (–x) = –(a · x).

Proof

Easy verification.

Definition 2.43.

Let V be a vector space over K and S a subset of V. We say that S is a generating set or a set of generators of V (over K), or that S generates V (over K), if every element can be written as a finite linear combination x = a₁x₁ + · · · + a_nx_n for some (depending on x) and with and for 1 ≤ i ≤ n. A generating set S of V is called minimal, if no proper subset of S generates V. If V has a finite generating set, then V is called finitely generated or finite-dimensional.

Example 2.13.

Consider the field extension L := K[X]/〈f(X)〉 of K, where f is an irreducible polynomial in K[X] of degree n. If α denotes the equivalence class of X in L, then every element of L can be written as a_n–1α^n–1 + · · · + a₁α + a₀ with for 0 ≤ i ≤ n – 1. Thus {1, α, . . . , α^n–1} is a generating set of L over K. In particular, L is finitely generated over K.
The K-vector space Kⁿ is generated by the unit vectors e_i, 1 ≤ i ≤ n, defined as e_i := (0, . . . , 0, 1, 0, . . . , 0) (1 in the i-th position). Thus Kⁿ is also finitely generated over K.
{1, X, X², · · ·} is an infinite generating set of the polynomial ring K[X] regarded as a K-vector space. K[X] is not finitely generated over K.
It is not difficult to show that the generating sets discussed in these examples are minimal.

Definition 2.44.

A subset S of a K-vector space V is called linearly independent (over K), if whenever a₁x₁ + · · · + a_nx_n = 0 for some , and , 1 ≤ i ≤ n, we have a₁ = · · · = a_n = 0. If S is not linearly independent, it is called linearly dependent. If S is linearly independent (resp. dependent), then we also say that the elements of S are linearly independent (resp. dependent). A maximal linearly independent subset of V is a linearly independent subset S ⊆ V with the property that S ∪ {x} is linearly dependent for any .

If , then S is linearly dependent, since a · 0 = 0 for any . One can easily check that all the generating sets of Example 2.13 are linearly independent too. This is, however, not a mere coincidence, as the following result demonstrates.

Theorem 2.25.

A subset S of a K-vector space V is a minimal generating set for V if and only if S is a maximal linearly independent set of V.

Proof

[if] Given a maximal linearly independent subset S of V, we first show that S is a generating set for V. Take any non-zero . By the maximality of S, the set S ∪ {x} is linearly dependent, that is, there exists a linear relation of the form a₀x + a₁x₁ + · · · + a_nx_n = 0, , , with some a_i ≠ 0. The linear independence of S forces a₀ ≠ 0 and so is a finite linear combination of elements of S. Thus S generates V. Now, we show that S is minimal. Assume otherwise, that is, S′ := S \ {y} generates V for some . Since S is linearly independent, y ≠ 0. For some , , , we then have y = b₁y₁ + · · · + b_my_m, a contradiction to the linear independence of S.

[only if] Given a minimal generating set S of V, we first show that S is linearly independent. Assume not, that is, a₁x₁ + · · · + a_nx_n = 0 for some , and with some a_i, say a₁, non-zero. But then and, therefore, S \ {x₁} also generates V, a contradiction to the minimality of S. Thus S is linearly independent. Now choose a non-zero . Since S generates V, we can write y = b₁y₁ + · · · + b_my_m, , and , that is, 1y – b₁y₁ – · · · – b_my_m = 0, that is, S ∪ {y} is linearly dependent.

Definition 2.45.

Let V be a K-vector space. A minimal generating set S of V is called a basis of V over K (or a K-basis of V). By Theorem 2.25, S is a basis of V if and only if S is a maximal linearly independent subset of V. Equivalently, S is a basis of V if and only if S is a generating set of V and is linearly independent.

Any element of a vector space can be written uniquely as a finite linear combination of elements of a basis, since two different ways of writing the same element contradict the linear independence of the basis elements.

A K-vector space V may have many K-bases. For example, the elements 1, aX + b, (aX + b)², · · · form a K-basis of K[X] for any a, , a ≠ 0. However, what is unique in any basis of a given K-vector space V is the cardinality^[8] of the basis, as shown in Theorem 2.26.

^[8] Two sets (finite or not) S₁ and S₂ are said to be of the same cardinality, if there exists a bijective map S₁ → S₂.

For the sake of simplicity, we sometimes assume that V is a finitely generated K-vector space. This assumption simplifies certain proofs greatly. But it is important to highlight here that, unless otherwise stated, all the results continue to remain valid without the assumption. For example, it is a fact that every vector space has a basis. For finitely generated vector spaces, this is a trivial statement to prove, whereas without our assumption we need to use arguments that are not so simple. (A possible proof follows from Exercise 2.63 with U = {0}.)

Theorem 2.26.

Let V be a K-vector space. Then any K-basis of V has the same cardinality.

Proof

We assume that V is finitely generated. Let S = {x₁, . . . , x_n} be a minimal finite generating set, that is, a basis, of V. Let T be another basis of V. Assume that m := #T > n. (We might even have m = ∞.) We can choose distinct elements . Note that x_i and y_j are non-zero. Now we can write y₁ = a₁x₁ + · · · + a_nx_n for some (unique) , with some a_i ≠ 0. Renumbering x₁, . . . , x_n, if necessary, we may assume that a₁ ≠ 0. Then . It follows that y₁, x₂, . . . , x_n generate V. In particular, we can write y₂ = b₁y₁ + b₂x₂ + · · · + b_nx_n, , with some b_i ≠ 0. If b₂ = · · · = b_n = 0, then y₁, y₂ are linearly dependent, a contradiction. So b_i ≠ 0 for some i, 2 ≤ i ≤ n. Again we may renumber x₂, . . . , x_n, if necessary, to assume that b₂ ≠ 0. Then , that is, y₁, y₂, x₃, . . . , x_n generate V. Proceeding in this way we can show that y₁, . . . , y_n generate V, a contradiction to the minimality of T as a generating set. Thus we must have m ≤ n. In particular, m is finite. Now reversing the roles of S and T we can likewise prove that n ≤ m.

Theorem 2.26 holds even when V is not finitely generated. We omit the proof for this case here.

Definition 2.46.

Let V be a K-vector space. The cardinality of any K-basis of V is called the dimension of V over K and is denoted by dim_K V (or by dim V, if K is understood from the context). We call V finite-dimensional (resp. infinite-dimensional), if dim_K V is finite (resp. infinite).

For example, dim_K Kⁿ = n, , and dim_K K[X] = ∞.

Definition 2.47.

Let V be a K-vector space. A subgroup U of V, which is closed under the scalar multiplication of V, is again a K-vector space and is called a (vector) subspace of V. In this case, we have dim_K U ≤ dim_K V (Exercise 2.63).

Example 2.14.

Let V be a vector space over K.

The subset {0} and V are trivially subspaces of V.
Let S be any subset of V (not necessarily linearly independent). Then the set is a vector subspace of V. We say that U is spanned or generated by S, or that S generates or spans U, or that U is the span of S. This is often denoted by or by U = Span S. If S is linearly independent, then S is a basis of U.

Definition 2.48.

Let V and W be K-vector spaces. A map f : V → W is called a homomorphism (of vector spaces) or a linear transformation or a linear map over K, if

f(ax + by) = af(x) + bf(y)

for all a, and x, . Equivalently, f is a linear map over K if and only if f(x + y) = f(x) + f(y) and f(ax) = af(x) for all and x, . The set of all K-linear maps V → W is denoted by Hom_K(V, W). Hom_K(V, W) is a K-vector space under the definitions (f + g)(x) := f(x) + g(x) and (af)(x) := af(x) for all f, and . A K-linear transformation V → V is called a K-endomorphism of V. The set of all K-endomorphisms of V is denoted by End_K V. A bijective^[9] homomorphism (resp. endomorphism) is called an isomorphism (resp. automorphism).

^[9] As in Footnote 2, we continue to be lucky here: The inverse of a bijective linear transformation is again a linear transformation.

Theorem 2.27.

Let V and W be K-vector spaces. Then V and W are isomorphic if and only if dim_K V = dim_K W.

Proof

If dim_K V = dim_K W and S and T are bases of V and W respectively, then there exists a bijection f : S → T. One can extend f to a linear map as , for , and . One can readily verify that is an isomorphism. Conversely, if g : V → W is an isomorphism and S is any basis of V, then is clearly a basis of W.

Corollary 2.9.

A K-vector space V with n := dim_K V < ∞ is isomorphic to Kⁿ.

Let V be a K-vector space and U a subspace. As in Section 2.3 we construct the quotient group V/U. This group can be given a K-vector space structure under the scalar multiplication map a(x + U) := ax + U, , . If T ⊆ V is such that the residue classes of the elements of T form a K-basis of V/U and if S is a K-basis of U, then it is easy to see that S ∪ T is a K-basis of V. In particular,

Equation 2.2

For , the set is called the kernel Ker f of f, and the set is called the image Im f of f. We have the isomorphism theorem for vector spaces:

Theorem 2.28. Isomorphism theorem

Ker f is a subspace of V, Im f is a subspace of W, and V/Ker f ≅ Im f.

Proof

Similar to Theorem 2.3 and Theorem 2.9.

Definition 2.49.

For , the dimension of Im f is called the rank of f and is denoted by Rank f, whereas the dimension of Ker f is called the nullity of f and is denoted by Null f. An immediate consequence of the isomorphism theorem and of Equation (2.2) is the following important result.

Theorem 2.29.

Rank f + Null f = dim_K V for any .

*2.7.2. Modules

If we remove the restriction that K is a field and assume that K is any ring, then a vector space over K is called a K-module. More specifically, we have:

Definition 2.50.

Let R be a ring. A module over R (or an R-module) is an (additively written) Abelian group M together with a multiplication map · : R × M → M called the scalar multiplication map, such that for every a, and x, we have a · (x + y) = a · x + a · y, (a + b) · x = a · x + b · x, 1 · x = x, and a · (b · x) = (ab) · x, where ab denotes the product of a and b in the ring R. When no confusions are likely, we omit the scalar multiplication sign · and write a · x as ax.

Example 2.15.

Vector spaces are special cases of modules, when the underlying ring is a field.
Ideals of R are modules over R with the ring multiplication map taken as the scalar multiplication.
Every Abelian group G is a -module under the scalar multiplication
The polynomial rings R[X] and R[X₁, . . . , X_n] are modules over R.
Let M_i, , be a family of R-modules. The direct product of M_i is defined as the set of all tuples indexed by I. The direct sum is the subset of the Cartesian product consisting only of the tuples for which a_i = 0 except for a finite number of . Both the direct product and the direct sum are R-modules under component-wise addition and scalar multiplication. When I is finite, they are naturally the same.

Modules are a powerful generalization of vector spaces. Any result we prove for modules is equally valid for vector spaces, ideals and Abelian groups. On the other hand, since we do not demand that the ring R be necessarily a field, certain results for vector spaces are not applicable for all modules.

It is easy to see that Corollary 2.8 continues to hold for modules. An R-submodule of an R-module M is a subgroup of M, that is closed under the scalar multiplication of M. For a subset S ⊆ M, the set of all finite linear combinations of the form a₁x₁ + · · · + a_nx_n, , , , is an R-submodule N of M, denoted by RS or . We say that N is generated by S (or by the elements of S). If S is finite, then N is said to be finitely generated. A (sub)module generated by a single element is called cyclic. It is important to note that unlike vector spaces the cardinality of a minimal generating set of a module is not necessarily unique. (See Exercise 2.68 for an example.) It is also true that given a minimal generating set S of M, there may be more than one ways of writing an element of M as finite linear combinations of elements of S. For example, if and S = {2, 3}, then 1 = (–1)·2+1·3 = 2·2+(–1)·3. The nice theory of dimensions developed in connection with vector spaces does not apply to modules.

For an R-submodule N of M, the Abelian group M/N is given an R-module structure by the scalar multiplication map a(x + N) := ax + N. This module is called the quotient module of M by N.

For R-modules M and N, an R-linear map or an R-module homomorphism (from M to N) is defined as a map f : M → N with f(ax+by) = af(x)+bf (y) for all a, and x, (or equivalently with f(x + y) = f(x) + f(y) and f(ax) = af(x) for all and x, ). An isomorphism, an endomorphism and an automorphism are defined in analogous ways as in case of vector spaces. The set of all (R-module) homomorphisms M → N is denoted by Hom_R(M, N) and the set of all (R-module) endomorphisms of M is denoted by End_R M. These sets are again R-modules under the definitions: (f + g)(x) := f(x) + g(x) and (af)(x) := af(x) for all and (and f, g in Hom_R(M, N) or End_R M).

The kernel and image of an R-linear map f : M → N are defined as the sets Ker and Im . With these notations we have the isomorphism theorem for modules:

Theorem 2.30. Isomorphism theorem

Ker f and Im f are submodules of M and N respectively and M / Ker f ≅ Im f.

For an R-module M and an ideal of R, the set consisting of all finite linear combinations with , and is a submodule of M. On the other hand, for a submodule N of M the set is an ideal of R. In particular, the ideal (M : 0) is called the annihilator of M and is denoted as Ann_R M (or as Ann M). For any ideal , one can view M as an under the map . One can easily check that this map is well-defined, that is, the product is independent of the choice of the representative a of the equivalence class .

Definition 2.51.

A free module M over a ring R is defined to be a direct sum of R-modules M_i with each M_i ≅ R as an R-module. If I is of finite cardinality n, then M is isomorphic to Rⁿ.

Any vector space is a free module (Theorem 2.27 and Corollary 2.9). The Abelian groups , , are not free.

Theorem 2.31. Structure theorem for finitely generated modules

M is a finitely generated R-module if and only if M is a quotient of a free module Rⁿ for some .

Proof

[if] The free module Rⁿ has a canonical generating set e_i, , where

e_i = (0, . . . , 0, 1, 0, . . . , 0) (1 in the i-th position).

If M = Rⁿ/N, then the equivalence classes e_i + N, i = 1, ..., n, constitute a finite set of generators of M.

[only if] If x₁, ..., x_n generate M, then the R-linear map f : Rⁿ → M defined by (a₁, ..., a_n) ↦ a₁x₁ + · · · + a_nx_n is surjective. Hence by the isomorphism theorem M ≅ Rⁿ / Ker f.

**2.7.3. Algebras

Let be a homomorphism of rings. The ring A can be given an R-module structure with the multiplication map for and . This R-module structure of A is compatible with the ring structure of A in the sense that for every a, and x, one has (ax)(by) = (ab)(xy).

Conversely, if a ring A has an R-module structure with (ax)(by) = (ab)(xy) for every a, and x, , then there is a unique ring homomorphism taking a ↦ a · 1 (where 1 denotes the identity of A). This motivates us to define the following.

Definition 2.52.

Let R be a ring. An algebra over R or an R-algebra is a ring A together with a ring homomorphism . The homomorphism is called the structure homomorphism of the R-algebra A. If A and B are R-algebras with structure homomorphisms and ψ : R → B, then an R-algebra homomorphism (from A to B) is a ring homomorphism η : A → B such that .

Example 2.16.

Let R be a ring.

The polynomial ring R[X₁, . . . , X_n] is an R-algebra with the canonical inclusion as the structure homomorphism and is called a polynomial algebra over R.
For an ideal of R, the canonical surjection makes an R-algebra.
If A is an R-algebra with structure homomorphism and if B is an A-algebra with structure homomorphism ψ : A → B, then B is an R-algebra with structure homomorphism .
Combining (2) and (3) implies that if A is an R-algebra and an ideal of A, then the ring is again an R-algebra, called the quotient algebra of A by .

An R-algebra A is an R-module with the added property that multiplication of elements of A is now legal. Exploiting this new feature leads to the following concept of algebra generators.

Definition 2.53.

Let A be an R-algebra with the structure homomorphism . A subset S of A is said to generate A as an R-algebra, if every element can be written as a polynomial expression in (finitely many) elements of S with coefficients from R (that is, from ). We write this as A = R[S]. If S = {x₁, . . . , x_n} is finite, we also write R[x₁, . . . , x_n] in place of R[S] and say that A is finitely generated as an R-algebra or that the homomorphism is of finite type.

Example 2.17.

The polynomial algebra R[X₁, . . . , X_n], n ≥ 1, over R is not finitely generated as an R-module, but is finitely generated as an R-algebra.
For an ideal of R[X₁, . . . , X_n], the ring is generated as an R-algebra by the equivalence classes of X_i, 1 ≤ i ≤ n, that is, A = R[x₁, . . . , x_n]. If is not the zero ideal, then A is not a polynomial algebra, because x₁, . . . , x_n are not indeterminates in the sense that they satisfy (non-zero) polynomial equations f(x₁, . . . , x_n) = 0 for every . (In this case, we also say that x₁, . . . , x_n are algebraically dependent.) The notation R[. . .] is a generalization of the notation for polynomial algebras. In what follows, we usually denote polynomial algebras by R[X₁, . . . , X_n] with upper-case algebra generators, whereas for an arbitrary finitely generated R-algebra we use lower-case symbols for the algebra generators as in R[x₁, . . . , x_n].

One may proceed to define kernels and images of R-algebra homomorphisms and frame and prove the isomorphism theorem for R-algebras. We leave the details to the reader. We only note that algebra homomorphisms are essentially ring homomorphisms with the added condition of commutativity with the structure homomorphisms.

Theorem 2.32.

A ring A is a finitely generated R-algebra if and only if A is a quotient of a polynomial algebra (over R).

Proof

[if] Immediate from Example 2.17.

[only if] Let A := R[x₁, . . . , x_n]. The map η : R[X₁, . . . , X_n] → A that takes f(X₁, . . . , X_n) ↦ f(x₁, . . . , x_n) is a surjective R-algebra homomorphism. By the isomorphism theorem, one has the isomorphism A ≅ R[X₁, . . . , X_n]/Ker η of R-algebras.

This theorem suggests that for the study of finitely generated algebras it suffices to investigate only the polynomial algebras and their quotients.

Exercise Set 2.7

2.63	Let V be a K-vector space, U a subspace of V, and T an arbitrary K-basis of U. Show that there is a K-basis of V, that contains T. [H]
2.64	Let V be a K-vector space, and U₁, U₂ subspaces of V. Show that the set is a K-subspace of V. If U₁ ∩ U₂ = {0}, we say that U is the direct sum of U₁ and U₂ and write U = U₁ ⊕ U₂. Let V be a K-vector space and W a subspace of V. Show that there exists a subspace W′ of V such that V = W ⊕ W′. This space W′ is called the complement subspace of W in V. [H]
2.65	Let V and W be K-vector spaces and f : V → W a K-linear map. Show that f is uniquely determined by the images f(x), , where S is a basis of V.
2.66	Let V and W be K-vector spaces. Check that Hom_K(V, W) is a vector space over K. Show that dim_K(Hom_K(V, W)) = (dim_K V)(dim_K W). In particular, if W = K, then Hom_K(V, K) is isomorphic to V. The space Hom_K(V, K) is called the dual space of V.
2.67	Let V and W be m- and n-dimensional K-vector spaces, S = {x₁, . . . , x_m} a K-basis of V, T = {y₁, . . . , y_n} a K-basis of W, and f : V → W a K-linear map. For each i = 1, . . . , m, write f(x_i) = a_i1y₁ + · · · + a_iny_n, . The m × n matrix is called the transformation matrix of f (with respect to the bases S and T). We have: Let V₁, V₂, V₃ be K-vector spaces, f, f₁, , and . Prove the following assertions: . . f is invertible (as a map) if and only if is invertible (as a matrix). (Remark: This exercise explains that the linear transformations of finite-dimensional vector spaces can be explained in terms of matrices.)
2.68	Show that for every there are integers a₁, . . . , a_n that constitute a minimal set of generators for the unit ideal in . [H]
2.69	Let M be an R-module. A subset S of M is called a basis of M, if S generates M and is linearly independent over R in the sense that , , , , implies a₁ = · · · = a_n = 0. Show that M has a basis if and only if M is a free R-module.
2.70	We define the rank of a finitely generated R-module M as Rank_R M := min{#S \| M is generated by S}. If N is a submodule of M, show that Rank_R M ≤ Rank_R N + Rank_R(M/N). Give an example where the strict inequality holds.
2.71	Let M be an R-module. An element is called a torsion element of M, if Ann Rx ≠ 0, that is, if there is with ax = 0. The set of all torsion elements of M is denoted by Tors M. M is called torsion-free if Tors M = {0}, and a torsion module if Tors M = M. Show that Tors M is a submodule of M. Show that Tors M is a torsion module (called the torsion submodule of M) and that the module M/Tors M is torsion-free. If R is an integral domain, show that every free module over R is torsion-free. In particular, every vector space is torsion-free.
2.72	Show that: is not finitely generated as a -module. [H] is not a free -module. [H] is a torsion-free -module. This shows that the converse of Exercise 2.71(c) is not true in general.

2.8. Fields

In this section, we study some important properties of field extensions. We also give an introduction to Galois theory. Unless otherwise stated, the letters F, K and L stand for fields in this section.

2.8.1. Properties of Field Extensions

We have seen that if F ⊆ K is a field extension, then K is a vector space over F. This observation leads to the following very useful definitions.

Definition 2.54.

For a field extension F ⊆ K, the cardinality of any F-basis of K is called the degree of the extension F ⊆ K and is denoted by [K : F]. If [K : F] is finite, K is called a finite extension of F. Otherwise, K is called an infinite extension of F.

Proposition 2.30.

Let F ⊆ K ⊆ L be a tower of field extensions. Then [L : F] = [L : K] [K : F]. In particular, the extension F ⊆ L is finite if and only if the extensions F ⊆ K and K ⊆ L are finite. In that case, [L : K] | [L : F] and [K : F] | [L : F].

Proof

One can easily check that if S is an F-basis of K and S′ a K-basis of L, then the set is an F-basis of L.

Recall the definitions of the rings F[X] of polynomials and F(X) of rational functions in one indeterminate X. These notations are now generalized. For a field extension F ⊆ K and for , we define:

and

Equation 2.3

It is easy to see that F[a] is the smallest (with respect to inclusion) of the integral domains that contain F and a. Similarly F(a) is the smallest of the fields that contain F and a. We also have F[a] ⊆ F(a). Now we state the following important characterization of algebraic elements.

Theorem 2.33.

For a field extension F ⊆ K and an element , the following conditions are equivalent:

The element a is algebraic over F.
The extension F(a) is finite over F.
F(a) = F[a].

Proof

[(a)⇒(b)] Let be of degree d. Consider the ring homomorphism that takes f(X) ↦ f(a). From Proposition 2.28, Ker , and by the isomorphism theorem . Since h is irreducible over F, F[X]/〈h〉 and so Im are fields. Since Im contains F and a (note that ), we have , that is, . Finally, notice that [F[X]/〈h〉 : F] = d.

[(b)⇒(c)] Let d := [F(a) : F]. Since the elements 1, a, a², . . . , a^d are linearly dependent over F, there exists , not all 0, such that α₀ + α₁a + · · · + α_da^d = 0. This, in turn, implies that there is an irreducible polynomial with h(a) = 0. Now consider . Clearly, h  g (because otherwise g(a) = 0). Since h is irreducible, gcd(g, h) = 1, that is, there exist polynomials u(X), with u(X)g(X) + v(X)h(X) = 1, that is, with u(a)g(a) = 1. But then .

[(c)⇒(a)] Clearly, the element 0 is algebraic over F. So assume a ≠ 0. Since , by hypothesis there is a polynomial such that 1/a = f(a). But then a is a root of the non-constant polynomial .

Corollary 2.10.

For a field extension F ⊆ K, the set of elements in K that are algebraic over F is a field.

Proof

It is sufficient to show that if a, are algebraic over F, then the elements a ± b, ab and a/b (if b ≠ 0) are also algebraic over F. By Theorem 2.33, [F(a) : F] is finite. Since b is algebraic over F, it is also algebraic over F(a). In particular, [F(a)(b) : F(a)] is finite. But then the extension F(a)(b) is also finite over F and contains a ± b, ab and a/b (if b ≠ 0).

The field F(a)(b) in the proof of the last corollary is also denoted as F(a, b). It is the smallest subfield of K that contains F, a and b, and it follows that F(a, b) = F(b, a). More generally, for a field extension F ⊆ K and for , each algebraic over F, the field F(a₁, . . . , a_n) is defined as F(a₁)(a₂) . . . (a_n) and is independent of the order in which a_i are adjoined.

Corollary 2.11.

Let F ⊆ K be a finite extension. Then K is algebraic over F.

Proof

For any , we have F ⊆ F(a) ⊆ K. Now use Proposition 2.30.

The converse of the last corollary is not true, that is, it is possible that an algebraic extension has infinite extension degree. Exercise 2.59 gives an example.

Corollary 2.12.

If F ⊆ K and K ⊆ L are algebraic field extensions, then F ⊆ L is also algebraic.

Proof

Take an arbitrary . Since K ⊆ L is algebraic, there is a non-zero polynomial such that f(a) = 0. It then follows that a is algebraic over F(α₀, . . . , α_n). Since each α_i is algebraic over F, the degree [F(α₀, . . . , α_n) : F] is finite. Therefore, [F(α₀, . . . , α_n)(a) : F] = [F(α₀, . . . , α_n)(a) : F(α₀, . . . , α_n)] [F(α₀, . . . , α_n) : F] is also finite and hence F(α₀, . . . , α_n)(a) and, in particular, a are algebraic over F.

Definition 2.55.

A field extension F ⊆ K is called simple, if K = F(a) for some .

Proposition 2.31.

Let F be a field of characteristic 0 and let a, b (belonging to some extension of F) be algebraic over F. Then the extension F(a, b) of F is simple.

Proof

Let p(X) and q(X) be the minimal polynomials (over F) of a and b respectively. Let d := deg p and d′ := deg q. The polynomials p and q are irreducible over F and hence by Exercise 2.61 have no multiple roots. Let a₁, . . . , a_d be the roots of p and b₁, . . . , b_d′ the roots of q with a = a₁ and b = b₁. For each i, j with j ≠ 1, the equation a_i + λb_j = a + λb has a unique solution for λ (not necessarily in F). Since F is infinite, we can choose which is not a solution of any of the equations just mentioned. Define c := a + μb, so that c ≠ a_i + μb_j for all i, j with j ≠ 1. Clearly, F(c) ⊆ F(a, b). To prove the reverse inclusion, note that by hypothesis q(b) = 0. Also if we define , we see that f(b) = p(a) = 0. By the choice of c, we have f(b_j) ≠ 0 for j ≠ 1. Finally since q is square-free, we have . This implies that and so too.

Corollary 2.13.

A finite extension F ⊆ K of fields of characteristic 0 is simple.

Proof

We proceed by induction on d := [K : F]. The result vacuously holds for d = 1. So let us assume that d > 1 and that the result holds for all smaller values of d. Choose an element . Then [F(a) : F] > 1 and divides d. If [F(a) : F] = d, we are done. So assume [F(a) : F] < d. Since [K : F(a)] < d, by the induction hypothesis the extension F(a) ⊆ K is simple, say K = F(a)(b) = F(a, b). The result now follows immediately from the previous proposition.

2.8.2. Splitting Fields and Algebraic Closure

Let f(X) be a non-constant polynomial of degree d in F[X]. Assume that f does nor split over F. Consider an irreducible (in F[X]) factor f′ of f of degree d′ > 1. F′ := F[X]/〈f′〉 is a field extension of F. Furthermore, if , the elements 1, constitute a basis of F′ over F. In particular, [F′ : F] = d′ ≤ d. Now, one can write f(X) = (X – α₁)g(X) for some . If g splits over F′, so does f too. Otherwise, choose any irreducible (in F′[X]) factor g′ of g with deg g′ > 1 and consider the field extension F″ := F′[X]/〈g′〉. Then [F″ : F′] = deg g′ ≤ deg g = d – 1, so that [F″ : F] ≤ d(d – 1). Moreover, if , then f(X) = (X –α₁)(X –α₂)h(X) for some . Proceeding in this way we get:

Proposition 2.32.

For a polynomial of degree d ≥ 1, there is a field extension K of F with [K : F] ≤ d!, such that f splits over K.

We now establish the uniqueness of the splitting field of a polynomial . To start with, we set up certain notations. An isomorphism μ : F → F′ of fields induces an isomorphism μ* : F[X] → F′[Y] of polynomial rings, defined by a_dX^d+a_d–1X^d–1 + · · · + a₀ ↦ μ(a_d)Y^d + μ(a_d–1)Y^d–1 + · · · + μ(a₀). We have μ*(a) = μ(a) for all . Note also that is irreducible over F if and only if is irreducible over F′. With these notations we state the following important lemma.

Lemma 2.5.

Let the non-constant polynomial be irreducible over F. Let α and β be roots of f and μ*(f) respectively. Then there is an isomorphism τ : F (α) → F′(β) of fields such that τ(a) = μ(a) for all and τ(α) = β.

Proof

Since F(α) = F[α] and F′(β) = F′[β], we can define the map τ : F[α] → F′[β] by g(α) ↦ (μ*(g))(β) for each . It is now an easy check that τ is a well-defined isomorphism of fields with the desired properties.

Roots of an irreducible polynomial are called conjugates (of each other). If α and β are two roots of an irreducible polynomial , the last lemma guarantees the existence of an isomorphism τ : F(α) → F(β) that fixes all the elements of F and that maps α ↦ β.

Proposition 2.33.

We use the maps μ : F → F′ and μ* : F[X] → F′[Y] as defined above. Let be a non-constant polynomial and let K and K′ be splitting fields of f and μ*(f) over F and F′ respectively. Then there is an isomorphism τ : K → K′ of fields, such that τ(a) = μ(a) for all .

Proof

We proceed by induction on n := [K : F]. (By Proposition 2.32 n is finite.) If n = 1, then K = F, that is, the polynomial f splits over F itself and so does μ*(f) over F′, that is, K′ = F′. Thus τ = μ is the desired isomorphism.

Now assume that n > 1 and that the result holds for all fields L and for all polynomials in L[X] with splitting fields (over L) of extension degrees less than n. Consider an irreducible factor g of f with 1 < deg g ≤ deg f. Note that g also splits over K. We take any root of g and consider the tower of field extensions F ⊆ F(α) ⊆ K. Similarly, let be a root of μ*(g) and consider F′ ⊆ F′(β) ⊆ K′. By Lemma 2.5 there is an isomorphism ν : F(α) → F′(β) with ν(a) = μ(a) for all and ν(α) = β. Now [K : F(α)] = [K : F]/[F (α) : F] = [K : F ]/deg g < n. It is evident that K and K′ are splitting fields of f and μ*(f) over F(α) and F′(β) respectively. Hence by the induction hypothesis there is an isomorphism τ : K → K′ with τ(a) = ν(a) for all . In particular, τ(a) = μ(a) for all .

The results pertaining to the splitting field of a polynomial can be generalized in the following way. Let S be a non-empty subset of F[X]. A splitting field of S over F is a minimal field K containing F such that each polynomial splits in K. If S = {f₁, . . . , f_r} is a finite set, the splitting field of S is the same as the splitting field of f = f₁ · · · f_r (Exercise 2.57). But the situation is different, if S is infinite. Of particular interest is the set S consisting of all irreducible polynomials in F[X]. In this case, the splitting field of S is an algebraic closure of F.

We give a sketch of the proof that even when S is infinite, a splitting field for S can be constructed. This, in particular, establishes the existence of an algebraic closure of any field. We may assume that S comprises non-constant polynomials only. For each , we define an indeterminate X_f and consider the ring and the ideal of A generated by f(X_f) for all . We have and, therefore, there is a maximal ideal of A containing (Exercise 2.23). Consider the field F₁ := A/m containing F. Every polynomial contains at least one root in F₁. Now we replace F by F₁ and as above get another field F₂ containing F₁ (and hence F), such that every polynomial in S (of degree ≥ 2) has at least two roots in F₂. We continue this procedure (infinitely often, if necessary) and obtain a sequence of fields F ⊆ F₁ ⊆ F₂ ⊆ F₃ ⊆ · · ·. Define K to be the field consisting of all elements of , that are algebraic over F. Each polynomial in S splits in K, but in no proper subfield of K, that is, K is a splitting field of S.

It turns out that the splitting field of S is unique up to isomorphisms that fix elements of F. In particular, the algebraic closure of F is unique up to isomorphisms that fix elements of F, and is denoted by .

*2.8.3. Elements of Galois Theory

For a field K, the set Aut K of all automorphisms of K is a group under (functional) composition. We extend this concept now. Let F ⊆ K be an extension of fields.

Definition 2.56.

An automorphism is called an F-automorphism of K, if fixes all the elements of F(which means that for all ). The set of all F-automorphisms of K is denoted by Aut_F K or by Gal(K|F) and is a subgroup of Aut K. The Galois group of a polynomial is defined to be the group Aut_F K, where K is the splitting field of f over F.

Conversely, for a subgroup H of Aut_F K the set of elements of K that are fixed by all the automorphisms of H, that is, the set of all with for every , is a subfield of K, called the fixed field of H (over F) and denoted as Fix_F H. Clearly, F ⊆ Fix_F H ⊆ K.

For every intermediate field L (that is, a field L with F ⊆ L ⊆ K), we have a subgroup Aut_L K of Aut_F K. Conversely, given a subgroup H of Aut_F K we have the intermediate fixed field Fix_F H. It is a relevant question to ask if there is any relationship between the subgroups of Aut_F K and the intermediate fields. A nice correspondence exists for a particular type of extensions that we define now.

Definition 2.57.

A field extension F ⊆ K is said to be a Galois extension (or K is said to be a Galois extension over F), if Fix_F (Aut_F K) = F. Thus K is Galois over F if and only if for every there is a with .

Example 2.18.

Let K be the splitting field of a non-constant polynomial . By Exercise 2.77, the extension F ⊆ K is normal. Assume that F ⊆ K is a separable extension (Exercise 2.75). Consider an element and let g be the minimal polynomial of α over F. Then deg g > 1 and g splits in K[X]. By assumption (of separability), there is a root of g with β ≠ α. Lemma 2.5 shows that there is a such that τ(α) = β. Thus, K is Galois over F. In particular, if char F = 0 or if , then F ⊆ K is separable and so Galois. For example, is a Galois extension of .

The following theorem establishes the correspondence we are looking for.

Theorem 2.34. Fundamental theorem of Galois theory

For a finite Galois extension F ⊆ K, there is a bijective correspondence between the set of all intermediate fields and the set of all subgroups of Aut_F K (given by L ↦ Aut_L K and H ↦ Fix_F H) such that the following assertions hold:

Aut_{Fix_F H} K = H for every subgroup H of Aut_F K.
Fix_F (Aut_L K) = L for every field L with F ⊆ L ⊆ K.
For field extensions F ⊆ L ⊆ L′ ⊆ K, the extension degree [L′ : L] is the same as the index [Aut_L K : Aut_L′ K]. In particular, the order of Aut_F K is [K : F].
For every intermediate field L, one has:

K is Galois over L.
L is Galois over F if and only if Aut_L K is a normal subgroup of Aut_F K. In this case, Aut_F L ≅ Aut_F K/Aut_L K.

A proof of this theorem is rather long and uses many auxiliary results which we would not need otherwise. We, therefore, choose to omit the proof here.

Exercise Set 2.8

2.73	Let α be transcendental over F. Show that the domain F[α] and the field F(α) are respectively isomorphic to the polynomial ring F[X] and the field F(X) of rational functions in one indeterminate X. Generalize the result for an arbitrary family α_i, , of elements each of which is transcendental over F.
2.74	Let F ⊆ K be a field extension and let be an endomorphism of K with for every . If a non-constant polynomial has a root , show that is also a root of f. For example, if , and is the automorphism mapping z to its (complex) conjugate , then we conclude that if a complex number z is a root of , then is also a root of f. A similar result holds for the extension , where m is a non-square rational number. If K is algebraic over F, show that is an automorphism. [H]
2.75	Let F ⊆ K be a field extension. An irreducible polynomial is said to be separable over F, if f has no multiple roots. An algebraic element is said to be separable over F, if the minimal polynomial of α over F is separable. K is called a separable extension of F, if every element of K is (algebraic and) separable over F. Show that if char F = 0 or if , and if K is an algebraic extension of F, then K is separable over F · [H] An algebraic element is called purely inseparable over F, if the minimal polynomial of α over F factors in K[X] as (X – α)ⁿ for some . If every element of K is (algebraic and) purely inseparable over F, then K is called a purely inseparable extension of F. Show that is both separable and purely inseparable if and only if . Thus, if char F = 0 or , then F has no purely inseparable extension other than itself. If p := char F > 0, then an element is purely inseparable over K if and only if minpoly_α,F(X) = X^{p^r} + a for some r ≥ 0 and . In particular, show that if K is a finite purely inseparable extension of F, then [K : F ] = p^s for some s ≥ 0.
2.76	F is called a perfect field, if every irreducible polynomial in F[X] is separable over F. Show that F is a perfect field if and only if every algebraic extension of F is separable over F. In particular, the fields of characteristic 0 and the fields , , are perfect. Let p := char F > 0. Show that F is perfect if and only if every element of F has a p-th root in F. [H]
2.77	A field extension F ⊆ K is called normal, if every irreducible polynomial in F[X], that has a root in K, splits in K[X]. If K is the splitting field of a polynomial over F, show that K is a normal extension of F. [H] If [K : F] = 2, show that F ⊆ K is a normal extension. Consider the tower of field extensions to conclude that if F ⊆ K and K ⊆ L are normal extensions, then F ⊆ L need not be normal.
2.78	Prove the following assertions: is an infinite extension of . [H] . [H]
2.79	Let F ⊆ K be a field extension and let L be the fixed field of Aut_F K over F. Show that K is a Galois extension of L.

2.9. Finite Fields

Finite fields are seemingly the most important types of fields used in cryptography. They enjoy certain nice properties that infinite fields (in particular, the well-known fields like , and ) do not. We concentrate on some properties of finite fields in this section. As we see later, arithmetic over a finite field K is fast, when char K = 2 or when #K is a prime. As a result, these two classes of fields are the most common ones employed in cryptography. However, in this section, we do not restrict ourselves to these specific fields only, but provide a general treatment valid for all finite fields. As in the previous section, we continue to use the letters F, K, L to denote fields. In addition, we use the letter p to denote a prime number and q a power of p: that is, q = pⁿ for some .

2.9.1. Existence and Uniqueness of Finite Fields

Let K be a finite field of cardinality q. Then p := char K > 0. By Proposition 2.7, p is a prime, that is, K contains an isomorphic copy of the field . If , we have q = pⁿ. Therefore, we have proved the first statement of the following important result.

Theorem 2.35.

The cardinality of a finite field is a power pⁿ, , of a prime number p. Conversely, given and , there exists a finite field of cardinality pⁿ.

Proof

In order to construct a finite field of cardinality q := pⁿ, we start with and consider the splitting field K of the polynomial . Since f′(X) = –1 ≠ 0, the roots of f are distinct (Exercise 2.61). Therefore, the set has cardinality q. By Exercise 2.80, E is a field. Since F ⊆ E ⊆ K and f splits over E, by definition of splitting fields, we have K = E, that is, #K = #E = q.

Theorem 2.36. Fermat’s little theorem for finite fields

Let K be a finite field of cardinality q. Then every satisfies a^q = a.

Proof

Clearly, 0^q = 0. Take a ≠ 0. K* being a group of order q – 1, by Proposition 2.4 ord_K* (a) divides q – 1. In particular, a^q–1 = 1, that is, a^q = a.

Theorem 2.37.

Let K be a finite field of cardinality q = pⁿ and let F be the subfield of K isomorphic to . Then K is the splitting field of the polynomial over F. In particular, K is unique up to F -isomorphisms (that is, isomorphisms fixing elements of F).

Proof

By Theorem 2.37, each of the q elements of K is a root of f and consequently K is the splitting field of f. The last assertion in the theorem follows from the uniqueness of splitting fields (Proposition 2.33).

This uniqueness allows us to talk about the finite field of cardinality q (rather than a finite field of cardinality q). We denote this (unique) field by .

The results proved so far can be generalized for arbitrary extensions , where q = pⁿ, n, . We leave the details to the reader (Exercise 2.82). It is important to point out here that since is the splitting field of X^{q^m} – X over , by Exercise 2.77 we have:

Corollary 2.14.

Every finite extension of finite fields is normal.

This implies that an irreducible polynomial has either none or all of its roots in . Also if with q = pⁿ, then α^q = α^pⁿ = α. Therefore, α^{p^n–1} is a p-th root of α. By Exercise 2.76(b), we then conclude:

Corollary 2.15.

Every finite field is perfect.

Proposition 2.34.

Consider the extension , . There is a unique intermediate field with q^d elements, , if and only if d|m. Furthermore, if d|m, then belongs to the (unique intermediate) field if and only if α^{q^d} = α.

Proof

For d|m, we have (X^{q^d} – X)|(X^{q^m} – X). The q^d roots of X^{q^d} – X in K constitute an intermediate field L. If L′ ≠ L is another intermediate field with q^d elements, by Theorem 2.36 there are more than q^d elements of K, that are roots of X^{q^d} – X, a contradiction. Conversely, an intermediate field L contains q^d elements, where . Since , we have d|m. The last assertion in the proposition follows immediately from the above argument.

Corollary 2.16.

Let and . Then deg f divides m.

Proof

Consider the extension of , where d := deg f, and the fact that is a normal extension.

Now we will prove a very important result concerning the multiplicative group .

Theorem 2.38.

is a cyclic group for any finite field .

Proof

Modify the proof of Proposition 2.19 or use the following more general result.

Theorem 2.39.

Let K be a field (not necessarily finite). Then any finite subgroup G of the multiplicative group K* is cyclic.

Proof

Since K is a field, for any the polynomial Xⁿ – 1 has at most n roots in K and hence in G. The theorem then follows immediately from Exercise 2.18.

Corollary 2.17.

Every finite extension is simple. In particular, contains an irreducible polynomial of degree m (for any q and m).

Proof

Let α be a generator of the cyclic group . Then, m is the smallest of the positive integers s for which α^{q^s} = α. Let with d := deg f, so that . If d < m, then α^{q^d} = α, a contradiction. Thus d = m, that is, .

2.9.2. Polynomials over Finite Fields

In this section, we study some useful properties of polynomials over finite fields. We concentrate on polynomials in for an arbitrary q = pⁿ, , . We have seen how the polynomials X^{q^m} – X proved to be important for understanding the structures of finite fields. But that is not all; these polynomials indeed have further roles to play. This prompts us to reserve the following special symbol: .

Let be a finite extension of finite fields and let be a root of the polynomial . Since each , we have . Therefore, . More generally, for each r = 0, 1, 2, · · · the element is a root of f(X). This gives us a nice procedure for computing the minimal polynomial of α as the following corollary suggests.

Corollary 2.18.

The minimal polynomial of over is (X – α)(X – α^q) · · · (X – α^{q^d–1}), where d is the smallest of the integers for which α^{q^s} = α.

Proof

Let have degree δ. So is the smallest field containing ( and) α and hence all the roots of f_α, that is, α^{q^s} = α for s = δ and for no smaller positive integer values of s. Therefore, δ = d and all the conjugates of α are precisely α, α^q, . . . , α^{q^d–1}.

We now prove a theorem which has important consequences.

Theorem 2.40.

is the product of all monic irreducible polynomials in , whose degrees divide m.

Proof

We have . By Corollary 2.18 , the minimal polynomial f_α(X) of over divides . By Corollary 2.16, deg f_α divides m. Finally, since f_α(X) = f_β(X) or gcd(f_α(X), f_β(X)) = 1 depending on whether α and β are conjugates or not, is a product of monic irreducible polynomials of , whose degrees divide m. In order to show that is the product of all such polynomials, let us consider an arbitrary polynomial which is monic and irreducible over and has degree d|m. The polynomial g splits over (with no multiple roots, finite fields being perfect). Since d|m, by Proposition 2.34 . Thus g splits over as well and, in particular, divides .

The first consequence of Theorem 2.40 is that it leads to a procedure for checking the irreducibility of a polynomial . Let d := deg f. If f(X) is reducible, it admits an irreducible factor of degree ≤ ⌊d/2⌋. Since is the product of all distinct irreducible factors of f with degrees dividing m, we compute the gcds g₁, . . . , g_⌊d/2⌋. If all these gcds are 1, we conclude that f is irreducible. Otherwise f is reducible. We will see an optimized implementation of this procedure in Chapter 3. Besides irreducibility testing, the above theorem also leads to algorithms for finding random irreducible polynomials and for factorizing polynomials, as we will also discuss in Chapter 3.

The second consequence of Theorem 2.40 is that it gives us a formula for calculating the number of monic irreducible polynomials of a given degree over a given field. First we need to define a function on .

Definition 2.58.

The Möbius function is defined as

It follows that μ(n) ≠ 0 if and only if n is square-free.

Lemma 2.6.

For , we have

where denotes summation over all positive divisors d of n.

Proof

The result follows immediately for n = 1. For n > 1, write , where p₁, . . . , p_r are r ≥ 1 distinct primes and . The only non-zero terms in the sum are those corresponding to d = p_i₁ · · · p_{i_s} for pairwise distinct choices of . From definition, it then follows that .

Lemma 2.7. Möbius inversion formula

Let f and g be maps from to an Abelian group G.

If G is additive and , then
If G is multiplicative and , then

Proof

To prove the additive formula we note that

where the last equality follows from Lemma 2.6. The multiplicative formula can be proved similarly.

Let us denote by ν_q,m the number of monic irreducible polynomials in of degree m and by the product of all monic irreducible polynomials in of degree m. By Theorem 2.40, we have and . Applications of the Möbius inversion formula then yield the following formulas:

Equation 2.4

If p₁, . . . , p_r are all the prime divisors of m (not necessarily all distinct), Equation (2.4) together with the observation that μ(n) ≥ –1 for all imply that But each p_i ≥ 2, so that m ≥ 2^r, and hence . We, therefore, have an independent proof of the second statement in Corollary 2.17. Moreover, for practical values of q and m we have the good approximation:

Equation 2.5

Since the total number of monic polynomials of degree m in F_q[X] is q^m, a randomly chosen monic polynomial in of degree m has an approximate probability of 1/m for being irreducible, that is, one expects to get an irreducible polynomial of degree m, after O(m) random monic polynomials are picked up from . These observations have an important bearing for devising efficient algorithms for finding irreducible polynomials over finite fields. (See Chapter 3.)

The conjugates of over are α^qⁱ, . It is interesting to look at the sum and the product of the conjugates of α. By Corollary 2.18, for some . Since , the elements and belong to . Since α^{q^d} = α, for any (positive) integral multiple δ of d, the sum and the product are elements of too.

Definition 2.59.

Let , q = pⁿ, be a finite extension of finite fields and let . The trace of α over is defined as the sum

and the norm of α over is defined as

In view of the preceding discussion, the trace and norm of α are elements of . For q = p, the trace and norm of α are also called the absolute trace and the absolute norm of α and are often denoted as and . We often drop the suffix in these notations, when no ambiguities are likely.

The trace and norm functions play an important role in the theory of finite fields. See Exercise 2.86 for some elementary properties of these functions.

2.9.3. Representation of Finite Fields

is a vector space of dimension m over . Let β₀, . . . , β_m–1 be an -basis of . Each element has a unique representation a = a₀β₀ + · · · + a_m–1β_m–1 with each . Therefore, if we have a representation of the elements of , we can also represent the elements of . Thus elements of any finite field can be represented, if we have representations of elements of prime fields. But the set {0, 1, . . . , p – 1} under the modulo p arithmetic represents .

So our problem reduces to selecting suitable bases β₀, . . . , β_m–1 of over . In order to illustrate how we can do that, let us choose a priori a fixed monic irreducible polynomial with deg f = m. We then represent , where α (the residue class of X) is a root of f in . The elements are linearly independent over , since otherwise we would have a non-zero polynomial of degree less than m, of which α is a root. The -basis 1, α, . . . , α^m–1 of is called a polynomial basis (with respect to the defining polynomial f). The elements of are then polynomials of degrees < m. The arithmetic in is carried out as the polynomial arithmetic of modulo the irreducible polynomial f.

Example 2.19.

The elements of are 0 and 1 with 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0, 0 · 0 = 1 · 0 = 0 · 1 = 0 and 1 · 1 = 1. In order to represent , we choose the irreducible polynomial . Elements of are a₂α² +a₁α+a₀, where . In order to demonstrate the arithmetic in , we take . Their sum in is a+b = α+1. On the other hand, ab = α⁴+α³+α²+α = α(α³+α²+1)+α² = α.0+α² = α². The complete multiplication table for this representation is given in the Table 2.2.

Table 2.2. Multiplication table for
	1	α	α + 1	α²	α² + 1	α² + α	α² + α + 1
0	0	0	0	0	0	0	0
1	1	α	α + 1	α²	α² + 1	α² + α	α² + α + 1
α	α	α²	α² + α	α² + 1	α² + α + 1	1	α + 1
α + 1	α + 1	α² + α	α² + 1	1	α	α² + α + 1	α²
α²	α²	α² + 1	1	α² + α + 1	α + 1	α	α² + α
α² + 1	α² + 1	α² + α + 1	α	α + 1	α² + α	α²	1
α² + α	α² + α	1	α² + α + 1	α	α²	α + 1	α² + 1
α² + α + 1	α² + α + 1	α + 1	α²	α² + α	1	α² + 1	α

is represented by the set {0, 1, 2} with arithmetic operations modulo 3. Since –1 is a quadratic non-residue modulo 3, the polynomial X² + 1 is irreducible over . Therefore, the quotient field can be used to represent , being a root of this polynomial. The multiplication table of under this representation is then as shown in Table 2.3.

Table 2.3. Multiplication table for
	1	2	β	β + 1	β + 2	2β	2β + 1	2β + 2
0	0	0	0	0	0	0	0	0
1	1	2	β	β + 1	β + 2	2β	2β + 1	2β + 2
2	2	1	2β	2β + 2	2β + 1	β	β + 2	β + 1
β	β	2β	2	β + 2	2β + 2	1	β + 1	2β + 1
β + 1	β + 1	2β + 2	β + 2	2β	1	2β + 1	2	β
β + 2	β + 2	2β + 1	2β + 2	1	β	β + 1	2β	2
2β	2β	β	1	2β + 1	β + 1	2	2β + 2	β + 2
2β + 1	2β + 1	β + 2	β + 1	2	2β	2β + 2	β	1
2β + 2	2β + 2	β + 1	2β + 1	β	2	β + 2	1	2β

Polynomial bases are most common in finite field implementations. Some other types of bases also deserve specific mention in this context.

Definition 2.60.

An element is called a normal element over , if the conjugates α, α^q, . . . , α^{q^m–1} are (distinct and) linearly independent over . For a normal element α of over , the -basis α, α^q, . . . , α^{q^m–1} is called a normal basis (of over ). If, in addition, α is a primitive element (that is, a generator) of , then α and the corresponding normal basis are called a primitive normal element and a primitive normal basis respectively.

It can be shown that normal bases exist for all finite extensions . It can even be shown that primitive normal bases also do exist for all such extensions.

Example 2.20.

Consider the representation of in Example 2.19. The elements α, α² and α⁴ = α² + α + 1 satisfy

with the 3×3 transformation matrix having determinant 1 modulo 2. Thus α is a normal element of and (α, α², α⁴) is a normal basis of . Since is prime, α is a generator of , that is, α is also a primitive normal element of .

On the other hand, α + 1 is not a normal element of . Table 2.2 gives

with the transformation matrix having determinant zero modulo 2.

Computations over finite fields often call for exponentiations of elements a = a₀β₀ + · · · + a_m–1β_m–1. If β_i = α^qⁱ, i = 0, . . . , m – 1, construct a normal basis, , since α^{q^m} = α and for each i. Thus the coefficients of a^q (in the representation under the given normal basis) is obtained simply by cyclically shifting the coefficients a₀, . . . , a_m–1 in the representation of a. This leads to a considerable saving of time. In particular, this trick becomes most meaningful for q = 2 (a case of high importance in cryptography).

Now that exponentiations become cheaper with normal bases, one should not let the common operations (addition and multiplication) turn significantly slower. The sum of a = a₀β₀ + · · · + a_m–1β_m–1 and b = b₀β₀ + · · · + b_m–1β_m–1 continues to remain as easy as in the case of a polynomial basis, namely, a + b = (a₀ + b₀)β₀ + · · · + (a_m–1 + b_m–1)β_m–1, where each a_i + b_i is calculated in . However, computing the product ab introduces difficulty. In particular, it requires the representation of β_iβ_j, 0 ≤ i, j ≤ m – 1, in the basis β₀, . . . , β_m–1, say, . For i ≤ j, we have . It is thus sufficient to look only at the coefficients , 0 ≤ j, k ≤ m – 1. We denote by C_α the number of non-zero . From practical considerations (for example, for hardware implementations), C_α should be as small as possible. For q = 2, one can show that 2m – 1 ≤ C_α ≤ m². If, for this special case, C_α = 2m – 1, the normal basis α, α^q, . . . , α^{q^m–1} is called an optimal normal basis. Unlike normal (or primitive normal) bases, optimal normal bases do not exist for all values of .

We finally mention another representation of elements of a finite field , that does not depend on the vector space representation discussed so far, but which is based on the fact that the group is cyclic. If we are given a primitive element (that is, a generator) γ of , then the elements of are 0, 1 = γ⁰, γ, . . . , γ^q–2. Multiplication and exponentiation become easy with this representation, since 0 · a = 0 for all , whereas γⁱ · γ^j = γ^k with k ≡ i + j (mod q – 1). Unfortunately, this representation provides no clue on how to compute γⁱ + γ^j. One possibility is to store a table consisting of the values z_k satisfying 1 + γ^k = γ^z_k for all k = 0, . . . , q – 2 (with γ^k ≠ –1), so that for i ≤ j one can compute γⁱ + γ^j = γⁱ(1 + γ^j–i) = γⁱγ^z_j–i = γ^l, where l ≡ i + z_j–i (mod q – 1). Such a table is called Zech’s logarithm table, can be maintained for small values of q and may facilitate computations in extensions . But if q is large (or more correctly if p is large, where q = pⁿ), this representation of elements of is not practical nor often feasible. Another difficulty of this representation is that it calls for a primitive element γ. If q is large and the integer factorization of q – 1 is not provided, there are no efficient methods known for finding such an element or even for checking if a given element is primitive.

Example 2.21.

Consider the representation of in Example 2.19. By Table 2.3, γ := β + 1 is a generator of . Table 2.4 lists the powers of γ and the Zech logarithms.

Table 2.4. Zech’s logarithm table for with respect to γ = β + 1
k	γ^k	1 + γ^k	z_k
0	1	2	4
1	β + 1	β + 2	7
2	2β	2β + 1	3
3	2β + 1	2β + 2	5
4	2	0	–
5	2β + 2	2β	2
6	β	β + 1	1
7	β + 2	β	6

Exercise Set 2.9

2.80	Let F be a field (not necessarily finite) of characteristic and let a, . Prove that (a + b)^p = a^p + b^p, or, more generally, (a + b)^pⁿ = a^pⁿ + b^pⁿ for all . [H]
2.81	Let , and q := pⁿ. Prove that: If , then f(X^p) = f(X)^p. If , then f(X^p) = g(X)^p for some .
2.82	Let , n, and q := pⁿ. Let F ⊆ K be an extension of finite fields with #F = q and #K = q^m. Show that K is the splitting field of over . [H]
2.83	Write the addition and multiplication tables of (some representations of) the fields and . Use these tables to find a primitive element in each of these fields and a normal element in (over ).
2.84	Let K be a field (not necessarily finite or of positive characteristic). Let be of degree 2 or 3. Prove that f is reducible in K[X] if and only if f has a root in K. Deduce that X² + X + 1 and X³ + X + 1 are irreducible in . Let be of degree d ≥ 0. The opposite of f is the polynomial . Show that f(X) is irreducible in K[X] if and only if f^op(X) is irreducible in K[X]. Deduce that X³ + X² + 1 is irreducible in .
2.85	In this exercise, one studies the arithmetic in the finite field . Show that is irreducible. Let us represent as . Call and consider the elements a := 3α² + 2α + 1 and b := 2α² + 3 in . Compute ab^–1 in this representation of . You should compute the canonical representative of ab^–1 in , that is, a polynomial in α of degree < 3 with coefficients reduced modulo 5.
2.86	Let F ⊆ K ⊆ L be finite extensions of finite fields with [L : K] = s. Let α, and . Prove the following assertions: Tr_K\|F(α + β) = Tr_K\|F(α) + Tr_K\|F (β) and N_K\|F (αβ) = N_K\|F (α) N_K\|F (β). Tr_L\|F (α) = s Tr_K\|F (α) and N_L\|F (α) = N_K\|F (α)^s. Transitivity of trace and norm Tr_L\|F (γ) = Tr_K\|F (Tr_L\|K(γ)) and N_L\|F (γ) = N_K\|F (N_L\|K (γ)).
2.87	Let be a finite extension of finite fields. In this exercise, we treat both K and L as vector spaces over K. Show that: Tr_L\|K is a surjective linear transformation L → K. All the linear transformations L → K are given by T_α : L → K, β ↦ Tr_L\|K(αβ), where . (In this notation, Tr_L\|K = T₁.) Moreover, for distinct elements α, the linear transformations T_α and T_α′ are distinct.
2.88	Let K and L be as in Exercise 2.87 and let . Show that Tr_L\|K(β) = 0 if and only if β = γ^q – γ for some .
2.89	Let K and L be as in Exercise 2.87. Two K-bases (β₀, . . . , β_m–1) and (γ₀, . . . , γ_m–1) of L are called dual or complementary, if Tr_L\|K(β_iγ_j) = δ_ij.^[10] Show that every K-basis of L has a unique dual basis. ^[10] The Kronecker delta δ on an index set I (finite or infinite) is defined for i, as:
2.90	Prove that every finite extension of finite fields is Galois. [H]
2.91	For the extension , consider the map , α ↦ α^q. Show that is an -automorphism of . is called the Frobenius automorphism of over . Show that is cyclic of order m and with as a generator. [H]
2.92	Let be irreducible with deg f = d. Consider the extension and let r := gcd(d, m). Show that f is irreducible in if and only if r = 1. [H] More generally, show that f factors in into a product of r irreducible polynomials each of degree d/r.
2.93	Consider the representation of in Example 2.19. Construct the minimal polynomials over of the elements of . [H]
2.94	Show that the number of (ordered) -bases of is (q^m – 1)(q^m – q)(q^m – q²) · · ·(q^m – q^{m – 1}).

*2.10. Affine and Projective Curves

In this section, we introduce some elementary concepts from algebraic geometry, which facilitate the treatment of elliptic and hyperelliptic curves in the next two sections. We concentrate only on plane curves, because these are the only curves we need in this book. Throughout this section, K denotes a field (finite or infinite) and the algebraic closure of K.

2.10.1. Plane Curves

The solutions of a polynomial equation f(X, Y) = 0 is one of the central objects of study in algebraic geometry. For example, we know that in the equation X² + Y² – 1 = 0 represents a circle with origin at (0, 0) and with radius 1. When we pass to an arbitrary field, it is often not possible to visualize such plots, but it still makes sense to talk about the set of solutions of such an equation. For example, the solutions of the above circle in are the four discrete points (0, 1), (0, 2), (1, 0) and (2, 0). (This solution set does not really look round.)

One can generalize this study by considering polynomials in n indeterminates and by investigating the simultaneous solutions of m polynomials. We, however, do not intend to be so general here and concentrate only on curves defined by a single polynomial equation in two indeterminates.

Definition 2.61.

For , the n-dimensional affine space over K is defined to be the set consisting of all n-tuples (x₁, . . . , x_n) with each . For n = 2, the affine space is also called the affine plane over K. For a point , the elements are called the affine coordinates of P. The affine space over the closure is often abbreviated as , when the field K is understood from the context.

is an n-dimensional vector space over K. For example, the affine plane can be identified with the conventional X-Y plane.

Definition 2.62.

An affine plane (algebraic) curve C over K is defined by a polynomial and is written as C : f(X, Y) = 0. The set C(K) of K-rational points on an affine plane curve C : f(X, Y) = 0 is the set of all points satisfying f(x, y) = 0.

K-rational points on a plane curve are precisely the solutions of the defining polynomial equation. Standard examples of affine plane curves include the straight lines given by aX + bY + c = 0, a, , not both 0, and the conic sections (circles, ellipses, parabolas and hyperbolas) given by aX² + bXY + cY² + dX + eY + f = 0, a, b, c, d, e, with at least one of a, b, c non-zero. For , the set of K-rational points can be drawn as a graph of the polynomial equation, whereas for an arbitrary field K (in particular, for finite fields) such drawings make little or no sense. However, it is often helpful to visualize curves as curves over (also called real curves) and then generalize the situation to an arbitrary field K.

The number ∞ is not treated as a real number (or integer or natural number). But it is often helpful to extend the definition of by including two points that are infinitely far away from the origin, one in each direction. This gives us the so-called extended real line . An immediate advantage of such a completion of is that every Cauchy sequence converges in . But for studying the roots of polynomial equations it is helpful to add only a single point at infinity to in order to get what is called the projective line over . Similarly, if we start with the affine plane and add a point at infinity for each slope of straight lines Y = aX + b and one more for the lines X = c, we get the so-called projective plane over . We also call the line passing through all the points at infinity in to be the line at infinity. An immediate benefit of passing from to is that in any two distinct lines (parallel or not in ) meet at exactly one point and through any two distinct points of passes a unique line.

Now it is time to replace by an arbitrary field K and rephrase our definitions in such a way that it continues to make sense to talk about points and line at infinity, even when K itself contains only finitely many points.

Definition 2.63.

Let . Define the relation ~ on the ‘punctured’ n + 1-dimensional affine space over K by (x₀, . . . , x_n) ~ (y₀, . . . , y_n) if and only if there exists a such that y_i = λx_i for all i = 0, . . . , n. It is easy to see that ~ is an equivalence relation on . The set of all equivalence classes of ~ is called the n-dimensional projective space over K. In particular, is called the projective plane over K. A point is the equivalence class of a point . The elements constitute a set of homogeneous coordinates for P.

It is evident that can be identified with the set of all 1-dimensional vector subspaces (that is lines) of the affine space . To argue that this formal definition tallies with the intuitive notion for n = 2 and , consider the affine 3-space referred to by the coordinates X, Y, Z. Look at the family of planes , parallel to the X-Y plane. (ε₀ is the X-Y plane itself.) First take a non-zero value of λ, say λ = 1. Every line in passing through the origin and not parallel to the X-Y plane meets ε₁ exactly at one point. Conversely, a unique line passes through each point on ε₁ and the origin. In this way, we associate points of with points on ε₁. These are all the finite points of . On the other hand, the lines passing through the origin and lying in the X-Y plane (ε₀ : Z = 0) do not meet ε₁ and correspond to the points at infinity of .

In the last paragraph, we obtained the canonical embedding of the affine plane in by setting Z = 1. By definition, is symmetric in X, Y and Z. This means that we can as well set X = 1 or Y = 1 and see that there are other embeddings of in . This observation often proves to be useful (for example, see Definition 2.66).

Now that we have passed from the affine plane to the projective plane, we should be able to carry (affine) plane curves to the projective plane. For this, we need some definitions.

Definition 2.64.

Let R denote the polynomial ring K[X₀, X₁, . . . , X_n] over a field K. A monomial of R is an element of R of the form , α_i ≥ 0. A term in R is a monomial multiplied by an element . Any polynomial is a sum of finitely many nonzero terms. The degree of a monomial (or a term ) is defined as α₀ + α₁ + · · · + α_n. The degree of a non-zero polynomial , denoted deg f, is defined to be the maximum of the degrees of its non-zero terms. The degree of the zero polynomial is taken to be –∞. A non-zero polynomial is said to be homogeneous of degree d ≥ 0, if all of its non-zero terms have degree d. The zero polynomial is said to be homogeneous of any degree.

Let C : f(X, Y) = 0 be an affine plane curve over a field K defined by a non-zero polynomial and d := deg f. Then f^(h)(X, Y, Z) := Z^df(X/Z, Y/Z) is a homogeneous polynomial of degree d in the polynomial ring K[X, Y, Z]. The polynomial f^(h) is called the homogenization of f. Putting Z = 1 in f^(h)(X, Y, Z) gives back the original polynomial f(X, Y), that is, f^(h)(X, Y, 1) = f(X, Y). Therefore, f is called the dehomogenization of the homogeneous polynomial f^(h). The homogenization (and dehomogenization) of the zero polynomial is taken to be the zero polynomial.

Take and . By definition, [x, y, z] = [λx, λy, λz]. Since f^(h)(λx, λy, λz) = λ^df^(h)(x, y, z) = 0 if and only if f^(h)(x, y, z) = 0, it makes sense to talk about the zeros of the homogeneous polynomial f^(h) in the projective plane . This motivates us to define projective plane curves:

Definition 2.65.

A projective plane curve C over K is defined by a homogeneous polynomial and is written as C : h(X, Y, Z) = 0. The set C(K) of K -rational points on a projective plane curve C : h(X, Y, Z) = 0 is the set of all points such that h(x, y, z) = 0.

Let C : f(X, Y) = 0 be an affine plane curve. The projective plane curve defined by f^(h)(X, Y, Z) is by an abuse of notation denoted also by C. The zeros of the affine curve C : f(X, Y) = 0 in are in one-to-one correspondence with the finite zeros of C : f^(h)(X, Y, Z) = 0 in (that is, zeros with Z = 1). The projective curve contains some more point(s), namely those at infinity, that can be obtained by putting Z = 0 in f^(h)(X, Y, Z). Passage from the affine plane to the projective plane is just that: a systematic inclusion of the points at infinity.

It is often customary to write an affine plane curve as C : f(X, Y) = g(X, Y) and a projective plane curve as C : f^(h)(X, Y, Z) = g^(h)(X, Y, Z) with f^(h) and g^(h) of the same degree. The former is the same as the curve C : f – g = 0, and the latter the same as C : f^(h) – g^(h) = 0.

A homogeneous polynomial can be viewed as the homogenization of any of the polynomials

f_Z(X, Y) = f(X, Y, 1), f_Y (X, Z) = f(X, 1, Z) and f_X(Y, Z) = f(1, Y, Z).

Consider a point P = [a, b, c] on the projective curve C : f(X, Y, Z) = 0. Since a, b and c are not all 0, P is a finite point on at least one of f_X, f_Y and f_Z.

2.10.2. Polynomial and Rational Functions on Plane Curves

Throughout the rest of Section 2.10 we make the following assumption:

Assumption 2.1.

K is an algebraically closed field, that is, .

Although many of the results we state now are valid for fields that are not algebraically closed, it is convenient to make this assumption in order to avoid unnecessary complications.

Let C : f(X, Y) = 0 be a curve defined over K. Henceforth we assume that the polynomial f(X, Y) is irreducible over K. Though we write the affine equation for the curve for notational simplicity, we usually work with the set C(K) of the K-rational points on the corresponding projective curve. We refer to the solutions of C in the affine plane as the finite points on the curve.

Definition 2.66.

Let P = [a, b, c] be a point on a curve C defined over K. We call P a smooth or regular or non-singular point of C, if P satisfies the following conditions.

If P is a finite point (that is, if c ≠ 0), then P is called a smooth point on C, if the partial derivatives ∂f/∂X and ∂f/∂Y do not vanish simultaneously at (a/c, b/c).
If P is a point at infinity (that is, if c = 0), then we must have a ≠ 0 or b ≠ 0. Assume a ≠ 0. (The other case can be treated similarly.) Consider the polynomial . P is a finite point on the curve D : g(Y, Z) = 0. P is called a smooth point on C, if (b/a, 0) is a smooth point on D, that is, if ∂g/∂Y and ∂g/∂Z do not vanish simultaneously at (b/a, 0).

A non-smooth point on C is also called non-regular or singular. C is called smooth or regular or non-singular, if all points (finite and infinite) on C are smooth.

Now we define polynomial functions on C. For a moment, we concentrate on the affine curve, that is, only the finite points on C. Let g, with (that is, f|(g – h)). Since for any point P on C we have f(P) = 0, it follows that g(P) = h(P). This motivates us to define the following.

Definition 2.67.

The ring K[X, Y]/〈f〉 is called the affine coordinate ring of C and is denoted by K[C]. Elements of K[C] are called polynomial functions on C. If we denote by x and y the residue classes of X and Y respectively in K[C], then a polynomial function on C is given by a polynomial .^[11] By our assumption, f is an irreducible polynomial; so 〈f〉 is a prime ideal of K[X, Y], that is, the coordinate ring K[C] is an integral domain.

^[11] Recall from Section 2.7 that K[x, y] is the K-algebra generated by x and y. It is not a polynomial algebra (in general).

The quotient field (Exercise 2.34) of K[C] is called the function field of C and is denoted by K(C). An element of K(C) is of the form g(x, y)/h(x, y) with g(x, y), , h(x, y) ≠ 0 (that is, h(X, Y) ∉ 〈f〉), and is called a rational function on C.

By definition, two rational functions are equal if and only if g₁(x, y)h₂(x, y) – g₂(x, y)h₁(x, y) = 0 in K[C] or, equivalently, if and only if . We define addition and multiplication of rational functions by the usual rules (Exercise 2.34).

Definition 2.68.

Let P = (a, b) be a finite point on the curve C. Given a polynomial function , the value of g at P is defined to be . If is a rational function, then r is said to be defined at P, if r has a representation r = g/h, g, , with h(P) ≠ 0. In that case, we define the value of r at P to be . If r is not defined at P, it is customary to write r(P) = ∞.

By definition, K[C] and K(C) are collections of equivalence classes. However, the value of a polynomial or a rational function on C is independent of the representatives of the equivalence classes and is, therefore, a well-defined concept.

The above definitions can be extended to the corresponding projective curve C : f^(h)(X, Y, Z) = 0. By Exercise 2.96(e), the polynomial f^(h) is irreducible, since we assumed f to be so.

Definition 2.69.

The function field (denoted again by K(C)) of the projective curve C is the set of quotients (called rational functions) of the form g(X, Y, Z)/h(X, Y, Z), where g, are homogeneous of the same degree and h ∉ 〈f^(h)〉. Two rational functions g₁/h₁ and g₂/h₂ are equal if and only if .

A rational function is said to be defined at a point P = [a, b, c] on C, if r has a representation g/h with h(a, b, c) ≠ 0. In that case, we define r(P) := g(a, b, c)/h(a, b, c). Since g and h are homogeneous and of the same degree, the value r(P) is independent of the choice of the projective coordinates of P (Exercise 2.95 ). If r is not defined at P, we write r(P) = ∞.

One can define polynomial functions on a projective curve (as we did for affine curves), but it makes no sense to talk about the value of such a polynomial function at a point P on the curve, because this value depends on the choice of the homogeneous coordinates of P (Exercise 2.95). This problem is eliminated for a rational function g/h by assuming g and h to be of the same degree.

Definition 2.70.

Let C be a projective plane curve, r be a non-zero rational function and P a point on C. P is called a zero of r if r(P) = 0, and a pole of r if r(P) = ∞.

Now we define the multiplicities of zeros and poles of a rational function or, more generally, the order of any point on a projective plane curve. This is based on the following result, the proof of which is long and difficult, and is omitted.

Theorem 2.41.

Let C be a projective plane curve defined by an irreducible polynomial over K and P a smooth point on C. Then there exists a rational function (depending on P) with the following properties:

u_P (P) = 0.
For any non-zero rational function , there exist an integer d and a rational function having neither a zero nor a pole at P such that . The integer d does not depend on the choice of u_P.

Definition 2.71.

The function u_P of the last theorem is called a uniformizing variable or a uniformizing parameter or simply a uniformizer of C at P. For any non-zero rational function , the integer d with is called the order of r at P and is denoted by ord_P (r).

The connection of poles and zeros with orders is established by the following theorem which we again avoid to prove.

Theorem 2.42.

P is neither a pole nor a zero of r if and only if ord_P(r) = 0. P is a zero of r if and only if ord_P(r) > 0. P is a pole of r if and only if ord_P(r) < 0.

If P is a zero (resp. a pole) of r, the integer ord_P(r) (resp. – ord_P(r)) is called the multiplicity of the zero (resp. pole) P.

Theorem 2.43.

Let r be a rational function on the projective plane curve C defined over K. Then r has finitely many poles and zeros. Furthermore, .

This is one of the theorems that demand K to be algebraically closed. More explicitly, if K is not algebraically closed, any rational function continues to have only finitely many zeros and poles, but the sum of the orders of r at these points is not necessarily equal to 0. Also note that this sum, if taken over only the finite points of C, need not be 0, even when K is algebraically closed.

2.10.3. Maps Between Plane Curves

Now that we know how to define and evaluate rational functions on a curve, we are in a position to define rational maps between two curves. Let C₁ : f₁(X, Y, Z) = 0 and C₂ : f₂(X, Y, Z) = 0 be two projective plane curves defined over K by irreducible homogeneous polynomials f₁, .

Definition 2.72.

A rational map (defined over K) is given by rational functions , , in K(C₁) such that for each at which all of , and are defined, the point . One often uses the notation .

This, however, is not the complete story. A more precise characterization of a rational map is as follows:

A rational map is said to be defined at , if there exists a rational function (depending on P) such that , and are all defined at P, , and are not all zero and . A rational map which is defined at every point of C₁(K) is called a morphism.

The curves C₁ and C₂ are said to be isomorphic (denoted C₁ ≅ C₂), if there exist morphisms and ψ : C₂ → C₁ such that and are identity maps on C₁(K) and C₂(K) respectively.

Isomorphism is an equivalence relation on the set of all projective plane curves defined over K. Since two isomorphic curves share many common algebraic and geometric properties, it is of interest in algebraic geometry to study the equivalence classes (rather than the individual curves). If C₁ ≅ C₂ and C₂ has a simpler representation than C₁, then studying the properties of C₂ makes our job simpler and at the same time reveals all the common properties of C₁. (See Section 2.11 for an example.)

**2.10.4. Divisors on Plane Curves

Let a be a symbol and n a positive integer. We represent by na the formal sum a+···+a (n times). We also define 0a := 0 and –na := n(–a), where the symbol –a satisfies a + (–a) = (–a) + a = 0. For n₁, , we define n₁a + n₂a := (n₁ + n₂)a. The set under these definitions becomes an Abelian group. If we are given two symbols a, b we can analogously define formal sums na + mb, n, , and the sum of formal sums as (n₁a + m₁b) + (n₂a + m₂b) := (n₁ + n₂)a + (m₁ + m₂)b. With these definitions the set becomes an Abelian group. These constructions can be generalized as follows:

Definition 2.73.

Given a set (not necessarily finite) of symbols a_i, , the set of formal sums of the form , where n_i = 0 except for finitely many , is an Abelian group with the addition formula . This group is called the free Abelian group generated by a_i, .

Now let a_i be the K-rational points on a projective plane curve C defined over K. For notational convenience, we represent by [P] the symbol corresponding to the point P on C. This removes confusions in connection with elliptic curves C (See Section 2.11 ) for which we intend to make a distinction between P + Q and [P] + [Q] for two points P, . The former sum is again a point on C, whereas the latter is never (the symbol corresponding to) a point on C.

Definition 2.74.

A formal sum , , where n_P = 0 except for finitely many , is called a divisor on C. The free Abelian group generated by the symbols [P] for all the points is called the group of divisors of C and is denoted by Div_K(C) or simply by Div(C), when K is implicit in the context.

Let be a divisor. The support of D is defined to be the set and is denoted by Supp D.

The degree of D is defined as the integer and is denoted as deg D. The subset of Div(C) is clearly a subgroup of Div(C). We denote this subgroup by Div⁰(C).

Now we define divisors of rational functions on C. Henceforth we assume that C is smooth (that is, smooth at all K-rational points on C).

Definition 2.75.

The divisor of a rational function is defined to be the formal sum , where ord_P(r) is the order of r at P (Definition 2.71). By Theorem 2.43 .

A divisor is called principal, if D = Div(r) for some rational function . We have Div(rr′) = Div(r) + Div(r′) for any rational functions r, . It follows that the set of all principal divisors on C is a subgroup of Div(C) (and of Div⁰(C) as well). We denote this subgroup by Prin_K(C) or simply by Prin(C). The quotient group Div(C)/Prin(C) is called the divisor class group or the Picard group of C and is denoted by Pic_K(C) or in short by Pic(C). On the other hand, the quotient Div⁰(C)/Prin(C) is denoted by or Pic⁰(C) and is called the Jacobian of C. Instead of Pic⁰(C) we use the notation or .

Though the Jacobian is defined for an arbitrary smooth curve C (defined by an irreducible polynomial), it is a special class of curves called hyperelliptic curves for which it is particularly easy to represent and do arithmetic in the group . This gives us yet another family of groups on which cryptographic protocols can be built.

If K is not algebraically closed, we need not have for a rational function . This means that in that case the group cannot be defined in the above manner. However, since C is also a curve defined over , we can define as above and call a particular subgroup of as the Jacobian of C over K. We defer this discussion until Section 2.12.

Exercise Set 2.10

In this exercise set, we do not assume (unless otherwise stated) that K is necessarily algebraically closed.

2.95	For homogeneous polynomials f₁, of respective degrees d₁ and d₂, prove the following assertions: If d₁ = d₂, then f₁ ± f₂ are homogeneous polynomials of degree d₁. The polynomial f₁f₂ is homogeneous of degree d₁ + d₂. Conversely, if f₁f₂ is homogeneous, then f₁ and f₂ are also homogeneous. A polynomial is homogeneous of degree d if and only if it satisfies f(λX₁, . . ., λX_n) = λ^df(X₁, . . ., X_n) for every .
2.96	In this exercise, we generalize the notion of homogenization and dehomogenization of polynomials. Let K[X₁, . . . , X_n] denote the polynomial ring in n indeterminates. Introducing another indeterminate X₀, we define the homogenization of a polynomial as Prove the following assertions. f^(h) is an element of K[X₀, X₁, . . . , X_n] and is homogeneous of degree d. f^(h)(1, X₁, . . . , X_n) = f(X₁, . . . , X_n). If deg f = d ≥ 0 and f_d is the sum of all non-zero terms of degree d in f, then we have f^(h)(0, X₁, . . . , X_n) = f_d(X₁, . . . , X_n). For f, , (fg)^(h) = f^(h)g^(h). Moreover, if g\|f, then g^(h)\|f^(h) and (f/g)^(h) = f^(h)/g^(h). Under what condition(s) is (f + g)^(h) = f^(h) + g^(h)? f is irreducible if and only if f^(h) is irreducible.
2.97	Let C : f(X, Y) = 0 be an affine plane curve defined by a non-zero polynomial and C : f^(h)(X, Y, Z) = 0 the corresponding projective plane curve. Let d := deg f = deg f^(h) and f_d the sum of non-zero terms of f of degree d. Show that: f^(h)(X, Y, 1) = f(X, Y) and f^(h)(X, Y, 0) = f_d(X, Y). is a K-rational point of the affine curve if and only if is a K-rational point of the projective curve. More generally, let . The point is a K-rational solution of f if and only if [x, y, λ] is a K-rational solution of f^(h). The solutions of f at infinity are obtained by solving f^(h)(X, Y, 0) = f_d(X, Y) = 0. Conclude that the curve C can have at most d points at infinity. For a, , each of the curves Y – aX = b and X – aY = b (straight lines), and Y – X² = 0 and X – Y² = 0 (parabolas) contains only one point at infinity. The hyperbola XY – 1 = 0 contains two points at infinity. How many points at infinity does the hyperbola X² – Y² – 1 = 0 contain? The circle X² + Y² – 1 = 0? For a₁, a₂, a₃, a₄, , the elliptic curve Y² + a₁XY + a₃Y = X³ + a₂X² + a₄X + a₆ contains only one point at infinity. Let and u(X), with deg u ≤ g, deg v = 2g + 1 and v monic. Show that the hyperelliptic curve Y² + u(X)Y = v(X) has only one point at infinity.
2.98	Show that the defining polynomial of the elliptic curve in Exercise 2.97(e) is irreducible. Prove the same for the hyperelliptic curve of Exercise 2.97(f). [H]
2.99	Show that for an ideal the following two conditions are equivalent: is generated by a set of homogeneous polynomials. If , where f_i is the sum of non-zero terms of degree i in f, then for all i = 0, . . . , d. (The polynomials f_i are called the homogeneous components of f.) An ideal satisfying the above equivalent conditions is called a homogeneous ideal. Construct an example to demonstrate that all ideals of K[X₁, . . . , X_n] need not be homogeneous.

*2.11. Elliptic Curves

The mathematics of elliptic curves is vast and complicated. A reasonably complete understanding of elliptic curves would require a book of comparable size as this. So we plan to be rather informal while talking about elliptic curves and about their generalizations called hyperelliptic curves. Interested readers can go through the books suggested at the end of this chapter to learn more about these curves. In this section, K stands for a field (finite or infinite) and the algebraic closure of K.

2.11.1. The Weierstrass Equation

An elliptic curve E over K is a plane curve defined by the polynomial equation

Equation 2.6

or by the corresponding homogeneous equation

E : Y²Z + a₁XYZ + a₃YZ² = X³ + a₂X²Z + a₄XZ² + a₆Z³.

These equations are called the Weierstrass equations for E. In order that E qualifies as an elliptic curve, we additionally require that it is smooth at all -rational points (Definition 2.66).^[12] Two elliptic curves defined over the field are shown in Figure 2.1.

^[12] Ellipses are not elliptic curves.

Figure 2.1. Elliptic curves over

(a) Y² = X³ – X + 1
(b) Y² = X³ – X

E contains a single point at infinity, namely (Exercise 2.97(e)). The set of K-rational points on E in the projective plane is denoted by E(K) and is the central object of study in the theory of elliptic curves. We shortly endow E(K) with a group structure and this group is used extensively in cryptography.

Let us first see how we can simplify the equation for E. The simplification depends on the characteristic of K. Because fields of characteristic 3 are only rarely used in cryptography, we will not deal with such fields. Simplification of the Weierstrass equation is effected by suitable changes of coordinates. A special kind of transformation is allowed in order to preserve the geometric and algebraic properties of an elliptic curve.

Theorem 2.44.

Two elliptic curves

E₁	:	Y² + a₁XY + a₃Y = X³ + a₂X² + a₄X + a₆
E₂	:	Y² + b₁XY + b₃Y = X³ + b₂X² + b₄X + b₆

defined over K are isomorphic (Definition 2.72) if and only if there exist and r, s, such that the substitution of u²X + r for X and u³Y + u²sX + t for Y transforms the equation of E₁ to the equation of E₂. For this transformation, the coefficients b_i are related to the coefficients a_i as follows:

Equation 2.7

The theorem is not proved here. Formulas (2.7) can be checked by tedious calculations. A change of variables as in Theorem 2.44 is referred to as an admissible change of variables. We denote this by

(X, Y) ← (u²X + r, u³Y + u²sX + t).

The inverse transformation is also admissible and is given by

Isomorphism is an equivalence relation on the set of all elliptic curves over K.

Consider the elliptic curve E over K given by Equation (2.6). If char K ≠ 2, the admissible change transforms E to the form

E₁ : Y² = X³ + b₂X² + b₄X + b₆.

If, in addition, char K ≠ 3, the admissible change transforms E₁ to E₂ : Y² = X³ + aX + b. We henceforth assume that an elliptic curve over a field of characteristic ≠ 2, 3 is defined by

Equation 2.8

(instead of by the original Weierstrass Equation (2.6)).

If char K = 2, the Weierstrass equation cannot be simplified as in Equation (2.8). In this case, we consider two cases separately, namely a₁ ≠ 0 or otherwise. In the former case, the admissible change allows us to write Equation (2.6) in the simplified form

Equation 2.9

On the other hand, if a₁ = 0, then the admissible change (X, Y) ← (X + a₂, Y) shows that E can be written in the form

Equation 2.10

A curve defined by Equation (2.9) is called non-supersingular, whereas one defined by Equation (2.10) is called supersingular.

Now we associate two quantities with an elliptic curve. The importance of these quantities follows from the subsequent theorem. We start with the generic Weierstrass equation and later specialize to the simplified formulas.

Definition 2.76.

For the curve given by Equation (2.6), we define the following quantities:

Equation 2.11

Δ(E) is called the discriminant of the curve E, and j(E) the j-invariant of E.

For the special cases given by the simplified equations above, these quantities have more compact formulas as given in Table 2.5.

Theorem 2.45.

For the curve E defined by Equation (2.6), the following properties hold:

An admissible change of variables does not alter Δ(E) and j(E).

Table 2.5. Discriminant and j-invariant for elliptic curves
Special case	Δ(E)	j(E)
char K ≠ 2, 3 (Equation 2.8)	–16(4a³ + 27b²)	1728(4a)³/Δ(E)
char K = 2, non-supersingular (Equation 2.9)	b	1/b
char K = 2, supersingular (Equation 2.10)	a⁴	0

E is an elliptic curve, that is, E is smooth, if and only if Δ(E) ≠ 0. In particular, the j-invariant is defined for all elliptic curves.
Let E₁ and E₂ be two elliptic curves defined over the field K. If E₁ and E₂ are isomorphic over K, then j(E₁) = j(E₂). Conversely, if j(E₁) = j(E₂), then E₁ and E₂ are isomorphic over .

Proof

Tedious calculations using Formulas (2.7) establish this claim.
The polynomial f(X, Y, Z) = Y²Z + a₁XYZ + a₃YZ² – X³ – a₂X²Z – a₄XZ² – a₆Z³ defines the curve E. Since , E is smooth at . Suppose that E is not smooth at the finite point . The admissible change (X, Y) ← (X + x₀, Y + y₀) does not alter the value of Δ(E) by (1). So we can assume, without loss of generality, that (x₀, y₀) = (0, 0). But then we have f(0, 0) = –a₆ = 0, ∂f/∂x(0, 0) = –a₄ = 0 and ∂f/∂y(0, 0) = a₃ = 0. Now it is easy to check from Equation (2.11) that Δ(E) = 0.
Conversely, let Δ(E) = 0. For simplicity, we assume that char K ≠ 2, 3 and E is given by Equation (2.8). By Exercise 2.62, , that is, the polynomial X³ + aX + b has multiple roots, say, . But then E is not smooth at .
By Part (1) and Theorem 2.44, two isomorphic elliptic curves have the same j-invariant. For proving the converse, we once again assume that char K ≠ 2, 3 and E₁ : Y² = X³ + a₁X + b₁ and E₂ : Y² = X³ + a₂X + b₂ have the same j-invariant. Then we have . Now we provide an admissible change of variable of the form (X, Y) ← (u²X, u³Y), , that transforms E₁ to E₂. Since Δ(E₁) ≠ 0 and Δ(E₂) ≠ 0, we take u = (b₁/b₂)^1/6 if a₁ = 0, u = (a₁/a₂)^1/4 if b₁ = 0, and u = (a₁/a₂)^1/4 = (b₁/b₂)^1/6 if a₁b₁ ≠ 0. Note that since is algebraically closed, u is defined in all the above cases.

2.11.2. The Elliptic Curve Group

Consider an elliptic curve E over a field K. We now define an operation (which is conventionally denoted by +) on the set E(K) of K-rational points on E in the projective plane . This operation provides a group structure on E(K). It is important to point out that this group is not the same as the group Div_K(E) of divisors on E(K) (Definition 2.74), since the sum of points we are going to define is not formal. However, there is a connection between these two groups (See Exercise 2.125).

Definition 2.77.

Let E be the elliptic curve defined by Equation (2.6) and the point at infinity on E. A binary operation + on E(K) is defined as follows:

For any , we define , that is, serves as the additive identity.
The opposite (additive inverse) of a point is now defined: if , then –P = P, and if , then –P = (h, –k – a₁h – a₃).
For P, , the sum P + Q is defined by the chord and tangent rule which goes as follows.
1. If Q = –P, then .
2. If Q ≠ –P, we consider the line passing through P and Q (we take the tangent line if P = Q). Since the degree of the defining equation for E is three, this line meets the curve at exactly one other point R. We define P + Q = –R. Figure 2.1 illustrates this case for curves over .

Theorem 2.46.

The set E(K) under the operation + is an Abelian group.

No simple proof of this theorem is known. Indeed the only group axiom that is difficult to check is associativity, that is, to check that (P + Q) + R = P + (Q + R) for all P, Q, . An elementary strategy would be to write explicit formulas for (P + Q) + R and P + (Q + R) (using the formulas for P + Q given below) and show that they are equal, but this process involves a lot of awful calculations and consideration of many cases.

There are other proofs that are more elegant, but not as elementary. One possibility is to use the theory of divisors and is outlined now. It turns out that the Jacobian has a bijective correspondence with the set E(K) via the map which takes to (more correctly to the equivalence class of the divisor in ). Furthermore, , where the addition on the left is the addition on E(K) as defined above and the addition on the right is that in the Jacobian . By definition, is naturally an additive Abelian group. It immediately follows that E(K) is an additive Abelian group too. (See Exercise 2.125.)

We now give the formulas for the coordinates of the points –P and P + Q on E(K). The derivation of these formulas for the general case is left to the reader (Exercise 2.102). We concentrate on the important special cases. We assume that P = (h₁, k₁) and Q = (h₂, k₂) are finite points on E(K) with Q ≠ –P so that .

If char K ≠ 2, 3 and E is defined by Equation 2.8, we have:

Next, we consider char K = 2 and non-supersingular curves (Equation 2.9). The formulas in this case are:

Finally, for supersingular curves (Equation 2.10) with char K = 2, we have:

We denote by mP the sum P + · · · + P (m times) for a point and for . We also define and (–m)P := –(mP) (for ).

Example 2.22.

Consider the elliptic curve

E₁ : Y² = X³ + X + 3

over . We have Δ(E₁) ≡ –16(4 × 1³ + 27 × 3²) ≡ 3 (mod 7). Also j(E₁) ≡ 1728 × 4³ × 3^–1 ≡ 2 (mod 7), that is, j(E₁) = 2. It is easy to check that contains the six points , P₁ = (4, 1), P₂ = (4, 6), P₃ = (5, 0), P₄ = (6, 1) and P₅ = (6, 6). The multiples of these points are summarized in Table 2.6. It follows that the group is cyclic with P₁ as a generator.

Table 2.6. Multiples of points on the elliptic curve Y² = X³ + X + 3 over
P	2P	3P	4P	5P	ord P
					1
P₁ = (4, 1)	(6, 6)	(5, 0)	(6, 1)	(4, 6)	6
P₂ = (4, 6)	(6, 1)	(5, 0)	(6, 6)	(4, 1)	6
P₃ = (5, 0)					2
P₄ = (6, 1)	(6, 6)				3
P₅ = (6, 6)	(6, 1)				3

Now, consider the non-supersingular elliptic curve

E₂ : Y² + XY = X³ + X² + ξ

defined over , where ξ := T + 〈T³ + T + 1〉. We have Δ(E₂) = ξ and j(E₂) = ξ^–1 = ξ² + 1. The finite points on E₂ are:

P₁	=	(0, ξ² + ξ),
P₂	=	(1, ξ²),
P₃	=	(1, ξ² + 1),
P₄	=	(ξ, ξ²),
P₅	=	(ξ, ξ² + ξ),
P₆	=	(ξ + 1, ξ² + 1),
P₇	=	(ξ + 1, ξ² + ξ),
P₈	=	(ξ² + ξ, 1),
P₉	=	(ξ² + ξ, ξ² + ξ + 1).

So contains 10 points (including ). The multiples of the points are listed in Table 2.7, which implies that is again cyclic.^[13] The φ(10) = 4 generators of this group are P₄, P₅, P₈ and P₉.

^[13] Both 6 and 10 are square-free integers, and so the groups and must be cyclic (Exercise 2.115(a)).

Table 2.7. Multiples of points on the elliptic curve Y² + XY = X³ + X² + ξ over .
P	2P	3P	4P	5P	6P	7P	8P	9P	ord P
P₀									1
P₁									2
P₂	P₇	P₆	P₃						5
P₃	P₆	P₇	P₂						5
P₄	P₃	P₉	P₆	P₁	P₇	P₈	P₂	P₅	10
P₅	P₂	P₈	P₇	P₁	P₆	P₉	P₃	P₄	10
P₆	P₂	P₃	P₇						5
P₇	P₃	P₂	P₆						5
P₈	P₆	P₄	P₂	P₁	P₃	P₅	P₇	P₉	10
P₉	P₇	P₅	P₃	P₁	P₂	P₄	P₆	P₈	10

Let us continue to represent as in (2). The supersingular curve
E₃ : Y² + Y = X³ + ξX + ξ²
has Δ(E₃) = 1, j(E₃) = 0. is a cyclic group with 9 points as Table 2.8 illustrates.

Table 2.8. Multiples of points on the elliptic curve Y² + Y = X³ + ξX + ξ² over
P	2P	3P	4P	5P	6P	7P	8P	ord P
P₀ =								1
P₁ = (0, ξ² + ξ)	P₅	P₄	P₇	P₈	P₃	P₆	P₂	9
P₂ = (0, ξ² + ξ + 1)	P₆	P₃	P₈	P₇	P₄	P₅	P₁	9
P₃ = (ξ + 1, ξ)	P₄							3
P₄ = (ξ + 1, ξ + 1)	P₃							3
P₅ = (ξ², ξ²)	P₇	P₃	P₂	P₁	P₄	P₈	P₆	9
P₆ = (ξ², ξ² + 1)	P₈	P₄	P₁	P₂	P₃	P₇	P₅	9
P₇ = (ξ² + ξ, ξ² + ξ)	P₂	P₄	P₆	P₅	P₃	P₁	P₈	9
P₈ = (ξ² + ξ, ξ² + ξ +1)	P₁	P₃	P₅	P₆	P₄	P₂	P₇	9

Definition 2.78.

Let . The set of points such that is evidently a subgroup of E(K) and is denoted by E_K[m] or by E[m], if K is understood from the context. The elements of E_K[m], called the m-torsion points of E, are those points of E(K), the (additive) orders of which are finite and divide m.

Multiples mP of a point can be expressed using nice formulas.

Definition 2.79.

For an elliptic curve defined over K by the equation E : f(X, Y) = 0 and for , there exist polynomials θ_m, ω_m, , such that for any point with we have

mP = (θ_m(h, k)/ψ_m(h, k)², ω_m(h, k)/ψ_m(h, k)³).

The polynomial ψ_m is called the m-th division polynomial of E.

Using the addition formula one can verify the following recursive description for ψ_m and the expressions for θ_m and ω_m in terms of ψ_m.

Lemma 2.8.

For an elliptic curve E defined by the general Weierstrass Equation (2.6) over a field K, the division polynomials ψ_m, , are recursively described as:

where d_i are as in Definition 2.76. The polynomials θ_m satisfy

for all ,

and for char K ≠ 2, one has

It follows by induction on m that these formulas really give polynomial expressions for ψ_m, θ_m and ω_m for all . For even m, the polynomial ψ_m is divisible by ψ₂. Furthermore, for the polynomials defined as

can be expressed as polynomials in x only. These univariate polynomials are easier to handle than the bivariate ones ψ_m and, by an abuse of notation, are also called division polynomials. The degrees of satisfy the inequality:

Points of E[m] can be characterized in terms of the division polynomials:

Theorem 2.47.

Ler and . Then if and only if ψ_m(h, k) = 0. Furthermore, if m > 2 and , then if and only if .

We finally define polynomials as follows. If char K ≠ 2, then for all . On the other hand, for char K = 2 and for non-supersingular curves over K we already have (Exercise 2.107), and it is customary to define f_m(x) := ψ_m(x, y) for all . By further abuse of notations, we also call f_m the m-th division polynomial of E.

2.11.3. Elliptic Curves over Finite Fields

In this section, we take , a finite field of cardinality q and characteristic p. We do not deal with the case p = 3. Let E be an elliptic curve defined over . If p > 3, we assume that E is defined by Equation (2.8), whereas for p = 2, we assume that E is defined by Equation (2.10) or Equation (2.9) depending on whether E is supersingular or not.

Since is a subset of , the cardinality is finite. The next theorem shows that is quite close to q.

Theorem 2.48. Hasse’s theorem

, where . (The integer t is called the trace of Frobenius at q.)

The implication of this theorem is that the possible cardinalities of lie in a rather narrow interval . If q = p is a prime, then for every n, , there is at least one curve E with . Moreover, the values of are distributed almost uniformly in the interval . However, if q is not a prime, these nice results do not continue to hold.

Definition 2.80.

If t = 1 (that is, if ), the curve E is called anomalous. If p|t, the curve E is called supersingular and if pt, then E is called non-supersingular.

Anomalous and supersingular curves are cryptographically weak, because certain algorithms are known with running time better than exponential to solve the so-called elliptic curve discrete logarithm problem over these curves. Determination of the order gives t from which one can easily check whether E is anomalous or supersingular. If p = 2, we have an easier check for supersingularity.

Proposition 2.35.

An elliptic curve E over a finite field of characteristic 2 is supersingular if and only if j(E) = 0 or, equivalently, if and only if a₁ = 0 in Equation (2.6).

For arbitrary characteristic p, we have the following characterization.

Proposition 2.36.

An elliptic curve E over is supersingular if and only if t² = 0, q, 2q, 3q or 4q. In particular, if char , 3, then E is supersingular if and only if t = 0.

By Theorem 2.38, the group is always cyclic. However, the group is not always cyclic, but is of a special kind. We need a few definitions to explain the structure of . The notion of internal direct product for multiplicative groups (Exercise 2.19) can be readily applied to additive groups as follows.

Definition 2.81.

Let G be an additive group and let H₁, . . . , H_r be subgroups of G. If every element of G can be written uniquely as h₁ + · · · + h_r with , i = 1, . . . , r, we say that G is the (internal) direct sum of the subgroups H₁, . . . , H_r and denote this as G = H₁ ⊕ · · · ⊕ H_r.

Theorem 2.49. Structure theorem for finite Abelian groups

Let G be a finite additive Abelian group of cardinality #G = n. Then there exist and integers n_i ≥ 2 for 1 ≤ i ≤ r, such that G is the direct sum of (subgroups isomorphic to the) cyclic groups , that is, , where n_i+1|n_i for all i = 1, . . . , r – 1. Furthermore, such a decomposition is unique in the sense that if with integers m_i ≥ 2 and m_i+1|m_i for i = 1, . . . , s – 1, then r = s and n_i = m_i for all i = 1, . . . , r. In this case, we say that G has rank r and is of type (n₁, . . . , n_r). By Lagrange’s theorem, each n_i|n. Moreover, n = n₁n₂ · · · n_r. G is cyclic if and only if the rank of G is 1.

Theorem 2.50. Structure theorem for

The elliptic curve group is of rank 1 or 2. If the rank is 1, then is cyclic, otherwise , where n₁, n₂ ≥ 2 and n₂|n₁. In the second case, we have n₂|(q – 1).

Once we know the order of the group , it is easy to compute the order of as the following theorem suggests.

Theorem 2.51.

Let α, satisfy 1 – tX + qX² = (1 – αX)(1 – βX). Then for any the order .

Exercise Set 2.11

2.100	Show that the following curves over K are not smooth (and hence not elliptic curves): Y² = X³, K arbitrary. Y² = X³ + X², K arbitrary. Y² = X³ + aX + b, if char K = 2.
2.101	Show that for an elliptic curve E over K and a finite point , the only points in E(K) (or ) having X-coordinate equal to h are P and –P. Let char K ≠ 2, 3 and let E be defined by Equation (2.8). If α₁, α₂, are the roots (distinct by Theorem 2.45) of X³ + aX + b, then (α₁, 0), (α₂, 0) and (α₃, 0) are the only points on with Y-coordinate equal to 0. Show that these are the only points of order 2 in .
2.102	Let P = (h₁, k₁) and Q = (h₂, k₂) be two points (different from ) in E(K) defined by the Weierstrass Equation (2.6). Assume that Q ≠ –P. Determine R = (h₃, k₃) = P + Q as follows: Show that the line passing through P and Q (the tangent, if P = Q) has the equation Y = λX + μ, where Substituting λX + μ for Y in Equation (2.6) gives a cubic equation in X of which h₁ and h₂ are two roots. Show that the third root (the X-coordinate of R) is h₃ = λ² + a₁λ – a₂ – h₁ – h₂. Hence deduce that the Y-coordinate of R is k₃ = –(λ + a₁)h₃ – μ – a₃.
2.103	Let . Show that there exists an elliptic curve E over K such that . [H]
2.104	Assume that char K ≠ 2, 3 and consider the elliptic curve E given by Equation (2.8). Let K[E] be the affine coordinate ring and K(E) the field of rational functions on E. Show that every element in K[E] can be uniquely represented as u(x) + yv(x) for polynomials u(x), . The conjugate of is defined as . The norm of f is defined as . Show that . The degree of is defined as deg f := max(2 deg_x u, 3 + 2 deg_x v), where deg_x denotes the degree in x. Show that deg f = deg_x N(f). Show that for f, one has N(fg) = N(f) N(g). Hence conclude that deg(fg) = deg f + deg g. Show that every rational function in K(E) can be represented as a(x) + yb(x), where a(x), .
2.105	Show that the division polynomials for the general Weierstrass equation can be recursively defined as where F = 4x³ + d₂x² + 2d₄x + d₆.
2.106	Write the recursive formulas for the division polynomials ψ_m(x, y) and for the elliptic curve E defined by Equation 2.8 over a field K of characteristic ≠ 2, 3. Show that for m ≥ 2 and for we have [View full size image]
2.107	Write the recursive formulas for the division polynomials ψ_m(x, y) and for the elliptic curve E defined by Equation 2.9 over a field K of characteristic 2. Conclude that ψ_m are polynomials in only x for all . With f_m := ψ_m for all show that for m ≥ 2 and for we have [View full size image]
2.108	Consider the elliptic curve defined over the field : E_a,b : Y² = X³ + aX + b. Verify the following assertions: (You may write a computer program.) Each E_a,b has order between 3 and 13. The curve E_0,3 : Y² = X³ + 3 has the maximum possible order 13. The curve E_0,4 : Y² = X³ + 4 has the minimum possible order 3. The curve E_0,5 : Y² = X³ + 5 is anomalous. The group is not cyclic.
2.109	Consider the representation of as , where ξ is a root of T³ + T + 1 in . Identify an element (where ) with the integer (a₂a₁a₀)₂ = a₂2² + a₁2 + a₀. For integers a, , b ≠ 0, define the non-supersingular elliptic curve: E_a,b : Y² + XY = X³ + aX² + b. Verify the following assertions: (You may write a computer program.) Each E_a,b has order between 4 and 14. The curve E_1,1 : Y² + XY = X³ + X² + 1 has the maximum possible order 14. The curve E_2,1 : Y² + XY = X³ + ξX² + 1 has the minimum possible order 4. The curve E_2,2 : Y² + XY = X³ + ξX² + ξ is anomalous. The orders of E_a,b for all choices of a, b lie in the set {4, 6, 8, 10, 12, 14}. Each is cyclic. Theorem 2.45(3) requires the phrase over , that is, two curves over an algebraically non-closed field having the same j-invariant may be non-isomorphic.
2.110	Consider the representation of and the identification of elements of with integers as in Exercise 2.109. For a, b, , a ≠ 0, define the supersingular elliptic curve: E_a,b,c : Y² + aY = X³ + bX + c. Verify the following assertions: (You may write a computer program.) Each E_a,b,c has order between 5 and 13. The curve E_1,1,1 : Y² + Y = X³ + X + 1 has the maximum possible order 13. The curve E_1,1,2 : Y² + Y = X³ + X + ξ has the minimum possible order 5. The orders of E_a,b,c for all choices of a, b, c lie in the set {5, 9, 13}. No E_a,b,c is anomalous. Each is cyclic.
2.111	Consider the elliptic curve E : Y² + XY = X³ + X² + 1 defined over for all . Show that where r = ⌊n/2⌋. [H] Conclude that E is anomalous over , but not so over .
2.112	Let K be a finite field of characteristic ≠ 2, 3 and E : Y² = X³ + aX + b an elliptic curve defined over K. Prove that: #E(K) is odd if and only if X³ + aX + b is irreducible in K[X]. [H] E(K) is not cyclic if X³ + aX + b splits in K[X]. The converse of Part (b) does not hold. [H]
2.113	Let E : Y² + XY = X³ + aX² + b be a non-supersingular elliptic curve defined over . Prove that: has exactly one point of order 2. [H] is even.
2.114	Let E : Y² + aY = X³ + bX + c be a supersingular elliptic curve over . Prove that: has no points of order 2. is odd.
2.115	Let G be a finite Abelian group of cardinality n. Show that if n is square-free, then G is cyclic. [H] Prove that if E is an anomalous elliptic curve over , then is cyclic. [H] If E is a supersingular elliptic curve over the field of characteristic ≠ 2, 3, prove that is either cyclic or isomorphic to . [H]
2.116	Let , p ≡ 3 (mod 4), and a, . Consider the elliptic curve E : Y² = X³ – a²X over (or over ). Prove that: contains at most three points of order three. The points of order three in are precisely the points of order three in .
2.117	A Weierstrass equation of an elliptic curve defined over a field K is said to be in the Legendre form, if it can be written as Equation 2.12 for some , k ≠ 0, 1. Show that if char K ≠ 2, then every Weierstrass equation over K can be written in the Legendre form. Show that the j-invariant of the curve E defined by Equation (2.12) is .

**2.12. Hyperelliptic Curves

Hyperelliptic curves are generalizations of elliptic curves. We cannot define a group structure on a general hyperelliptic curve in the way as we did for elliptic curves. We instead work in the Jacobian of a hyperelliptic curve. For an elliptic curve E over an algebraically closed field K, the Jacobian is canonically isomorphic to the group E(K). Thus one can as well use the techniques for hyperelliptic curves for describing and working in elliptic curve groups. However, the exposition of the previous section turns out to be more intuitive and computationally oriented.

2.12.1. The Defining Equations

A hyperelliptic curve C of genus over a field K is defined by a polynomial equation of the form

Equation 2.13

In order that C qualifies as a hyperelliptic curve, we additionally require that C (as a projective curve) be smooth over . The set of K-rational points on C is denoted as usual by C(K). For g = 1, Equation (2.13) is the same as the Weierstrass Equation (2.6) on p 98, that is, elliptic curves are hyperelliptic curves of genus one. A hyperelliptic curve of genus 2 over is shown in Figure 2.2.

Figure 2.2. A hyperelliptic curve of genus 2 over : Y² = X(X² – 1)(X² – 2)

A hyperelliptic curve has only one point at infinity (Exercise 2.97(f)) and is smooth at . If char K ≠ 2, substituting simplifies Equation (2.13) as . Since is a monic polynomial in K[X] of degree 2g + 1, we may assume that if char K ≠ 2, the equation for C is of the form:

Equation 2.14

Proposition 2.37.

If char K ≠ 2, then the hyperelliptic curve C defined by Equation (2.14) is smooth if and only if v has no multiple roots (in ). If char K = 2, then the curve defined by Equation (2.14) is never smooth.

Proof

First, consider char K ≠ 2. If v has a multiple root, say , then v′(α) = 0 and, therefore, C is not smooth at the finite point . Conversely, if (h, k) is a singular point on , then we have 2k = 0 and v′(h) = 0. Since (h, k) = (h, 0) is a point on C, we have v(h) = 0, that is, h is a multiple root of v.

For char K = 2 and , we have (∂(Y² – v(X))/∂X)(h, k) = v′(h) and (∂(Y² – v(X))/∂Y)(h, k) = 0. Now, v′(X) is a monic polynomial of degree 2g > 0 and, therefore, has at least one root, say . But then C is not smooth at .

Definition 2.82.

Let P = (h, k) be a finite point on the hyperelliptic curve C defined by Equation (2.13). The point is called the opposite of P.^[14] P and are the only points on C with X-coordinate equal to h. If , then P is called a special point on C, otherwise it is called an ordinary point on C. The set of all finite (resp. ordinary, resp. special) points on C is denoted by C_fin(K) (resp. C_ord(K), resp. C_spl(K)). These notations are also abbreviated as C_fin, C_ord and C_spl, if the field K is understood from the context.

^[14] It is customary to define the opposite of to be itself.

2.12.2. Polynomial and Rational Functions

All the general theory we described in Section 2.10 continues to be valid for hyperelliptic curves. However, since we are now given an explicit equation describing the curves, we can give more explicit expressions for polynomial and rational functions on hyperelliptic curves. For simplicity, we consider the affine equation and extend our definitions separately for the point at infinity.

Consider the hyperelliptic curve C defined by Equation (2.13). By Exercise 2.98, the defining polynomial f(X, Y) := Y² + u(X)Y – v(X) (or its homogenization) is irreducible over , so that the affine (or projective) coordinate ring of C is an integral domain and the corresponding function field is simply the field of fractions of the coordinate ring.

Let . Since y² + u(x)y – v(x) = 0 in K[C], we can repeatedly substitute y² by –u(x)y + v(x) in G(x, y) until the y-degree of G(x, y) becomes less than 2. This proves part of the following:

Proposition 2.38.

Every polynomial function can be written uniquely as G(x, y) = a(x) + yb(x) for some a(X), .

Proof

In order to establish the uniqueness, note that if G(x, y) = a₁(x) + yb₁(x) = a₂(x) + yb₂(x), then . Since the Y -degree of f is 2, this implies [a₁(X) + Y b₁(X)] – [a₂(X) + Y b₂(X)] = 0, that is, [a₁(X) – a₂(X)] + [b₁(X) – b₂(X)]Y = 0, that is, a₁(X) = a₂(X) and b₁(X) = b₂(X).

Definition 2.83.

Let . The conjugate of G is defined to be the polynomial function . The norm of G is defined as .

Some useful properties of the norm function are listed in the following lemma, the proof of which is left to the reader as an easy exercise.

Lemma 2.9.

For G, , we have:

.
If G(x, y) = a(x) + yb(x), then N(G) = a(x)² – a(x)b(x)u(x) – v(x)b(x)². In particular, .
.
N(GH) = N(G) N(H).

We also have an easy description of the rational functions on C.

Proposition 2.39.

Every rational function can be written in the form s(x) + yt(x) for some s(X), .

Proof

We can write r(x, y) = G(x, y)/H(x, y) for G, , H ≠ 0. Multiplying both the numerator and the denominator by and using Lemma 2.9(2) and Proposition 2.38 completes the proof.

The value of a rational function on C at a finite point on C can be defined as in the case of general curves (See Definition 2.68). In order to define the value of a rational function at the point , we need some other concepts.

For a moment, let us assume that . From the equation of C, we see that k² ≈ h^2g+1 (neglecting lower-degree terms) for sufficiently large coordinates h, k of a point . This means that k tends to infinity exponentially (2g + 1)/2 times as fast as h does. So it is customary to give Y a weight (2g + 1)/2 times a weight we give to X. The smallest integral weights of X and Y to satisfy this are 2 and 2g + 1 respectively. This motivates us to provide Definition 2.84 (generalized for any K).

Definition 2.84.

Let . The degree of G is defined to be deg G := max(2 deg_x a, 2g + 1 + 2 deg_x b), where deg_x denotes the usual x-degree of a polynomial in K[x]. Since a and b are uniquely determined by G, deg G is well-defined. If G = 0, we set deg G := –∞.

If 0 ≠ G = a(x)+yb(x), d₁ = deg_x a and d₂ = deg_x b, then the leading coefficient of G is taken to be the coefficient of x^d₁ in a(x) if deg G = 2d₁, or to be the coefficient of x^d2 in b(x) if deg G = 2g + 1 + 2d₂. (We cannot have 2d₁ = 2g + 1 + 2d₂, since the left side is even and the right side is odd.)

Some basic properties of the degree function follow.

Lemma 2.10.

For G, , we have:

deg G = deg_x(N(G)).
deg(GH) = deg G + deg H.
.

Proof

Easy exercise.

Now we are in a position to give an explicit definition of the value of a rational function at .

Definition 2.85.

For with G, , we define as:

If deg(G) < deg(H), then .

If deg(G) > deg(H), then (that is, r is not defined at ).

If deg(G) = deg(H), then is defined as the ratio of the leading coefficients of G and H.

Now that we have a complete description of the value of a rational function at any point on C, poles and zeros of rational functions on C can be defined as in Definition 2.70. In order to define the order of a polynomial or rational function at a point P on C, we should find a uniformizing parameter u_P at P. Tedious calculations help one deduce the following explicit expressions for u_P.

Proposition 2.40.

Let be a finite point. Then we can take

as a uniformizing parameter at P. Finally, is a uniformizing parameter at the point at infinity (where g is the genus of C).

We give an alternative definition of the order (independent of u_P), which is computationally useful and which is equivalent to Definition 2.71 for a hyperelliptic curve.

Definition 2.86.

Let and . The order of G at P is defined as follows. First, let P = (h, k) be a finite point on C. Let e be the largest exponent such that (x – h)^e divides both a(x) and b(x). We write G = (x – h)^eG₁(x, y). If G₁(h, k) ≠ 0 we set l := 0, otherwise we set l to be the highest exponent such that (x – h)^l divides N(G₁). We then define

Finally, we define .

Now, let r(x, y) = G(x, y)/H(x, y) be a rational function on C and . We define the order of r at P as ord_P(r) := ord_P(G) – ord_P(H). The value ord_P(r) can be shown to be independent of the choice of G and H.

Example 2.23.

Let be a finite point on C. Consider the rational function , . The only points on C with X-coordinate equal to h are P and its opposite . Therefore, if P is an ordinary point, , whereas if P is a special point, ord_P (r) = 2m. Moreover, . For any , we have ord_Q(r) = 0.

Now consider r = (x – h)^m for some m < 0. Write r = G/H with G = 1 and H = (x – h)^–m. Since ord_Q(r) = ord_Q(G) – ord_Q(h), we continue to have

If m ≥ 0, then r is a polynomial function and has zeros P and and no finite poles. In this case, the sum of the orders of its zeros is 2m = 2 deg_x r = deg r. Theorem 2.52 generalizes this observation.

Theorem 2.52.

A non-constant polynomial function has only finitely many zeros and a single pole at . Furthermore, if K is algebraically closed, then .

2.12.3. The Jacobian

We continue to work with the hyperelliptic curve C of Equation (2.13). We first impose the restriction that K is algebraically closed and use the theory of Section 2.10 to define the set Div(C) of divisors on C, the degree zero part Div⁰(C) of Div(C), the divisor Div(r) of a rational function , the set Prin(C) of principal divisors on C, the Picard group Pic(C) = Div(C)/ Prin(C) and the Jacobian .

Example 2.24.

For the rational function r := (x – h)^m of Example 2.23, we have:

The Jacobian is the set of all cosets of Prin(C) in Div⁰(C). It is not a good idea to work with cosets (which are equivalence classes). Recall that in the case of , we represented a coset by the remainder of Euclidean division of a by n. In case of the representation , we took polynomials of smallest degrees as canonical representatives of the cosets of 〈f(X)〉. In case of too, we intend to find such good representatives, one from each coset. We now introduce the concept of reduced divisors for that purpose.

Definition 2.87.

Two divisors D₁, (resp. in Div(C)) are said to be equivalent, denoted D₁ ~ D₂, if , or equivalently if .

Our goal is to associate to every divisor some unique reduced divisor with D ~ D_red, that is, D_red plays the role of the canonical representative of . We start with the following definition.

Definition 2.88.

A divisor is called semi-reduced, if each m_P ≥ 0 and if for m_P > 0 we have: if P is an ordinary point, and m_P = 1 if P is a special point.

Proposition 2.41.

Every divisor is equivalent to some semi-reduced divisor D₁.

Proof

Let , with and with C_ord being the disjoint union of C₁ and C₂, where an ordinary point if and only if its opposite and . Now we can write D = D₁ + D₂, where

and

with m₁ and m₂ so chosen that D₁, . By definition, D₁ is semi-reduced, whereas by Example 2.24 , where

Now, we explain how we can represent a semi-reduced divisor by a pair of polynomials a(x), . For that, we need a definition.

Definition 2.89.

Let and be two divisors on C (not necessarily in Div⁰(P)). The greatest common divisor (gcd) of D₁ and D₂ is defined as the divisor

Theorem 2.53.

Let be a semi-reduced divisor on C. Let P_i = (h_i, k_i), i = 1, . . . , n, be the only finite points P on C such that m_P > 0. Let m_i := m_{P_i}, and (so that deg_x(a) = m). Then there exists a unique polynomial with the following properties:

deg_x b < m,
b(h_i) = k_i for i = 1, . . . , n,
a(x) divides b(x)² + b(x)u(x) – v(x), and
.

Conversely, if a(x), with deg_x b < deg_x a and with a dividing b² + bu – v, then the divisor gcd is semi-reduced.

We denote the divisor gcd by Div(a, b). The zero divisor has the representation Div(1, 0).

A representation of the elements of by semi-reduced divisors (that is, by pairs of polynomials in K[x]) suffers from two disadvantages. First, the representation is not unique, and second, the degrees of the representing polynomials may be quite large. These difficulties are removed if we consider semi-reduced divisors of a special kind.

Definition 2.90.

A semi-reduced divisor is called a reduced divisior, if , where g is the genus of C.

The following theorem establishes the desirable properties of a reduced divisor.

Theorem 2.54.

For , there exists a unique reduced divisor D₁ equivalent to D.

Proof

We only prove the existence of reduced divisors. For the proof of the uniqueness, one may, for example, see Koblitz [154]. The norm of a divisor is defined as the integer .

Let . By Proposition 2.41 there exists a semi-reduced divisor D′ ~ D. One can easily verify that |D′| ≤ |D|. If we already have |D′| ≤ g, then D′ is a desired reduced divisor. So assume otherwise, that is, |D′| ≥ g + 1. We can then choose finite points P₁, . . . , P_g+1 on C (not necessarily all distinct) such that is a subsum of the formal sum D′. Let the semi-reduced divisor be represented as Div(a, b) with deg_x a = g + 1 and deg_x b ≤ g. But then deg(b(x) – y) = 2g + 1 and b(x) – y has zeros at P₁, . . . , P_g+1 by Theorem 2.53. So by Theorem 2.52 we can write for some finite points Q₁, . . . , Q_g on C. Now satisfies D″ ~ D′ and |D″| < |D′|. We apply Proposition 2.41 again to get a semi-reduced divisor D‴ ~ D″ with |D‴| ≤ |D″|. Thus starting from the semi-reduced divisor D′ we produce another semi-reduced divisor D‴ such that D‴ ~ D′ ~ D and |D‴| < |D′|. We continue the process a finite number of times, until we get an equivalent semi-reduced divisor D₁ of norm ≤ g. This is a desired reduced divisor.

From the viewpoint of cryptography, the field K should be a finite field which is never algebraically closed. So we must remove the restriction . Since C is naturally defined over as well, we start with the Jacobian and define a particular subgroup of to be the Jacobian of C over K.

Definition 2.91.

Let be a K-automorphism of . For a point , the point is also in . For a divisor , we define . D is said to be defined over K if for all . The subset of consisting of divisor classes that have representative divisors defined over K is a subgroup (denoted by ) of and is called the Jacobian of C over K.

Every element of can be represented uniquely as a reduced divisor Div(a, b) for polynomials a(x), with deg_x a ≤ g and deg_x b < deg_x a. is, therefore, a finite Abelian group. For suitably chosen hyperelliptic curves, these groups can be used to build cryptographic protocols.

Exercise Set 2.12

In this exercise set, we let C denote a hyperelliptic curve of genus g defined by Equation (2.13) over a field K (not necessarily algebraically closed).

2.118	Show that the curve C₁ : Y² = X⁵ + X + 1 defined over is not smooth and so not a hyperelliptic curve. Find a point where C₁ is not smooth. Show that the curve C₂ : Y² = X⁵ + X + 2 defined over is smooth, that is, a hyperelliptic curve of genus 2. Find out all the -rational points on C₂. (There are ten of them.)
2.119	Represent as , where ξ is a root of the irreducible polynomial . Show that the curve C₃ : Y² + XY = X⁵ + X + 1 defined over is not smooth and so not a hyperelliptic curve. Find a point where C₃ is not smooth. Show that the curve C₄ : Y² + XY = X⁵ + X + ξ defined over is smooth, that is, a hyperelliptic curve of genus 2. Find out all the -rational points on C₄. (There are eight of them.)
2.120	Let . Prove the following assertions: The only points on C with X-coordinate equal to h are P and . . P is a special point if and only if u²(h) + 4v(h) = 0. If char K ≠ 2, then C has at most 2g + 1 special points, whereas if char K = 2, then C has at most g special points.
2.121	Prove Lemmas 2.9 and 2.10.
2.122	Let and . Show that G(P) = 0 if and only if . Let . Show that either P is a special point of C or h is a common root of u and v. Show that and that .
2.123	Prove Theorem 2.52. [H]
2.124	A line on C is a polynomial function of the form with a, b, , a and b not both 0. Let D = Div(l) be the divisor of a line l. Show that the norm \|D\| is either 2 or 2g + 1. Let . Determine Div(x – h). Determine Div(y).
2.125	Let E be an elliptic curve (that is, a hyperelliptic curve of genus 1) defined over K. Show that any divisor can be written as for some unique point and for some rational function . This rational function r is unique up to multiplication by elements of . Show that the map that maps the residue class of to the point satisfying for some , is a bijection. Let P, , not both . Show that there is a line l with , where R = –(P + Q). Let , where σ is defined in Part (b). Show that for P, one has . (This, in particular, proves Theorem 2.46 and that σ is a group isomorphism.) Let . Show that D is a principal divisor if and only if (integer sum) and (sum in ).

**2.13. Number Fields

In this section, we develop the theory of number fields and rings. Our aim is to make accessible to the readers the working of the cryptanalytic algorithms based on number field sieves.

2.13.1. Some Commutative Algebra

Commutative algebra is the study of commutative rings with identity (rings by our definition). Modern number theory and geometry are based on results from this area of mathematics. Here we give a brief sketch of some commutative algebra tools that we need for developing the theory of number fields.

Ideal arithmetic

We start with some basic operations on ideals (cf. Example 2.7, Definition 2.23).

Definition 2.92.

Let A be a ring and let , , be a family (not necessarily finite) of ideals in A.

The set-theoretic intersection is evidently an ideal in A.

The sum of the family is the ideal

Two ideals and of A are said to be relatively prime or coprime, if , or equivalently if there exist and with a + b = 1.

If I = {1, 2, . . . , n} is finite, the product is the ideal generated by all elements of the form x₁x₂ . . . x_n with for all i = 1, . . . , n. We have:

If , the product is denoted as . The empty product of ideals is conventionally taken to be the unit ideal A. If is the principal ideal 〈a〉, then .

One can readily check that the operations intersection, sum and product on ideals in a ring are associative and commutative.

Commutative algebra extensively uses the theory of prime and maximal ideals (Definition 2.19, Proposition 2.9, Corollary 2.2 and Exercise 2.23). The set of all prime ideals in A is called the (prime) spectrum of A and is denoted by Spec A. The set of all maximal ideals of A is called the maximal spectrum of A and denoted by Spm A. We have Spm A ⊆ Spec A. These two sets play an extremely useful role for the study of the ring A. If A is non-zero, both these sets are non-empty.

Localization

The concept of formation of fractions of integers to give the rationals can be applied in a more general setting. Instead of having any non-zero element in the denominator of a fraction we may allow only elements from a specific subset. All we require to make the collection of fractions a ring is that the allowed denominators should be closed under multiplication.

Definition 2.93.

Let A be a ring. A non-empty subset S of A is called multiplicatively closed or simply multiplicative, if and for any s, we have .

Example 2.25.

For a non-zero ring A, the subset A \ {0} is multiplicatively closed, if and only if A is an integral domain. For a general non-zero ring A, the set of all elements such that a is not a zero-divisor is a multiplicative subset of A.
Let A be a ring and a a proper ideal of A. The set is multiplicatively closed, if and only if is a prime ideal of A.
For a ring A and an element , the set {1, f, f², f³, . . .} ⊆ A is multiplicatively closed.

Let A be a ring and S a multiplicative subset of A. We define a relation ~ on A × S as: (a, s) ~ (b, t) if and only if u(at – bs) = 0 for some . (If A is an integral domain, one may take u = 1 in the definition of ~.) It is easy to check that ~ is an equivalence relation on A × S. The set of equivalence classes of A × S under ~ is denoted by S^–1A, whereas the equivalence class of is denoted as a/s. For a/s, , define (a/s) + (b/t) := (at + bs)/(st) and (a/s)(b/t) := (ab)/(st). It is easy to check that these operations are well-defined and make S^–1 A a ring with identity 1/1, in which each s/1, , is invertible. There is a canonical ring homomorphism taking a ↦ a/1. In general, is not injective. However, if A is an integral domain and 0 ∉ S, then the injectivity of can be proved easily and we say that the ring A is canonically embedded in the ring S^–1A.

Definition 2.94.

Let A be a ring and S a multiplicative subset of A. The ring S^–1A constructed as above is called the localization of A away from S or the ring of fractions of A with respect to S.

Example 2.26.

Let A be an integral domain and let S = A \ {0}. Then S^–1A is called the quotient field or the field of fractions of A and is denoted as Q(A). If A is already a field, then Q(A) ≅ A. Other examples include and Q(K[X]) = K(X), K a field, where K(X) denotes the field of rational functions over K in one indeterminate X.
More generally, if A is any ring and S is the set of all non-zero-divisors of A, then S^–1A is called the total quotient ring of A and is again denoted by Q(A). It is, in general, not a field. If A is an integral domain, then S = A \ {0} and the usage of Q(A) remains consistent.
Let A be a ring, a prime ideal of A and . Then S^–1A is called the localization of A at and is usually denoted by A_p.
Let A be a ring, and S = {1, f, f², f³, . . . }. In this case, S^–1A is conventionally denoted by A_f.

Integral dependence

The concept of integral dependence generalizes the notion of integers. Recall that for a field extension K ⊆ L, an element is called algebraic over K, if α is a root of a non-zero polynomial . Since K is a field, the polynomial f can be divided by its leading coefficient, giving a monic polynomial in K[X] of which α is a root. However, if K is not a field, division by the leading coefficient is not always permissible. So we require the minimal polynomial to be monic in order to define a special class of objects.

Definition 2.95.

Let A ⊆ B be an extension of rings. An element is said to be integral over A, if α satisfies^[15] (that is, is a root of) a monic (and hence non-zero) polynomial . An equation of the form f(α) = 0, monic, is called an equation of integral dependence of α over A.

^[15] Strictly speaking, α being a root of f(X) is equivalent to α satisfying the polynomial equation f(α) = 0. Often the term equation is dropped in this context—a harmless colloquial contraction.

Example 2.27.

If both A and B are fields, the concepts of integral and algebraic elements are the same. (See the argument preceding Definition 2.95.)
Take and and let , gcd(a, b) = 1, be integral over . Let (a/b)ⁿ + α_n–1(a/b)^n–1 + · · · + α₁(a/b) + α₀, , be an equation of integral dependence of a/b over . Multiplication by bⁿ gives aⁿ = –b(α_n–1a^n–1 + · · · + α₁ab^n–2 + α₀b^n–1), that is, b|aⁿ. Since gcd(a, b) = 1, this forces b = ±1, that is, . This is, in general, true for any UFD A and its field of fractions B = Q(A) (See Exercise 2.131).
Every element is integral over A, since it satisfies the monic polynomial .

Now let A ⊆ B be an extension of rings and let C consist of all the elements of B that are integral over A. Clearly, A ⊆ C ⊆ B. It turns out that C is again a ring. This result is not at all immediate from the definition of integral elements. We prove this by using the following lemma which generalizes Theorem 2.33.

Lemma 2.11.

For a ring extension A ⊆ B and for , the following conditions are equivalent:

α is integral over A.
A[α] is a finitely generated A-module.
A[α] ⊆ C for some subring C of B with C being a finitely generated A-module.

Proof

[(a)⇒(b)] Let αⁿ + a_n–1α^n–1 + · · · + a₁α + a₀ = 0, , be an equation of integral dependence of α over A. is generated as an A-module by 1, α, α², . . . . In order to show that only the elements 1, α, . . . , α^n–1 generate A[α] as an A-module, it is sufficient to show that each α^k, , is an A-linear combination of 1, α, . . . , α^n–1. We proceed by induction on k. The assertion certainly holds for k = 0, . . . , n – 1, whereas for k ≥ n we write α^k = –(a_n–1α^k–1 + · · · + a₁α^k–n+1 + a₀α^k–n), whence induction completes the proof.

[(b)⇒(c)] Take C := A[α].

[(c)⇒(a)] Let generate C as an A-module. Since A[α] ⊆ C and, in particular, , for all i = 1, . . . , n we can write for some . Let denote the matrix (αδ_ij – a_ij)_1≤i,j≤n, where δ_ij is the Kronecker delta. Then . Multiplication (on the left) by the adjoint of shows that for all i = 1, . . . , n. Since , we have for some , so that (det ) · 1 = 0, that is, det . But det is a monic polynomial in α of degree n and with coefficients from A.

Proposition 2.42.

For an extension A ⊆ B of rings, the set

is a subring of B containing A.

Proof

Clearly, A ⊆ C ⊆ B as sets. To show that C is a ring let α, . By Condition (b) of Lemma 2.11, A[α] is a finitely generated A-module. Now β, being integral over A, is also integral over A[α]; so again by Lemma 2.11(b), A[α][β] is a finitely generated A[α]-module. It is then easy to check that A[α, β] = A[α][β] is a finitely generated A-module. Since α ± β and αβ are in A[α, β], by Lemma 2.11(c), these elements are integral over A, that is, belong to C. Thus C is a ring.

Definition 2.96.

The ring C of Proposition 2.42 is called the integral closure of A in B. A is called integrally closed in B, if C = A. On the other hand, if C = B, we say that B is an integral extension of A or that B is integral over A.

An integral domain A is called integrally closed (without specific mention of the ring in which it is so), if A is integrally closed in its quotient field Q(A). An integrally closed integral domain is called a normal domain (ND).

Example 2.28.

(or more generally any UFD) is a normal domain.
is not integrally closed in or , since, for example, is integral over . The integral closure of in is denoted by . Elements of are called algebraic integers (See Exercise 2.60).

Noetherian rings

Recall that a PID is a ring (integral domain) in which every ideal is principal, that is, generated by a single element. We now want to be a bit more general and demand every ideal to be finitely generated. If a ring meets our demand, we call it a Noetherian ring. These rings are named after Emmy Noether (1882–1935) who was one of the most celebrated lady mathematicians of all ages and whose work on Noetherian rings has been very fundamental and deep in the branch of algebra. Emmy’s father Max Noether (1844 –1921) was also an eminent mathematician.

Definition 2.97.

Let A be a ring and let be an ascending chain of ideals of A. This chain is called stationary, if there is an such that . The ring A is said to satisfy the ascending chain condition or the ACC, if every ascending chain of ideals in A is stationary, or in other words, if there does not exist any infinite strictly ascending chain of ideals in A.

Proposition 2.43.

For a ring A, the following conditions are equivalent:

Every ideal of A is finitely generated.
A satisfies the ascending chain condition.
Every non-empty set of ideals of A contains a maximal element.

Proof

[(a)⇒(b)] Let be an ascending chain of ideals of A. Consider the ideal which is finitely generated by hypothesis. Let a₁, . . . , a_r be a set of generators of . Each , that is, there exists such that and hence for every n ≥ m_i. Take m := max(m₁, . . . , m_r). For every n ≥ m, we have a , that is, .

[(b)⇒(c)] Let S be a non-empty set of ideals of A. Order S by inclusion. The ACC implies that every chain in S has an upper bound in S. By Zorn’s lemma, S has a maximal element.

[(c)⇒(a)] Let be an ideal of A. Consider the set S of all finitely generated ideals of A contained in . S is non-empty, since it contains the zero ideal. By condition (c), S has a maximal element, say, . If , take . Then is finitely generated (since is so), properly contains and is contained in . This contradicts the maximality of in S. Thus we must have , that is, is finitely generated.

Definition 2.98.

A ring A is called Noetherian, if A satisfies (one and hence all of) the equivalent conditions of Proposition 2.43.

Example 2.29.

All PIDs are Noetherian, since principal ideals are obviously finitely generated. In particular, and K[X] (K a field) are Noetherian.
If A is Noetherian and an ideal of A, then is Noetherian, since the ideals of are in one-to-one inclusion-preserving correspondence with the ideals of A containing a and hence satisfy the ACC.
Let A be a Noetherian ring and S a multiplicative subset of A. Then the localization B := S^–1A is also Noetherian. To prove this fact let be an ideal in B. One can show that for some ideal of A. Since A is Noetherian, is finitely generated, say, . It is now (almost) obvious that is generated by a₁/1, . . . , a_r/1. A particular case: If A is Noetherian and a prime ideal of A, then the localization is also Noetherian.
The ring of polynomials with infinitely many indeterminates X₁, X₂, X₃, . . . is not Noetherian. This is because the ideal
〈X₁, X₂, X₃, . . .〉 = AX₁ + AX₂ + AX₃ + · · ·
is not finitely generated, or alternatively because we have the infinite strictly ascending chain of ideals: 〈X₁〉  〈X₁, X₂〉  〈X₁, X₂, X₃〉  · · ·, or because the set S := {〈X₁〉, 〈X₁, X₂〉, 〈X₁, X₂, X₃〉, . . .} of ideals in A does not contain a maximal element.

We have seen that if A is a PID, the polynomial ring A[X] need not be a PID. However, the property of being Noetherian is preserved during the passage from A to A[X] (Theorem 2.8).

Dedekind domains

A class of rings proves to be vital in the study of number fields:

Definition 2.99.

An integral domain A is called a Dedekind domain, if it satisfies all of the following three conditions:

A is Noetherian.
Every non-zero prime ideal of A is maximal.
A is integrally closed (in its quotient field K := Q(A)).

2.13.2. Number Fields and Rings

After much ado we are finally in a position to define the basic objects of study in this section.

Definition 2.100.

A number field K is defined to be a finite (and hence algebraic) extension of the field of rational numbers. Clearly, . The extension degree is called the degree of the number field K and is finite by definition.

Note that there exist considerable controversies among mathematicians in accepting this definition of number fields. Some insist that any field K satisfying should be called a number field. Some others restrict the definition by demanding that one must have K algebraic over ; however, fields K with infinite extension degree are allowed. We restrict the definition further by imposing the condition that has to be finite. Our restricted definition is seemingly the most widely accepted one. In this book, we study only the number fields of Definition 2.100 and accepting this definition would at the minimum save us from writing huge expressions like “(algebraic) number fields of finite extension degree over ” to denote number fields.

For number fields, the notion of integral closure leads to the following definition.

Definition 2.101.

A number field K contains and hence . The integral closure of in K is called the ring of integers of K and is denoted by . ( is the Gothic O.) Clearly, and is an integral domain. We also have , where is the subset of comprising all algebraic integers. A number ring is a ring which is (isomorphic to) the ring of integers of a number field.

By Example 2.27(2), the ring of integers of the number field is , that is, . It is, therefore, customary to call the elements of rational integers. Since is naturally embedded in for any number field K, it is important to notice the distinction between the integers of K (that is, the elements of ) and the rational integers of K (that is, the images of the canonical inclusion ).

Some simple properties of number rings are listed below.

Proposition 2.44.

For a number field K, we have:

.
For , there exists a rational integer such that . In particular, the quotient field of is K.
is integrally closed in , that is, is a normal domain.

Proof

(1) follows immediately from Example 2.27(2), (2) follows from Exercise 2.60, and (3) follows from Exercise 2.126(b).

Let K be a number field of degree d. By Corollary 2.13, K is a simple extension of , that is, there exists an element with a minimal polynomial f(X) over such that deg and . The field K is a -vector space of dimension d with basis 1, α, . . . , α^d–1. There exists a nonzero integer a such that is an algebraic integer and we continue to have . Thus, without loss of generality, we may take α to be an algebraic integer. In this case, the -basis 1, α, . . . , α^d–1 of K consists only of algebraic integers.

Conversely, let be an irreducible polynomial of degree d ≥ 1. The field is a number field of degree d and the elements of K can be represented by polynomials with rational coefficients and of degrees < d. Arithmetic in K is carried out as the polynomial arithmetic of followed by reduction modulo the defining irreducible polynomial f(X). This gives us an algebraic representation of K independent of any element of K. Now, K can also be viewed as a subfield of and the elements of K can be represented as complex numbers.^[16] A representation with a field isomorphism is called a complex embedding of K in .^[17] Such a representation is not unique as Proposition 2.45 demonstrates.

^[16] A complex number has a representation by a pair (a, b) of real numbers. Here, plays the role of X + 〈X² + 1〉 in . Finally, every real number has a decimal (or binary or hexadecimal or . . .) representation.

^[17] The field is canonically embedded in K. It is evident that the embedding σ : K → K′ fixes element-wise.

Proposition 2.45.

A number field K of degree d ≥ 1 has exactly d distinct complex embeddings.

Proof

As above we take for some irreducible polynomial of degree d. Since is a perfect field (See Exercise 2.76), the d roots of f(X) are all distinct. For each i = 1, . . . , d, the map sending X + 〈f(X)〉 ↦ α_i clearly extends to a field isomorphism . Thus we get d distinct complex embeddings of K in . Now let K′ be a subfield of , such that is a -isomorphism. Let α := σ(X + 〈f(X)〉). Then 0 = σ(0) = σ(f(X + 〈f(X)〉)) = f(σ(X + 〈f(X)〉)) = f(α). Thus α is a root of f, that is, α = α_i for some . Since K′ is a field containing and α_i and having , it follows that and σ = σ_i.

This proposition says that the conjugates α₁, . . . , α_d are algebraically indistinguishable. For example, X² + 1 has two roots ±i, where . But it makes little sense to talk about the positive and the negative square roots of –1? They are algebraically indistinguishable and if one calls one of these i, the other one becomes –i.^[18] However, if a representation of is given, we can distinguish between and by associating these quantities with the elements and respectively, where is the positive real square root of 5 and where is the imaginary unit available from the given representation of .

^[18] In a number theory seminar in 1996, Hendrik W. Lenstra, Jr. commented:

Suppose the Martians defined the complex numbers by adjoining a root of –1 they called j. And when the Earth and Martians start talking, they have to translate i to be either j or –j. So we take i to j, because I think that’s what the scientists will decide. ··· But it was later discovered that most Martians are left handed, so the philosophers decide it’s better to send i to –j instead.

It is also quite customary to start with for some algebraic and seek for the complex embeddings of K in . One then considers the minimal polynomial f(X) of α (over ) and proceeds as in the proof of Proposition 2.45 but now defining the map as the unique field isomorphism that fixes and takes α ↦ α_i. If we take α = α₁, then σ₁ is the identity map, whereas σ₂, . . . , σ_d are non-identity field isomorphisms.

The moral of this story is that whether one wants to view the number field K as or as for any is one’s personal choice. In any case, one will be dealing with the same mathematical object and as long as representation issues are not brought into the scene, all these definitions of a number field are absolutely equivalent.

The embeddings need not be all distinct as sets. For example, the two embeddings and of are identical as sets. But the maps x ↦ i and x ↦ –i are distinct (where x := X + 〈X² + 1〉). Thus while specifying a complex embedding of a number field K, it is necessary to mention not only the subfield K′ of isomorphic to K, but also the explicit field isomorphism K → K′.

Definition 2.102.

Let K be a number field of degree d defined by an irreducible polynomial or by any root of f(X). Let r₁ be the number of real roots and 2r₂ the number of non-real roots of f. (Note that the non-real roots of a real polynomial occur in (complex) conjugates.) By the fundamental theorem of algebra, we have d = r₁ + 2r₂. For any real root α of f, the complex embedding of K is completely contained in and hence is often called a real embedding of K. On the other hand, for a non-real root β of f the complex embedding of K is called a non-real or a properly complex embedding of K. The pair (r₁, r₂) is called the signature of the number field K. K has r₁ real embeddings and 2r₂ properly complex embeddings. If r₂ = 0, that is, if all embeddings of K are real, one calls K a totally real number field. On the other hand, if r₁ = 0, that is, if all embeddings of K are properly complex, then K is called a totally complex number field.

Example 2.30.

The number field is totally real and has the signature (2, 0). (The roots of X² – 2 are .)
The number field is totally complex and has the signature (0, 1). (The roots of X² + 2 are .)
The number field is neither totally real nor totally complex. The roots of X³ – 2 are and . The signature of K is (1, 1), that is, K has one real embedding and two properly complex embeddings.

The simplest examples of number fields are the quadratic number fields, that is, number fields of degree 2. Some special properties of quadratic number fields are covered in the exercises. It follows from Exercise 2.136 that every quadratic number field is of the form for some non-zero square-free integer D ≠ 1.

Now we investigate the -module structure of for a number field K of degree d. Let σ₁, . . . , σ_d be the complex embeddings of K.

Definition 2.103.

For an element , we define the trace of α (over ) as

Equation 2.15

and the norm of α (over ) as

If g(X) is the minimal polynomial of α over and r := deg g, then r|d. Moreover, . So Tr(α) and N(α) belong to . If α is an algebraic integer, then , that is, Tr(α), .

The following properties of the norm and trace functions can be readily verified. Here α, and .

Tr(α + β)	=	Tr(α) + Tr(β),
N(αβ)	=	N(α)N(β),
Tr(cα)	=	c Tr(α),
N(cα)	=	c^dN(α),
Tr(c)	=	cd,
N(c)	=	c^d.

Definition 2.104.

Let . We call the determinant of the matrix (Tr(β_iβ_j))_1≤i,j≤d, whose ij-th entry is equal to Tr(β_iβ_j), the discriminant Δ(β₁, . . . , β_d) of β₁, . . . , β_d. Since each Tr, it follows that . Moreover, if β₁, . . . , β_d are all algebraic integers, then .

Proposition 2.46.

Δ(β₁, . . . , β_d) = (det(σ_j(β_i)))².

Proof

Consider the matrices D := (Tr(β_iβ_j)) and E := (σ_j(β_i)). By definition, we have Δ(β₁, . . . , β_d) = det D. We show that D = EE^t, which implies that det D = (det E)². The ij-th entry of EE^t is

where the last equality follows from Equation (2.15).

Let for some and let f(X) be the minimal polynomial of α over . We define the discriminant of f as

Δ(f) := Δ(1, α, α², ..., α^d–1).

We have to show that the quantity Δ(f) is well-defined, that is, independent of the choice of the root α of f(X). Let α = α₁, α₂, . . . .α_d be all the roots of f(X) and let the complex embedding σ_j of K map α to α_j. By Proposition 2.46, we have Δ(f) = (det E)², where . Computing the determinant of E gives , which implies that Δ(f) is independent of the permutations of the conjugates α₁, . . . , α_d of α. Notice that since α₁, . . . , α_d are all distinct, Δ(f) ≠ 0.

Let us deduce a useful formula for Δ(f). Write and take formal derivative to get , that is, . Therefore, , that is,

Equation 2.16

For arbitrary , the discriminant Δ(β₁, . . . , β_d) discriminates between the cases that β₁, . . . , β_d form a -basis of K and that they do not.

Lemma 2.12.

Let satisfy for i = 1, . . . , d and for . Then Δ(γ₁, . . . , γ_d) = (det T)²Δ(β₁, . . . , β_d), where T = (t_ij).

Proof

Let E₁ := (σ_j(β_i)) and E₂ := (σ_j(γ_i)). Now

is the ij-th entry of the matrix T E₁, that is, E₂ = T E₁. Hence

Δ(γ₁, . . . , γ_d) = (det E₂)² = (det T)²(det E₁)² = (det T)²Δ(β₁, . . . , β_d).

Corollary 2.19.

Let and be two -bases of K. Let and . Then , where T is the change-of-basis matrix from to .

Corollary 2.20.

form a -basis of K, if and only if Δ(β₁, . . . , β_d) ≠ 0.

Proof

Let , and . Since is a -basis of K, each β_i can be written (uniquely) as with . By Lemma 2.12, , where . We have seen that . Therefore, is a -basis of K.

Finally comes the desired characterization of .

Theorem 2.55.

For a number field K of degree d, the ring is a free -module of rank d.

Proof

Let form a -basis of K. We know that for some the elements r₁β₁, . . . , r_dβ_d are in and continue to constitute a -basis of K. So we may assume that the elements β₁, . . . , β_d are already in . Consider the set S of all -basis (β₁, . . . , β_d) of K consisting of elements from only. By Definition 2.104 and Corollary 2.20, for every . Choose such that is minimal in S.

Claim: is linearly independent over .

is a -basis of K, that is, linearly independent over and so trivially over too.

Claim: generates as a -module.

Assume not, that is, there exists such that α = a₁β₁ + · · · + a_dβ_d with some . Without loss of generality, we may assume that and write a₁ = a + r with and 0 < r < 1. Define γ₁ := α – aβ₁ = rβ₁ + a₂β₂ + · · · + a_dβ_d, γ₂ := β₂, . . . , γ_d := β_d. Clearly, . Furthermore, if

by Lemma 2.12, we have

Δ(γ₁, . . . , γ_d) = (det T)²Δ(β₁, . . . , β_d) = r²Δ(β₁, . . . , β_d).

Since r ≠ 0, Δ(γ₁, . . . , γ_d) ≠ 0, that is, (γ₁, . . . , γ_d) is again a -basis of K (Corollary 2.20), that is, . Finally since r < 1, we have |Δ(γ₁, . . . , γ_d)| < |Δ(β₁, . . . , β_d)|, a contradiction to the choice of (β₁, . . . , β_d). Thus every has to be a -linear combination of β₁, . . . , β_d. This completes the proof of the second claim and also of the theorem.

Definition 2.105.

Any -basis of is called an integral basis of K (or of ).

Corollary 2.21.

Every integral basis of K has the same discriminant (for a given K).

Proof

Let and be two integral bases of K. Let T be the -to- change-of-basis matrix. being an integral basis of K, all the entries of T are integers. Also from Corollary 2.19 we have and hence divides and has the same sign as . One can analogously show . Therefore, .

Definition 2.106.

Let be an integral basis of a number field K. The discriminant of K is defined to be the integer . By Corollary 2.21, Δ_K is well-defined, that is, independent of the choice of the integral basis of K.

Recall that K, as a vector space over , always possesses a -basis of the form 1, α, . . . , α^d–1. , as a -module, is free of rank d, but every number field K need not possess an integral basis of the form 1, α, . . . , α^d–1. Whenever it does, is called monogenic and an integral basis 1, α, . . . , α^d–1 of K is called a power integral basis. Clearly, if K has a power integral basis 1, α, . . . , α^d–1, then . But the converse is not true, that is, for with , 1, α, . . . , α^d–1 need not be an integral basis of K, even when is monogenic.

Example 2.31.

Consider the quadratic number field for some square-free integer D ≠ 0, 1. We consider the two cases (See Exercise 2.136):

Case 1: D ≡ 2, 3 (mod 4)

Here , that is, is a power integral basis of K. The minimal polynomial of is X² – D and the conjugates of are ±. Therefore, by Equation (2.16), we have

Case 2: D ≡ 1 (mod 4)

In this case, , that is, is a power integral basis of K. The minimal polynomial of is and the conjugates of are ±. Therefore, Equation (2.16) gives

2.13.3. Unique Factorization of Ideals

Ideals in a number ring possess very rich structures. We prove that number rings are Dedekind domains (Definition 2.99). A Dedekind domain (henceforth abbreviated as DD) need not be a UFD (or a PID). However, it is a ring in which ideals admit unique factorizations into products of prime ideals.

Let K be a number field of degree and its ring of integers. If is a homomorphism of rings and if is a prime ideal of B, then the contraction is a prime ideal of A. We say that lies above or over . If A ⊆ B and is the inclusion homomorphism, then . For a number field K, we consider the natural inclusion .

Lemma 2.13.

Let be a non-zero prime ideal of . Then lies above a unique non-zero prime ideal of . In particular, contains a (unique) rational prime.

Proof

Let . If , then both and 0 are prime ideals of that lie over the zero ideal of . Since , by Exercise 2.128(c), a contradiction.

Proposition 2.47.

is Noetherian.

Proof

Let constitute an integral basis of K, that is, , that is, the ring homomorphism mapping f(X₁, . . . , X_d) ↦ f(α₁, . . . , α_d) is surjective. By Hilbert’s basis theorem (Theorem 2.8), the polynomial ring is Noetherian and so , being the quotient of a Noetherian ring (by the isomorphism theorem), is Noetherian too (Example 2.29).

Theorem 2.56.

The ring of integers of a number field K is a Dedekind domain.

Proof

We have proved that is Noetherian (Proposition 2.47) and integrally closed (Proposition 2.44). It then suffices to show that each non-zero prime ideal of is maximal. By Lemma 2.13, lies over a non-zero prime ideal of . But is maximal in . Exercise 2.128(b) now completes the proof.

Now we derive the unique factorization theorem for ideals in a DD. It is going to be a long story. We refer the reader to Definition 2.92 to recall how the product of two ideals is defined.

Lemma 2.14.

Let A be a ring, , ideals of A, and a prime ideal of A such that . Then for some . In particular, if A is a DD and are non-zero prime ideals, then for some .

Proof

The proof is obvious for r = 1. So assume that r > 1. If for all i = 1, . . . , r, then for each i we can choose and see that , a contradiction to that is prime. The last statement of the lemma follows from the fact that in a DD every non-zero prime ideal is maximal.

We now generalize the concept of ideals.

Definition 2.107.

Let A be an integral domain and K := Q(A). An A-submodule of K is called a fractional ideal of A, if for some .

Every ideal of A is evidently a fractional ideal of A and hence is often called an integral ideal of A. Conversely, every fractional ideal of A contained in A is an integral ideal of A. The principal fractional ideal Ax is the A-submodule of K generated by . If A is a Noetherian domain, we have the following equivalent characterization of fractional ideals.

Lemma 2.15.

Let A be a Noetherian integral domain, K := Q(A) and . Then is a fractional ideal of A, if and only if is a finitely generated A-submodule of K.

Proof

[if] Let , where x_i = a_i/b_i, a_i, , b_i ≠ 0. Then .

[only if] Let be such that . Now ba is an (integral) ideal of A (easy check) and is finitely generated, since A is Noetherian. Let , . Then , where .

We define the product of two fractional ideals , of an integral domain A as we did for integral ideals:

It is easy to check that is again a fractional ideal of A. Let denote the set of non-zero fractional ideals of A. The product of fractional ideals defines a commutative and associative binary operation on . The ideal A acts as a (multiplicative) identity in . A fractional ideal of A is called invertible, if for some fractional ideal of A. We deduce shortly that if A is a DD, then every non-zero fractional ideal of A is invertible and, therefore, is a group under multiplication of fractional ideals.

Lemma 2.16.

Let A be a Noetherian domain and an (integral) ideal of A. For some , there exist prime ideals of A each containing such that .

Proof

Let S be the set of ideals of A for which the lemma does not hold. Assume that . Since A is Noetherian, S contains a maximal element, say . Clearly, is a proper non-prime ideal of A, that is, for some a, we have . The ideals and strictly contain and, therefore, by the maximality of are not in S, that is, there exist prime ideals each containing (and hence ) such that and prime ideals each containing (and hence ) such that . Moreover, , since , so that , a contradiction. Thus S must be empty.

Note that the condition “each containing ” was necessary in Lemma 2.16 in order to rule out the trivial possibility that for some .

Lemma 2.17.

Let A be a DD, K := Q(A) and a non-zero prime ideal of A. Define the set

Then we have:

is a fractional ideal of A.
.
. In particular, every non-zero prime ideal in a DD is invertible.

Proof

Clearly, is an A-submodule of K, and for , we have .
Since , we have . In order to prove the strict inclusion, we take any and consider the ideal . By Lemma 2.16, there exist prime ideals each containing (and hence non-zero) such that . We choose r to be minimal, so that does not contain the product of any r – 1 of . Now and hence by Lemma 2.14 for some i, say, i = r. Choose any . Since , we have . On the other hand, and , so that , that is, .
By the definition of , it follows that is contained in and hence an integral ideal of A. Since , it follows that . Since is a maximal ideal, we then have or . Assume that . We claim that this assumption implies that , a contradiction to Part (2). So we must have . For proving the claim, let and choose . Then we have and, therefore, and so on. For each , define the ideal . Then is an ascending chain of ideals in A. Since A is Noetherian, the chain must be stationary, that is, for some we have , that is, , that is, with . Since A is an integral domain and a ≠ 0, we see that b is integral over A. Since A is integrally closed, . Therefore, , as claimed.

Theorem 2.57.

Every non-zero ideal in a DD A can be represented as a product of prime ideals of A. Moreover, such a factorization of is unique up to permutations of the factors.

Proof

If , there is nothing to prove. So let be a proper ideal of A. We first show that if contains a product of non-zero prime ideals, then is a product of prime ideals. By Lemma 2.16, we have prime ideals , , of A each containing , such that . Let us choose r to be minimal and proceed by induction on r. If r = 1, is already prime. So take r > 1 and assume that if an ideal of A contains a product of r – 1 or less non-zero prime ideals of A, then is a product of prime ideals. Let be a maximal ideal containing . We then have and by Lemma 2.14 for some i, say, i = r. Now, consider the fractional ideal . Then and so is an integral ideal of A. Furthermore , that is, contains a product of r – 1 non-zero prime ideals. By the induction hypothesis, is a product of prime ideals, that is, . But then is also a product of prime ideals.

In order to prove the uniqueness of this product, let with prime ideals and . Now and by Lemma 2.14 for some , say, j = 1. Then . Proceeding in this way shows the desired uniqueness.

In the factorization of a non-zero ideal of a DD, we do not rule out the possibility of repeated occurrences of factors. Taking this into account shows that every non-zero ideal in a DD A admits a unique factorization

with distinct non-zero prime ideals and with exponents . Here uniqueness is up to permutations of the indexes 1, . . . , r. This factorization can be extended to fractional ideals, but this time we have to allow non-positive exponents. First note that for integers e₁, . . . , e_r and non-zero prime ideals of A the product is well-defined and is a fractional ideal of . The converse is proved in the following corollary.

Corollary 2.22.

Every non-zero fractional ideal of a DD A admits a unique factorization of the form with non-zero prime ideals of A and with exponents . Moreover for such a fractional ideal we have .

Proof

By definition, there exists such that . But then is an integral ideal of A. We write and with f_i, . Since each non-zero prime ideal is invertible (Lemma 2.17(3)), it follows that . This proves the existence of a factorization of . The proof for the uniqueness is left to the reader as an easy exercise. The last assertion follows from a repeated use of Lemma 2.17(3).

The fractional ideal in Corollary 2.22 is denoted by . We have . One can easily verify that defined as above is equal to the set

In fact, one can use the last equality as the definition for .

To sum up, every non-zero fractional ideal of a DD A is invertible and the set of all non-zero fractional ideals of A is a group. The unit ideal A acts as the identity in .

As in every group, we have the cancellation law(s) in .

Corollary 2.23.

Let A be a DD and , , fractional ideals of A. If and , then .

In view of unique factorization of ideals in A, we can speak of the divisibility of integral ideals in A. Let and be two integral ideals of A. We say that divides and write , if for some integral ideal of A. We now show that the condition is equivalent to the condition . Thus for ideals in a DD the term divides is synonymous with contains.

Corollary 2.24.

Let and be integral ideals of a DD A. Then if and only if .

Proof

[if] If , we have , that is, is an integral ideal of A.

Also .

[only if] If for some integral ideal , we have .

Corollary 2.25.

Let and with e_i, be the prime decompositions of two non-zero integral ideals of a DD A. Then if and only if e_i ≤ f_i for all i = 1, . . . , r.

Proof

[if] We have , where is an integral ideal of A.

[only if] Let for some integral ideal of A. Clearly, and we can write the prime decomposition with l_i ≥ 0. We have . By unique factorization, we have f₁ = e₁ + l₁, . . . , f_r = e_r + l_r and l_r+1 = · · · = l_r+s = 0.

As we pass from to , the notion of unique factorization passes from the element level to the ideal level. If a DD is already a PID, these two concepts are equivalent. (Non-zero prime ideals in a PID are generated by prime elements.) Though every UFD need not be a PID, we have the following result for a DD.

Proposition 2.48.

A Dedekind domain A is a UFD, if and only if A is a PID.

Proof

[if] Every PID is a UFD (Theorem 2.11).

[only if] Let A be a UFD. In order to show that A is a PID, it suffices (in view of Theorem 2.57) to show that every non-zero prime ideal of A is a principal ideal. Choose any non-zero . Then . Now a is a non-unit in A (since otherwise we would have ) and A is assumed to be a UFD. Thus we can write a = uq₁ · · · q_r for , and for prime elements q_i in A. Clearly, each 〈q_i〉 is a non-zero prime ideal of A and 〈a〉 = 〈q₁〉 · · · 〈q_r〉. Therefore, and hence by Lemma 2.14 for some .

In the rest of this section, we abbreviate as , if K is implicit in the context.

2.13.4. Norms of Ideals

We have seen that the ring is a free -module of rank d. The same result holds for every non-zero ideal of . Let β₁, . . . , β_d constitute an integral basis of K.

One can choose rational integers a_ij with each a_ii positive such that

Equation 2.17

constitute a -basis of . Moreover, the discriminant Δ(γ₁, . . . , γ_d) is independent of the choice of an integral basis γ₁, . . . , γ_d of and is called the discriminant of , denoted . It follows that can be generated as an ideal (that is, as an -module) by at most d elements. We omit the proof of the following tighter result.

Proposition 2.49.

Every (integral) ideal in a DD A is generated by (at most) two elements. More precisely, for a proper non-zero ideal of A and for any there exists with .

Definition 2.108.

The norm of a non-zero ideal of is defined as the cardinality of the quotient ring . It is customary to define the norm of the zero ideal as zero.

Using the integers a_ij of Equations (2.17), we can write

Equation 2.18

Corollary 2.26.

For every non-zero ideal of , the quotient ring is a finite ring. In particular, if is a non-zero prime (hence maximal) ideal of , then is a finite field.

It is tempting to define the norm of an element to be the norm of the principal ideal . It turns out that this new definition is (almost) the same as the old definition of N(α). More precisely:

Proposition 2.50.

For any element , we have N(〈α〉) = |N(α)|.

Proof

The result is obvious for α = 0. So assume that α ≠ 0 and call . Let β₁, . . . , β_d be an integral basis of . It is an easy check that αβ₁, . . . , αβ_d is an integral basis of . Let σ₁, . . . , σ_d be the complex embeddings of K. Then is the square of the determinant of the matrix

It follows that . Equation (2.18) now completes the proof.

Corollary 2.27.

For any , we have .

Like the norm of elements, the norm of ideals is also multiplicative. We omit the (not-so-difficult) proof here.

Proposition 2.51.

Let and be ideals in . Then, .

The following immediate corollary often comes handy.

Corollary 2.28.

Let and be non-zero ideals of . If is the factorization of , then . In particular, if , then (in ).

2.13.5. Rational Primes in Number Rings

The behaviour of rational primes in number rings is an interesting topic of study in algebraic number theory. Let K be a number field of degree d and . Consider a rational prime p and denote by 〈p〉 the ideal generated by p in . We use the symbol to denote the (prime) ideal of generated by p. Further let

Equation 2.19

be the prime factorization of 〈p〉 with , with pairwise distinct non-zero prime ideals of and with . For each i, we have , that is, , that is, (Lemma 2.13), that is, lies over . Conversely if is an ideal of lying over , then , that is, , that is, , that is, for some i. Thus, are precisely all the prime ideals of that lie over .

By Corollary 2.27, N(〈p〉) = p^d. By Corollary 2.28, each divides p^d and is again a power p^di of p.

Definition 2.109.

We define the ramification index of over p (or ) as . This is the largest such that divides (that is, contains) 〈p〉. The integer d_i (where is called the inertial degree of over p.

By the multiplicative property of norms, we have

Definition 2.110.

If r = d, so that each e_i = d_i = 1, we say that the prime p (or )splits completely in . On the other extreme, if r = 1, e₁ = 1, d₁ = d, then 〈p〉 is prime in and we say that p is inert in . Finally, if e_i > 1 for some i, we say that the prime p ramifies in . If r = 1 and e₁ = d (so that d₁ = 1), then the prime p is said to be totally ramified in .

The following important result is due to Dedekind. Its proof is long and complicated and is omitted here.

Theorem 2.58.

A rational prime p ramifies in , if and only if p divides the discriminant Δ_K. In particular, there are only finitely many rational primes that ramify in .

Though this is not the case in general, let us assume that the ring is monogenic (that is, for some ) and try to compute the explicit factorization (Equality (2.19)) of 〈p〉 in . In this case, and let be the minimal polynomial of α. We then have .

Let us agree to write the canonical image of any polynomial in as . We write the factorization of as

with and with pairwise distinct irreducible polynomials . If , then . For each i = 1, . . . , r choose whose reduction modulo p is . Define the ideals

of . Since , we have

and

Therefore, are non-zero prime ideals of with . Thus . On the other hand, , since f(α) = 0 and . Thus we must have , that is, we have obtained the desired factorization of 〈p〉.

Let us now concentrate on an example of this explicit factorization.

Example 2.32.

Let D ≠ 0, 1 be a square-free integer congruent to 2 or 3 modulo 4. If , then is monogenic. We take an odd rational prime p and compute the factorization of 〈p〉 in . We have to factorize modulo p the minimal polynomial f(X) := X² – D. We consider three cases separately based on the value of the Legendre symbol .

Case 1:

In this case, p|D, that is, . Then , where . Thus p (totally) ramifies in .

Case 2:

Since p is assumed to be an odd prime, the two square roots of D modulo p are distinct. Let δ be an integer with δ² ≡ D (mod p). Then . In this case, , where and . Thus p splits (completely) in .

Case 3:

The polynomial is irreducible in and hence remains prime in , that is, p is inert in .

Thus the quadratic residuosity of D modulo p dictates the behaviour of p in .

Let us finally look at the fate of the even prime 2 in . If D is even, then and if D is odd, then . In each case, 2 ramifies in .

Recall from Example 2.31 that Δ_K = 4D. Thus we have a confirmation of the fact that a rational prime p ramifies in if and only if p|Δ_K.

One can similarly study the behaviour of rational primes in

where D ≡ 1 (mod 4) is a square-free integer ≠ 0, 1.

2.13.6. Units in a Number Ring

There are just two units in , namely ±1. In a general number ring, there may be many more units. For example, all the units in the ring of Gaussian integers are ±1, ±i. There may even be an infinite number of units in a number ring. It can be shown that , , are all the units of . (Note that for all n ≠ 0 the absolute values of are different from 1.) is a PID. So we can think of factorizations in as element-wise factorizations. To start with, we fix a set of pairwise non-associate prime elements of . Every non-zero element of admits a factorization for prime “representatives” p_i and for a unit u of the form . Thus, in order to complete the picture of factorization, we need machinery to handle the units in a number ring.

Let K be a number field of degree d and signature (r₁, r₂). We have d = r₁ + 2r₂. The set of units in is denoted by . We know that is an (Abelian) group under (complex) multiplication. Our basic aim now is to reveal the structure of the group .

Every Abelian group is a -module and, if finitely generated and not free, contains torsion elements, that is, (non-identity) elements of finite order > 1.^[19] always contains the element –1 of order 2. The torsion subgroup of is denoted by . We have , where is a torsion-free group. It turns out that ℜ is a finite group (and hence cyclic) and that is finitely generated and hence free, that is, for some . From Dirichlet’s unit theorem (which we do not prove), it follows that ρ = r₁ + r₂ – 1. Thus, has a -basis consisting of ρ elements, say ξ₁, . . . , ξ_ρ, and every unit of can be uniquely expressed as , where ω is a root of unity and . A set of generators of is called a set of fundamental units.

^[19] Every finitely generated torsion-free module over a PID is free.

Example 2.33.

Let D ≠ 0, 1 be a square-free integer, and . If D < 0, the signature of K is (0, 1) and the value of ρ for is 0 + 1 – 1 = 0, that is, , that is, is finite in this case.

Now, suppose D > 0. K is a real field in this case, so that . Also the signature of K is (2, 0), that is, ρ = 2 + 0 – 1 = 1. This means that contains an infinite number of units. Let ξ be a fundamental unit of . Then, every unit of is of the form ±ξⁿ, .

Exercise Set 2.13

2.126	If A ⊆ B and B ⊆ C are integral extensions of rings, show that A ⊆ C is also an integral extension. Let A ⊆ B be an extension of rings. Show that the integral closure of A in B is integrally closed in B. Let A ⊆ B be an integral extension of rings, an ideal of B and . (Note that is an ideal of A. If is prime in B, then is prime in A. See Proposition 2.10.) Show that is integral over .
2.127	Let A ⊆ B be an extension of integral domains, a finitely generated non-zero ideal of A and . If , show that γ is integral over A. [H]
2.128	Let A ⊆ B be an integral extension of integral domains. Show that A is a field if and only if B is a field. Let A ⊆ B be an integral extension of rings, a prime ideal of B and . Show that is maximal if and only if is maximal. [H] Let A, B, and be as in (b). Further let be another prime ideal of B with . Show that if , then . [H]
2.129	Let A be a ring and S a multiplicatively closed subset of A. Show that: If , then S^–1A is the zero ring. If S′ := S \ {1} is non-empty and closed under multiplication, then S′^–1A ≅ S^–1A. If A is Noetherian, then S^–1A is also Noetherian.
2.130	Let A ⊆ B be a ring extension and C the integral closure of A in B. Show that for any multiplicative subset S of A (and hence of B and C) the integral closure of S^–1A in S^–1B is S^–1C. In particular, if A is integrally closed in B, then so is S^–1A in S^–1B.
2.131	Recall that an integrally closed integral domain is called a normal domain (ND). Show that every UFD is a normal domain. Let D be a square-free integer ≠ 0, 1. Show that , is normal if and only if D ≡ 2, 3 (mod 4). (Remark: The reader should note the following important implications: That is, a Euclidean domain is a PID, a PID is a UFD and a UFD is a normal domain. Neither of the reverse implications is true. For example, the ring of integers of is known to be a PID but not a Euclidean domain. The ring K[X₁, . . . , X_n], n ≥ 2, of multivariate polynomials over a field K is a UFD, but not a PID, since the ideal 〈X₁, . . . , X_n〉 is not principal. Finally, is a normal domain (by Exercise 2.136 below), but not a UFD, since are two different factorizations of 6 into irreducible elements.)
2.132	A (non-zero) ring A with a unique maximal ideal m is called a local ring. In that case, the field A/m is called the residue field of A. Let A be ring and a prime ideal of A. Show that the localization is a local ring with the unique maximal ideal generated by elements , and the residue field is canonically isomorphic to the quotient field of the integral domain under the map .
2.133	A ring A is called a discrete valuation ring (DVR) or a discrete valuation domain (DVD), if A is a local principal ideal domain. Let A be a DVR with maximal ideal m = 〈p〉. Prove the following assertions: A is a UFD. The only primes in A are the associates of p. [H] Every non-zero element of A can be written as up^α, where u is a unit of A and . Every non-zero ideal of A is of the form 〈p^α〉 for some . A has only one non-zero prime ideal (namely, m). (Remark: The prime p of A is called a uniformizing parameter or a uniformizer for A and is unique up to multiplication by units. The map taking up^α ↦ α is called a discrete valuation of A and can be naturally extended to a group homomorphism by defining ν(a/b) := ν(a)–ν(b), where a, , b ≠ 0 and K = Q(A) is the quotient field of A. It is often convenient to define ν(0) := +∞. It follows that and .)
2.134	Let A be a local Noetherian integral domain which is not a field. Assume further that the maximal ideal m ≠ 0 of A is the only non-zero prime ideal of A. Show that A is a DVR (that is, a PID) if and only if A is integrally closed. Let A be a Noetherian integral domain which is not a field. Prove that A is a Dedekind domain if and only if is a DVR for every non-zero prime ideal of A.
2.135	Show that the only units of are ±1 and ±i. Show that the primes of are associates to the following: a prime integer ≡ 3 (mod 4), a + ib, a, , with a² + b² equal to 2 or a prime integer ≡ 1 (mod 4).
2.136	Show that every quadratic number field K can be represented as for a square-free integer D ≠ 0, 1. Let for some square-free integer D ≠ 0, 1. Show that: (In particular, the ring of integers of is the ring of Gaussian integers.)
2.137	Let A be a Dedekind domain. Let q₁ and q₂ be two distinct non-zero prime ideals of A. Show that for any e₁, we have . [H] Let be the prime factorization of a non-zero ideal of A with pairwise distinct primes q_i and . Show that . [H]
2.138	Let A be a Dedekind domain and a non-zero (integral) ideal of A. Show that: There exists a non-zero (integral) ideal of A such that is a principal ideal. [H] The number of ideals of A containing is finite. Every ideal of is principal.
2.139	Let and , e_i, , be the prime decompositions of two non-zero ideals , of a DD A. Define the gcd and lcm of and as Show that and lcm. Conclude that . (Note that if A is a general ring, we only have .)
2.140	Let K be a number field and . Let be an ideal of . Show that . In particular, every non-zero ideal of contains a non-zero integer. [H] Let be a non-zero prime ideal of . Prove that for some , where p is the unique rational prime contained in (Lemma 2.13).
2.141	Let K be a number field, , , and . Show that: , if and only if N(α) = ±1. , if and only if f(0) = ±1, where is the minimal polynomial of α over . , if and only if \|σ(α)\| = 1 for every complex embedding σ of K.
2.142	Let K be a number field. We say that K is norm-Euclidean, if for every α, , β ≠ 0, there exist q, such that α = qβ + r and \| N(r)\| < \| N(β)\|. Conclude that if K is norm-Euclidean, then is a Euclidean domain with the Euclidean degree function ν(α) := \| N(α)\|. (The converse of this is not true. For example, it is known that is not norm-Euclidean, but is a Euclidean domain.) Prove the following equivalent characterization of a norm-Euclidean number field: K is norm-Euclidean if and only if for every there exists such that \| N(α – β)\| < 1. Show that the following number fields are norm-Euclidean: , , , and . Show that is not norm-Euclidean. [H]
2.143	In this exercise, one derives that the only (rational) integer solutions of Bachet’s equation Equation 2.20 are x = 3, y = ±5. Show that Equation (2.20) has no solutions with x or y even. [H] Let (x, y) be a solution of Equation (2.20) with both x and y odd. Then x³ admits a factorization in as . Let . Show that and that is a UFD. Also the only units of are ±1. Show that gcd. [H] Because of unique factorization one can write for c, . Expand the cube and equate the real and imaginary parts to conclude that we must have y = ±5, so that x = 3.

**2.14. p-adic Numbers

Let us now study a different area of algebraic number theory, introduced by Kurt Hensel in an attempt to apply power series expansions in connection with numbers. While trying to explain the properties of (rational) integers mathematicians started embedding in bigger and bigger structures, richer and richer in properties. came in a natural attempt to form quotients, and for some time people believed that that is all about reality. Pythagoras was seemingly the first to locate and prove the irrationality of a number, namely, . It took humankind centuries for completing the picture of the real line. One possibility is to look as the completion of . A sequence a_n, , of rational numbers is called a Cauchy sequence if for every real ε > 0, there exists such that |a_m – a_n| ≤ ε for all m, , m, n ≥ N. Every Cauchy sequence should converge to a limit and it is (and not ) where this happens. Seeing convergence of Cauchy sequences, people were not wholeheartedly happy, because the real polynomial X² + 1 did not have—it continues not to have—roots in . So the next question that arose was that of algebraic closure. was invented and turned out to be a nice field which is both algebraically closed and complete.

Throughout the above business, we were led by the conventional notion of distance between points (that is, between numbers)—the so-called Archimedean distance or the absolute value. For every rational prime p, there exists a p-adic distance which leads to a ring strictly bigger than and containing . This is the ring of p-adic integers. The quotient field of is the field of p-adic numbers. is complete in the sense of convergence of Cauchy sequences (under the p-adic distance), but is not algebraically closed. We know anyway that a (unique) algebraic closure of exists. We have , that is, it was necessary and sufficient to add the imaginary quantity i to to get an algebraically closed field. Unfortunately in the case of the p-adic distance the closure is of infinite extension degree over . In addition, is not complete. An attempt to make complete gives an even bigger field Ω_p and the story stops here, Ω_p being both algebraically closed and complete. But Ω_p is already a pretty huge field and very little is known about it.

In the rest of this section, we, without specific mention, denote by p an arbitrary rational prime.

2.14.1. The Arithmetic of p-adic Numbers

There are various ways in which p-adic integers can be defined. A simple way is to use infinite sequences.

Definition 2.111.

A p-adic integer is defined as an infinite sequence of elements with the property that a_n+1 ≡ a_n (mod pⁿ) for every . Each a_n, being an element of , can be represented as a (rational) integer unique modulo pⁿ. Thus, if b_n, , define another sequence of integers with b_n ≡ a_n (mod pⁿ) for every n, the p-adic integers (a_n) and (b_n) are treated the same. In particular, if 0 ≤ b_n < pⁿ for every n, then (b_n) is called the canonical representation of (a_n). The set of all p-adic integers is denoted by .^[20] A sequence (a_n) of integers with a_n+1 ≡ a_n (mod pⁿ) for every n is called a p-coherent sequence.

^[20] Well! We are now in a mess of notations. We have for every . In particular, for we have which is a field that we planned to denote also by . It is superfluous to have two notations for the same thing. Many authors, therefore, prefer to avoid the hat and call as . For them, our is and/or written explicitly. Let us stick to our old conventions and use hats to remove ambiguities.

See Exercise 2.144 for another way of defining p-adic integers. We now show that is a ring. Before doing that, we mention that the ring is canonically embedded in by the injective map , a ↦ (a).

Definition 2.112.

Let (a_n) and (b_n) be two p-adic integers. Define:

(a_n) + (b_n)	:=	(a_n + b_n).
(a_n) · (b_n)	:=	(a_n · b_n).

One can easily check that these operations are well-defined, that is, independent of the choice of the representatives of a_n and b_n. It also follows easily that these operations make a ring with additive identity and with multiplicative identity . The additive inverse of (a_n) is –(a_n) = (–a_n). Moreover is an injective ring homomorphism . In view of this, one often identifies the rational integer a with the p-adic integer . We will also do so, provided that we do not expect to face a danger of confusion. Also note that for the l-fold sum l(a_n) is the same as (l)(a_n) = (la_n). Thus in this context the two interpretations of l remain perfectly consistent.

It turns out that is an integral domain. In order to see why, let us focus our attention on the units of . Let us plan to denote (the multiplicative group of units of ) by U_p. The next result characterizes elements of U_p.

Proposition 2.52.

For , the following conditions are equivalent:

p  a_n for all .
p  a₁.

Proof

[(a)⇒(b)] Let (a_n)(b_n) = (a_nb_n) = 1 = (1) for some . Then for every we have a_nb_n ≡ 1 (mod pⁿ), that is, a_n is invertible modulo pⁿ and hence modulo p as well, that is, p  a_n.

[(b)⇒(c)] Obvious.

[(c)⇒(a)] Let us construct a p-coherent sequence b_n, , of (rational) integers with a_nb_n ≡ 1 (mod pⁿ). This (b_n) would be the desired inverse of (a_n) in . Since p  a₁ and a_n ≡ a₁ (mod p), it follows that p  a_n as well and, therefore, the congruence a_nx ≡ 1 (mod pⁿ) has a unique solution modulo pⁿ, namely (mod pⁿ).

We also have a_n+1b_n+1 ≡ 1 (mod pⁿ), that is, a_nb_n+1 ≡ 1 (mod pⁿ), that is, .

Proposition 2.53.

Every can be written uniquely as x = p^ry for some and for some .

Proof

If p  a₁, take r := 0 and y := x. So assume that p|a₁. Choose such that [a_n]_pⁿ = [0]_pⁿ for 1 ≤ n ≤ r, whereas [a_r+1]_p^r+1 ≠ [0]_p^r+1. Such an r exists, since x ≠ 0 by hypothesis. For , we have a_r+n ≡ a_r ≡ 0 (mod p^r), that is, p^r|a_r+n, whereas a_r+n ≡ a_r+1 ≢ 0 (mod p^r+1), that is, p^r+1  a_r+n, that is, v_p(a_r+n) = r. Define b_n := a_r+n/p^r. Since a_r+n+1 ≡ a_r+n (mod p^r+n), division by p^r gives b_n+1 ≡ b_n (mod pⁿ), that is, . Moreover, p^rb_n = a_r+n ≡ a_n (mod pⁿ), that is, x = p^ry. Finally, since p  b₁, we have . This establishes the existence of a factorization x = p^ry. The uniqueness of this factorization is left to the reader as an easy exercise.

Proposition 2.54.

is an integral domain.

Proof

Let x₁ and x₂ be non-zero elements of . By Proposition 2.53, we can write x₁ = p^r₁ y₁ and x₂ = p^r₂ y₂ with r₁, and y₁, . Then (a_n) := x₁x₂ = p^r₁+r₂ y₁y₂. Now and hence no b_n is divisible by p. Therefore, a_r₁+r₂+1 = p^r₁+r₂ b_r₁+r₂+1 ≢ 0 (mod p^r₁+r₂+1), that is, (a_n) = x₁x₂ ≠ 0.

Definition 2.113.

The quotient field of is called the field of p-adic numbers.

Proposition 2.55.

Every non-zero can be expressed uniquely as x = p^ry with and .

Proof

One can write x = a/b for some a, . Then a = p^sc and b = p^td for some s, , c, and so x = p^s–t(c/d) with . The proof for the uniqueness is left to the reader.

The canonical inclusion naturally extends to the canonical inclusion . We can identify with the rational a/b and say that is contained in . Being a field of characteristic 0, contains an isomorphic copy of . The map gives this isomorphism explicitly. Note that the ring is strictly bigger than and the field is strictly bigger than the field (Exercise 2.147).

2.14.2. The p-adic Valuation

Proposition 2.55 leads to the notion of p-adic distance between pairs of points in . Let us start with some formal definitions.

Definition 2.114.

A metric on a set S is a map such that for every x, y, we have:

Non-negative d(x, y) ≥ 0.
Non-degeneracy d(x, y) = 0 if and only if x = y.
Symmetry d(x, y) = d(y, x).
Triangle inequality d(x, z) ≤ d(x, y) + d(y, z).

A set S together with a metric d is called a metric space (with metric d).

Definition 2.115.

A norm on a field K is a map such that for all x, we have:

Non-negative ‖x‖ ≥ 0.
Non-degeneracy ‖x‖ = 0 if and only if x = 0.
Multiplicativity ‖xy‖ = ‖x‖ ‖y‖.
Triangle inequality ‖x + y‖ ≤ ‖x‖ + ‖y‖.

It is an easy check that for a norm ‖ ‖ on K the function , d(x, y) := ‖x – y‖, defines a metric on K.

A norm ‖ ‖ on a field K is called non-Archimedean (or a finite valuation), if ‖x + y‖ ≤ max(‖x‖, ‖y‖) for all x, (a condition stronger than the triangle inequality). A norm which is not non-Archimedean is called Archimedean (or an infinite valuation).

Example 2.34.

Setting defines a norm on any field K. This norm is called the trivial norm on K.
The absolute value | | is an Archimedean norm on (or ). It is customary to denote this norm as | |_∞. This norm induces the usual metric topology on (or ) which is at the heart of real analysis. In p-adic analysis, one investigates under the p-adic norms that we define now.

Definition 2.116.

The p-adic norm on is defined as:

Theorem 2.59.

The p-adic norm | |_p is a non-Archimedean norm on .

Proof

Non-negative-ness, non-degeneracy and multiplicativity of | |_p are immediate. For proving the triangle inequality, it is sufficient to prove the non-Archimedean condition. Take x, . If x = 0 or y = 0 or x + y = 0, we clearly have |x + y|_p ≤ max(|x|_p, |y|_p). So assume that each of x, y and x + y is non-zero. Write x = p^ru and y = p^sv with r, and u, . Without loss of generality, we may assume that r ≥ s. Then, x + y = p^sz, where . Since x + y ≠ 0, we have z ≠ 0; so we can write z = p^tw for some and . But then |x + y|_p = p^–(s+t) ≤ p^–s = max(p^–r, p^–s) = max(|x|_p, |y|_p).

Definition 2.117.

Two metrics d₁ and d₂ on a metric space S are called equivalent if a sequence (x_n) from S is Cauchy with respect to d₁ if and only if it is Cauchy with respect to d₂. Two norms on a field are called equivalent if they induce equivalent metrics.

For every , the field is canonically embedded in and thus we have a notion of a p-adic distance on . We also have the usual Archimedean distance | |_∞ on . We now state an interesting result without a proof, which asserts that any distance on must be essentially the same as either the usual Archimedean distance or one of the p-adic distances.

Theorem 2.60. Ostrowski’s theorem

Every non-trivial norm on is equivalent to | |_p for some .

The notions of sequences and series and their convergences can be readily extended to under the norm | |_p. Since the p-adic distance assumes only the discrete values p^r, , it is often customary to restrict ourselves only to these values while talking about the convergence criteria of sequences and series, that is, instead of an infinitesimally small real ε > 0 one can talk about an arbitrarily large with p^–M ≤ ε.

Definition 2.118.

Let x₁, x₂, . . . be a sequence of elements of . We say that this sequence converges to a limit , if given there exists such that |x_n – x|_p ≤ p^–M for all n ≥ N. We write this as x = lim x_n or as x_n → x.

Consider the partial sums for each . If there exists with s_n → s, we say that the sum converges to s and write .

A sequence x₁, x₂, . . . of elements of is said to be a Cauchy sequence if for every , there exists an such that |x_m – x_n|_p ≤ p^–M for all m, n ≥ N.

Definition 2.119.

A field K is called complete under a norm ‖ ‖ if every sequence of elements of K, which is Cauchy under ‖ ‖, converges to an element in K.

For example, is complete under | |_∞. We shortly demonstrate that is complete under | |_p.

Consider a field K not (necessarily) complete under a norm ‖ ‖. Let C denote the set of all Cauchy sequences from K. Define addition and multiplication in C as (a_n) + (b_n) := (a_n + b_n) and (a_n)(b_n) := (a_nb_n). Under these operations C becomes a commutative ring with identity having a maximal ideal . The field is called the completion of K with respect to the norm ‖ ‖. K is canonically embedded in L via the map . The norm ‖ ‖ on K extends to elements of L as lim_n→∞ ‖a_n‖. L is a complete field under this extended norm. In fact, it is the smallest field containing K and complete under ‖ ‖.

is the completion of with respect to the Archimedean norm | |_∞. On the other hand, turns out to be the completion of with respect to the p-adic norm | |_p. Before proving this let us first prove that itself is a complete field under the p-adic norm. Let us start with a lemma.

Lemma 2.18.

A sequence (a_n) of p-adic numbers is a Cauchy sequence if and only if the sequence (a_n+1 – a_n) converges to 0.

Proof

[if] Take any . Since a_n+1–a_n → 0 by hypothesis, there exists such that |a_n+1 –a_n|_p ≤ p^–M for all n ≥ N. But then for all m, n ≥ N with m = n+k, , we have .

Thus (a_n) is a Cauchy sequence.

[only if] Take any . Since (a_n) is a Cauchy sequence by hypothesis, there exists such that |a_m – a_n|_p ≤ p^–M for all m, n ≥ N. In particular, |a_n+1 – a_n|_p ≤ p^–M for all n ≥ N, that is, a_n+1 – a_n → 0.

Theorem 2.61.

The field is complete with respect to | |_p.

Proof

Let (a_n) be a Cauchy sequence in . By Lemma 2.18, a_n+1 – a_n → 0. Therefore, there exists such that |a_n+1 – a_n|_p ≤ 1 for all n ≥ N. For n = N + k, , we have

\|a_n\|_p	=	\|a_{N + k}\|_p
	=	\|(a_{N + k} – a_{N + k –1}) + · · · + (a_{N + 1} – a_N) + a_N\|_p
	≤	max(\|a_{N + k} – a_{N + k – 1}\|_p,. . ., \|a_{N + 1} – a_N\|_p, \|a_N\|_p)
	≤	max(1, \|a_N\|_p).

It then follows that |a_n|_p ≤ p^–m for all , where satisfies p^–m = max(1, |a₁|_p, . . . , |a_N |_p). If m ≥ 0, then each (Exercise 2.148). Otherwise consider the sequence (p^–ma_n) which is clearly Cauchy and in which each , since |p^–ma_n|_p ≤ p^mp^–m = 1. Thus, without loss of generality, we may assume that the given sequence (a_n) itself is one of p-adic integers.

Let a_n = a_n,0+a_n,1p+a_n,2p²+· · · be the p-adic expansion of a_n (Exercise 2.145). Since (a_n) is Cauchy, for every there exists such that |a_m – a_n|_p ≤ p^–(M+1) for all m, n ≥ N_M: that is, a_{n, i} = a_{m, i} for 0 ≤ i ≤ M, m, n ≥ N_M. Define x_M := a_{n, M} for any n ≥ N_M and . It then follows that a_n → x.

Theorem 2.62.

is the completion of with respect to the norm | |_p.

Proof

Let C denote the ring of Cauchy sequences from (under the p-adic norm), the maximal ideal of C consisting of sequences that converge to 0, and . We now show that .

If has the p-adic expansion a = a_–rp^–r +· · ·+a_–1p^–1 +a₀+a₁p+a₂p² + · · · (Exercise 2.145), then α_n := a_–rp^–r + · · · + a_–1p^–1 + a₀ + a₁p + · · · + a_npⁿ, , define a sequence of elements of . We have |α_n – a|_p ≤ p^–(n+1), that is, α_n → a. Moreover, the sequence (α_n) of rational numbers is Cauchy with respect to | |_p, since for every we have |α_m – α_n|_p ≤ p^–(M+1) for all m, n≥ M. Thus , , is a well-defined field homomorphism. Being a field homomorphism is injective.

What remains is to show that the map is surjective. Take any . Since (β_n) is a Cauchy sequence, by Theorem 2.61 it converges to a point . We construct the sequence (α_n) corresponding to a as described in the last paragraph. Then α_n → a as well and hence using the triangle inequality (or the non-Archimedean condition) we have α_n – β_n = (α_n – a) – (β_n – a) → 0, that is, , that is, .

Corollary 2.29.

The p-adic series (with ) converges if and only if |a_n|_p → 0.

Proof

The only if part is obvious. For the if part, take a sequence (a_n) of p-adic numbers with |a_n|_p → 0. Define . Since a_n+1 = s_n+1 – s_n → 0 by hypothesis, Lemma 2.18 guarantees that (s_n) is a Cauchy sequence, that is, (s_n) converges in .

This is quite unlike the Archimedean norm | |_∞. For example, with respect to this norm , whereas the series diverges.

2.14.3. Hensel’s Lemma

Let us conclude our short study of p-adic methods by proving an important theorem due to Hensel. This theorem talks about the solvability of polynomial equations f(X) = 0 for . Before proceeding further, let us introduce a notation. Recall that every has a unique p-adic expansion of the form a = a₀ + a₁p + a₂p² + · · · with 0 ≤ a_n < p (Exercises 2.144 and 2.145). If a₀ = a₁ = · · · = a_n–1 = 0, then a = a_npⁿ + a_n+1pⁿ⁺¹ + a_{n +2}pⁿ⁺² + · · · = pⁿb, where . Thus pⁿ|a in . We denote this by saying that a ≡ 0 (mod pⁿ). Notice that a ≡ 0 (mod pⁿ) if and only if |a|_p ≤ p^–n. We write a ≡ b (mod pⁿ) for a, , if a – b ≡ 0 (mod pⁿ). Since pⁿ can be viewed as the element of , this congruence notation conforms to that for a general PID. ( is a PID by Exercise 2.148.)

Since by our assumption any ring A comes with identity (that we denote by 1 = 1_A), it makes sense to talk for every about an element n = n_A in A, which is the n-fold sum of 1. More precisely:

Given any , one can define the formal derivative of f as . Properties of formal derivatives of polynomials are covered in Exercise 2.61.

Theorem 2.63. Hensel’s lemma

Let . Suppose that there exist and satisfying:

|f(α₀|_p ≤^{–(2M + 1)} (that is, α₀ is a solution of f(x)≡ 0 (mod p^2M+1)), and
|f′(α₀)|_p = p^–M (this is, f′ (α₀) ≢ 0 (mod p^M+1)).

Then there exists a unique such that f(α) = 0 and |α – α₀|_p ≤ p^–(M+1) (that is, α ≡ α₀ (mod p^M+1)).

Proof

Let us inductively construct a sequence α₀, α₁, α₂, · · · of p-adic integers with the properties that |f(α_n)|_p ≤ p^–(2M+n+1) and |f′(α_n)|_p = p^–M for every . The given α₀ provides the starting point (induction basis). For the inductive step, assume that n ≥ 1 and that α₀, α₁, . . . , α_n–1 have been constructed with the desired properties. we now explain how to construct α_n from α_n–1. Put

α_n := α_n–1 + k_np^{M + n}

for some

We want to find a suitable k_n for which |f(α_n)|_p ≤ p^–(2M+n+1). Taylor expansion gives f(α_n) = f(α_n–1) + k_np^M+nf′(α_n–1) + c_np^2(M+n) for some . Since by induction hypothesis p^2M+n |f(α_n–1) and p^M |f′(α_{n – 1}), we can write

Since p^M+1  f′(α_n–1), the element and, therefore, there is a unique solution for k_n of the congruence

This value of k_n yields

f (α_n) = p^{2M + n}(b_np + c_npⁿ) ≡ 0 (mod p^2M+n+1)

for some . The Taylor expansion of f′ gives f′(α_n) = f′(α_n–1) + d_np^M+n (for some ) which implies that f′(α_n) ≡ f′(α_n–1) (mod p^M), that is, |f′(α_n)|_p = p^–M.

Since |α_n – α_n–1|_p ≤ p^–(M+n), it follows that α_n – α_n–1 → 0, that is, (α_n) is a Cauchy sequence (under | |_p). By the completeness of , we then have an such that α_n → α. Similarly f(α_n) – f(α_n–1) → 0, that is, the sequence (f(α_n)) is Cauchy and hence converges to f(α). Also |f(α_n)|_p ≤ p^–(2M+n+1), that is, f(α_n) → 0, that is, f(α) = 0. Finally, each α_n ≡ α₀ (mod p^M+1), so that α ≡ α₀ (mod p^M+1). This establishes the existence of a desired .

For proving the uniqueness of α, let satisfy f(β) = 0 and |β – α₀|_p ≤ p^–(M+1). By Taylor expansion, f(β) = f(α) + (β – α)f′(α) + (β – α)²c for some , that is, (β – α)(f′(α) + (β – α)c) = 0. Now β – α = (β – α₀) – (α – α₀) and so |β – α|_p ≤ max(|β – α₀|_p, |α – α₀|_p) ≤ p^–(M+1), whereas f′(α_n) → f′(α), so that |f′(α)|_p = p^–M. Therefore, f′(α)+(β –α)c ≢ 0 (mod p^M+1) and, in particular, f′(α) + (β – α)c ≠ 0. Thus we must have β – α = 0.

Note that α_n in the last proof satisfies the congruence

f(α_n) ≡ 0 (mod p^2M+n+1)

for each . We are given the solution α₀ corresponding to n = 0. From this, we inductively construct the solutions α₁, α₂, . . . corresponding to n = 1, 2, . . . respectively. The process for computing α_n from α_n–1 as described in the proof of Hensel’s lemma is referred to as Hensel lifting. The given conditions ensure that this lifting is possible (and uniquely doable) for every , and in the limit n → ∞ we get a root of f. Since each k_n is required modulo p, we can take . So α admits a p-adic expansion of the form α = α₀ + k₁p^M+1 + k₂p^M+2 + k₃p^M+3 + · · ·.

The special case M = 0 for Hensel’s lemma is now singled out:

Corollary 2.30.

Let . Suppose that there exists an satisfying:

|f(α₀)|_p < 1 (that is, α₀ is a solution of f(x) ≡ 0 (mod p)), and
|f′(α₀)|_p = 1 (that is, f′(α₀) ≢ 0 (mod p), that is, α₀ is a simple root of f modulo p).

Then there exists a unique such that f(α) = 0 and |α – α₀|_p < 1 (that is, α ≡ α₀ (mod p)).

For this special case, we compute solutions α_n of f(x) ≡ 0 (mod pⁿ⁺¹) inductively for n = 1, 2, 3, . . . , given a suitable solution α₀ of this congruence for n = 0. The lifting formula is now:

Equation 2.21

Example 2.35.

is canonically embedded in and so is in . Thus it makes sense to carry out the lifting process for a polynomial and for some solution α₀ of f(X) ≡ 0 (mod p) in . One solves Formula (2.21) in and obtains each . The limit α belongs to and is a solution of f(X) = 0 in .

For example, let p be an odd prime and . Let be a solution of X² ≡ a (mod p). Here f(X) = X² – a, so that f′(X) = 2X, that is, f′(α₀) = 2α₀ ≢ 0 (mod p). Thus the conditions of Corollary 2.30 are satisfied and we get a unique square root of α in with α ≡ α₀ (mod p). This α has a p-adic expansion of the form α = α₀ + k₁p + k₂p² + k₃p³ + · · ·.

As a specific numerical example, take p = 7, a = 2 and α₀ = 3. Using Formula (2.21), we compute k₁ = 1, α₁ = 10, k₂ = 2, α₂ = 108, k₃ = 6, α₃ = 2166, and so on. Thus a square root of 2 in is 3 + 1 × 7 + 2 × 7² + 6 × 7³ + · · ·. The other square root of α in can be obtained by starting with α₀ = 4.

Exercise Set 2.14

2.144	Establish that any p-adic integer (a_n) can be uniquely described as a sequence of integers x_n satisfying 0 ≤ x_n < p for every and a_n ≡ x₀ + x₁p + · · · + x_n–1p^n–1 (mod pⁿ) for every . In this case, the p-adic integer (a_n) is written as the infinite series (a_n) = x₀ + x₁p + x₂p² + · · ·. One calls the above series the p-adic expansion of (a_n). Note that the sum in the above series is not to be treated as one of integers. However, for the expansion of a to the base p is the same as the p-adic expansion of a (more correctly of ). In other words, if the p-adic expansion of (a_n) is terminating, that is, x_N = x_N+1 = x_N+2 = · · · = 0 for some N, then (a_n) can be identified with the rational integer x₀ + x₁p + · · · + x_N–1p^N–1. A non-terminating p-adic series, on the other hand, diverges under the Archimedean norm, but converges under the p-adic norm and corresponds to an element of not in . The rational integer –1, for example, has the infinite p-adic expansion (p – 1) + (p – 1)p + (p – 1)p² + · · ·. The sum telescopes and in the limit n → ∞ converges (under the p-adic norm) to lim_n→∞ pⁿ – 1 = –1. Let . Write the p-adic expansion for –a. [H] Given p-adic integers a := x₀ + x₁p + x₂p² + · · · and b := y₀ + y₁p + y₂p² + · · · , find the p-adic integers c := z₀ + z₁p + z₂p² + · · · and d := w₀ + w₁p + w₂p² + · · · , such that c = a + b and d = ab. (Express each z_n and w_n explicitly in terms of x_n’s and y_n’s.)
2.145	In view of Exercise 2.144, every admits a unique expansion of the form x = x₀ + x₁p + x₂p² + · · · , where each . This notion of p-adic expansion can be extended to the elements of . Show that for , there exist unique and unique integers x_–r, x_–r+1, . . . , x_–1, x₀, x₁, . . . , each in {0, 1, . . . , p – 1}, such that x can be written as: x = x_–rp^–r + x_–r+1p^–r+1 + · · · + x_–1p^–1 + x₀ + x₁p + x₂p² + · · ·. Describe how to compute the p-adic expansions of x + y and xy given those for x, . Also of x/y provided that y ≠ 0. What is \|x\|_p for ? What is\|x\|_p for with x_–r ≠ 0.
2.146	Let p be an odd prime and with . From elementary number theory we know that the congruence x² ≡ a (mod pⁿ) has two solutions for every . Let x₁ be a solution of x² ≡ a (mod p). We know that a solution x_n of x² ≡ a (mod pⁿ) lifts uniquely to a solution x_n+1 of x² ≡ a (mod pⁿ+1). Thus we can inductively compute a sequence x₁, x₂, x₃, · · · of integers. Show that (x_n) is a p-adic integer and that (x_n)² = (a).
2.147	Show that the ring contains rationals of the form a/b, a, , p  b. This implies that . Take a := 17 for p = 2, a := 7 for p = 3 and a := p + 1 for p > 3. Show that there exists with x² = a in . Show also that such an x does not belong to . Thus . Show that . Thus .
2.148	Prove the following assertions: . . Every non-zero ideal of is of the form for some . The ideals of Part (c) satisfy the infinite strictly descending chain . is a local domain with the maximal ideal . The ideal of Part (c) is the principal ideal of generated by p^r, and . In particular, is a local PID, that is, a discrete valuation domain (Exercise 2.133), with the residue field .
2.149	Compute the p-adic expansion of 1/3 in and of –2/5 in .
2.150	Show that is dense in under the p-adic norm \| \|_p, that is, show that given any and real ε > 0, there exists with \|x – a\|_p < ε. Show also that is dense in .
2.151	Prove the following assertions that establish that is the closure of in under \| \|_p. Every sequence (a_n) of rational integers, Cauchy under \| \|_p, converges in . If a sequence (a_n) of rational numbers, Cauchy under \| \|_p, converges to , then there exists a sequence (b_n) of rational integers, Cauchy under \| \|_p, that converges to x.
2.152	Show that: The series converges in . The series converges in . in . [H] The series does not converge in . If and \|a\|_p < 1, then .
2.153	Prove that for any non-zero . [H]
2.154	Prove that for any the sequence (a^pⁿ) converges in . [H]
2.155	Let p, , p ≠ q. Show that the fields and are not isomorphic.
2.156	Let a be an integer congruent to 1 modulo 8. Show that there exists an such that α² = a and .
2.157	Compute with α² + α + 223 = 0 and α ≡ 4 (mod 243).
2.158	Let p be an odd prime and . Show that the polynomial X² – a has exactly root in .
2.159	Show that the polynomial X² – p is irreducible in .
2.160	Teichmüller representative Let . Show that there exists a unique such that α^p = α and α ≡ a (mod p).
2.161	Show that the algebraic closure of is of infinite extension degree over . [H]

2.15. Statistical Methods

Many attacks on cryptosystems involve statistical analysis of ciphertexts and also of data collected from the victim’s machine during one or more private-key operations. For a proper understanding of these analysis techniques, one requires some knowledge of statistics and random variables. In this section, we provide a quick overview of some statistical gadgets. We make the assumption that the reader is already familiar with the elementary notion of probability. We denote the probability of an event E by Pr(E).

2.15.1. Random Variables and Their Probability Distributions

An experiment whose outcome is random is referred to as a random experiment. The set of all possible outcomes of a random experiment is called the sample space of the experiment. For example, the outcomes of tossing a coin can be mapped to the set {H, T} with H and T standing respectively for head and tail. It is convenient to assign numerical values to the outcomes of a random experiment. Identifying head with 0 and tail with 1, one can view coin tossing as a random experiment with sample space {0, 1}. Some other random experiments include throwing a die (with sample space {1, 2, 3, 4, 5, 6}), the life of an electric bulb (with sample space , the set of all non-negative real numbers), and so on. Unless otherwise specified, we henceforth assume that sample spaces are subsets of .

A random variable is a variable which can assume (all and only) the values from a (given) sample space.

A discrete random variable can assume only countably many values, that is, the sample space S_X of a discrete random variable X either is finite or has a bijection with , that is, we can enumerate the elements of S_X as x₁, x₂, x₃, . . ..

The probability distribution function or the probability mass function

f_X : S_X → [0, 1]

of a discrete random variable X assigns to each x in the sample space S_X of X the probability of the occurrence of the value x in a random experiment.^[21] We have

^[21] [a, b] is the closed interval consisting of all real numbers u satisfying a ≤ u ≤ b. Similarly, the open interval (a, b) is the set of all real values u satisfying a < u < b. In order to make a distinction between the open interval (a, b) and the ordered pair (a, b), many—mostly Europeans—use the notation ]a, b[ for denoting open intervals.

A continuous random variable assumes uncountable number of values, that is, the sample space S_X of a continuous random variable X cannot be in bijective correspondence with a subset of . Typically S_X is an interval [a, b] or (a, b) with –∞ ≤ a < b ≤ +∞.

One does not assign individual probabilities Pr(X = x) to a value assumed by a continuous random variable X.^[22] The probabilistic behaviour of X is in this case described by the probability density function

^[22] More correctly, Pr(X = x) = 0 for each .

with the implication that the probability that X occurs in the interval [c, d] (or (c, d)) is given by the integral

that is, by the area between the x-axis, the curve f_X(x) and the vertical lines x = c and x = d. We have

It is sometimes useful to set f_X(x) :=0 for , so that f_X is defined on the entire real line .

The cumulative probability distribution of a random variable X (discrete or continuous) is the function F_X (x) := Pr(X ≤ x) for all . If X is continuous, we have

which implies that

2.15.2. Operations on Random Variables

Let X and Y be discrete random variables. The joint probability distribution of X, Y refers to a random variable Z with S_Z = S_X × S_Y. For z = (x, y), the probability of Z = z is denoted by f_Z(z) = Pr(Z = z) = Pr(X = x, Y = y). The probability Pr(X = x, Y = y) stands for the probability that X = x and Y = y. The random variables X and Y are called independent, if

Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y)

for all x, y.

Example 2.36.

Suppose that we have an urn containing three identical balls with labels 1, 2, 3. We draw two balls randomly from the urn. Let us denote the outcome of the first drawing by X and that of the second drawing by Y. We consider the joint distribution X, Y of the two outcomes in the two following cases:

The balls are drawn with replacement, that is, after the first ball is drawn, it is returned back to the urn (and the urn is shaken well), before the next ball is drawn. The joint probability distribution is now as follows:

In this case, the outcome of the second drawing is not influenced by the outcome of the first drawing; that is, X and Y are independent, and we have , as expected.
The balls are drawn without replacement, that is, the ball obtained by the first drawing is not returned to the urn, before the second ball is drawn. In this case, the outcome of the second drawing is influenced by that of the first drawing in the sense that the same ball cannot be drawn on both occasions. Thus, X and Y are now dependent. This is revealed by the following joint probability distribution:
x y Pr(X = x, Y = y)
1 1 0
1 2 1/6
1 3 1/6
2 1 1/6
2 2 0
2 3 1/6
3 1 1/6
3 2 1/6
3 3 0

For continuous random variables X and Y, the joint distribution is defined by the probability density function f_X,Y (x, y) and the cumulative distribution is obtained by the double integral

X and Y are independent, if f_X,Y (x, y) = f_X(x)f_Y (y) for all x, y. In this case, we also have F_X,Y (c, d) = F_X(c)F_Y (d) for all c, d.

Now, we define arithmetic operations on random variables. First, let X and Y be discrete random variables. The sum X + Y is defined to be a random variable U which assumes the values u = x + y for and with probability

The product XY of X and Y is defined to be a random variable V which assumes the values v = xy for and with probability

For , the random variable W = αX assumes the values w = αx for with probability

f_W(w) = Pr(W = αx) = Pr(X = x) = fX(x).

Example 2.37.

Let us consider the random variables X and Y of Example 2.36. For the sake of brevity, we denote Pr(X = x, Y = y) by P_xy. The distributions of U = X + Y in the two cases are as follows:

Drawing with replacement:
Pr(U = 2) = P₁₁ = 1/9
Pr(U = 3) = P₁₂ +P₂₁ = 2/9
Pr(U = 4) = P₁₃ +P₂₂ + P₃₁ = 1/3
Pr(U = 5) = P₂₃ +P₃₂ = 2/9
Pr(U = 6) = P₃₃ = 1/9
Drawing without replacement:
Pr(U = 3) = P₁₂ +P₂₁= 1/3
Pr(U = 4) = P₁₃ +P₃₁= 1/3
Pr(U = 5) = P₂₃ +P₃₂= 1/3

Now, let us consider continuous random variables X and Y. In this case, it is easier to define first the cumulative density functions of U = X + Y, V = XY and W = αX and then the probability density functions by taking derivatives:

One can easily generalize sums and products to an arbitrary finite number of random variables. More generally, if X₁, . . . , X_n are random variables and , one can talk about the probability distribution or density function of the random variable g(X₁, . . . , X_n). (See Exercise 2.163.)

Now, we introduce the important concept of conditional probability. Let X and Y be two random variables. To start with, suppose that they are discrete. We denote by f(x, y) = Pr(X = x, Y = y) the joint probability distribution function of X, Y. For with Pr(Y = y) > 0, we define the conditional probability of X = x given Y = y as:

For a fixed , the probabilities f_X|y(x), , constitute the probability distribution function of the random variable X|y (X given Y = y). If X and Y are independent, f(x, y) = f_X(x)f_Y (y) and so f_X|y(x) = f_X(x) for all , that is, the random variables X and X|y have the same probability distribution. This is expected, because in this case the probability of X = x does not depend on whatever value y the variable Y takes.

If X and Y are continuous random variables with joint density f(x, y) and , the conditional probability density function of X|y (X given Y = y) is defined by

Again if X and Y are independent, we have f_X|y(x) = f_X(x) for all x, y.

For a fixed , one can likewise define the conditional probabilities f_Y|x (y) := f(x, y)/f_X (x) for all .

Let X and Y be discrete random variables with joint distribution f(x, y). Also let Γ ⊆ S_X and Δ ⊆ S_Y. One defines the probability f_X(Γ) as:

The joint probability f(Γ, Δ), is defined as:

If Γ = {x} is a singleton, we prefer to write f(x, Δ) instead of f({x}, Δ). Similarly, f(Γ, y) stands for f (Γ,{y}). We also define the conditional distributions:

We abbreviate f_X|Δ (Γ) as Pr(Γ|Δ) and f_Y|Γ (Δ) as Pr(Δ|Γ).

Theorem 2.64. Bayes rule

Let X, Y be discrete random variables and Δ ⊆ S_Y with f_Y (Δ) > 0. Also let Γ₁,..., Γ_n form a partition of S_X with f_X (Γ_i) > 0 for all i = 1, . . . , n. Then we have:

that is, in terms of probability:

Proof

Pr(Γ_i, Δ) = Pr(Δ|Γ_i) Pr(Γ_i) = Pr(Γ_i|Δ) Pr(Δ). So it is sufficient to show that Pr(Δ) equals the sum in the denominator. The event Δ is the union of the pairwise disjoint events (Γ_j, Δ), j = 1,..., n, and so .

The Bayes rule relates the a priori probabilities Pr(Γ_j) and Pr(Δ|Γ_j) to the a posteriori probabilities Pr(Γ_i|Δ). The following example demonstrates this terminology.

Example 2.38.

Consider the random experiment of Example 2.36(2). Take Γ_j := {j} for and Δ := {2, 3}. We have the following a priori probabilities:

Pr(Γ_j)	=	Probability of getting ball j in the first draw = 1/3,
Pr(Δ\|Γ₁)	=	Probability of getting the second or the third ball in the second draw, given that the first ball is obtained in the first draw = 1,
Pr(Δ\|Γ₂)	=	Probability of getting the second or the third ball in the second draw, given that the second ball is obtained in the first draw = 1/2,
Pr(Δ\|Γ₃)	=	Probability of getting the second or the third ball in the second draw, given that the third ball is obtained in the first draw = 1/2.

The a posteriori probability Pr(Γ₁|Δ) that the first ball was obtained in the first draw given that the ball obtained in the second draw is the second or the third one is calculated using the Bayes rule as:

One can similarly calculate . This is expected, since the only events (x, y) consistent with are the four equiprobable possibilities (1, 2), (1, 3), (2, 3) and (3, 2).

2.15.3. Expectation, Variance and Correlation

Let X be a random variable. The expectation E(X) of X is defined as follows:

E(X) is also called the (arithmetic) mean or average of X. One uses the alternative symbols μ_X and to denote E(X). More generally, let X₁, . . . , X_n be n random variables with joint probability distribution/density function f(x₁, . . . , x_n). Also let . We define the following expectations:

X is discrete:

X is continuous:

[View full size image]

Let g(X) and h(Y) be real polynomial functions of the random variables X and Y and let . Then

E(g(X) + h(Y))	=	E(g(X)) + E(h(Y)),
E(g(X)h(Y))	=	E(g(X)) E(h(Y)) if X and Y are independent,
E(αg(X))	=	αE(g(X)).

Let us derive the sum and product formulas for discrete variables X and Y.

If X and Y are independent, then

The variance Var(X) of a random variable X is defined as

Var (X) := E[(X – E(X))²].

From the observation that E[(X – E(X))²] = E[X² – 2 E(X)X + [E(X)]²] = E(X²) – 2 E(X) E(X) + [E(X)]², we derive the computational formula:

Var (X) = E[X²] – [E(X)]².

Var(X) is a measure of how the values of X are dispersed about the mean E(X) and is always a non-negative quantity. The (non-negative) square root of Var(X) is called the standard deviation σ_X of X:

The following formulas can be easily verified:

Var(X + α)	=	Var(X).
Var(αX)	=	α² Var(X).
Var(X + Y)	=	Var(X) + Var(Y) + 2 Cov(X, Y),

where and where the covariance Cov(X, Y) of X and Y is defined as:

Cov(X, Y) := E[(X – E(X))(Y – E(Y))] = E(XY) – E(X) E(Y).

Normalized covariance is a measure of correlation between the two random variables X and Y. More precisely, the correlation coefficient ρ_X,Y is defined as:

If X and Y are independent, E(XY) = E(X) E(Y) so that Cov(X, Y) = 0 and so ρ_X,Y = 0. The converse of this is, however, not true, that is, ρ_X,Y = 0 does not necessarily imply that X and Y are independent. ρ_X,Y is a real value in the interval [–1, 1] and is a measure of linear relationship between X and Y. If larger (resp. smaller) values of X are (in general) associated with larger (resp. smaller) values of Y, then ρ_X,Y is positive. On the other hand, if larger (resp. smaller) values of X are (in general) associated with smaller (resp. larger) values of Y, then ρ_X,Y is negative.

Example 2.39.

Once again consider the drawing of two balls from an urn containing three balls labelled {1, 2, 3} (Examples 2.36, 2.37 and 2.38). Look at the second case (drawing without replacement). We use the shorthand notation P_xy for Pr(X = x, Y = y). The individual probability distributions of X and Y can be obtained from the joint distribution as follows:

Pr(X = 1)	= P₁₁ + P₁₂ + P₁₃	= 0 + (1/6) + (1/6)	= 1/3
Pr(X = 2)	= P₂₁ + P₂₂ + P₂₃	= (1/6) + 0 + (1/6)	= 1/3
Pr(X = 3)	= P₃₁ + P₃₂ + P₃₃	= (1/6) + (1/6) + 0	= 1/3

Pr(Y = 1)	= P₁₁ + P₂₁ + P₃₁	= 0 + (1/6) + (1/6)	= 1/3
Pr(Y = 2)	= P₁₂ + P₂₂ + P₃₂	= (1/6) + 0 + (1/6)	= 1/3
Pr(Y = 3)	= P₁₃ + P₂₃ + P₃₃	= (1/6) + (1/6) + 0	= 1/3

Thus E(X) = 1 × (1/3) + 2 × (1/3) + 3 × (1/3) = 2. Similarly, E(Y) = 2. Therefore, E(X + Y) = E(X) + E(Y) = 4. This can also be verified by direct calculations: E(X + Y) = 3 × (1/3) + 4 × (1/3) + 5 × (1/3) = 4.

E(X²) = E(Y²) = 1² × (1/3) + 2² × (1/3) + 3² × (1/3) = 14/3 and Var(X) = Var(Y) = (14/3) – 2² = 2/3. The probability distribution for XY is

E(XY = 2)	=	P₁₂ + P₂₁ = 1/3
E(XY = 3)	=	P₁₃ + P₃₁ = 1/3
E(XY = 6)	=	P₂₃ + P₃₂ = 1/3,

so that E(XY) = 2 × (1/3) + 3 × (1/3) + 6 × (1/3) = 11/3. Therefore, Cov(XY) = E(XY) – E(X) E(Y) = (11/3) – 2 × 2 = –1/3, that is,

The negative correlation between X and Y is expected. If X = 1 (small), Y takes bigger values (2, 3). On the other hand, if X = 3 (large), Y assumes smaller values (1, 2). Of course, the correlation is not perfect, since for X = 2 the values of Y can be smaller (1) or larger (3). So, we should feel happy to see a not-so-negative correlation of –1/2 between X and Y.

2.15.4. Some Famous Probability Distributions

Some probability distributions that occur frequently in statistical theory and in practice are described now. Some other useful probability distributions are considered in the Exercises 2.169, 2.170 and 2.171.

Uniform distribution

A discrete uniform random variable U has sample space S_U := {x₁, . . . , x_n} and probability distribution

A continuous uniform random variable U has sample space S_U and probability density function

where A > 0 is the size^[23] of S_U. For example, if S_U is the real interval [a, b] for a < b, we have

^[23] If , “size” means length. If or , “size” refers to area or volume respectively. We assume that the size of S_U is “measurable”.

In this case, we have

E(U) = (a + b)/2

and

Var(U) = (b – a)²/12.

Uniform random variables often occur naturally. For example, if we throw an unbiased die, the six possible outcomes (1 through 6) are equally likely, that is, each possible outcome has the probability 1/6. Similarly, if a real number is chosen randomly in the interval [0, 1], we have a continuous uniform random variable. The built-in C library call rand() (pretends to) return an integer between 0 and 2³¹ – 1, each with equal probability (namely, 2^–31).

Bernoulli distribution

The Bernoulli random variable B = B(n, p) is a discrete random variable characterized by two parameters and , where p stands for the probability of a certain event E and n represents the number of (independent) trials. It is assumed that the probability of E remains constant (namely, p) in each of the n trials. The sample space S_B = {0, 1, . . . , n} comprises the (exact) numbers of occurrences of E in the n trials. B has the probability distribution

as follows from simple combinatorial arguments. The mean and variance of B are:

E(B) = np

and

Var(B) = np(1 – p).

The Bernoulli distribution is also called the binomial distribution.

Normal distribution

The normal random variable or the Gaussian random variable N = N (μ, σ²) is a continuous random variable characterized by two real parameters μ and σ with σ > 0. The density function of N is

The cumulative distribution for N can be expressed in terms of the error function erf():

The error function does not have a known closed-form expression. Figure 2.3 shows the curves for f_N (x) and F_N (x) for the parameter values μ = 0 and σ = 1 (in this case, N is called the standard normal variable).

Figure 2.3. Standard normal distribution

[View full size image]

Some statistical properties of N are:

E(N) = μ

and

Var(N) = σ².

The curve f_N (x) is symmetric about x = μ. Most of the area under the curve is concentrated in the region μ – 3σ ≤ x ≤ μ + 3σ. More precisely:

Pr(μ – σ ≤ X ≤ μ + σ)	≈	0.68,
Pr(μ – 2σ ≤ X ≤ μ + 2σ)	≈	0.95,
Pr(μ – 3σ ≤ X ≤ μ + 3σ)	≈	0.997.

Many distributions occurring in practice (and in nature) approximately follow normal distributions. For example, the height of (adult) people in a given community is roughly normally distributed. Of course, the height of a person cannot be negative, whereas a normal random variable may assume negative values. But, in practice, the probability that such an approximating normal variable assumes a negative value is typically negligibly low.

2.15.5. Sample Mean, Variation and Correlation

In practice, we often do not know a priori the probability distribution or density function of a random variable X. In some cases, we do not have the complete data, whereas in some other cases we need an infinite amount of data to obtain the actual probability distribution of a random variable. For example, let X represent the life of an electric bulb manufactured by a given company in the last ten years. Even though there are only finitely many such bulbs and even if we assume that it is possible to trace the working of every such bulb, we have to wait until all these bulbs burn out, before we know the actual distribution of X. That is certainly impractical. On the contrary, if we have data on the life-times of some sample bulbs, we can approximate the properties of X by those of the samples.

Suppose that S := (x₁, x₂, . . . , x_n) is a sample of size n. We assume that all x_i are real numbers. We define the following quantities for S:

[View full size image]

Here is the mean of the collection .

If T := (y₁, y₂, . . . , y_m) is another sample (of real numbers), the (linear) relationship between S and T is measured by the following quantities:

Here is the mean of the collection ST := (x_iy_j | i = 1, . . . , n, j = 1, . . . , m).

An important property of the normal distribution is the following:

Theorem 2.65. Central limit theorem

Let X be any random variable with mean μ and variance σ² and let . The mean of a random sample S of size n chosen according to the distribution of X approximately follows the normal distribution N (μ, σ²/n). The larger the sample size n is, the better this approximation is.

Exercise Set 2.15

2.162

An urn contains n₁ red balls and n₂ black balls. We draw k balls sequentially and randomly from the urn, where 1 ≤ k ≤ n₁ + n₂.

If the balls are drawn with replacement, what is the probability that the k-th ball drawn from the urn is red?
If the balls are drawn without replacement, what is the probability that the k-th ball drawn from the urn is red?

2.163

Let X and Y be the random variables of Example 2.36. For each of the two cases, calculate the probability distribution functions, expectations and variances of the following random variables:

XY
2X + 3Y
X²
X² + 2XY + Y²
(X + Y)²

2.164

Let X and Y be continuous random variables, g(X) and h(Y) non-constant real polynomials and α, β,

. Prove that:

E(g(X) + h(Y))	=	E(g(X)) + E(h(Y)).
E(g(X)h(Y))	=	E(g(X)) E(h(Y)), if X and Y are independent.
E(αg(X))	=	αE(g(X)).
Var(αX + βY + γ)	=	α² Var(X) + β² Var(Y).

2.165

Let X be a random variable and Y := αX + β for some α,

. What is ρ_X,Y ?

2.166

Let X and Y be discrete random variables with joint probability distribution function f(x, y). Show that the probability distributions of X and Y can be obtained as
If X and Y are continuous random variables with joint density function f(x, y), show that the density functions of X and Y are given by

The functions f_X and f_Y are called the marginal probability distribution (or density function) of X and Y respectively.

2.167

Let X and Y be continuous random variables whose joint distribution is the uniform distribution in the triangle 0 ≤ X ≤ Y ≤ 1.

Compute the marginal distributions f_X and f_Y.
Compute E(X), E(Y), Var(X), Var(Y), Cov(X, Y) and ρ_X,Y.

2.168

Let X, Y, Z be random variables. Show that:

Cov(X, Y)	=	Cov(Y, X).
ρ_X,Y	=	ρ_Y,X.
Cov(X, X)	=	Var(X).
Cov(X, Y + Z)	=	Cov(X, Y) + Cov(X, Z).
Cov(X, X + Y)	=	Var(X) + Cov(X, Y).
Cov(X, X + Y)	=	Var(X) if X and Y are independent.

2.169

Geometric distribution Assume that in each trial of an experiment, an event E has a constant probability p of occurrence. Let G = G(p) denote the random variable with and with f_G(x) equal to the probability that E occurs the first time during the x-th trial (that is, after exactly x – 1 failures). Show that:

What if p = 0?

2.170

Poisson distribution Let P = P (λ) be the discrete random variable with and with , where λ is a positive real constant. Show that E(P) = Var(P) = λ.

2.171

Exponential distribution

Let X = X(λ) be the continuous random variable with density

where λ is a positive real constant. Show that:
A random variable Y with is said to be memoryless, if
Pr(Y > s + t | Y > s) = Pr(Y > t) for all s, .

Show that the exponential variable X of Part (a) is memoryless.

2.172

The birthday paradox Let S be a finite set of cardinality n.

Show that the probability that k < n elements, drawn at random form S (with replacement), are (pairwise) distinct is
Use the inequality 1 – x ≤ e^–x for any real number x to show that .
Deduce that p ≤ 1/2, if , and that p ≤ 0.136 for .
(The birthday paradox states that if only 23 people are chosen at random, there is a chance as high as 50 per cent that at least two of them have the same birthday.)

3.2. Complexity Issues

Given an algorithm (or an implementation of the same), the time and space required for the execution of the algorithm on a machine depend very much on the machine’s architecture and on the compiler. But this does not mean that we cannot make some general theoretical estimates. The so-called asymptotic estimates that we are going to introduce now tend to approach the real situation as the input size tends to infinity. For finite input sizes (which is always the case in practice), these theoretical predictions turn out to provide valuable guidelines.

3.2.1. Order Notations

We start with the following important definitions.

Definition 3.1.

Let f and g be positive real-valued functions of natural numbers.

f is said to be bounded above by g or of the order of g, denoted f = O(g), if there exists an and a positive real constant c such that f(n) ≤ cg(n) for all n ≥ n₀. In this case, we also say that g is bounded below by f and denote this by g = Ω(f).
If f = O(g) and g = O(f), we say that f and g are of the same order and denote this by f = Θ(g) (or by g = Θ(f)). Equivalently, f = Θ(g) if and only if f = O(g) and f = Ω(g); that is, if and only if there exist an integer and real positive constants c₁, c₂ such that c₁g(n) ≤ f(n) ≤ c₂g(n) for all n ≥ n₀.
f is said to be of strictly lower order than g, denoted f = o(g), if f(n)/g(n) tends to 0 as n tends to infinity. In other words, f = o(g) if and only if for every real positive constant c (however small it may be) there exists an integer such that f(n) < cg(n) for all n ≥ n_c. If f = o(g), we also say that g is of strictly higher order than f and denote this by g = ω(f). Thus g = ω(f) if and only if for every real positive constant c (however large it may be) there exists an integer such that g(n) > cf(n) for all n ≥ n_c.

Example 3.1.

Let f(n) := a_dn^d + · · · + a₁n + a₀ with d ≥ 0, , a_d > 0. Then f = Θ(n^d). This heuristically means that as n becomes sufficiently large, the leading term a_dn^d dominates over the other terms, and apart from the constant of proportionality a_d the function f(n) grows with n as n^d does. If f = Θ(n^d) for some integer d > 0, we say that f is of polynomial order in n.^[1] A Θ(1) function is often called a constant function.
^[1] This is not the complete truth. Functions like , n^2.3 or n³(log n)² would be better included in the polynomial family. Thus, we may define f to be of polynomial order (in n), if f = O(n^d) and f = Ω(n^d′) for some positive real constants d, d′. Similar comments hold for poly-logarithmic and exponential orders.
If f = Θ((log n)^a) for some real a > 0, we say that f is of poly-logarithmic order in n. By Exercise 3.2(b), any function of poly-logarithmic order grows asymptotically slower than any function of polynomial order.
If f = Θ(aⁿ) for some real a > 1, f said to be of exponential order in n. Again by Exercise 3.2(b) any function of exponential order grows asymptotically faster than any function of polynomial order.
Now, consider a function of the form

Equation 3.1

for real c > 0 and for 0 ≤ α ≤ 1. For α = 0, we have f = Θ(n^c); that is, f is of polynomial order. On the other extreme, if α = 1, f = Θ(aⁿ), where a := exp(c), that is, f is of exponential order. If 0 < α < 1, we say that f is of subexponential order in n, since the order of f is somewhere in between polynomial and exponential. We will come across functions of subexponential orders quite frequently in the rest of the book. Note that as α increases from 0 to 1, the order of f also increases monotonically from polynomial to exponential.
A function f = O(n^a(log n)^b) with a > 0 and b ≥ 0 is often denoted by the soft O-notation: f = O^~(n^a). This implies that up to multiplication by a polynomial in log n the function f is of the order of n^a. Similarly, if f = O(aⁿg(n)) for a > 1 and for some g(n) of polynomial order, we say that f = O^~(aⁿ). Intuitively spoken, the O-notation hides constant multipliers, whereas the soft O-notation suppresses exponentially small multipliers.
The notion of order can be readily extended to functions with two or more input variables. For example, for positive real-valued functions f, g of two positive integer variables m, n one says f = O(g), if for some m₀, and for some positive real constant c one has f(m, n) ≤ cg(m, n) for all m ≥ m₀ and n ≥ n₀. The function f(m, n) = m³2ⁿ is of polynomial order in m, but of exponential order in n.

The order notation is used to analyse algorithms in the following way. For an algorithm, the input size is defined as the total number of bits needed to represent the input of the algorithm. We find asymptotic estimates of the running time and the memory requirement of the algorithm in terms of its input size. Let f(n) denote the running time^[2] of an algorithm A for an input of size . If f(n) = Θ(n^a) (or, more generally, if f = O(n^a)) for some a > 0, A is called a polynomial-time algorithm. If a = 1 (resp. 2, 3, . . .), then A is specifically called a linear-time (resp. quadratic-time, cubic-time, . . .) algorithm. A Θ(1) algorithm is often called a constant-time algorithm. If f = Θ(bⁿ) for some b > 1, A is called an exponential-time algorithm. Similarly, if f satisfies Equation (3.1) with 0 < α < 1, A is called a subexponential-time algorithm.

^[2] The practical running time of an algorithm may vary widely depending on its implementation and also on the processor, the compiler and even on run-time conditions. Since we are talking about the order of growth of running times in relation to the input size, we neglect the constants of proportionality and so these variations are usually not a problem. If one plans to be more concrete, one may measure the running time by the number of bit operations needed by the algorithm.

One has similar classifications of an algorithm in terms of its space requirements, namely, polynomial-space, linear-space, exponential-space, and so on. We can afford to be lazy and drop -time from the adjectives introduced in the previous paragraph. Thus, an exponential algorithm is an exponential-time algorithm, not an exponential-space algorithm.

It is expedient to note here that the running time of an algorithm may depend on the particular instance of the input, even when the input size is kept fixed. For an example, see Exercise 3.3. We should, therefore, be prepared to distinguish, for a given algorithm and for a given input size n, between the best (that is, shortest) running time f_b(n), the worst (that is, longest) running time f_w(n), the average running time f_a(n) on all possible inputs (of size n) and the expected running time f_e(n) for a randomly chosen input (of size n). In typical situations, f_w(n), f_a(n) and f_e(n) are of the same order, in which case we simply denote, by running time, one of these functions. If this is not the case, an unqualified use of the phrase running time would denote the worst running time f_w(n).

The order notation, though apparently attractive and useful, has certain drawbacks. First it depicts the behaviour of functions (like running times) as the input size tends to infinity. In practice, one always has finite input sizes. One can check that if f(n) = n¹⁰⁰ and g(n) = (1.01)ⁿ are the running times of two algorithms A and B respectively (for solving the same problem), then f(n) ≤ g(n) if and only if n = 1 or n ≥ 117,309. But then if the input size is only 1,000, one would prefer the exponential-time algorithm B over the polynomial-time algorithm A. Thus asymptotic estimates need not guarantee correct suggestions at practical ranges of interest. On the other hand, an algorithm which is a product of human intellect does not tend to have such extreme values for the parameters; that is, in a polynomial-time algorithm, the degree is usually ≤ 10 and the base for an exponential-time algorithm is usually not as close to 1 as 1.01 is. If we have f(n) = n⁵ and g(n) = 2ⁿ as the respective running times of the algorithms A and B, then A outperforms B (in terms of speed) for all n ≥ 23.

The second drawback of the order notation is that it suppresses the constant of proportionality; that is, an algorithm whose running time is 100n² has the same order as one whose running time is n². This is, however, a situation that we cannot neglect in practice. In particular, when we compare two different implementations of the same algorithm, the one with a smaller constant of proportionality is more desirable than the one with a larger constant. This is where implementation tricks prove to be important and even indispensable for large-scale applications.

3.2.2. Randomized Algorithms

A deterministic algorithm is one that always follows the same sequence of computations (and thereby produces the same output) for a given input. The deterministic running time of a computational problem P is the fastest of the running times (in order notation) of the known algorithms to solve P.

If an algorithm makes some random choices during execution, we call the algorithm randomized or probabilistic. The exact sequence of computations followed by the algorithm depends on these random choices and as a result different executions of the same algorithm may produce different outputs for a given input. At first glance, randomized algorithms look useless, because getting different outputs for a given input is apparently not what one would really want. But there are situations where this is desirable. For example, in an implementation of the RSA protocol, one generates random primes p and q of given bit lengths. Here we require our prime generation procedure to produce different primes during different executions (that is, for different entities on the net).

More importantly, randomized algorithms often provide practical computational solutions for many problems for which no practical deterministic algorithms are known. We will shortly encounter many such situations where randomized algorithms are simplest and/or fastest known algorithms. However, this sudden enhancement in performance by random choices does not come for free. To explain the so-called darker sides of randomization, we explain two different types of randomized algorithms.

A Monte Carlo algorithm is a randomized algorithm that may produce incorrect outputs. However, for such an algorithm to be useful, we require that the running time be always small and the probability of an error sufficiently low. A good example of a Monte Carlo algorithm is Miller–Rabin’s algorithm (Algorithm 3.13) for testing the primality of an integer. For an integer of bit size n, the Miller–Rabin test with t iterations runs in time O(tn³). Whenever the algorithm outputs false, it is always correct. But an answer of true is incorrect with an error probability ≤ 2^–2t, that is, it certifies a composite integer as a prime with probability ≤ 2^–2t. For t = 20, an error is expected to occur less than once in every 10¹² executions. With this little sacrifice we achieve a running time of O(n³) (for a fixed t), whereas the best deterministic primality testing algorithm (known to the authors at the time of writing this book) takes time O(n^7.5) and hence is not practical.

A Las Vegas algorithm is a randomized algorithm which always produces the correct output. However the running time of such an algorithm depends on the random choices made. For such an algorithm to be useful, we expect that for most random choices the running time is small. As an example, consider the problem of finding a random (monic) irreducible polynomial of degree n over . Algorithm 3.22 tests the irreducibility of a polynomial in in deterministic polynomial time. We generate random polynomials of degree n and check the irreducibility of these polynomials by Algorithm 3.22. From Section 2.9.2, we know that a randomly chosen monic polynomial of degree n over a finite field is irreducible with an approximate probability of 1/n. This implies that after O(n) random polynomials are tried, one expects to find an irreducible polynomial. The resulting Las Vegas algorithm (Algorithm 3.23) runs in expected polynomial time. It may, however, happen that for certain random choices we keep on generating reducible polynomials for an exponential number of times, but the likelihood of such an accident is very, very low (Exercise 3.5).

An algorithm is said to be a probabilistic or randomized polynomial-time algorithm, if it is either a Monte Carlo algorithm with polynomial worst running time or a Las Vegas algorithm with polynomial expected running time. Both the above examples of randomized algorithms are probabilistic polynomial-time algorithms. A combination of these two types of algorithms can also be conceived; namely, algorithms that produce correct outputs with high probability and have polynomial expected running time. Some computational problems are so challenging that even such probably correct and probably fast algorithms are quite welcome.

We finally note that there are certain computational problems for which the deterministic running time is exponential and for which randomization also does not help much. In some cases, we have subexponential randomized algorithms which are still too slow to be of reasonable practical use. Some of these so-called intractable problems are at the heart of the security of many public-key cryptographic protocols.

3.2.3. Reduction Between Computational Problems

In the last two sections, we have introduced theoretical measures (the order notations) for estimating the (known) difficulty of solving computational problems. In this section, we introduce another concept by which we can compare the relative difficulty of two computational problems.

Let P₁ and P₂ be two computational problems. We say that P₁ is polynomial-time reducible to P₂ and denote this as , if there is a polynomial-time algorithm which, given a solution of P₂, provides a solution for P₁. This means that if , then the problem P₁ is no more difficult than P₂ apart from the extra polynomial-time reduction effort. In that case, if we know an algorithm to solve P₂ in polynomial time, then we have a polynomial-time algorithm for P₁ too. If and , we say that the problems P₁ and P₂ are polynomial-time equivalent and write P₁ ≅ P₂.

In order to give an example of these concepts, we let G be a finite cyclic multiplicative group of order n and g a generator of G. The discrete logarithm problem (DLP) is the problem of computing for a given an integer x such that a = g^x. The Diffie–Hellman problem (DHP), on the other hand, is the problem of computing g^xy from the given values of g^x and g^y. If one can compute y from g^y, one can also compute g^xy = (g^x)^y by performing an exponentiation in the group G. Therefore, , if exponentiations in G can be computed in polynomial time. In other words, if a solution for DLP is known, a solution for DHP is also available: that is, DHP is no more difficult than DLP except for the additional exponentiation effort. However, the reverse implication (that is, whether ) is not known for many groups.

So far we have assumed that our reduction algorithms are deterministic. If we allow randomized (that is, probabilistic) polynomial-time reduction algorithms, we can similarly introduce the concepts of randomized polynomial-time reducibility and of randomized polynomial-time equivalence. We urge the reader to formulate the formal definitions for these concepts.

Exercise Set 3.2

3.1

Sort the following functions in the increasing sequence of order. (Don’t mind if some of these functions are not defined for a few values of n.)
10¹², 2ⁿ, 2^2ⁿ, 2^n², 100n², 10^–3n³, 1/n, , n!, nⁿ,
log n, (log n)/n, n/log n, n² log n, n(log n)², (0.1)^{log n}, (log n)ⁿ,
1/log n, , 10⁶(log n)¹⁰⁰, log log n, 2^{log log n}, n^{log log n},
, , ,
exp(n^1/3(ln n)^2/3), exp((ln n)^1/3(ln ln n)^2/3).
Evaluate the functions of Part (a) at n = 10ⁱ for i = 1, 2, . . . , 10 and conclude that as n gets larger, the asymptotic ordering tallies with the actual ordering more correctly.

3.2

Show that for any real a > 1 and b > 0 one has n^b = o(aⁿ).
For any positive real c, d, show that (log n)^c = o(n^d).
Show that if f = O(g) and g = O(h), then f = O(h).
Give an example to show that f = O(g) does not necessarily imply f = Θ(g).
Give an example of a function f with f = O(n^1+ε) for every ε > 0, but f is not O(n).

3.3

Suppose that an algorithm A takes as input a bit string and runs in time g(t), where t is the number of one-bits in the input string. Let f_b(n), f_w(n), f_a(n) and f_e(n) respectively denote the best, worst, average and expected running times of A for inputs of size n. Derive the following table under the assumption that each of the 2ⁿ bit strings of length n is equally likely.

		Running times
g(t)	f_b(n)	f_w(n)	f_a(n)	f_e(n)
t	0	n	n/2	n/2
t²	0	n²	n(n + 1)/4	n²/4
2^t	1	2ⁿ	(3/2)ⁿ

3.4

Show that an exponential-space (resp. subexponential-space) algorithm must be (at least) exponential-time (resp. subexponential-time) too. You may assume that at a time a computing device can access (read/write) at most a finite number of memory locations.
Give an example of an algorithm that is exponential-time but polynomial-space.

3.5

Consider the Las Vegas algorithm discussed in Section 3.2.2 for generating a random irreducible polynomial of degree n over

. Assume that a randomly chosen polynomial in

of degree n has (an exact) probability of 1/n for being irreducible. Find out the probability p_r that r polynomials chosen randomly (with repetition) from

are all reducible. For n = 1000, calculate the numerical values of p_r for r = 10ⁱ, i = 1, . . . , 6, and find the smallest integers r for which p_r ≤ 1/2 and p_r ≤ 10^–12. Find the expected number of polynomials tested for irreducibility, before the algorithm terminates.

3.6

Let n = pq be the product of two distinct primes p and q. Show that factoring n is polynomial-time equivalent to computing φ(n) = (p–1)(q–1), where φ is Euler’s totient function. (Assume that an arithmetic operation (including computation of integer square roots) on integers of bit size t can be performed in polynomial time (in t).)

3.7

Let G be a finite cyclic multiplicative group and let H be the subgroup of G generated by

whose order is known. The generalized discrete logarithm problem (GDLP) is the following: Given

, find out if

and, if so, find an integer x for which a = h^x. Show that GDLP ≅ DLP, if exponentiations in G can be carried out in polynomial time and if DLP in H is polynomial-time equivalent to DLP in G. [H]

3.3. Multiple-precision Integer Arithmetic

Cryptographic protocols based on the rings and demand n and p to be sufficiently large (of bit length ≥ 512) in order to achieve the desired level of security. However, standard compilers do not support data types to hold with full precision the integers of this size. For example, C compilers support integers of size ≤ 64 bits. So one must employ custom-designed data types for representing and working with such big integers. Many libraries are already available that can handle integers of arbitrary length. FREELIP, GMP, LiDIA, NTL and ZEN are some such libraries that are even freely available.

Alternatively, one may design one’s own functions for multiple-precision integers. Such a programming exercise is not very difficult, but making the functions run efficiently is a huge challenge. Several tricks and optimization techniques can turn a naive implementation to a much faster and more memory-efficient code and it needs years of experimental experience to find out the subtleties. Theoretical asymptotic estimates might serve as a guideline, but only experimentation can settle the relative merits and demerits of the available algorithms for input sizes of practical interest. For example, the theoretically fastest algorithm known for multiplying two multiple-precision integers is based on the so-called fast Fourier transform (FFT) techniques. But our experience shows that this algorithm starts to outperform other common but asymptotically slower algorithms only when the input size is at least several thousand bits. Since such very large integers are rarely needed by cryptographic protocols, FFT-based multiplication is not useful in this context.

3.3.1. Representation of Large Integers

In order to represent a large integer, we break it up into small parts and store each part in a memory word^[3] accessible by built-in data types. The simplest way to break up a (positive) integer a is to predetermine a radix ℜ and compute the ℜ-ary representation (a_s–1, . . . , a₀)_ℜ of a (see Exercise 3.8). One should have ℜ ≤ 2³² so that each ℜ-ary digit a_i can be stored in a memory word. For the sake of efficiency, it is advisable to take ℜ to be a power of 2. It is also expedient to take ℜ as large as possible, because smaller values of ℜ lead to (possibly) longer size s and thereby add to the storage requirement and also to the running time of arithmetic functions. The best choice is ℜ = 2³². We denote by ulong a built-in unsigned integer data type provided by the compiler (like the ANSI C standard unsigned long). We use an array of ulong for storing the digits. The array can be static or dynamic. Though dynamic arrays are more storage-efficient (because they can be allocated only as much memory as needed), they have memory allocation and deallocation overheads and are somewhat more complicated to programme than static arrays. Moreover, for cryptographic protocols one typically needs integers no longer than 4096 bits. Since the product of two integers of bit size t has bit size ≤ 2t, a static array of 8192/32 = 256 ulong suffices for storing cryptographic integers. It is also necessary to keep track of the actual size of an integer, since filling up with leading 0 digits is not an efficient strategy. Finally, it is often useful to have a signed representation of integers. A sign bit is also necessary for this case. We state three possible declarations in Exercise 3.11.

^[3] We assume that a word in the memory is 32 bits long.

3.3.2. Basic Arithmetic Operations

We now describe the implementations of addition, subtraction, multiplication and Euclidean division of multiple-precision integers. Every other complex operation (like modular arithmetic, gcd) is based on these primitives. It is, therefore, of utmost importance to write efficient codes for these basic operations.

For integers of cryptographic sizes, the most efficient algorithms are the standard ones we use for doing arithmetic on decimal numbers, that is, for two positive integers a = a_s–1 . . . a₀ and b = b_t–1 . . . b₀ we compute the sum c = a + b = c_r–1 . . . c₀ as follows. We first compute a₀ + b₀. If this sum is ≥ ℜ, then c₀ = a₀ + b₀ – ℜ and the carry is 1, otherwise c₀ = a₀ + b₀ and the carry is 0. We then compute a₁ + b₁ plus the carry available from the previous digit, and compute c₁ and the next carry as before.

For computing the product d = ab = d_l–1 . . . d₀, we do the usual quadratic procedure; namely, we initialize all the digits of d to 0 and for each i = 0, . . . , s – 1 and j = 0, . . . , t – 1 we compute a_ib_j and add it to the (i + j)-th digit of d. If this sum (call it σ) at the (i + j)-th location exceeds ℜ – 1, we find out q, r with σ = qℜ + r, r < ℜ. Then d_i+j is assigned r, and q is added to the (i + j + 1)-st location. If that addition results in a carry, we propagate the carry to higher locations until it gets fully absorbed in some word of d.

All these sound simple, but complications arise when we consider the fact that the sum of two 32-bit words (and a possible carry from the previous location) may be 33 bits long. For multiplication, the situation is even worse, because the product a_ib_j can be 64 bits long. Since our machine word can hold only 32 bits, it becomes problematic to hold all these intermediate sums and products to full precision. We assume that the least significant 32 bits are correctly returned and assigned to the output variable (ulong), whereas the leading 32 bits are lost.^[4] The most efficient way to keep track of these overflows is to use assembly instructions and this is what many number theory packages (like PARI and UBASIC) do. But this means that for every target architecture we have to write different assembly codes. Here we describe certain tricks that make it possible to grab the overflow information with only high-level languages, without sufficiently degrading the performance compared to assembly instructions.

^[4] This is the typical behaviour of a CPU that supports 2’s complement arithmetic.

Addition and subtraction

First consider the sum a_i + b_i. We compute the least significant 32 bits by assigning c_i = a_i + b_i. It is easy to see that an overflow occurs during this sum if and only if c_i < a_i. We set the output carry accordingly. Now, let us consider the situation when we have an input carry: that is, when we compute the sum c_i = a_i + b_i+1. Here an overflow occurs if and only if c_i ≤ a_i. Algorithm 3.1 performs this addition of words.

Algorithm 3.1. Addition of words

Input: Words a_i and b_i and the input carry .

Output: Word c_i and the output carry with a_i + b_i + γ_i = c_i + δ_iℜ.

Steps:

c_i := a_i + b_i.

if (γ_i) { c_i ++, δ_i := ( (c_i ≤ a_i) ? 1 : 0 ). } else { δ_i := ( (c_i < a_i) ? 1 : 0 ). }

Algorithm 3.1 assumes that c_i and a_i are stored in different memory words. If this is not the case, we should store a_i + b_i in a temporary variable and, after the second line, c_i should be assigned the value of this temporary variable. Note also that many processors provide an increment primitive which is faster than the general addition primitive. In that case, the statement c_i++ is preferable to c_i := c_i+1.

For subtraction, we proceed analogously from right to left and keep track of the borrow. Here the check for overflow can be done before the subtraction of words is carried out (and, therefore, no temporary variable is needed, if we assume that the output carry is not stored in the location of the operands).

Algorithm 3.2. Subtraction of words

Input: Words a_i and b_i and the input borrow .

Output: Word c_i and the output borrow with a_i – b_i – γ_i = c_i – δ_iℜ.

Steps:

if (γ_i) { δ_i := ( (a_i ≤ b_i) ? 1 : 0 ), c_i := a_i – b_i, c_i – –. }

else { δ_i := ( (a_i < b_i) ? 1 : 0 ), c_i := a_i – b_i. }

We urge the reader to develop the complete addition and subtraction procedures for multiple-precision integers, based on the above primitives for words.

Multiplication

The product of two 32-bit words can be as long as 64 bits, and we plan to (compute and) store this product in two words. Assuming the availability of a built-in 64-bit unsigned integer data type (which we will henceforth denote as ullong), this can be performed as in Algorithm 3.3.

Algorithm 3.3. Multiplication of words

Input: Words a and b.

Output: Words c and d with ab = cℜ + d.

Steps:

/* We use a temporary variable t of data type ullong */

t := (ullong)(a) * (ullong)(b), c := (ulong)(t ≫ 32), d := (ulong)t.

We use a temporary 64-bit integer variable t to store the product ab. The lower 32 bits of t is stored in d by simply typecasting, whereas the higher 32 bits of t is obtained by right-shifting t (the operator ≫) by 32 bits. This is a reasonable strategy given that we do not explore assembly-level instructions. Algorithm 3.4 describes a multiplication algorithm for two multiple-precision integer operands, that does not directly use the word-multiplying primitive of Algorithm 3.3.

The reader can verify easily that this code properly computes the product. We now highlight how this makes the computation efficient. The intermediate results are stored in the array t of 64-bit ullong. This means that after the 64-bit product a_ib_j of words a_i and b_j is computed (in the temporary variable T), we directly add T to the location t_i+j. If the sum exceeds ℜ² – 1 = 2⁶⁴ – 1, that is, if an overflow occurs, we should add ℜ to t_{i + j + 1} or equivalently 1 to t_i+j+2. This last addition is one of ullong integers and can be made more efficient, if this is replaced by ulong increments, and this is what we do using the temporary array u. Since the quadratic loop is the bottleneck of the multiplication procedure, it is absolutely necessary to make this loop as efficient as possible.

Algorithm 3.4. Multiplication of multiple-precision integers

Input: Integers a = (a_r–1 . . . a₀)_ℜ and b = (b_s–1 . . . b₀)_ℜ

Output: The product c = (c_r+s–1 . . . c₀)_ℜ = ab.

Steps:

/* Let T be a variable and t₀, . . . , t_r+s–1 an array of ullong variables */

/* Let v be a variable and u₀, . . . , u_r+s–1 an array of ulong variables */

Initialize the array locations c_i, t_i and u_i to 0 for all i = 0, . . . , r + s – 1.

/* The quadratic loop */
for (i = 0, . . . , r – 1) and (j = 0, . . . , s – 1) {
   T := (ullong)(a_i) * (ullong)(b_j).
   if ((t_i+j + = T) < T) u_i+j+2 ++.
}

/* Deferred normalization */
for (i = 0, . . . , r + s – 1) {
    if ((c_i + = u_i) < u_i) u_i+1 ++.
    v := (ulong)(t_i), if ((c_i + = v) < v) u_i+1++.
    v := (ulong)(t_i ≫ 32), if ((c_i+1 + = v) < v) u_i+2 ++.
}

After the quadratic loop, we do deferred normalization from the array of 64-bit double-words t_i to the array of 32-bit words c_i. This is done using the typecasting and right-shift strategy mentioned in Algorithm 3.3. We should also take care of the intermediate carries stored in the array u. The normalization loop takes a total time of O(r + s), whereas the quadratic loop takes time O(rs). If we had done normalization inside the quadratic loop itself, that would incur an additional O(rs) cost (which is significantly more than that of deferred normalization).

Squaring

If both the operands a and b of multiplication are same, it is not necessary to compute a_ib_j and a_jb_i separately. We should add to t_i+j the product , if i = j, or the product 2a_ia_j, if i < j. Note that 2a_ia_j can be computed by left shifting a_ia_j by one bit. This might result in an overflow which can be checked before shifting by looking at the 64th bit of a_ia_j. Algorithm 3.5 incorporates these changes.

Fast multiplication

For the multiplication of two multiple-precision integers, there are algorithms that are asymptotically faster than the quadratic Algorithms 3.4 and 3.5. However, not all these theoretically faster algorithms are practical for sizes of integers used in cryptology. Our practical experience shows that a strategy due to Karatsuba outperforms the quadratic algorithm, if both the operands are of roughly equal sizes and if the bit lengths of the operands are 300 or more. We describe Karatsuba’s algorithm in connection with squaring, where the two operands are same (and hence of the same size). Suppose we want to compute a² for a multiple-precision integer a = (a_r–1 . . . a₀)_ℜ. We first break a into two integers of almost equal sizes, namely, α := (a_r–1 . . . a_t)_ℜ and β := (a_t–1 . . . a₀)_ℜ, so that a = ℜ^tα + β. Now, a² = α²ℜ^2t + 2αβℜ^t + β² and 2αβ = (α² + β²) – (α – β)². We recursively invoke Karatsuba’s multiplication with operands α, β and α – β. Recursion continues as long as the operands are not too small and the depth of recursion is within a prescribed limit. One can check that Karatsuba’s algorithm runs in time O(r^{lg 3} lg r) = O(r^1.585 lg r) which is a definite improvement over the O(r²) running time taken by the quadratic algorithm.

Algorithm 3.5. The quadratic loop for squaring

for (i = 0, . . . , r – 1) and (j = i, . . . , r – 1) {
   T := (ullong)(a_i) * (ullong)(a_j).
   if (i ≠ j) {
      if (the 64th bit of T is 1) u_i+j+2 ++.
      T ≪= 1.
   }
   if ((t_i+j + = T) < T) u_i+j+2 ++.
}

The best-known algorithm for multiplication of two multiple-precision integers is based on the fast Fourier transform (FFT) techniques and has running time O^~(r). However, for integers used in cryptology this algorithm is usually not practical. Therefore, we will not discuss FFT multiplication in this book.

Division

Euclidean division with remainder of multiple-precision integers is somewhat cumbersome, although conceptually as difficult (that is, as simple) as the division procedure of decimal integers, taught in early days of school. The most challenging part in the procedure is guessing the next digit in the quotient. For decimal integers, we usually do this by looking at the first few (decimal) digits of the divisor and the dividend. This need not give us the correct digit, but something close to the same. In the case of ℜ-ary digits, we also make a guess of the quotient digit based on a few leading ℜ-ary digits of the divisor and the dividend, but certain precautions have to be taken to ensure that the guess is not too different from the correct one.

Suppose we are given positive integers a = (a_r–1 . . . a₀)_ℜ and b = (b_s–1 . . . b₀)ℜ with a_r–1 ≠ 0 and b_s–1 ≠ 0, and we want to compute the integers x = (x_r–s . . . x₀)_ℜ and y = (y_s–1 . . . y₀)_ℜ with a = xb + y, 0 ≤ y < b. First, we want that b_s–1 ≥ ℜ/2 (you’ll see why, later). If this condition is already not met, we force it by multiplying both a and b by 2^t for some suitable t, 0 < t < 32. In that case, the quotient remains the same, but the remainder gets multiplied by 2^t. The desired remainder can be later found out easily by right-shifting the computed remainder by t bits. The process of making b_s–1 ≥ ℜ/2 is often called normalization (of b). Henceforth, we will assume that b is normalized. Note that normalization may increase the word-size of a by 1.

Algorithm 3.6. Euclidean division of multiple-precision integers

Input: Integers a = (a_r–1 . . . a₀)_ℜ and b = (b_s–1 . . . b₀)_ℜ with r ≥ 3, s ≥ 2, a_r–1 ≠ 0, b_s–1 ≥ ℜ/2 and a ≥ b.

Output: The quotient x = (x_r–s . . . x₀)_ℜ = a quot b and the remainder y = (y_s–1 . . . y₀)_ℜ = a rem b of Euclidean division of a by b.

Steps:

Initialize the quotient digits x_i to 0 for i = 0, . . . , r – s.

/* The main loop */
for (i = r – 1, . . . , s) {
   /* Initial check */
   if (a_i ≥ b_s–1) and (a ≥ bℜ^i–s+1) { x_i–s+1++, a := a – bℜ^i–s+1. }

   /* Guess the next digit of quotient */
   if (a_i = b_s–1) x_i–s := ℜ – 1, else x_i–s := ⌊(a_iℜ + a_i–1)/b_s–1)⌋.
   if (x_i–s ≠ 0)
       while (x_i–s(b_s–1ℜ + b_s–2) > a_iℜ² + a_i–1ℜ + a_i–2) x_i–s– –.

   /* Modify the guess to the correct value */
   z := x_i–sbℜ^i–s.
   if (a < z) { x_i–s– –, z := z – bℜ^i–s. }
   a := a – z.
}

/* Here the quotient may be one less than the actual value */
if (a ≥ b) { a := a – b, x := x+1. }
y := a.

Algorithm 3.6 implements multiple-precision division. It is not difficult to prove the correctness of the algorithm. We refrain from doing so, but make some useful comments. The initial check inside the main loop may cause the increment of x_i–s+1. This may lead to a carry which has to be adjusted to higher digits. This carry propagation is not mentioned in the code for simplicity. Since b is assumed to be normalized, this initial check needs to be carried out only once; that is, for a non-normalized b we have to replace the if statement by a while loop. This is the first advantage of normalization. In the first step of guessing the quotient digit x_i–s, we compute ⌊(a_iℜ + a_i–1)/b_s–1⌋ using ullong arithmetic. At this point, the guess is based only on two leading digits of a and one leading digit of b. In the while loop, we refine this guess by considering one more digit of a and b each. Since b is normalized, this while loop is executed no more than twice (the second advantage of normalization). The guess for x_i–s made in this way is either equal to or one more than the correct value which is then computed by comparing a with x_i–sbℜ^i–s. The running time of the algorithm is O(s(r – s)). For a fixed r, this is maximum (namely O(r²)) when s ≈ r/2.

Bit-wise operations

Multiplication and division by a power of 2 can be carried out more efficiently using bit operations (on words) instead of calling the general procedures just described. It is also often necessary to compute the bit length of a non-zero multiple-precision integer and the multiplicity of 2 in it. In these cases also, one should use bit operations for efficiency. For these implementations, it is advantageous to maintain precomputed tables of the constants 2ⁱ, i = 0, . . . , 31, and of 2ⁱ – 1, i = 0, . . . , 32, rather than computing them in situ every time they are needed. In Algorithm 3.7, we describe an implementation of multiplication by a power of 2 (that is, the left shift operation). We use the symbols OR, ≫ and ≪ to denote bit-wise or, right shift and left shift operations on 32-bit integers.

Algorithm 3.7. Left-shift of multiple-precision integers

Input: Integer a = (a_r–1 . . . a₀)_ℜ ≠ 0, a_r–1 ≠ 0, and .

Output: The integer c = (c_s–1 . . . c₀)_ℜ = a · 2^t, c_s–1 ≠ 0.

Steps:

u := t quot 32, v := t rem 32.
if (v = 0) { /* Word-by-word copy */
    s := r + u.
    for (i = r – 1, . . . , 0) c_i+u := a_i.
} else { /* Use shifts of individual words */
    s := r + u + 1, c_s–1 := 0.
    for (i = r – 1, . . . , 0) { c_i+u+1 := c_i+u+1 OR (a_i ≫ (32 – v)), c_i+u := (a_i ≪ v). }
    if (c_s–1 = 0) s– –.
}
for (i = u – 1, . . . , 0) c_i := 0.

Unless otherwise mentioned, we will henceforth forget about the above structural representation of multiple-precision integers and denote arithmetic operations on them by the standard symbols (+, –, * or · or ×, quot, rem and so on).

3.3.3. GCD

Computing the greatest common divisor of two (multiple-precision) integers has important applications. In this section, we assume that we want to compute the (positive) gcd of two positive integers a and b. The Euclidean gcd loop comprising repeated division (Proposition 2.15) is not usually the most efficient way to compute integer gcds. We describe the binary gcd algorithm that turns out to be faster for practical bit sizes of the operands a and b. If a = 2^ra′ and b = 2^sb′ with a′ and b′ odd, then gcd(a, b) = 2^min(r,s) gcd(a′, b′). Therefore, we may assume that a and b are odd. In that case, if a > b, then gcd(a, b) = gcd(a – b, b) = gcd((a – b)/2^t, b), where t := v₂(a – b) is the multiplicity of 2 in a – b. Since the sum of the bit sizes of (a – b)/2^t and b is strictly smaller than that of a and b, repeating the above computation eventually terminates the algorithm after finitely many iterations.

Algorithm 3.8. Extended binary gcd

Input: Two positive integers a, b with a ≥ b and b odd.

Output: Integers d, u and v with d = gcd(a, b) = ua + vb > 0. If (a, b) ≠ (1, 1), then |u| < b and |v| < a.

Steps:

/* Initial reduction */
Compute integers q and r satisfying a = bq + r with 0 ≤ r < b.
if (r = 0) { (d, u, v) := (b, 0, 1), return. }

/* Initialize */
(x, y) := (b, r).
v₁ := 0, v₂ := 1.

/* Main loop */
while (1) {
   if (x ≥ y) {
      x := x – y.   /* x is even here except perhaps in the first iteration */
      v₁ := v₁ – v₂.
      if (x = 0) {   /* End loop and return d, u and v */
         u₂ := (y – v₂r)/b.
         (d, u, v) := (y, v₂, u₂ – v₂q).
         Return.
      } else if (x is even) {
         t := v₂(x), x := x/2^t.    /* x is odd here */
         for (i = 1, . . . , t) {
            if (v₁ is odd) v₁ := v₁ + b.
            v₁ := v₁/2.
         }
       }
     } else { /* if (x < y) */
       y := y – x, v₂ := v₂ – v₁.    /* y is even here */
       t := v₂(y), y := y/2^t.   /* y is odd here */
       for (i = 1, . . . , t) {
          if (v₂ is odd) v₂ := v₂ + b.
          v₂ := v₂/2.
       }
   }
}

Multiple-precision division is much costlier than subtraction followed by division by a power of 2. This is why the binary gcd algorithm outperforms the Euclidean gcd algorithm. However, if the bit sizes of a and b differ reasonably, it is preferable to use Euclidean division once and replace the pair (a, b) by (b, a rem b), before entering the binary gcd loop. Even when the original bit sizes of a and b are not much different, one may carry out this initial reduction, because in this case Euclidean division does not take much time.

Recall from Proposition 2.16 that if d := gcd(a, b), then for some integers u and v we have d = ua + vb. Computation of d along with a pair of integers u, v is called the extended gcd computation. Both the Euclidean and the binary gcd loops can be augmented to compute these integers u and v. Since binary gcd is faster than Euclidean gcd, we describe an implementation of the extended binary gcd algorithm. We assume that 0 < b ≤ a and compute u and v in such a way that if (a, b) ≠ (1, 1), then |u| < b and |v| < a. Algorithm 3.8, which shows the details, requires b to be odd. The other operand a may also be odd, though the working of the algorithm does not require this.

In order to prove the correctness of Algorithm 3.8, we introduce the sequence of integers x_k, y_k, u_1,k, u_2,k, v_1,k and v_2,k for k = 0, 1, 2, . . . , initialized as:

x₀ := b,	u_{1, 0} := 1,	v_{1, 0} := 0,
y₀ := r,	u_{2, 0} := 0,	v_{2, 0} := 1.

During the k-th iteration of the main loop, k = 1, 2, . . . , we modify the values x_k–1, y_k–1, u_1,k–1, u_2,k–1, v_1,k–1 and v_2,k–1 to x_k, y_k, u_1,k, u_2,k, v_1,k and v_2,k in such a way that we always maintain the relations:

u_1,kx₀ + v_1,ky₀	=	x_k,
u_2,kx₀ + v_2,ky₀	=	y_k.

The main loop terminates when x_k = 0, and at that point we have the desired relation y_k = gcd(b, r) = u_2,kb + v_2,kr. For the updating during the k-th iteration, we assume that x_k–1 ≥ y_k–1. (The converse inequality can be handled analogously.) The x and y values are updated as x_k := (x_k–1 – y_k–1)/2^t_k, y_k := y_k–1, where t_k := v₂(x_k–1 – y_k–1). Thus, we have u_2,k = u_2,k–1 and v_2,k = v_2,k–1, whereas if t_k > 0, we write

All the expressions within square brackets in the last equation are integers, since x₀ = b is odd. Note that updating the variables in the loop requires only the values of these variables available from the previous iteration. Therefore, we may drop the prefix k and call these variables x, y, u₁, u₂, v₁ and v₂. Moreover, the variables u₁ and u₂ need not be maintained and updated in every iteration, since the updating procedure for the other variables does not depend on the values of u₁ and u₂. We need the value of u₂ only at the end of the main loop, and this is available from the relation y = u₂b + v₂r maintained throughout the loop. The formula u₂b + v₂r = y = gcd(b, r) is then combined with the relations a = qb + r and gcd(a, b) = gcd(b, r) to get the final relation gcd(a, b) = v₂a + (u₂ – v₂q)b.

Algorithm 3.8 continues to work even when a < b, but in that case the initial reduction simply interchanges a and b and we forfeit the possibility of the reduction in size of the arguments (x and y) caused by the initial Euclidean division.

Finally, we remove the restriction that b is odd. We write a = 2^ra′ and b = 2^sb′ with a′, b′ odd and call Algorithm 3.8 with a′ and b′ as parameters (swapping a′ and b′, if a′ < b′) to compute integers d′, u′, v′ with d′ = gcd(a′, b′) = u′a′ + v′b′. Without loss of generality, assume that r ≥ s. Then d := gcd(a, b) = 2^sd′ = u′(2^sa′) + v′b. If r = s, then 2^sa′ = a and we are done. So assume that r > s. If u′ is even, we can extract a power of 2 from u′ and multiply 2^sa′ by this power. So let’s say that we have a situation of the form for some integers and , with odd, and for s ≤ t < r. We can rewrite this as . Since is even, this gives us , where τ > t and where is odd or τ = r. Proceeding in this way, we eventually reach a relation of the form d = u(2^ra′) + vb = ua + vb. It is easy to check that if (a′, b′) ≠ (1, 1), then the integers u and v obtained as above satisfy |u| < b and |v| < a.

3.3.4. Modular Arithmetic

So far, we have described how we can represent and work with the elements of . In cryptology, we are seemingly more interested in the arithmetic of the rings for multiple-precision integers n. We canonically represent the elements of by integers between 0 and n – 1.

Let a, . In order to compute a + b in , we compute the integer sum a + b, and, if a + b ≥ n, we subtract n from a + b. This gives us the desired canonical representative in . Similarly, for computing a – b in , we subtract b from a as integers, and, if the difference is negative, we add n to it. For computing , we multiply a and b as integers and then take the remainder of Euclidean division of this product by n.

Note that is invertible (that is, ) if and only if gcd(a, n) = 1. For , a ≠ 0, we call the extended (binary) gcd algorithm with a and n as the arguments and get integers d, u, v satisfying d = gcd(a, n) = ua+vn. If d > 1, a is not invertible modulo n. Otherwise, we have ua ≡ 1 (mod n), that is, a^–1 ≡ u (mod n). The extended gcd algorithm indeed returns a value of u satisfying |u| < n. Thus if u > 0, it is the canonical representative of a^–1, whereas if u < 0, then u + n is the canonical representative of a^–1.

Modular exponentiation

Another frequently needed operation in is modular exponentiation, that is, the computation of a^e for some and . Since a⁰ = 1 for all and since a^e = (a^–1)^–e for e < 0 and , we may assume, without loss of generality, that . Computing the integral power a^e followed by taking the remainder of Euclidean division by n is not an efficient way to compute a^e in . Instead, after every multiplication, we reduce the product modulo n. This keeps the size of the intermediate products small. Furthermore, it is also a bad idea to compute a^e as (· · ·((a·a)·a)· · ·a) which involves e – 1 multiplications. It is possible to compute a^e using O(lg e) multiplications and O(lg e) squarings in , as Algorithm 3.9 suggests. This algorithm requires the bits of the binary expansion of the exponent e, which are easily obtained by bit operations on the words of e.

The for loop iteratively computes b_i := a^{(e_r–1 ... e_i)₂} (mod n) starting from the initial value b_r := 1. Since (e_r–1 . . . e_i)₂ = 2(e_r–1 . . . e_i+1)₂ + e_i, we have (mod n). This establishes the correctness of the algorithm. The squaring (b²) and multiplication (ba) inside the for loop of the algorithm are computed in (that is, as integer multiplication followed by reduction modulo n). If we assume that e_r–1 = 1, then r = ⌈lg e⌉. The algorithm carries out r squares and ρ ≤ r multiplications in , where ρ is the number of bits of e, that are 1. On an average ρ = r/2. Algorithm 3.9 runs in time O((log e)(log n)²). Typically, e = O(n), so this running time is O((log n)³).

Algorithm 3.9. Modular exponentiation: square-and-multiply algorithm

Input: , .

Output: .

Steps:

Let the binary expansion of e be e = (e_r–1 . . . e₁e₀)₂, where each .
b := 1.
for (i = r – 1, . . . , 0) {
b := b² (mod n). /* Squaring */
if (e_i = 1) b := ba (mod n). /* Multiplication */
}

Now, we describe a simple variant of this square-and-multiply algorithm, in which we choose a small t and use the 2^t-ary representation of the exponent e. The case t = 1 corresponds to Algorithm 3.9. In practical situations, t = 4 is a good choice. As in Algorithm 3.9, multiplication and squaring are done in .

Algorithm 3.10. Modular exponentiation: windowed square-and-multiply algorithm

Input: , .

Output: .

Steps:

Let e = (e_r–1 . . . e₁e₀)_2^t, where each .
Compute and store  for l = 0, 1, . . . , 2^t – 1.   /* Precomputation */
b := 1.
for (i = r – 1, . . . , 0) {
   for (j = 1, . . . , t) b := b² (mod n).    /* Squaring */
   b := ba^e_i (mod n).     /* Multiplication: Read a^e_i from the precomputed table */
}

In Algorithm 3.10, the powers a^l, l = 0, 1, . . . , 2^t – 1, are precomputed using the formulas: a⁰ = 1, a¹ = a and a^l = a^l–1 · a for l ≥ 2. The number of squares inside the for loop remains (almost) the same as in Algorithm 3.9. However, the number of multiplications in this loop reduces at the expense of the precomputation step. For example, let n be an integer of bit length 1024 and let e ≈ n. A randomly chosen e of this size has about 512 one-bits. Therefore, the for loop of Algorithm 3.9 does about 512 multiplications, whereas with t = 4 Algorithm 3.10 does only 1024/4 = 256 multiplications with the precomputation step requiring 14 multiplications. Thus, the total number of multiplications reduces from (about) 512 to 14 + 256 = 270.

Montgomery exponentiation

During a modular exponentiation in , every reduction (computation of remainder) is done by the fixed modulus n. Montgomery exponentiation exploits this fact and speeds up each modular reduction at the cost of some preprocessing overhead.

Assume that the storage of n requires s ℜ-ary digits, that is, n = (n_s–1 . . . n₀)_ℜ (with n_s–1 ≠ 0). Take R := ℜ^s = 2^32s, so that R > n. As is typical in most cryptographic situations, n is an odd integer (for example, a big prime or a product of two big primes). Then gcd(ℜ, n) = gcd(R, n) = 1. Use the extended gcd algorithm to precompute n′ := –n^–1 (mod ℜ).

We associate with , where (mod n). Since R is invertible modulo n, this association gives a bijection of onto itself. This bijection respects the addition in : that is, in . Multiplication in , on the other hand, corresponds to , and can be implemented as Algorithm 3.11 suggests.

Algorithm 3.11. Montgomery multiplication

Input: and (Montgomery representations of x, ).

Output: Montgomery representation of .

Steps:

Montgomery multiplication works as follows. In the first step, it computes the integer product . The subsequent for loop computes (mod n). Since n′ ≡ –n^–1 (mod ℜ), the i-th iteration of the loop makes w_i = 0 (and leaves w_i–1, . . . ,w₀ unchanged). So when the for loop terminates, we have w₀ = w₁ = · · · = w_s–1 = 0: that is, is a multiple of ℜ^s = R. Therefore, is an integer. Furthermore, this is obtained by adding to a multiple of n: that is, for some integer k ≥ 0. Since R is coprime to n, it follows that (mod n). But this may be bigger than the canonical representative of . Since k is an integer with s ℜ-ary digits (so that k < R) and and n < R, it follows that . Therefore, if exceeds n – 1, a single subtraction suffices.

Computation of requires ≤ s² single-precision multiplications. One can use the optimized Algorithm 3.4 for that purpose. In case of squaring, and further optimizations (say, in the form of Karatsuba’s method) can be employed.

Each iteration of the for loop carries out s + 1 single-precision multiplications. (The reduction modulo ℜ is just returning the more significant word in the two-word product w_in′.) Since, the for loop is executed s times, Algorithm 3.11 performs a total of ≤ s² + s(s+1) = 2s² + s single-precision multiplications.

Integer multiplication (Algorithm 3.4) followed by classical modular reduction (Algorithm 3.6) does almost an equal number of single-precision multiplications, but also O(s) divisions of double-precision integers by single-precision ones. It turns out that the complicated for loop of Algorithm 3.6 is slower than the much simpler loop in Algorithm 3.11. But if precomputations in the Montgomery multiplication are taken into account, we do not tend to achieve a speed-up with this new technique. For modular exponentiations, however, precomputations need to be done only once: that is, outside the square-and-multiply loop, and Montgomery multiplication pays off. In Algorithm 3.12, we rewrite Algorithm 3.9 in terms of the Montgomery arithmetic. A similar rewriting applies to Algorithm 3.10.

Algorithm 3.12. Montgomery exponentiation

Input: , .

Output: b = a^e (mod n).

Steps:

/* Precomputations */
n′ := –n (mod ℜ). , .

/* The square-and-multiply loop */

[View full size image]

Exercise Set 3.3

3.8

Let

, ℜ > 1. Show that every

can be represented uniquely as a tuple (a_s–1, . . . , a₁, a₀) for some

(depending on a) with

a = a_s–1ℜ^s–1 + · · · + a₁ℜ + a₀,

0 ≤ a_i < ℜ for all i and a_s–1 ≠ 0. In this case, we write a as (a_s–1 . . . a₀)_ℜ or simply as a_s–1 . . . a₀, when ℜ is understood from the context. ℜ is called the radix or base of this representation, a_s–1, . . . , a₀ the (ℜ-ary) digits of a, a_s–1 the most significant digit, a₀ the least significant digit and s the size of a with respect to the radix ℜ.

3.9

Let

. Show that every

can be written uniquely as

a = a_sR^s + a_s–1R^s–1 + · · · + a₁R + a₀

with each .

3.10

Negative radix Show that every integer can be written as

a = a_s(–2)^s + a_s–1(–2)^s–1 + · · · + a₁(–2) + a₀

with . Moreover, if we force that a_s ≠ 0 for a ≠ 0 and that s = 0 for a = 0, argue that this representation is unique.

3.11

Investigate the relative merits and demerits of the following three representations (in C) of multiple-precision integers needed for cryptography. In each case, we have room for storing 256 ℜ-ary words, the actual size and a sign indicator. In the second and third representations, we use two extra locations (sizeIdx and signIdx) in the digit array for holding the size and sign information.

/* Representation 1 */
typedef struct {
   int size;
   boolean sign;
   ulong digits[256];
} cryptInt1;

/* Representation 2 */
typedef ulong cryptInt2[258];
#define signIdx 0
#define sizeIdx 1

/* Representation 3 */
typedef ulong cryptInt3[258];
#define signIdx 256
#define sizeIdx 257

Remark: We recommend the third representation.

3.12

Write an algorithm that prints a multiple-precision integer in decimal and an algorithm that accepts a string of decimal digits (optionally preceded by a + or – sign) and stores the corresponding integer as a multiple-precision integer. Also write algorithms for input and output of multiple-precision integers in hexadecimal, octal and binary.

3.13

Write an algorithm which, given two multiple-precision integers a and b, compares the absolute values |a| and |b|. Also write an algorithm to compare a and b as signed integers.

3.14

Write an algorithm that uses the Euclidean gcd loop (Proposition 2.15) to compute the gcd d of two integers a and b. (Observe that gcd(a, b) = gcd(b, a rem b) for b ≠ 0.)
Modify the Euclidean gcd algorithm of Part (a), so that for given integers a, b we obtain d, u, v with d = gcd(a, b) = ua + vb.

3.15

Describe a representation of rational numbers with exact multiple-precision numerators and denominators. Implement the arithmetic (addition, subtraction, multiplication and division) of rational numbers under this representation.

3.16

Sliding window exponentiation Suppose we want to compute the modular exponentiation a^e (mod n). Consider the following variant of the square-and-multiply algorithm: Choose a small t (say, t = 4) and precompute a^{2^t–1}, a^{2^t–1+1}, . . . , a^{2^t–1} modulo n. Do squaring for every bit of e, but skip the multiplication for zero bits in e. Whenever a 1 bit is found, consider the next t bits of e (including the 1 bit). Let these t bits represent the integer l, 2^t–1 ≤ l ≤ 2^t – 1. Multiply by a^l (mod n) (after computing usual t squares) and move right in e by t bit positions. Argue that this method works and write an algorithm based on this strategy. What are the advantages and disadvantages of this method over Algorithm 3.10?

3.17

Suppose we want to compute a^eb^f (mod n), where both e and f are positive r-bit integers. One possibility is to compute a^e and b^f modulo n individually, followed by a modular multiplication. This strategy requires the running time of two exponentiations (neglecting the time for the final multiplication). In this exercise, we investigate a trick to reduce this running time to something close to 1.25 times the time for one exponentiation. Precompute ab (mod n). Inside the square-and-multiply loop, either skip the multiplication or multiply by a, b or ab, depending upon the next bits in the two exponents e and f. Complete the details of this algorithm. Deduce that, on an average, the running time of this algorithm is as declared above.

3.18

Let

, m ≠ 1. An addition chain for m of length l is a sequence 1 = a₁, a₂, . . . , a_l = m of natural numbers such that for every index i, 2 ≤ i ≤ l, there exist indices i₁, i₂ < i with a_i = a_i₁ + a_i₂. (It is allowed to have i₁ = i₂.)

If 1 = a₁, a₂, . . . , a_l = m is an addition chain for m and if j₁, j₂, . . . , j_l is a permutation of 1, 2, . . . , l with a_j₁ ≤ a_j₂ ≤ · · · ≤ a_{j_l}, show that a_j₁, a_j₂, . . . , a_{j_l} is also an addition chain for m. It, therefore, suffices to consider sorted addition chains only.
Show that m has an addition chain of length ≤ 2 ⌈lg m⌉. [H]
Let G be a (multiplicative) group and . Design an algorithm for computing g^m given an addition chain for m. What is the complexity of the algorithm (in terms of the length of the given addition chain)?
Show that Algorithms 3.9 and 3.10 use addition chains for e of lengths ≤ 2 ⌈lg e⌉.

3.4. Elementary Number-theoretic Computations

Now that we know how to work in and in the residue class rings , , we address some important computational problems associated with these rings. In this chapter, we restrict ourselves only to those problems that are needed for setting up various cryptographic protocols.

3.4.1. Primality Testing

One of the simplest and oldest questions in algorithmic number theory is to decide if a given integer , n > 1, is prime or composite. Practical primality testing algorithms are based on randomization techniques. In this section, we describe the Monte Carlo algorithm due to Miller and Rabin. The obvious question that comes next is to find one (or all) of the prime factors of an integer, deterministically or probabilistically proven to be composite. This is the celebrated integer factorization problem and will be formally introduced in Section 4.2. In spite of the apparent proximity between the primality testing and the integer factoring problems, they currently have widely different (known) complexities. Primality testing is easy and thereby promotes efficient setting up of cryptographic protocols. On the other hand, the difficulty of factoring integers protects these protocols against cryptanalytic attacks.

Definition 3.2.

Let n be an odd integer greater than 1 and let with gcd(a, n) = 1. Then n is called a pseudoprime to the base a, if a^n–1 ≡ 1 (mod n).

By Fermat’s little theorem, a prime p is a pseudoprime to every base with gcd(a, p) = 1. However, the converse of this is not true. By Exercise 3.19, n is not a pseudoprime to at least half of the bases in , provided that there is at least one such base in . Unfortunately, there exist composite integers m, known as Carmichael numbers, such that m is a pseudoprime to every base . The smallest Carmichael number is 561 = 3 × 11 × 17. Exercises 3.21 and 3.22 investigate some properties of these numbers. Though Carmichael numbers are not very abundant in nature (), they are still infinite in number. So a robust primality test requires n to satisfy certain constraints in addition to being a pseudoprime to one or more bases. The following constraint is due to Solovay and Strassen.

Definition 3.3.

Let n be an odd integer > 1 and let with gcd(a, n) = 1. Then n is called an Euler pseudoprime or a Solovay–Strassen pseudoprime to the base a, if (mod n), where is the Jacobi symbol (Definition 2.32). Clearly, an Euler pseudoprime to the base a is also a pseudoprime to the base a.

By Euler’s criterion (Proposition 2.21), if p is a prime and gcd(a, p) = 1, then p is an Euler pseudoprime to the base a. The converse in not true, in general, but if n is composite, then n is an Euler pseudoprime to at most φ(n)/2 bases in (Exercise 3.20). This, in turn, implies that if n is an Euler pseudoprime to t randomly chosen bases in , then the chance that n is composite is no more than 1/2^t. This observation leads to a Monte Carlo algorithm for testing the primality of an integer, where the probability of error (1/2^t) can be made arbitrarily small by choosing large values of t. A more efficient algorithm can be developed using the following concept due to Miller and Rabin.

Definition 3.4.

Let n be an odd integer > 1 with n – 1 = 2^rn′, r := v₂(n – 1) > 0, n′ odd, and let with gcd(a, n) = 1. Then n is called a strong pseudoprime to the base a, if either a^n′ ≡ 1 (mod n) or a^2ⁱn′ ≡ –1 (mod n) for some i, 0 ≤ i < r. It is clear that if n is a strong pseudoprime to the base a, then n is also a pseudoprime to the base a. What is less evident but still true is that if n is a strong pseudoprime to the base a, then n is also an Euler pseudoprime to the base a.

The rationale behind this definition is the following. If for some we have a^n–1 ≢ 1 (mod n), we conclude with certainty that n is composite. So assume that a^n–1 ≡ 1 (mod n) and consider the powers b_i := a^2ⁱn′ (mod n) for i = 0, 1, . . . , r to see how the sequence b₀, b₁, . . . eventually reaches b_r ≡ 1 (mod n). If b₀ ≡ 1 (mod n) already, this dynamics is clear. If, on the other hand, we have an i such that b_i ≢ 1 (mod n), whereas b_i+1 ≡ 1 (mod n), then b_i is a square root of 1 modulo n. If n is a prime, the only square roots of 1 modulo n are ±1 and so n must be a strong pseudoprime to the base a. On the other hand, if n is composite but not the power of a prime, then 1 has at least two non-trivial square roots (that is, square roots other than ±1) modulo n (Exercise 3.30). We hope to find one such non-trivial square root of 1 in the sequence b₀, b₁, . . . , b_r–1 and if we are successful, the compositeness of n is proved with certainty.

A complete residue system modulo an odd composite n contains at most n/4 bases to which n is a strong pseudoprime. The proof of this fact is somewhat involved (though elementary) and can be found elsewhere, for example, in Chapter V of Koblitz [153]. Here, we concentrate on the Monte Carlo Algorithm 3.13 known as the Miller–Rabin primality test and based on this observation.

Algorithm 3.13. Miller–Rabin primality test

Input: An odd integer and an acceptable probability δ of failure.

Output: A certificate that either “n is composite” or “n is prime”.

Steps:

Find out n′ and r such that n – 1 = 2^rn′ with  and n′ odd.
Determine the number t of iterations, so that the probability of failure is ≤ δ.
for (j = 1, . . . , t) {
   Choose a random base a, 1 < a < n.
   b := a^n′  (mod n).   /* Compute b₀ */
   if (b ≢ 1 (mod n)) {
      i := 0.
      while (i < r – 1) and (b ≢ –1 (mod n)) {
         i++, b := b² (mod n).    /* Compute b_i by squaring b_i–1 */
         if (b ≡ 1 (mod n)) { Return “n is composite”. }
      }
      if (b ≢ –1 (mod n)) { Return “n is composite”. }
   }
}
Return “n is prime”.

Whenever Algorithm 3.13 outputs n is composite, it is correct. On the other hand, if it certifies n as prime, there is a probability δ that n is composite. This probability can be made very small by choosing a suitably large value of the iteration count t. For cryptographic applications, δ ≤ 1/2⁸⁰ is considered sufficiently safe. In view of the first statement of the last paragraph, we can take t = 40 to meet this error bound. In practice, much smaller values of t offer the desired confidence. For example, if n is of bit length 250, 500, 750 or 1000, the respective values t = 12, 6, 4 and 3 suffice.

Although, in Algorithm 3.13, we have chosen a to be an arbitrary integer between 2 and n – 2, there is apparently no harm, if we choose a randomly in the interval 2 ≤ a < 2³². In fact, such a choice of single-precision bases is desirable, because that makes the exponentiation a^n′ (mod n) more efficient (See Algorithm 3.9). A typical cryptographic application loads at start-up a precalculated table of small primes (say, the first thousand primes). Choosing the bases randomly from this list of small primes is indeed a good idea.

Deterministic primality proving

While the Miller–Rabin algorithm settles the primality testing problem in a practical sense, it is, after all, a randomized algorithm. It is interesting, at the minimum theoretically, to investigate the deterministic complexity of primality testing. There has been a good amount of research in this line. Let us sketch here the history of deterministic primality proving, without going to rigorous mathematical details.

One natural strategy to check for the primality of a positive integer n is to factor it. However, factoring integers is a computationally difficult problem. Primality proving has been found to be a much easier computational exercise. That is, one need not factorize n explicitly in order to claim about the primality of n.

The (seemingly) first modern primality testing algorithm is due to Miller[204]. This algorithm is deterministic polynomial-time, provided that the extended Riemann hypothesis or ERH (Conjecture 2.3) is true. Since the ERH is still an unsolved problem in mathematics, it cannot be claimed with certainty if Miller’s test is really a polynomial-time algorithm. Rabin [248] provided a version of Miller’s test which is unconditionally polynomial-time, but is, at the same time, randomized. This is what we have discussed earlier under the name Miller–Rabin primality test. This is a Monte Carlo algorithm which produces the answer no (composite) with certainty, but the answer yes (prime) with some (small) probability of error. Solovay and Strassen’s test [287] based on Definition 3.3 is another no-biased randomized polynomial-time primality test and can be made deterministically polynomial-time under the ERH.

Adleman and Huang [3], using the work of Goldwasser and Kilian [116], provide a yes-biased randomized primality-proving algorithm that runs in expected polynomial time unconditionally. Adleman et al. [4] propose the first deterministic algorithm that runs unconditionally in time less than fully exponential (in log n). Its (worst) running time is (ln n)^{O(ln ln ln n)}, which is still not polynomial. (The exponent ln ln ln n grows very slowly with n, but still is not a constant.)

In August 2002, Agarwal, Kayal and Saxena came up with the first deterministic primality testing algorithm that runs in polynomial time unconditionally, that is, under no unproven assumptions. This algorithm, popularly abbreviated as the AKS algorithm, is based on the observation that n is prime if and only if (X + a)ⁿ ≡ Xⁿ + a (mod n) for every (Exercise 3.26). A naive application of this observation requires computing an exponential number of coefficients in the binomial expansion of (X + a)ⁿ. The AKS algorithm gets around with this difficulty by checking the new congruence

Equation 3.2

for some polynomial h(X) of small degree. Here the notation (mod n, h(X)) means modulo the ideal of . If deg h(X) is bounded by a polynomial in log n, (X + a)ⁿ (and also Xⁿ + a) can be computed modulo n, h(X) in polynomial time. However, reduction modulo h(X) may allow a composite n to satisfy the new congruence. Agarwal et al. took h(X) := X^r –1 for some prime r = O(ln⁶ n) with r – 1 having a prime divisor ln n. From a result in analytic number theory due to Fouvry, such a prime r always exists. Congruence (3.2) is verified for this h(X) and for at most ln n values of a. An elementary proof presented in Agarwal et al. [5] demonstrates that this suffices to conclude deterministically and unconditionally about the primality of n. The AKS algorithm in this form runs in time O^~(ln¹² n).

Lenstra and Pomerance [175] have reduced the running time of the AKS algorithm to O^~(ln⁶ n). The AKS paper comes with another conjecture which, if true, yields a O^~(ln³ n) deterministic primality-proving algorithm.

Conjecture 3.1. AKS conjecture

Let n be an odd integer > 1, and with r  n. If

(X – 1)ⁿ ≡ Xⁿ – 1 (mod n, X^r – 1),

then either n is prime or n² ≡ 1 (mod r).

It remains an open question whether a future version of the AKS algorithm would supersede the Miller–Rabin test in terms of performance. As long as the answers are not favourable to the AKS algorithm, these new theoretical endeavours do not seem to have sufficient impacts on cryptography. Primes certified by the Miller–Rabin test are at present secure enough for all applications. Nonetheless, the AKS breakthrough has solid theoretical implications and deserves mention in a prime context.

3.4.2. Generating Random Primes

If a random prime of a given bit length t is called for, we can keep on generating random odd integers of bit length t and check these integers for primality using the Miller–Rabin test. The prime number Theorem 2.20 ascertains that after O(t) iterations we expect to find a prime. A somewhat similar but reasonably faster algorithm is discussed in Exercise 4.14. We will henceforth call random primes of a given bit length and having no additional imposed properties as naive primes. Naive primes are often not cryptographically secure, because the primes used in many protocols should satisfy certain properties in order to preclude some known cryptanalytic attacks.

Definition 3.5.

Let p be an odd prime. Then p is called a safe prime, if (p – 1)/2 is also a prime, whereas p is called a strong prime, if

p – 1 has a large prime divisor, say, q,
p + 1 has a large prime divisor, say, q′, and
q – 1 has a large prime divisor, say, q″.

In cryptography, a large prime divisor typically refers to one with bit length ≥ 160.

A random safe prime of a given bit length t can be found by generating a random sequence of natural numbers n congruent to 3 modulo 4 and of bit length t, until one is found for which both n and (n – 1)/2 are primes (as certified by the Miller–Rabin primality test). The prime number theorem once again implies that this search is expected to terminate after O(t²) iterations.

For generating a random strong prime p of bit length t, we first generate q′ and q″ and then q and finally p. (See the notations of Definition 3.5.) Algorithm 3.14 describes Gordon’s algorithm in which the bit lengths l and l′ of q and q′ are nearly t/2 and the bit length l″ of q″ is slightly smaller than l′. In our concrete implementation of the algorithm, we choose l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20 and l″ := ⌈t/2⌉ – 22. If t is sufficiently large (say, t ≥ 400), the prime divisors q, q′ and q″ are then cryptographically large.

The simple check that Gordon’s algorithm correctly computes a strong prime of bit length t with q, q′ and q″ as in Definition 3.5 is based on Fermat’s little theorem and is left to the reader. Note that with our choice of l, l′ and l″, the loop variables i and j run through single-precision values only, thereby making arithmetic involving them efficient. Also note that the ranges over which i and j vary are sufficiently large so that we expect the (outer) while loop to be executed only once. This implementation has a tendency to generate smaller values of q and p (with the given bit sizes). In practice, this is not a serious problem and can be avoided, if desired, by choosing random values of i and j from the indicated ranges.

Algorithm 3.14. Gordon’s strong-prime generator

Input: , t ≥ 400.

Output: A strong prime p of bit length t.

Steps:

l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20, l″ := ⌈t/2⌉ – 22.

while (1) {
    Find a (random) naive prime q′ of bit length l′.
    Find a (random) naive prime q″ of bit length l″.
    for (i = ⌈(2^l–1 – 1)/2q″⌉, . . . , ⌊(2^l – 2)/2q″⌋) {                 /* Search for q */
       q := 2iq″ + 1.
       if (q is prime) {
          p′ := 2((q′)^{q – 2} mod q)q′ – 1.
          for (j = ⌈(2^t–1 – p′)/2qq′⌉, . . . , ⌊(2^t – 1 – p′)/2qq′⌋) {     /* Search for p */
             p := p′ + 2jqq′.
             if (p is prime) { Return }
          }
       }
    }
}

Gordon’s algorithm takes only nominally more expected running time than that needed by the algorithm discussed at the beginning of Section 3.4.2 for generating naive primes of the same bit length. On the other hand, safe primes are much costlier to generate and may be avoided, unless the situation specifically demands their usage.

3.4.3. Modular Square Roots

Determination of square roots modulo a prime p is frequently needed in cryptographic applications. In this section, we assume that p is an odd prime and want to compute the square roots of , gcd(a, p) = 1, modulo p, provided that a is a quadratic residue modulo p, that is, if . Using the Jacobi symbol the value can be computed efficiently as Algorithm 3.15 suggests.

The correctness of Algorithm 3.15 follows from the properties of the Jacobi symbol (Proposition 2.22 and Theorem 2.19). The value of (–1)^(b²–1)/8 is determined by the value of b modulo 8, that is, by the three least significant bits of b:

Similarly, (–1)^{(a – 1)(b – 1)/4} can be computed using only the second least significant bits of a and b as:

If , our next task is to compute with x² ≡ a (mod p). If one such x is found, the other square root of a modulo p is –x ≡ p – x (mod p). If p ≡ 3 (mod 4) or p ≡ 5 (mod 8), we have explicit formulas for a square root x. The remaining case, namely p ≡ 1 (mod 8), is somewhat complicated. In this case, we use the probabilistic algorithm due to Tonelli and Shanks. The details are given in Algorithm 3.16. The explicit formulas for the first two cases are easy to verify. We now prove the correctness of the algorithm in the remaining case.

Algorithm 3.15. Computation of the Legendre symbol

Input: An odd prime p and an integer a, 1 ≤ a < p.

Output: The Legendre symbol .

Steps:

b := p, k := 1.

/* Initialize */

/* The Euclidean loop */

[View full size image]

Since is cyclic and has order p – 1 = 2^vq, the 2-Sylow subgroup G of has order 2^v and is also cyclic. Let g be a generator of G. By Euler’s criterion, a^q is a square in G and, therefore, a^qg^e = 1 (in G) for some even integer e, 0 ≤ e < 2^v, and x ≡ a^{(q + 1)/2}g^e/2 (mod p) is a square root of a modulo p.

A generator g of G can be obtained by choosing random elements b from and computing the Legendre symbol . It is easy to see that . Furthermore, b^q is a generator of G if and only if . Finding a quadratic non-residue in is the probabilistic part of the algorithm. Since exactly half of the elements of are quadratic non-residues, one expects to find one after a few random trials. In order to make the exponentiation b^q efficient, b should be chosen as single-precision integers. The while loop of the algorithm computes the multiplier g^e/2 in x using O(v) iterations by successively locating the 1 bits of e starting from the least significant end.

To sum up, square roots modulo a prime can be computed in probabilistic polynomial time. Computing square roots modulo a composite integer n is, on the other hand, a very difficult problem, unless the complete factorization of n is known (see Section 4.2 and Exercise 3.29).

Exercise Set 3.4

3.19

Let

be odd and composite and suppose that there exists (at least) one

with a^n–1 ≢ 1 (mod n). Show that b^n–1 ≢ 1 (mod n) for at least half of the bases

. [H]

Algorithm 3.16. Modular square root

Input: An odd prime p and an integer a, 1 ≤ a < p.

Output: A square root of a modulo p (if existent).

Steps:

if { Return “a does not have a square root modulo p”. }

if (p ≡ 3 (mod 4)) { Return (mod p). }

if (p ≡ 5 (mod 8))
   if  { Return  (mod p) }
   else { Return  (mod p). }

/* The case p ≡ 1 (mod 8) */
v := v₂(p – 1), q := (p – 1)/2^v.    /* q is odd */
Find a random quadratic non-residue b modulo p and set g := b^q (mod p).
x := a^{(q + 1)/2} (mod p).
Precompute a^–1 (mod p).
while (1) {
   find the smallest  for which (x²a^–1)^2ⁱ ≡ 1 (mod p).
   if (i = 0) { Return x. }
   x := xg^{2^v–i–1} (mod p).
}

3.20

Let

be odd and composite.

Show that there exists , such that (mod n). [H]
Show that (mod n) for at least half of the bases . [H]

3.21

Let

be a Carmichael number, that is, a composite integer for which a^n–1 ≡ 1 (mod n) for all a coprime to n, that is, ord_n(a)|(n – 1) for all

. Prove that:

(p – 1)|(n – 1) for every prime divisor p of n. [H]
n is odd. [H]
n is square-free. [H]
n has at least three distinct prime divisors.

3.22

Let be a square-free composite integer, such that (p – 1)|(n – 1) for every prime divisor p of n. Show that n is a Carmichael number.
Demonstrate that 561 = 3 × 11 × 17; 2,821 = 7 × 13 × 31; and 172,081 = 7 × 13 × 31 × 61 are Carmichael numbers.
Assume that for some the integers p₁ := 6k + 1, p₂ := 12k + 1 and p₃ := 18k + 1 are prime. Prove that p₁p₂p₃ is a Carmichael number.
Deduce that 1,729 = 7 × 13 × 19 and 294,409 = 37 × 73 × 109 are Carmichael numbers.

3.23

Fermat’s test for prime numbers Let and let , , be the prime factorization of n – 1. Suppose that there exist integers a₁, . . . , a_r such that for each i we have (mod n) and (mod n). Show that n is prime.

3.24

Pépin’s test for Fermat numbers Show that the Fermat number n := 2^{2^k} + 1 is prime if and only if 3^{(n – 1)/2} ≡ –1 (mod n).

3.25

Write an algorithm that, given natural numbers t, l with l < t, outputs a (probable) prime p of bit length t such that p – 1 has a (probable) prime divisor q of bit length l.

3.26

Let

Show that the ring is (canonically) isomorphic to the ring . In view of this, we write f(X) ≡ g(X) (mod n) to mean either that the coefficients of f are congruent modulo n to the respective coefficients of g or that the polynomials f(X) and g(X) are congruent modulo the principal ideal of generated by n.
Prove that if n is a prime, then (X + a)ⁿ ≡ Xⁿ + a (mod n) for every .
Prove that for composite n there exists , 1 < k < n, with . Deduce that in this case (X + a)ⁿ ≢ Xⁿ + a (mod n) for some .
Let and let be the canonical image of h(X) in . Show that the ring is isomorphic to the ring .

3.27

Modify Algorithm 3.15 to compute the (generalized) Jacobi symbol

for odd

and for arbitrary

3.28

A Implement the Chinese remainder theorem for integers, that is, write an algorithm that takes as input pairwise relatively prime moduli

and integers

for i = 1, . . . , r and that outputs

with a ≡ a_i (mod n_i) for all i = 1, . . . , r. [H]

3.29

Let f(X) be a non-constant polynomial in

Let the congruence f(x) ≡ 0 (mod p^e), , have a solution x ≡ a (mod p^e). Show that if an integer a′ := a + kp^e solves the congruence f(x) ≡ 0 (mod p^{e + 1}), then k satisfies the congruence
f′(a)k ≡ –f(a)/p^e (mod p).
Here f(a)/p^e means integer division. Demonstrate that this congruence may have 0, 1 or p solutions (for k) depending on the values of f′(a) and f(a)/p^e. Each such k gives a solution a′ of f(x) ≡ 0 (mod p^{e + 1}) with a′ ≡ a (mod p^e). We say that the solution a′ (modulo p^{e + 1}) is obtained from the solution a (modulo p^e) by (Hensel) lifting.
Lifting together with the Chinese remainder theorem allow us to reduce the problem of solving a polynomial congruence modulo an arbitrary modulus to the problem of solving the same congruence modulo the prime divisors of n. More precisely, if the prime factorization of n and all the solutions of the congruences f(x) ≡ 0 (mod p_i) for all i = 1, . . . , r are given, design an algorithm to compute all the solutions of the congruence f(x) ≡ 0 (mod n).

3.30

Let

be odd and

. Deduce that the congruence x² ≡ a (mod n) has exactly

solutions modulo n.

3.31

Show that Algorithm 3.17 correctly computes

for

. Specify a strategy to initialize a before the while loop. Determine how Algorithm 3.17 can be used to check if a given

is a perfect square. [H]

Algorithm 3.17. Integer square root

Input: .

Output: .

Steps:

Using bit operations initialize a to an integral value x, .
while (1) {    /* Newton’s iteration loop */
   b := ⌊(a + ⌊n/a⌋)/2⌋.
   if (a ≤ b) { Return a. }
   a := b.
}

3.32

Design an algorithm that, given n, , computes . [H]
Design an algorithm to check if a given is an integral power of another integer.

3.5. Arithmetic in Finite Fields

Many cryptographic protocols are based on the (apparent) intractability of the discrete logarithm problem (Section 4.2) in the multiplicative group of a finite field . The arithmetic of the finite fields , , and , , is easy to implement and run efficiently. In view of this, these two kinds of finite fields are most popular in cryptography and we concentrate our algorithmic study on these fields only.

A prime field is the quotient ring . In Section 3.3.4, we have already made a thorough study of the arithmetic of the rings , . We recall that the elements of are represented as integers from the set {0, 1, . . . , p – 1} and the arithmetic in is the modulo p integer arithmetic. Since p is typically multiple-precision, the characteristic p of is odd. The fields of even characteristic that we will study are the non-prime fields .

Section 2.9.3 explains several representations of extension fields. The most common one is the polynomial-basis representation for an irreducible polynomial f(X) of degree n in . In that case, an element of has the canonical representation as a polynomial a₀ + a₁X + · · · + a_n–1X^n–1, , of degree < n. An arithmetic operation on two elements of is the same operation in followed by reduction modulo the defining polynomial f(X). So we start with the implementation of the polynomial arithmetic over .

3.5.1. Arithmetic in the Ring

A polynomial over (or any field) is identified by its coefficients of which only finitely many are non-zero. Thus for storing a polynomial g(X) = a_dX^d + a_d–1X^d–1 + · · · + a₁X + a₀ it is sufficient to store the finite ordered sequence a_da_d–1 . . . a₁a₀. It is not necessary to demand a_d ≠ 0, but the shortest sequence representing a non-zero polynomial corresponds to a_d ≠ 0 and in this case deg g = d. On the other hand, as we see later it is often useful to pad such a sequence with leading zero coefficients. As an example, the polynomial is representable as 101 or as 0101 or as 00101 or · · ·.

Since can be viewed as the set {0, 1} with operations modulo 2, a polynomial in is essentially a bit string unique up to insertion (and deletion) of leading zero bits. As in the case of multiple-precision integers, we pack these coefficients in an array of 32-bit words and maintain the number of coefficients belonging to the polynomial. For example, the polynomial g(X) = X⁶⁴ + X³¹ + X⁷ + 1 can be stored in an array w₂w₁w₀ of three 32-bit words. w₀ consists of the coefficients of X⁰, X¹, . . . , X³¹, w₁ consists of the coefficients of X³², X³³, . . . , X⁶³, and w₂ consists of the coefficient of X⁶⁴. It is up to the implementation scheme to decide whether the coefficients are to be stored from left to right or from right to left in the bits of a word. We assume that less significant coefficients go to the less significant bits of a word. For the polynomial g above, the word w₀ viewed as an unsigned integer will then be w₀ = 2³¹ + 2⁷ + 1, whereas we have w₁ = 0. The least significant bit of w₂ would be 1. The remaining 31 bits of w₂ are not important and can be assigned any value as long as we maintain the information that only the coefficients of Xⁱ, 0 ≤ i ≤ 64, need to be considered. On the other hand, if we want to store the coefficients of g upto that of X⁸⁰, then the bits of w₂ at locations 1, . . . , 16 must be zero, whereas those at locations 17, . . . , 31 may be of any value. We, however, always recommend the use of leading zero-bits to fill the portion of the leading word not belonging to the polynomial.

Such a representation of elements of , in addition to being compact, facilitates efficient implementation of arithmetic functions. As we will shortly see, we need not often extract the individual coefficients of a polynomial but apply bit operations on entire words to process 32 coefficients simultaneously per operation. We usually do not need polynomials of degrees > 4096 for cryptographic applications. It is, therefore, sufficient to declare a static array capable of storing all the 8193 coefficients of a product of two such largest polynomials. The zero polynomial may be represented as one with zero word size, whereas the degree of the zero polynomial is taken to be –∞ which may be representable as –1.

We now describe the arithmetic functions on two non-zero polynomials

Equation 3.3

Under our implementation, a and b demand ρ := ⌈(r + 1)/32⌉ and σ := ⌈(s + 1)/32⌉ machine words α_{ρ – 1} . . . α₁α₀ and β_{σ – 1} . . . β₁β₀. We also assume paddings with leading zero bits in the areas not belonging to the operands.

Note that the addition of is the same as the XOR (⊕) of two bits. Applying this bit operation on words α_i and β_i adds 32 coefficients of the operand polynomials simultaneously (see Algorithm 3.18). Finally note that –1 = 1 in any field of characteristic 2, that is, subtraction is the same as addition in such a field.

The product a(X)b(X) can be computed as in Algorithm 3.19. Once again, using wordwise operations yields faster implementation. By AND and OR, we denote the bit-wise and and or operations on 32-bit words. The easy verification of the correctness of this algorithm is left to the reader. As in the case of addition, one might want to make the polynomial c compact after its words γ_{τ – 1}, . . . , γ₀ are computed.

Algorithm 3.18. Polynomial addition

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X) + b(X) (to be stored in the array γ_{τ – 1} . . . γ₁γ₀).

Steps:

τ := max(ρ, σ).
for (i = 0, . . . , min(ρ, σ) – 1) γ_i := α_i ⊕ β_i.
if (ρ > σ) for (i = σ, . . . , ρ – 1) γ_i := α_i,
else if (ρ < σ) for (i = ρ, . . . , σ – 1) γ_i := β_i.
while (τ > 0) and (γ_{τ – 1} = 0) τ – –. /* Make c compact (optional) */

Algorithm 3.19. Polynomial multiplication

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X)b(X) (to be stored in the array γ_{τ – 1} . . . γ₁γ₀).

Steps:

τ := ρ + σ – 1.     /* The size of the product */
for (i = 0, . . . , τ – 1) γ_i := 0.     /* Initialize the product */

/* The quadratic multiplication loop */
for (k = 0, . . . , 31) {    /* For each bit position in a word */
   for (j = 0, . . . , σ – 1) {     /* For each word of b */
      if (b_j AND 2^k) {     /* if the k-th bit of b_j is 1 */
         for (i = 0, . . . , ρ – 1) {    /* For each word of a */
            set γ_{i + j} := γ_{i + j} ⊕ (a_i ≪ k) and γ_{i + j + 1} := γ_{i + j + 1} ⊕ (a_i ≫ (32 – k)).
         }
      }
   }
}

The square of can be computed very easily using the fact that

a(X)² = (a_rX^r + · · · + a₁X + a₀)² = a_rX^2r + · · · + a₁X² + a₀.

This gives us a linear-time (in terms of r or ρ) algorithm instead of the quadratic general-purpose multiplication Algorithm 3.19. We leave the implementational details to the reader.

Division with remainder in is implemented in Algorithm 3.20. As before, we continue to work with the operands a(X) and b(X) as in Equation (3.3). But now we make a further assumption that b_s = 1, so that β_σ–1 ≠ 0, and also that s ≤ r. When the Euclidean division loop of Algorithm 3.20 terminates, the array locations δ_σ–1, . . . , δ₁, δ₀ contain the remainder. The arrays γ and δ may be made compact to discard the leading zero bits, if any.

Algorithm 3.20. Euclidean division of polynomials

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X) quot b(X) (to be stored in the array γ_{τ – 1} . . . γ₁γ₀) and d(X) = a(X) rem b(X) (to be sored in the array δ_ρ–1 . . . δ₁δ₀).

Steps:

τ := ⌈(s – r + 1)/32⌉.    /* The size of the quotient */
for i = 0, . . . , τ – 1 { γ_i := 0 }    /* Initialize c(X) to 0 */

for i = 0, . . . , ρ – 1 { δ_i := α_i }   /* Copy a(X) to d(X) */

/* Euclidean division loop */
for i = r, . . . , s {
   if (the coefficient of Xⁱ in d(X) is 1) {
       j := (i – s) quot 32, k := (i – s) rem 32.

       /* Set the coefficient of X^{i – s} of c(X) */
       γ_j := γ_j OR 2^k.

       /* Update d(X) := d(X) – X^{i – s}b(X) */
       for l = 0, . . . , σ – 1 {
          δ_{l + j} := δ_{l + j} ⊕ (b_l ≪ k).
          δ_{l + j + 1} := δ_{l + j + 1} ⊕ (b_l ≫ (32 – k)).
       }
    }
}

Computing modular inverses requires computation of extended gcds of polynomials in . We again start with the non-zero polynomials a(X), and compute polynomials d(X), u(X) and v(X) in with d(X) = gcd(a(X), b(X)) = u(X)a(X) + v(X)b(X), deg u < deg b and deg v < deg a. For polynomials, we do not have an equivalent of the binary gcd algorithm (Algorithm 3.8). We use repeated Euclidean divisions instead.

The proof for the correctness of Algorithm 3.21 is similar to that for Algorithm 3.8. Here, we introduce the variables r_k, U_k and V_k for k = 0, 1, 2, . . . . The initialization goes as: r₀ := a, r₁ := b, U₀ := 1, U₁ := 0, V₀ := 0 and V₁ := 1. During the k-th iteration (k = 1, 2, . . .), we first use Euclidean division to get r_k–1 = q_kr_k + r_{k + 1} which gives r_{k + 1} = r_k–1 – q_kr_k. We also compute U_{k + 1} = U_k–1 – q_kU_k and V_{k + 1} = V_k–1 –q_kV_k using the values available from the previous two iterations so as to maintain the relation r_{k + 1} = U_{k + 1}r₀ + V_{k + 1}r₁ for all k = 1, 2, . . . . In Algorithm 3.21, the k-th iteration of the while loop begins with x = r_k–1, y = r_k, u₁ = U_k and u₂ = U_k–1 and ends after updating the values to x = r_k, y = r_{k + 1}, u₁ = U_{k + 1} and u₂ = U_k. It is not necessary to maintain the values V_k in the main loop. After the loop terminates, one computes V_k = (r_k – U_kr₀)/r₁.

Modular arithmetic in is very much similar to the modular arithmetic in . If f(X) is a non-constant polynomial of (not necessarily irreducible), we represent elements of as polynomials in of degrees < n. Given two such polynomials a and b, we compute the sum a + b simply as the sum in . The product ab is computed by first computing the product ab in and then computing the remainder of Euclidean division of this product by f. Inverse of a modulo f exists if and only if gcd(a, f) = 1 (in ). In that case, extended gcd computation gives us polynomials u, v such that 1 = ua + vf, so that ua ≡ 1 (mod f). If a ≠ 0, then Algorithm 3.21 computes u with deg u < deg f = n, so that we take this u to be the canonical representative of a^–1 in . Finally, for the computation of the modular exponentiation a^e (mod f) can be done using an algorithm very similar to Algorithm 3.9 or Algorithm 3.10. We leave the details to the reader.

Algorithm 3.21. Extended gcd of polynomials

Input: Nonzero polynomials a, .

Output: Polynomials d, u, satisfying

d = gcd(a, b) = ua + vb, deg u < deg b, deg v < deg a.

Steps:

/* Initialize */
x := a, y := b, u₁ := 1, u₂ := 0.

/* Repeated Euclidean division */
while (y ≠ 0) {
   Simultaneously compute q := x quot y and r := x rem y (Algorithm 3.20).
   u := u₂ – qu₁, u₂ := u₁, u₁ := u,
   x := y, y := r.
}
d := x, v := (d – ua)/b.

3.5.2. Finite Fields of Characteristic 2

For the polynomial basis representation , we need an irreducible polynomial of degree n. We shortly present a probabilistic algorithm that generates a random monic irreducible polynomial in of given degree . Although we are interested only in the case q = 2, this algorithm holds even if q is any arbitrary prime or an arbitrary prime power.

First, we describe a deterministic polynomial-time algorithm for checking the irreducibility of a non-constant polynomial (over ). If f is reducible, it has a factor of degree i ≤ ⌊n/2⌋. Also recall (Theorem 2.40, p 82) that X^qⁱ –X is the product of all monic irreducible polynomials of of degrees dividing i. Therefore, if f has an irreducible factor of degree i, then gcd(f, X^qⁱ – X) = gcd(f, X^qⁱ – X rem f) will be a non-constant polynomial. Algorithm 3.22 employs these simple observations.

Now, recall from Section 2.9.2 that a random monic polynomial of of degree n is irreducible with probability approximately 1/n. Therefore, if we keep on checking for irreducibility random monic polynomials in of degree n, then after O(n) checks we expect to find an irreducible polynomial. This leads to the Las Vegas probabilistic Algorithm 3.23.

Algorithm 3.22. Check for irreducibility of a polynomial

Input: A non-constant polynomial .

Output: A (deterministic) certificate whether f is irreducible or not.

Steps:

n := deg f, g := X.
for i = 1, . . . , ⌊n/2⌋ {
g := g^q (mod f). /* Here g = X^qⁱ rem f */
if (deg(gcd(f, g – X)) > 0) { Return “f is reducible”. }
}
Return “f is irreducible”.

Algorithm 3.23. Generation of a random irreducible polynomial

Input: , n ≥ 2.

Output: A random monic irreducible polynomial of degree n.

Steps:

while (1) {
f := a random monic polynomial in of degree n.
if (f is irreducible) { Return }
}

Once the defining irreducible polynomial f is available, we carry out the arithmetic in as modular polynomial arithmetic with respect to the modulus f. This is described at the end of Section 3.5.1. Since this modular arithmetic involves taking the remainder of Euclidean division by f, it is sometimes expedient to choose f to be an irreducible polynomial of certain special types. The randomized algorithm described above gives a random monic irreducible polynomial f of degree n having on an average ≈ n/2 non-zero coefficients. The division algorithm (Algorithm 3.20) in that case takes time O(n²). On the other hand, if f is a sparse polynomial (like a trinomial), the Euclidean division loop can be rewritten to exploit this sparsity, thereby bringing down the running time of the division procedure to O(n). (See Exercise 3.34. Also see Exercise 3.38 for computing isomorphisms between different polynomial-basis representations of the same field.)

Let p be a prime and let . We have seen how to implement arithmetic in and hence by Exercise 3.35 that in too. If is an irreducible polynomial of degree n and if q = pⁿ, then and we implement the arithmetic of as the polynomial arithmetic of modulo f. Again by Exercise 3.35, this gives us the arithmetic of . Now, for and a monic irreducible polynomial we have a representation . Instead of having such a two-way representation of we may also represent as , where is a monic irreducible polynomial of degree nm. It usually turns out that the second representation of is more efficient. However, there are some situations where the two-way representation performs better. This is, in particular, the case when the arithmetic of can be made more efficient than the modular polynomial arithmetic of . For example, we might precompute tables of arithmetic operations of and use table lookups for performing the coefficient arithmetic of . This demands O(q²) storage and is feasible only when q is small. On the other hand, if we find a primitive element γ of and precompute a table that maps i ↦ γⁱ and another that maps γⁱ → i, then products in can be computed in time O(1) using table lookups. If, in addition, we store the Zech’s logarithm table (Section 2.9.3 ) for , then addition in can also be performed in O(1) time with table lookup. Both these three tables take O(q) memory which (though better than O(q²) storage of the previous scheme) is feasible only for small q.

3.5.3. Selecting Suitable Finite Fields

Not all finite fields are suitable for cryptographic applications. In this section, we discuss the desirable properties of a field so that secured protocols on can be developed. We first note that such protocols are usually based on the apparent intractability of the so-called discrete logarithm problem (DLP) (Section 4.2). As a result, selections of suitable fields are dictated by the known cryptanalytic algorithms to solve the DLP (See Section 4.4). We shall mostly concentrate on with either q = p a prime or q = 2ⁿ for some . By the bit size of q, denoted |q|, we mean the number of bits in the binary representation of q, that is, |q| = ⌈lg q⌉. As we have seen, each element of is representable using O(|q|) bits and, therefore, |q| is often also called the size of .

The first requirement on a cryptographically suitable field is that the size |q| should be sufficiently large. Recent cryptanalytic studies show that sizes |q| ≤ 512 are not secure enough. Sizes |q| ≥ 768 are recommended for secure applications. For long-term security, one might even require |q| ≥ 2048.

Any field of the recommended size is, however, not adequately secure. The cardinality #F_q = q must be such that q – 1 has at least one large prime divisor q′ (See the Pohlig–Hellman method in Section 4.4). By large, we usually mean |q′| ≥ 160. In addition, this prime factor q′ of q – 1 should be known to us. If q = p is a prime, then a safe prime or a strong prime serves our purpose (Definition 3.5, Algorithm 3.14). Also see Exercise 3.25. On the other hand, if q = 2ⁿ, the only way to obtain q′ is by factorizing the Mersenne number M_n := q – 1 = 2ⁿ – 1. Factorizing M_n for n ≥ 768 is a very difficult task. Luckily, extensive tables of complete or partial factorizations of M_n are available. For example, for n = 769 (a prime number), we have

M₇₆₉ = 2⁷⁶⁹ – 1 = 1,591,805,393 × 6,123,566,623,856,435,977,170,641 × q′,

where q′ is a 657-bit prime. These tables should be consulted for choosing a suitable value of n.

The multiplicative group is cyclic (Theorem 2.38). If the complete integer factorization of q – 1 is known, then it is possible to find, in polynomial time (in |q|), a primitive element of . Algorithm 3.24 computes r = O(lg n) exponentiations in G in order to conclude whether a given element is a generator of G. For , we have polynomial-time exponentiation algorithms, so Algorithm 3.24 runs in deterministic polynomial time. By Exercise 2.47 , the probability of a randomly chosen element of G being primitive is φ(m)/m. In view of the lower bound on φ(m)/m, given in Theorem 3.1 and proved by Rosser and Schoenfield [253], Algorithm 3.25 is expected to return a random primitive element of G after O(ln ln m) iterations.

Theorem 3.1.

Let , m ≥ 5. Then φ(m)/m ≥ 1/(6 ln ln m).

Algorithm 3.24. Check for primitive element

Input: A cyclic group G of cardinality #G = m with known factorization and an element .

Output: A deterministic certificate that a is a generator of G.

Steps:

/* We assume that G is multiplicatively written and has the identity e */
for i = 1, . . . , r {
if (a^(n–1)/p_i = e) { Return “a is not a generator of G”. }
}
Return “a is a generator of G”.

Algorithm 3.25. Computation of a generator of a finite cyclic group

Input: A cyclic group G of cardinality #G = m with known factorization .

Output: A generator g of G.

Steps:

while (1) {
g := a random element of G.
if (g is a generator of G) /* Algorithm 3.24 */ { Return }
}

If, however, the factorization of #G = m is not known, there are no known (deterministic or probabilistic) algorithms for finding a random generator of G or even for checking if a given element of G is primitive. This is indeed one of the intractable problems of computational algebraic number theory. This problem for can be bypassed as follows.

Recall that we have chosen q in such a way that has a large known prime factor q′. Let H be the unique subgroup of G of order q′. Then H is also cyclic and we choose to work in H (using the arithmetic of G). It turns out that if q′ ≥ 2¹⁶⁰ and if H is not contained in a proper subfield of , the security of cryptographic protocols over does not degrade too much by the use of H (instead of the full G) as the ground group. But we now face a new problem, that is, the problem of finding a generator of H. Since #H = q′ is a prime, every element of H \ {1} is a generator of H. So the problem essentially reduces to that of finding any non-identity element of H. This latter problem has a simple probabilistic solution. First of all, if q – 1 = q′ is itself prime, choosing any random non-identity element of will do. So assume q′ < q – 1. Choose a random and let b := a^{(q – 1)/q′}. By Lagrange’s theorem (Theorem 2.2, p 24), b^q′ = a^q–1 = 1 and, therefore, by Proposition 2.5 . Now, being a field, the polynomial can have at most (q – 1)/q′ roots in (that is, in ) and hence the probability that b = 1 is ≤ ((q – 1)/q′)/(q – 1) = 1/q′. This justifies the randomized polynomial running time of the Las Vegas Algorithm 3.26. Indeed if q′ ≥ 2¹⁶⁰, the while loop of the algorithm is executed only once almost always.

Algorithm 3.26. Computation of an element of given order

Input: A finite field and an (odd) prime factor q′ of q – 1 with q′ < q – 1.

Output: An element of multiplicative order q′.

Steps:

while (1) {
   a := a random element of  \ {0, ±1}.
   b := a^{(q – 1)/q′}.
   if (b ≠ 1) { Return }
}

3.5.4. Factoring Polynomials over Finite Fields

Polynomial factorization over finite fields is an interesting computational problem. All deterministic algorithms known for this purpose are quite poor: that is, fully exponential in the size of the field. However, if randomization is allowed, we have reasonably efficient (polynomial-time) algorithms. In this section, we outline the basic working of the modern probabilistic algorithms for polynomial factorization over finite fields. We assume that a non-constant polynomial is to be factored. Without loss of generality, we can take f to be monic. We assume further that the arithmetic of and that of is available. We work with a general value of q = pⁿ, p prime and , though in some cases we have to treat the case p = 2 separately. Irreducibility (or otherwise) in this section means the same over .

The factorization algorithm we are going to discuss is a generalization of the root finding algorithm (see Exercise 3.36) and consists of three steps:

Square-free factorization (SFF) Decompose the input polynomial f into a product of square-free polynomials.

Distinct-degree factorization (DDF) Given a square-free polynomial f of degree d, compute f = f₁ . . . f_d with each f_i being a product of irreducible polynomials of degree i.

Equal-degree factorization (EDF) Given a product f of irreducible polynomials of the same degree, find out the irreducible factors of f.

We now provide a separate detailed discussion for each of these three steps.

Square-free factorization

Theorem 3.2 is at the very heart of the square-free factorization algorithm and is a generalization of Exercise 2.61.

Theorem 3.2.

Let K be a field and a non-constant monic polynomial. Then the polynomial f / gcd(f, f′) is square-free, where f′ is the formal derivative of f. In particular, f is square-free if and only if gcd(f, f′) = 1.

Proof

Let be the factorization of f with pairwise distinct monic irreducible polynomials f₁, . . . , f_r, , with and with . In order to determine v_f₁(f′), we employ the usual rules for derivatives to get for some . If , then v_f₁(f′) ≥ α₁. Otherwise, v_f₁(f′) = α₁ – 1, since , i > 1. Similar is the case for v_fi(f′) for i = 2, . . . , r. It follows that gcd, where each , so that , , is square-free.

The algorithm for SFF over is now almost immediate except for one subtlety, namely, the consideration of the case f/gcd(f, f′) = 1, or equivalently, f′ = 0. In order to see when this case can occur, let us write the non-zero terms of f as f = a₁X^e₁ + · · · + a_tX^e_t with distinct exponents e₁, . . . , e_t and . Then f′ = a₁e₁X^{e₁ – 1} + · · · + a_te_tX^{e_t – 1} = 0 if and only if , that is, if p divides all of e₁, . . . , e_t. But then f(X) = h(X)^p, where , since for all i. These observations motivate the recursive Algorithm 3.27. It is easy to check that this (deterministic) algorithm runs in time polynomially bounded by deg f and log q.

Algorithm 3.27. Square-free factorization

Input: A monic non-constant polynomial , q = pⁿ, p prime, .

Output: A square-free factorization of f.

Steps:

Compute f′.
if (f′ = 0) {
    Compute  such that f = h^p.
    Recursively compute a SFF h = h₁ · · · h_s of h.
    Return the SFF of f as f = (h₁ · · · h_s)(h₁ · · · h_s) · · · (h₁ · · · h_s) (p times).
} else {
    Recursively compute a SFF gcd(f, f′) = g₁ · · · g_s of gcd(f, f′).
    Return the SFF of f as f = (f/ gcd(f, f′))g₁ · · · g_s.
}

Distinct-degree factorization

Let be a square-free polynomial of degree d. We can write f = f₁ · · · f_d, where for each i the polynomial is the product of all the irreducible factors of f of degree i. If f does not have an irreducible factor of degree i, then we take f_i = 1 as usual.^[5] In order to compute the polynomials f_i, we make use of the fact that is the product of all monic irreducible polynomials in whose degrees divide i (see Theorem 2.40 on p 82). It immediately follows that . Thus a few (at most d) gcd computations give us all f_i. But the polynomials are of rather large degrees. But since , keeping polynomials reduced modulo f implies that we take gcds of polynomials of degrees ≤ d. This, in turn, implies that the DDF can be performed in (deterministic) polynomial time (in d and ln q).

^[5] Conventionally, an empty product is taken to be the multiplicative identity and an empty sum to be the additive identity.

Algorithm 3.28 shows an implementation of the DDF. Though the algorithm does not require f to be monic, there is no harm in assuming so.

Algorithm 3.28. Distinct-degree factorization

Input: A (non-constant) square-free polynomial .

Output: The DDF of f, that is, the polynomials f₁, . . . , f_d as explained above.

Steps:

g := f.   /* Make a local copy of f */
h = X. i = 1.
while (deg g ≠ 0) {
   h := h^q (mod f).   /* Modular exponentiation */
   f_i := gcd(h – X, g).
   g := g/f_i.    /* Factor out f_i from g */
   i++.
}
if (i < d) { f_{i + 1} := 1, . . . , f_d := 1. }

This simple-minded implementation of the DDF is theoretically not the most efficient one known. In fact, it turns out that the DDF (and not the seemingly more complicated EDF) is the bottleneck of the entire polynomial factorization process. Therefore, making the DDF more efficient is important and there are lots of improvements suggested in the literature. All these improved algorithms essentially do the same thing as above (that is, the computation of ), but they optimize the computation of the polynomials rem f. The best-known method (due to Kaltofen and Shoup) is based on the observation that, in general, most of the f_i are 1. Therefore, instead of computing each , one may break the interval 1, . . . , d into several subintervals I₁, I₂, . . . , I_l and compute , j = 1, . . . , l. Only those F_j that turn up to be non-constant are further decomposed.

For cryptographic purposes, we will, however, deal with rather small values of d = deg f. (Typically d is at most a few thousands.) The asymptotically better algorithms usually do not outperform the simple Algorithm 3.28 for these values of d.

Equal-degree factorization

Equal-degree factorization, the last step of the polynomial factorization process, is the only probabilistic part of the algorithm. We may assume that f is a (monic) square-free polynomial of degree d and that each irreducible factor of f has the same (known) degree, say δ. If d = δ, then f is irreducible. So we assume that d > δ, that is, d = rδ for some . Theorem 3.3 provides the basic foundations for the EDF.

Theorem 3.3.

Let g be any polynomial in and let . Then X^{q^δ} – X divides g^{q^δ} – g.

Proof

If g = 0, there is nothing to prove. If g = a_lX^l + · · · + a₁X + a₀ ≠ 0 with , then g^{q^δ} – g = a_l(X^{lq^δ} – X^l) + · · · + a₁(X^{q^δ} – X). It is easy to verify that X^{q^δ} – X divides X^{iq^δ} – Xⁱ for every .

Now, we have to separate two cases, namely, q is odd and q is even. Theorem 3.3 is valid for any q, even or odd, but taking q as odd allows us to write g^{q^δ} – g = g(g^{(q^δ –1)/2}–1)(g^{(q^δ –1)/2} + 1). With the above assumptions on f we have f|(X^{q^δ} –X) and, therefore, f|(g^{q^δ} – g), so that f = gcd(g^{q^δ} – g, f) = gcd(g, f) gcd(g^{(q^δ –1)/2} – 1, f) gcd(g^{(q^δ –1)/2} + 1, f). If g is randomly chosen, then gcd(g^{(q^δ –1)/2} – 1, f) is with probability ≈ 1/2 a non-trivial factor of f. The idea is, therefore, to keep on choosing random g and computing until one gets . One then recursively applies the algorithm to and . It is sufficient to choose g with deg g < 2δ. Obviously, the exponentiation g^{q^δ} has to be carried out modulo f. We leave the details to the reader, but note that trying O(1) random polynomials g is expected to split f and, therefore, the EDF runs in expected polynomial time.

For the case q = 2ⁿ, essentially the same algorithm works, but we have to use the split g^{q^δ} + g = g^{2^nδ} + g = (g^{2^nδ–1} + g^{2^nδ–2} + · · · + g² + g)(g^{2^nδ–1} + g^{2^nδ–2} + · · · + g² + g + 1). Once again computing gcd(g^{2^nδ–1} + g^{2^nδ–2} + · · · + g² + g, f) for a random splits f with probability ≈ 1/2 and, thus, we get an EDF algorithm that runs in expected polynomial time.

Exercise Set 3.5

3.33

Find a (polynomial-basis) representation of

. Compute a primitive element in this representation.

3.34

Show that the running time of Algorithm 3.20 is O(s(r – s)) which reaches the maximum order of O(r²) = O(s²), when s ≈ r/2.
Suppose b is known to have e non-zero coefficients. Modify the Euclidean division loop of Algorithm 3.20 so that the algorithm runs in time O((r –s)e). [H] In particular, if e = O(1), the running time of Algorithm 3.20 becomes linear, namely O(r).

3.35

Implement the polynomial arithmetic of

given that of

3.36

Let q = pⁿ (p prime and

a non-constant polynomial and let g := gcd(f, X^q – X).

If S is the set of all roots of f in , show that . Thus, g is a square-free polynomial which splits over and has the same roots (over ) as f. If deg g = 0 or 1, then we know all the roots of g and hence of f. So, for the rest of this exercise, we assume that deg g ≥ 2.

Consider the case that p is odd. Let be arbitrary. Show that

(X + b)((X + b)^(q–1)/2 – 1)((X + b)^(q–1)/2 + 1) = X^q – X

and that

g = gcd(g, X + b) gcd(g, (X + b)^(q–1)/2 – 1) gcd(g, (X + b)^(q–1)/2 + 1).

Explain how Algorithm 3.29 produces two non-trivial factors of g (over ) in probabilistic polynomial time. [H] Write an algorithm to compute all the roots of f in .

Algorithm 3.29. Computing roots of a polynomial: odd characteristic

Input: A square-free polynomial that splits over .

Output: Polynomials g₁, with g = g₁g₂ and deg g_i ≥ 1 for i = 1, 2.

Steps:

if (g(0) = 0) { (g₁, g₂) := (X, g(X)/X), return. }
while (1) {
  Select a random element .
  h := (X + b)^(q–1)/2 – 1 (mod g).
  g₁ := gcd(g, h).
  if (1 ≤ deg g₁ < deg g) { g₂ := g/g₁, return. }
}

Now, assume that p = 2 and define the polynomial

Let be arbitrary. Show that
H(X + b)(H(X + b) + 1) = X^q – X
[H] and that
g(X) = gcd(g(X), H(X + b)) gcd(g(X), H(X + b) + 1).

Explain how Algorithm 3.30 produces two non-trivial factors of g (over ) in probabilistic polynomial time. Write an algorithm to compute all the roots of f in .

Algorithm 3.30. Computing roots of a polynomial: characteristic 2

Input: A square-free polynomial that splits over .

Output: Polynomials g₁, with g = g₁g₂ and deg g_i ≥ 1 for i = 1, 2.

Steps:

if (g(0) = 0) { (g₁, g₂) := (X, g(X)/X), return. }
while (1) {
   Select a random element .
   h := (X + b) + (X + b)² + (X + b)⁴ + · · · + (X + b)^{2^n–1} (mod g).
   g₁ := gcd(g, h).
   if (1 ≤ deg g₁ < deg g) { g₂ := g/g₁, return. }
}

3.37

Use Exercise 3.36 to compute all the roots of the following polynomials:

X⁶ + 6X⁴ + 4X² + 6 in .
X³ + (α² + α)X² + (α² + α + 1) in , where is represented as , α being a root of the polynomial X³ + X + 1.

3.38

Let f and g be two monic irreducible polynomials over

and of the same degree

. Consider the two representations

. In this exercise, we study how we can compute an isomorphism between these two representations. The polynomial f(Y) splits into linear factors over

. Consider a root α = α(Y) of f(Y) in

. Show that 1, α, α², . . . , α^n–1 is an

-basis of (the

-vector space)

. For i = 0, . . . , n – 1, write (uniquely)

with

, and consider the matrix A = (α_ij)_{0≤i≤n–1, 0≤j≤n–1}. Show that the map

that maps (the equivalence class of) a₀ + a₁X + · · · + a_n–1X^n–1 to (the equivalence class of) b₀ + b₁Y + · · · + b_n–1Y^n–1, where (b₀b₁ . . . b_n–1) = (a₀a₁ . . . a_n–1)A, is an

-isomorphism.

3.39

Let q = pⁿ for a prime p and

. We have seen that the elements of

can be represented as integers between 0 and p – 1, whereas the elements of

can be represented as polynomials modulo some irreducible polynomial

of degree n, that is, as polynomials of

of degrees < n. Show that the substitution X = p in the polynomial representation of elements of

gives a representation of elements of

as integers between 0 and q – 1. We call this latter representation of elements of

the packed representation. Compare the advantages and disadvantages of the packed representation over the polynomial representation.

3.40

Let G be a cyclic multiplicatively written group of order m (and with the identity element e). Assume that the factorization of

is known. Devise an algorithm that computes the order of an arbitrary element in G. [H]

3.41

Berlekamp’s Q-matrix factorization Let be a monic square-free polynomial of degree d, that admits a factorization f(X) = f₁(X) . . . f_r(X) with each monic, non-constant and irreducible. (Note that f_i are pairwise distinct, since f is square-free.) Let d_i be the degree of f_i.

Consider the ring

Show that . [H] A is an -vector space of dimension d.
Consider the map that maps x = X + 〈f(X)〉 to x^q – x. Show that is an -linear transformation with Ker , and so the nullity of equals the number of irreducible factors of f.
Let Q be the matrix of with respect to the basis 1, x, . . . , x^d–1. Describe an algorithm to compute Q. Also design an algorithm to compute a basis of Ker .
Show that if , then

For a suitable h(X), this is a non-trivial factorization of f. This procedure is efficient, when q is small.
Use Berlekamp’s method to factor X⁶ + X⁵ + X² + 1 over .

*3.6. Arithmetic on Elliptic Curves

The recent popularity of cryptographic systems based on elliptic curve groups over stems from two considerations. First, discrete logarithms in can be computed in subexponential time. This demands q to be sufficiently large, typically of length 768 bits or more. On the other hand, if the elliptic curve E over is carefully chosen, the only known algorithms for solving the discrete logarithm problem in are fully exponential in lg q. As a result, smaller values of q suffice to achieve the desired level of security. In practice, the length of q is required to be between 160 and 400 bits. This leads to smaller key sizes for elliptic curve cryptosystems. The second advantage of using elliptic curves is that for a given prime power q, there is only one group , whereas there are many elliptic curve groups (over the same field ) with orders ranging from to . If a particular group is compromised, we can switch to another curve without changing the base field .

In this section, we start with the description of efficient implementation of the arithmetic in the groups . Then we concentrate on some algorithms for counting the order . Knowledge of this order is necessary to find out cryptographically suitable elliptic curves. We consider only prime fields or fields of characteristic 2. So we assume that the curve is defined by Equation (2.8) or Equation (2.9) on p 100 (supersingular curves are not used in cryptography) instead of by the general Weierstrass Equation (2.6) on p 98.

3.6.1. Point Arithmetic

Let us first see how we can efficiently represent points on an elliptic curve E over . Since corresponds to two elements h, and since each element of can be represented using ≤ s = ⌈lg q⌉ bits, 2s bits suffice to represent P. We can do better than this. Substituting X = h in the equation for E leaves us with a quadratic equation in Y. This equation has two roots of which k is one. If we adopt a convention (for example, see Section 6.2.1) that identifies, using a single bit, which of the two roots the coordinate k is, the storage requirement for P drops to s + 1 bits. During an on-line computation this compressed representation incurs some overhead and may be avoided. However, for off-line storage and transmission (of public keys, for example), this compression may be helpful.

Explicit formulas for the sum of two points and for the opposite of a point on an elliptic curve E are given in Section 2.11.2 . These operations in can be implemented using a few operations in the ground field .

Computation of mP for and (or, more generally, for ) can be performed using a repeated-double-and-add algorithm similar to the repeated-square-and-multiply Algorithm 3.9. We leave out the trivial modifications and urge the reader to carry out the details.

Finding a random point is another useful problem. If q = p is an odd prime and we use the short Weierstrass Equation (2.8), we first choose a random and substitute X by h to get Y² = h³ + ah + b. This equation has 2, 0 or 1 solution(s) depending on whether h³ + ah + b is a quadratic residue or non-residue or 0 modulo p. Quadratic residuosity can be checked by computing the Legendre symbol (Algorithm 3.15), whereas square roots modulo p can be computed using Tonelli and Shanks’ Algorithm 3.16.

For a non-supersingular curve E over defined by Equation (2.9), a random point is chosen by first choosing a random . Substituting X = h in the defining equation gives Y² + hY + (h³ + ah² + b) = 0. If h = 0, then the unique solution for k is b^{2^n–1}. If h ≠ 0, replacing Y by hY and dividing by h² transforms the equation to the form Y² + Y + α = 0 for some . This equation has two or zero solutions depending on whether the absolute trace is 0 or 1. If k is a solution, the other solution is k + 1. In order to find a solution (if it exists), one may use the (probabilistic) root-finding algorithm of Exercise 3.36. Another possibility is discussed now.

We consider two separate cases. First, if n is odd, then is a solution, since Tr(α) = k² + k + α. On the other hand, if n is even, we first find a with Tr(β) = 1. Since Tr is a homomorphism of the additive groups and Tr(1) = 1, exactly half of the elements of have trace 1. Therefore, a desired β can be quickly found out by selecting elements of at random and computing traces of these elements. Now, it is easy to check that gives a solution of Y² + Y + α = 0.

**3.6.2. Counting Points on Elliptic Curves

Counting points on elliptic curves is a challenging problem, both theoretically and computationally. The first polynomial time (in log q) algorithm invented by Schoof and later made efficient by Elkies and Atkins (and many others), is popularly called the SEA algorithm. Unfortunately, even the most efficient implementation of this algorithm is not quite efficient, but it is the only known reasonable strategy, in particular, when q = p is a large (odd) prime of a size of cryptographic interest. The more recent Satoh–FGH algorithm, named after its discoverer Satoh and after Fouquet, Gaudry and Harley who proposed its generalized and efficient versions, is a remarkable breakthrough for the case q = 2ⁿ. Both the SEA and the Satoh–FGH algorithms are mathematically quite sophisticated. We now present a brief overview of these algorithms.

The SEA algorithm

We assume that q = p is a large odd prime, this being the typical situation when we apply the SEA algorithm. We also assume that E is given by the short Weierstrass equation Y² = X³ + aX + b. Let q₁ = 2, q₂ = 3, q₃ = 5, . . . be the sequence of prime numbers and t the Frobenius trace of E at p. By Hasse’s theorem (Theorem 2.48, p 106), with . A knowledge of t modulo sufficiently many small primes l allows us to reconstruct t using the Chinese remainder theorem. Because of the Hasse bound on t, it is sufficient to choose l from the primes q₁, q₂, . . . in succession, until the product q₁q₂ · · · q_r exceeds . By the prime number theorem (Theorem 2.20, p 53), we have r = O(ln p) and also q_i = O(ln p) for each i = 1, . . . , r.

The most innovative idea of Algorithm 3.31 is the determination of the integers t_i. For l = q₁ = 2, the process is easy. We have t₁ ≡ t ≡ 0 (mod 2) if and only if contains a point of order 2 (a point of the form (h, 0)), or equivalently, if and only if the polynomial X³ + aX + b has a root in . We compute the polynomial gcd g(X) := gcd(X³ + aX + b, X^p–X) over and conclude that

Algorithm 3.31. SEA algorithm for elliptic curve point counting

Input: A prime field , p odd, and an elliptic curve E defined over .

Output: The order of the group .

Steps:

Find (the smallest) such that the product .
for i = 1, 2, . . . , r { Compute with t ≡ t_i (mod q_i). }
Compute t by combining t₁, t₂, . . . , t_r using the Chinese Remainder Theorem.

Determination of t_i for i > 1 involves more work. We explain here the original idea due to Schoof. We denote by l the i-th prime q_i and by the set of all l-torsion points of (Definition 2.78, p 105). The Frobenius endomorphism that maps and to (h^p, k^p) satisfies the relation . If we restrict our attention only to the group E[l], then this relation reduces to , where t_i = t rem l and p_i = p rem l, that is, for all .

In terms of polynomials, the last relation is equivalent to

Equation 3.4

where the sum and difference follow the formulas for the elliptic curve E. Now, one has to calculate symbolically rather than numerically, since X and Y are indeterminates. These computations can be carried out in the ring (instead of in ), where f(X, Y) = Y² – (X³ + aX + b) is the defining polynomial of E and f_l = f_l(X) is the l-th division polynomial of E (Section 2.11.2 and Theorem 2.47, p 106). Reduction of a polynomial in modulo f makes its Y-degree ≤ 1, whereas reduction modulo f_l makes the X-degree less than deg f_l which is O(l²). We can try the values t_i = 0, 1, . . . , l – 1 successively until the desired value satisfying Equation (3.4) is found out.

It is not difficult to verify that Schoof’s algorithm runs in time O(log⁸ p) (under standard arithmetic in ) and is thus a deterministic polynomial-time algorithm for the point-counting problem. Essentially the same algorithm works for fields with q = 2ⁿ and has the same running time. Unfortunately, the big exponent (8) in the running time makes Schoof’s algorithm quite impractical. Numerous improvements are suggested to bring down this exponent. Elkies and Atkin’s modification for the case q = p gives rise to the SEA algorithm which has a running time of O(log⁶ p) under the standard arithmetic in . This speed-up is achieved by working in the ring , where g_l is a suitable factor of f_l and has degree O(l). Couveignes suggests improvements for the fields of characteristic 2. Efficient implementations of the SEA algorithm are reported by Morain, Müller, Dewaghe, Vercauteren and many others. At the time of writing this book, the largest values of q for which the algorithm has been successfully applied are 10⁴⁹⁹ + 153 (a prime) and 2¹⁹⁹⁹ (a power of 2).

The Satoh–FGH algorithm

The Satoh–FGH algorithm is well suited for fields of small characteristic p and, in particular, for the fields of characteristic 2. This algorithm has enabled point-counting over fields as large as . A generic description of the Satoh–FGH algorithm now follows after the introduction of some mathematical notions. Though our practical interest concentrates on the fields only, we consider curves over a general with q = pⁿ, p a prime.

Recall from Section 2.14 that the ring of p-adic integers is a discrete valuation ring (Exercises 2.133 and 2.148) with the unique maximal ideal generated by , and the residue field is isomorphic to .

We represent as a polynomial algebra over . We analogously define the p-adic ring , where f is an irreducible polynomial of degree n in . The elements of can be viewed as polynomials of degrees < n and with p-adic integers as coefficients. The arithmetic operations in are polynomial operations in modulo the defining polynomial f. The ring is canonically embedded in the ring (consider constant polynomials).

turns out to be a discrete valuation ring with maximal ideal , and the residue field is isomorphic to .

Definition 3.6.

The projection map is defined as the map that takes a p-adic integer α = (a₁, a₂, . . .) to , and can be canonically extended to a map by π(α₀ + α₁X + · · · + α_dX^d) := π(α₀) + π(α₁)X + · · · + π(α_d)X^d. In particular, this defines a projection map .

The (Teichmüller) lift is the map that takes 0 ↦ 0 and 0 ≠ a ↦ ω(a), where ω(a) is the unique (q – 1)-th root of unity in satisfying π(ω(a)) = a (cf. Exercise 2.160).

The semi-Witt decomposition of is defined to be the unique sequence a₀, a₁, . . . with such that α has the p-adic expansion .

The p-th power Frobenius endomorphism , a ↦ a^p, can now be extended to an endomorphism as follows. Let have the semi-Witt decomposition a₀, a₁, . . . with . Then, is the unique element having the semi-Witt decomposition One can show that . We have and similarly .

Now, let E = E₀ be an elliptic curve defined over . Application of to the coefficients of E₀ gives another elliptic curve E₁ over whose rational points are , , where , together with the point at infinity. We may apply to E₁ to get another curve E₂ over and so on. Since , we get a cycle of elliptic curves defined over :

Equation 3.5

Similarly, if ε = ε₀ is an elliptic curve defined over , application of leads to a sequence of elliptic curves defined over :

Equation 3.6

We need the canonical lifting of an elliptic curve E over to a curve ε over . Explaining that requires some more mathematical concepts:

Definition 3.7.

Let K be a field and let E and E′ be two elliptic curves defined over K. A morphism (Definition 2.72, p 95) that maps the point at infinity of E to the point at infinity of E′ is called an isogeny. The zero isogeny E → E′ maps every point to . A non-zero isogeny is also called a non-constant isogeny. Two curves E and E′ are called isogenous, if there exists a non-constant isogeny E → E′.

The kernel ker of an isogeny is defined to be the set . For every non-constant isogeny , the kernel ker is a finite subgroup of E(K).

The set Hom(E, E′) of all isogenies E → E′ is an Abelian group defined as , , , . If E = E′, then End(E) := Hom(E, E) becomes a ring with multiplication defined by composition and is called the endomorphism ring of E.

The multiplication-by-m map of E is an isogeny. If End(E) contains an isogeny not of this type, we call E an elliptic curve with complex multiplication.

Theorem 3.4.

For each , there exists a unique polynomial symmetric and of degree i + 1 in each of X and Y, such that two curves E and E′ (defined over a field K) with j-invariants j and j′ satisfy Φ_i(j, j′) = 0 if and only if there is an isogeny E → E′ whose kernel is cyclic of order i.

Definition 3.8.

The polynomials , , of Theorem 3.4 are called modular polynomials. As an example,

Φ₂(X, Y)	=	X³ + Y³ – X²Y² + 1488(X²Y + XY²) –
		162,000(X² + Y²) + 40,773,375XY + 8,748,000,000(X + Y) –
		157,464,000,000,000.

The next theorem establishes the foundation for lifting curves from to .

Theorem 3.5. Lubin–Serre–Tate

Let E be an elliptic curve defined over , q = pⁿ, , and with j-invariant . There exists an elliptic curve ε defined over with a unique j-invariant such that and . The curve ε is called the canonical lift of E and is unique upto isomorphism.

With this definition of lifting of elliptic curves, Cycles (3.5) and (3.6) satisfy the following commutative diagram, where ε_i is the canonical lift of E_i for each i = 0, 1, . . . , n.

Algorithm 3.32 outlines the Satoh–FGH algorithm. In order to complete the description of the algorithm, one should specify how to lift curves (that is, a procedural equivalent of Theorem 3.5) and their p-torsion points and how the lifted data can be used to compute the Frobenius trace t. We leave out the details here.

Algorithm 3.32. Satoh–FGH algorithm for elliptic curve point counting

Input: An elliptic curve E over , q = pⁿ, p prime, with j-invariant .

Output: The cardinality or equivalently the trace .

Steps:

Compute the curves E₀, . . . , E_n–1 and their j-invariants j₀, . . . , j_n–1.
Compute the lifted j-invariants J₀, . . . , J_n–1.
Compute the lifted curves ε₀, . . . , ε_n–1.
Lift the p-torsion groups E_i[p] for i = 0, . . . , n – 1.
Compute t and hence from the lifted data.

The elements of (and hence of ) are infinite sequences and hence cannot be represented in computer memory. However, we make an approximate representation by considering only the first m terms of the sequences representing elements of . Working in with this approximate representation is then essentially the same as working in . For the Satoh–FGH algorithm, we need m ≈ n/2.

For small p (for example, p = 2) and with standard arithmetic in , the Satoh–FGH algorithm has a deterministic running time O(n⁵) and space requirement O(n³). With Karatsuba arithmetic the exponent in the running time drops from 5 to nearly 4.17. In addition, this algorithm is significantly easier to implement than optimized versions of the SEA algorithm. These facts are responsible for a superior performance of the Satoh–FGH algorithm over the SEA algorithm (for small p).

3.6.3. Choosing Good Elliptic Curves

Choosing cryptographically suitable elliptic curves is more difficult than choosing good finite fields. First, the order of the elliptic curve group must have a suitably large prime divisor, say, of bit length 160 or more. In addition, the MOV attack applies to supersingular curves and the anomalous attack to anomalous curves (Definition 2.80 and Section 4.5). So a secure curve must be non-supersingular and non-anomalous. Checking all these criteria for a random curve E over requires the group order . One may use either the SEA algorithm or the Satoh–FGH algorithm to compute . Once is known, it is easy to check whether E is supersingular or anomalous. But factoring to find its largest prime divisor may be a difficult task and is not recommended. One may instead extract all the small prime factors of by trial divisions with the primes q₁ = 2, q₂ = 3, q₃ = 5, . . . , q_r for a predetermined r and write where m₁ has all prime factors ≤ q_r and m₂ has all prime factors > q_r. If m₂ is prime and of the desired size, then E is treated as a good curve. Algorithm 3.33 illustrates these steps.

The computation of the group orders takes up most of the execution time of the above algorithm. It is, therefore, of utmost importance to employ good algorithms for point counting. The best algorithms known till date (the SEA and the Satoh–FGH algorithms) are only reasonable. Further research in this area may lead to better algorithms in future.

Algorithm 3.33. Selecting cryptographically suitable elliptic curves

Input: A suitably large finite field .

Output: A cryptographically good elliptic curve E over .

Steps:

while (1) {
   Generate a random elliptic curve E over .
   Determine .
   if (E is neither supersingular nor anomalous) {
      Try to factorize  using trial division by small primes.
      if ( has a suitably large prime divisor) { Return E }
   }
}

There are ways of generating good curves without requiring the point counting algorithms over large finite fields. One possibility is to use the so-called subfield curves. If has a subfield of relatively small cardinality, one can choose a random curve E over and compute . Since E is also a curve defined over and can be easily obtained using Theorem 2.51 (p 107), we save the lengthy direct computation of . However, the drawback of this method is that since E is now chosen with coefficients from a small field , we do not have many choices. The second drawback is that we must have a small divisor q′ of q. If q is already a prime, this strategy does not work at all. If q = pⁿ, p a small prime, we need n to have a small divisor n′ that corresponds to q′ = p^n′. Sometimes small odd primes p are suggested, but the arithmetic in a non-prime field of some odd characteristic is inherently much slower than that in a field of nearly equal size but of characteristic 2.

Specific curves with complex multiplication (Definition 3.7) over large prime fields have also been suggested in the literature. Finding good curves with complex multiplication involves less computational overhead than Algorithm 3.33, but (like subfield curves) offers limited choice. However, it is important to mention that no special attacks are currently known for subfield curves and also for those chosen by the complex multiplication strategy.

4.3. The Integer Factorization Problem

The integer factorization problem (IFP) (Problems 4.1, 4.2 and 4.3) is one of the most easily stated and yet hopelessly difficult computational problem that has attracted researchers’ attention for ages and most notably in the age of electronic computers. A huge number of algorithms varying widely in the basic strategy, mathematical sophistication and implementation intricacy have been suggested, and, in spite of these, factoring a general integer having only 1000 bits seems to be an impossible task today even using the fastest computers on earth.

It is important to note here that even proving rigorous bounds on the running times of the integer-factoring algorithms is quite often a very difficult task. In many cases, we have to be satisfied with clever heuristic bounds based on one or more reasonable but unprovable assumptions.

This section highlights human achievements in the battle against the IFP. Before going into the details of this account we want to mention some relevant points. Throughout this section we assume that we want to factor a (positive) integer n. Since such an integer can be represented by ⌈lg(n + 1)⌉ bits, the input size is taken to be lg n (or, ln n, or log n). Most modern factorization algorithms take time given by the following subexponential expression in ln n:

L(n, α, c) := exp((c + o(1))(ln n)^α(ln ln n)^1–α),

where 0 < α < 1 and c > 0 are constants. As described in Section 3.2, the smaller the value of α is, the closer the expression L(n, α, c) is to a polynomial expression (in ln n). If n is understood from the context, we write L(α, c) in place of L(n, α, c). Although the current best-known algorithms correspond to α = 1/3, the algorithms with α = 1/2 are also quite interesting. In this case, we use the shorter notation L[c] := L(1/2, c).

Henceforth we will use, without explicit mention, the notation q₁ := 2, q₂ := 3, q₃ := 5, . . . to denote the sequence of primes. The concept of q_t-smoothness (for some ) will often be referred to as B-smoothness, where B = {q₁, . . . , q_t}. Recall from Theorem 2.21 that smaller integers have higher probability of being B-smooth for a given B. This observation plays an important role in designing integer factoring algorithms. The following special case of Theorem 2.21 is often useful.

Corollary 4.1.

Let , x = O(n^α) and y = L[β] = L(n, 1/2, β). Then we have the asymptotic formula .

Before any attempt of factoring n is made, it is worthwhile to check for the primality of n. Since probabilistic primality tests (like Algorithm 3.13) are quite efficient, we should first run one such test before we are sure that n is really composite. Henceforth, we will assume that n is known to be composite.

4.3.1. Older Algorithms

“Factoring in the dark ages” (a phrase attributed to Hendrik Lenstra) used fully exponential algorithms some of which are discussed now. Though the worst-case performances of these algorithms are quite poor, there are many situations when they might factor even a large integer quite fast. It is, therefore, worthwhile to spend some time on these algorithms.

Trial division

A composite integer n admits a factor ≤ , that can be found by trial divisions of n by integers ≤ . This demands trial divisions and is clearly impractical, even when n contains only 30 decimal digits. It is also true that n has a prime divisor ≤ . So it suffices to carry out trial divisions by primes only. Though this modified strategy saves us many unnecessary divisions, the asymptotic complexity does not reduce much, since by the prime number theorem the number of primes ≤ is about . In addition, we need to have a list of primes ≤ or generate the primes on the fly, neither of which is really practical. A trade-off can be made by noting that an integer m ≥ 30 cannot be prime unless m ≡ 1, 7, 11, 13, 17, 19, 23, 29 (mod 30). This means that we need to perform the trial divisions only by those integers m congruent to one of these values modulo 30 and this reduces the number of trial divisions to about 25 per cent. Though trial division is not a practical general-purpose algorithm for factoring large integers, we recommend extracting all the small prime factors of n, if any, by dividing n by a predetermined set {q₁, . . . , q_t} of small primes. If n is indeed q_t-smooth or has all prime factors ≤ q_t except only one, then the trial division method completely factors n quite fast. Even when n is not of this type, trial division might reduce its size, so that other algorithms run somewhat more efficiently.

Pollard’s rho method

Pollard’s rho method solves the IFP in an expected O^~(n^1/4) time and is based on the birthday paradox (Exercise 2.172).

Let be an (unknown) prime divisor of n and let be a random map. We start with an initial value and generate a sequence x_i+1 = f(x_i), , of elements of . Let y_i denote the smallest non-negative integer satisfying y_i ≡ x_i (mod p). By the birthday paradox, after iterates x₁, . . . , x_t are generated, we have a high chance that y_i = y_j, that is, x_i ≡ x_j (mod p) for some 1 ≤ i < j ≤ t. This means that p|(x_i – x_j) and computing gcd(x_i – x_j, n) splits n into two non-trivial factors with high probability. The method fails if this gcd is n. For a random n, this incident of having a gcd equal to n is of very low probability.

Algorithm 4.1 gives a specific implementation of this method. Computing gcds for all the pairs (x_i – x_j, n) is a massive investment of time. Instead we store (in the variable ξ) the values x_r, r = 2^t, for and compute only gcd(x_r+s – x_r, n) for s = 1, . . . , r. Since the sequence y_i, , is ultimately periodic with expected length of period , we eventually reach a t with r = 2^t ≥ τ. In that case, the for loop detects a match. Typically, the update function f is taken to be f(x) = x² –1 (mod n), which, though not a random function, behaves like one. Note that the iterates y_i, , may be visualized as being located on the Greek letter ρ as shown in Figure 4.1 (with a tail of the first μ iterates followed by a cycle of length τ). This is how this method derives its name.

Figure 4.1. Iterates in Pollard’s rho method

Algorithm 4.1 takes an expected running time . Since , Pollard’s rho method runs in expected time .

Algorithm 4.1. Pollard’s rho method

Input: A composite integer .

Output: A non-trivial factor of n.

Steps:

Choose a random element and set ξ := x and r := 1.

while (1) {
   for s = 1, . . . , r {
       x := f(x).
       d := gcd(x – ξ, n).
       if (1 < d < n) { Return d. }
   }
   ξ := x.
   r := 2r.
}

Many modifications of Pollard’s rho method have been proposed in the literature. Perhaps the most notable one is an idea due to R. P. Brent. All these modifications considerably speed up Algorithm 4.1, though leaving the complexity essentially the same, that is, . We will not describe these modifications in this book.

Pollard’s p – 1 method

Pollard’s p – 1 method is dependent on the prime factors of p – 1 for a prime divisor p of n. Indeed if p – 1 is rather smooth, this method may extract a (non-trivial) factor of n pretty fast, even when p itself is quite large. To start with we extend the definition of smoothness as follows.

Definition 4.1.

Let . An integer x is called y-power-smooth if, whenever a prime power p^e divides x, we have p^e ≤ y. Clearly, a y-power-smooth integer is y-smooth, but not necessarily conversely.

Let p be an (unknown) prime divisor of n. We may assume, without loss of generality, that . Assume that p–1 is M-power-smooth. Then (p – 1)| lcm(1, . . . , M) and, therefore, for an integer a with gcd(a, n) = 1 (and hence with gcd(a, p) = 1), we have a^lcm(1,...,M) ≡ 1 (mod p) by Fermat’s little theorem, that is, d := gcd(a^lcm(1,...,M) – 1, n) > 1. If d ≠ n, then d is a non-trivial factor of n. In case we have d = n (a very rare occurrence), we may try with another a or declare failure.

The problem with this method is that p and so M are not known in advance. One may proceed by guessing successively increasing values of M, till the method succeeds. In the worst case, that is, when p is a safe prime, we have M = (p – 1)/2. Since , this algorithm runs in a worst-case time of . However, if M is quite small, then this algorithm is rather efficient, irrespective of how large p itself is.

In Algorithm 4.2, we give a variant of the p – 1 method, where we supply a predetermined value of the bound M. We also assume that we have at our disposal a precalculated list of all primes q₁, . . . , q_t ≤ M.

There is a modification of this algorithm known as Stage 2 or the second stage. For this, we choose a second bound M′ larger than M. Assume that p – 1 = rq, where r is M-power-smooth and q is a prime in the range M < q ≤ M′. In this case, Stage 2 computes with high probability a factor of n after doing an operations as follows. When Algorithm 4.2 returns “failure” at the last step, it has already computed the value A := a^m (mod n), where , e_i = ⌊ln M/ln q_i⌋. In this case, A has the multiplicative order of q modulo p, that is, the subgroup H of generated by A has order q. We choose random integers . By the birthday paradox (Exercise 2.172), we have with high probability A^l_i ≡ A^l_j (mod p) for some i ≠ j. In that case, d := gcd(A^l_i – A^l_j, n) is divisible by p and is a desired factor of n (unless d = n, a case that occurs with a very low probability). In practice, we do not know q and so we determine s and the integers l₁, . . . , l_s using the bound M′ instead of q.

Algorithm 4.2. Pollard’s p – 1 method

Input: A composite integer , a bound M and all primes q₁, . . . , q_t ≤ M.

Output: A non-trivial factor d of n or “failure”.

Steps:

Select a random integer a, 1 < a < n. /* For example, we may take a := 2 */

if (d := gcd(a, n) ≠ 1) { Return d. }
for i = 1, . . . , t {
    e_i := ⌊ln M/ln q_i⌋.
    .
    d := gcd(a – 1, n)
    if (1 < d < n) { Return d. }
    if (d = n) { Return “failure”. }  /* Or repeat the for loop with another a */
    if (d = 1) { Return “failure”. }
}
Return “failure”.

In another variant of Stage 2, we compute the powers A^q_t+1 , . . . , A^q_t′ (mod n), where q_t+1, . . . , q_t′ are all the primes q_j satisfying M < q_j ≤ M′. If p – 1 = rq is of the desired form, we would find q = q_j for some t < j ≤ t′, and then gcd(A^q – 1, n), if not equal to n, would be a non-trivial factor of n.

In practice, one may try one’s luck using this algorithm for some M in the range 10⁵ ≤ M ≤ 10⁶ (and possibly also the second stage with 10⁶ ≤ M′ ≤ 10⁸) before attempting a more sophisticated algorithm like the MPQSM, the ECM or the NFSM.

Williams’ p + 1 method

As always, we assume that n is a composite integer and that p is an (unknown) prime divisor of n. Pollard’s p – 1 method uses an element a in the group whose multiplicative order is p – 1. The idea of Williams’ p + 1 method is very similar, that is, it works with an element a, this time in , whose multiplicative order is p + 1. If p + 1 is M-power-smooth for a reasonably small bound M, then computing d := gcd(a^p+1 – 1, n) > 1 splits n with high probability.

In order to find an element of order p + 1, we proceed as follows. Let α be an integer such that α² – 4 is a quadratic non-residue modulo p. Then the polynomial is irreducible and . Let a, be the two roots of f. Then ab = 1 and a + b = α. Since f(a^p) = 0 (check it!) and since , we have a^p = b = a^–1, that is, a^p+1 = 1.

Unfortunately, p is not known in advance. Therefore, we represent elements of as integers modulo n and the elements of as polynomials c₀ + c₁X with c₀, . Multiplying two such elements of is accomplished by multiplying the two polynomials representing these elements modulo the defining polynomial f(X), the coefficient arithmetic being that of . This gives us a way to do exponentiations in in order to compute a^m – 1 for a suitable m (for example, m = lcm(1, . . . , M)).

However, the absence of knowledge of p has a graver consequence, namely, it is impossible to decide whether α² – 4 is a quadratic non-residue modulo p for a given integer α. The only thing we can do is to try several random values of α. This is justified, because if k random integers α are tried, then the probability that for all of these α the integers α² – 4 are quadratic residues modulo p is only 1/2^k.

The code for the p + 1 method is very similar to Algorithm 4.2. We urge the reader to complete the details. Since p³ – 1 = (p – 1)(p² + p + 1), p⁴ – 1 = (p² – 1)(p² + 1) and so on, we can work in higher extensions like , to find elements of order p² + p + 1, p² + 1 and so on, and generalize the p ± 1 methods. However, the integers p² + p + 1, p² + 1, being large (compared to p ± 1), have smaller chance of being M-smooth (or M-power-smooth) for a given bound M.

The reader should have recognized why we paid attention to strong primes and safe primes (Definition 3.5, p 199, and Algorithm 3.14, p 200). Let us now concentrate on the recent developments in the IFP arena.

4.3.2. The Quadratic Sieve Method

Carl Pomerance’s quadratic sieve method (QSM) is one of the (reasonably) successful modern methods of factoring integers. Though the number field sieve factoring method is the current champion, there was a time in the recent past when the quadratic sieve method and the elliptic curve method were known to be the fastest algorithms for solving the IFP.

The basic algorithm

We assume that n is a composite integer which is not a perfect square (because it is easy to detect if n is a perfect square and if so, we replace n by ). The basic idea is to reach at a congruence of the form

Equation 4.1

with x ≢ ±y (mod n). In that case, gcd(x – y, n) is a non-trivial factor of n.

We start with a factor base B = {q₁, . . . , q_t} comprising the first t primes and let and J := H² – n. Then H and J are each and hence for a small integer c the right side of the congruence

(H + c)² ≡ J + 2cH + c² (mod n)

is also . We try to factor T(c) := J + 2cH + c² using trial divisions by elements of B. If the factorization is successful, that is, if T(c) is B-smooth, then we get a relation of the form

Equation 4.2

where . (Note that T(c) ≠ 0, since n is assumed not to be a perfect square.) If all α_i are even, say, α_i = 2β_i, then we get the desired Congruence (4.1) with and y = H + c. But this is rarely the case. So we keep on generating other relations. After sufficiently many relations are available, we combine these together (by multiplication) to get Congruence (4.1) and compute gcd(x – y, n). If this does not give a non-trivial factor, we try to recombine the collected relations in order to get another Congruence (4.1). This is how Pomerance’s QSM works.

In order to find suitable combinations for yielding Congruence (4.1), we employ a method similar to Gaussian elimination. Assume that we have collected r relations of the form

We search for integers such that the product

is a desired Congruence (4.1). The left side of this congruence is already a square. In order to make the right side a square too, we have to essentially solve the following system of linear congruences modulo 2:

This is a system of t equations over in r unknowns β₁, . . . , β_r and is expected to have solutions, if r is slightly larger than t. Note that the values of α_ij modulo 2 are only needed for solving the above linear system. This means that we can have a compact representation of the coefficient matrix (α_ij) by packing 32 of the coefficients as bits per word. Gaussian elimination (over ) can be done using bit operations only.

The running time of this method can be derived using Corollary 4.1. Note that the integers T(c) that are tested for B-smoothness are O(n^1/2) which corresponds to α = 1/2 in the corollary. We take q_t = L[1/2] (so that t = L[1/2]/ ln L[1/2] = L[1/2] by the prime number theorem) which corresponds to β = 1/2. Assuming that the integers T(c) behave as random integers of magnitude O(n^1/2), the probability that one such T(c) is B-smooth is L[–1/2]. Therefore, if L[1] values of c are tried, we expect to get L[1/2] relations involving the L[1/2] primes q₁, . . . , q_t. Combining these relations by Gaussian elimination is now expected to produce a non-trivial Congruence (4.1). This gives us a running-time of the order of L[3/2] for the relation collection stage. Gaussian elimination using L[1/2] unknowns also takes asymptotically the same time. However, each T(c) can have at most O(log n) distinct prime factors, implying that Relation (4.2) is necessarily sparse. This sparsity can be effectively exploited and the Gaussian elimination can be done essentially in time L[1]. Nevertheless, the entire procedure runs in time L[3/2], a subexponential expression in ln n.

Sieving

In order to reduce the running time from L[3/2] to L[1], we employ what is known as sieving (and from which the algorithm derives its name). Let us fix a priori the sieving interval, that is, the values of c for which T(c) is tested for B-smoothness, to be –M ≤ c ≤ M, where M = L[1]. Let be a small prime (that is, q = q_i for some i = 1, . . . , t). We intend to find out the values of c such that q^h|T(c) for small exponents h = 1, 2, . . . . Since T(c) = J + 2cH + c² = (c + H)² – n, the solvability for c of the condition q^h|T(c) or of q|T(c) is equivalent to the solvability of the congruence (c + H)² ≡ n (mod q). If n is a quadratic non-residue modulo q, no c satisfies the above condition. Consequently, the factor base B may comprise only those primes q for which n is a quadratic residue modulo q (instead of all primes ≤ q_t). So we assume that q meets this condition. We may also assume that q  n, because it is a good strategy to perform trial divisions of n by all the primes in B before we go for sieving. The sieving process makes use of an array indexed by c. We initialize the array location for each c, –M ≤ c ≤ M.

We explain the sieving process only for an odd prime q. The modifications for the case q = 2 are left to the reader as an easy exercise. The congruence x² – n ≡ 0 (mod q) has two distinct solutions for x, say, x₁ and mod q. These correspond to two solutions for c of (H + c)² ≡ n (mod q), namely, c₁ ≡ x₁ – H (mod q) and (mod q). For each value of c in the interval –M ≤ c ≤ M, that is congruent either to c₁ or modulo q, we subtract ln q from . We then lift the solutions x₁ and to the (unique) solutions x₂ and of the congruence x² – n ≡ 0 (mod q²) (Exercise 3.29), compute c₂ ≡ x₂ – H (mod q²) and (mod q²) and for each c in the range –M ≤ c ≤ M congruent to c₂ or modulo q² subtract ln q from . We then again lift to obtain the solutions modulo q³ and proceed as above. We repeat this process of lifting and subtracting ln q from appropriate locations of until we reach a sufficiently large for which neither c_h nor corresponds to any value of c in the range –M ≤ c ≤ M. We then choose another q from the factor base and repeat the procedure explained in this paragraph for this q.

After the sieving procedure is carried out for all small primes q in the factor base B, we check for which c, –M ≤ c ≤ M, the array location is 0. These are precisely the values of c in the indicated range for which T(c) is B-smooth. For each smooth T(c), we then compute Relation (4.2) using trial division (by primes of B).

The sieving process replaces trial divisions (of every T(c) by every q) by subtractions (of ln q from appropriate ). This is intuitively the reason why sieving speeds up the relation collection stage. For a more rigorous analysis of the running time, note that in order to get the desired c_i and modulo qⁱ for each and for each i = 1, . . . , h we have either to compute a square root modulo q (for i = 1) or to solve a congruence (during lifting for i ≥ 2), each of which can be done in polynomial time. Also the bound h on the exponent of q satisfy , that is, h = O(log n). Finally, there are L[1/2] primes in B. Therefore, the computation of the c_i and for all q and i takes a total of L[1/2] time.

Now, we count the total number ν of subtractions of different ln q values from all the locations of the array . The size of is 2M + 1. For each qⁱ, we need to subtract ln q from at most 2 ⌈(2M + 1)/qⁱ⌉ locations (for odd q), and we also have . Therefore, ν is of the order of , where Q is the maximum of all qⁱ and is , and where H_m, , denote the harmonic numbers (Exercise 4.6). But H_m = O(ln m), and so ν = O(2(2M + 1) log n) = L[1], since M = L[1].

The logarithms ln q (as well as the initial array values ln |T(c)|) are irrational numbers and hence need infinite precision for storing. We, however, need to work with only crude approximations of these logarithms, say up to three places after the decimal point. In that case, we cannot take as the criterion for selecting smooth values of T(c), because the approximate representation of logarithms leads to truncation (and/or rounding) errors. In practice, this is not a severe problem, because T(c) is not smooth if and only if it has a prime factor at least as large as q_t+1 (the smallest prime not in B). This implies that at the end of the sieving operation the values of for smooth T(c) are close to 0, whereas those for non-smooth T(c) are much larger (close to a number at least as large as ln q_t+1). Thus we may set the selection criterion for smooth integers as or as ln q_t+1. It is also possible to replace floating point subtraction by integer subtraction by doing the arithmetic on 1000 times the logarithm values. To sum up, the ν = L[1] subtractions the sieving procedure does would be only single-precision operations and hence take a total of L[1] time.

As mentioned earlier, Gaussian elimination with sparse equations can also be performed in time L[1]. So Pomerance’s algorithm with sieving takes time L[1].

Incomplete sieving

Numerous modifications over this basic strategy speed up the algorithm reasonably. One possibility is to do sieving every time only for h = 1 and ignore all higher powers of q. That is, for every q we check which of the integers T(c) are divisible by q and then subtract ln q from the corresponding indices of the array . If some T(c) is divisible by a higher power of q, this strategy fails to subtract ln q the required number of times. As a result, this T(c), even if smooth, may fail to pass the smoothness criterion. This problem can be overcome by increasing the cut-off from 1 (or 0.1 ln q_t+1) to a value ξ ln q_t for some ξ ≥ 1. But then some non-smooth T(c) will pass through the selection criterion in addition to some smooth ones that could not, otherwise, be detected. This is reasonable, because the non-smooth ones can be later filtered out from the smooth ones and one might use even trial divisions to do so. Experimentations show that values of ξ ≤ 2.5 work quite well in practice.

The reason why this strategy performs well is as follows. If q is small, for example q = 2, we should subtract only 0.693 from for every power of 2 dividing T(c). On the other hand, if q is much larger, say q = 1,299,709 (the 10⁵-th prime), then ln q ≈ 14.078 is large. But T(c) would not, in general, be divisible by a high power of this q. This modification, therefore, leads to a situation where the probability that a smooth T(c) is actually detected as smooth is quite high. A few relations would still be missed out even with the modified selection criterion, but that is more than compensated by the speed-up gained by the method. Henceforth, we will call this modified strategy as incomplete sieving and the original strategy (of considering all powers of q) as complete sieving.

Large prime variation

Another trick known as large prime variation also tends to give more usable relations than are available from the original (complete or incomplete) sieving. In this context, we call a prime q′ large, if q′ ∉ B. A value of T(c) is often expected to be B-smooth except for a single large prime factor:

Equation 4.3

with q′ ∉ B. Such a value of T(c) can be easily detected. For example, incomplete sieving with the relaxed selection criterion is expected to give many such relations naturally, whereas for complete sieving, if the left-over of ln |T(c)| in at the end of the subtraction steps is < 2 ln q_t, then this must correspond to a large prime factor < . Instead of throwing away an apparently unusable Equation (4.3 ), we may keep track of them. If a large prime q′ is not large enough (that is, not much larger than q_t), then it might appear on the right side of Equation (4.3) for more than one values of c, and if that is the case, all these relations taken together now become usable for the subsequent Gaussian elimination stage (after including q′ in the factor base). This means that for each large prime occurring more frequently than once, the factor base size increases by 1, whereas the number of relations increases by at least 2. Thus with a little additional effort we enrich the factor base and the relations collected, and this, in turn, increases the probability of finding a useful Congruence (4.1), our ultimate goal. Viewed from another angle, the strategy of large prime variation allows us to start with smaller values of t and/or M and thereby speed up the sieving stage and still end up with a system capable of yielding the desired Congruence (4.1). Note that an increased factor base size leads to a larger system to solve by Gaussian elimination. But this is not a serious problem in practice, because the sieving stage (and not the Gaussian elimination stage) is usually the bottleneck of the running time of the algorithm.

It is natural that the above discussion on handling one large prime is applicable to situations where a T(c) value has more than one large prime factors, say q′ and q″. Such a T(c) value leads to a usable relation if . This situation can be detected by a compositeness test on the non-smooth part of T(c). Subsequently, we have to factor the non-smooth part to obtain the two large primes q′ and q″. This is called two large prime variation. As the size of the integer n to be factored becomes larger, one may go for three and four large prime variations.

We will shortly encounter many other instances of sieving (for solving the IFP and the DLP). Both incomplete sieving and the use of large primes, if carefully applied, help speed up most of these sieving methods much in the same way as they do in connection with the QSM.

The multiple polynomial quadratic sieve

Easy computations (Exercise 4.11) show that the average and maximum of the integers |T(c)| checked for smoothness in the QSM are approximately M H and 2M H respectively. Though these values are theoretically , in practice the factor of M (or 2M) makes the integers |T(c)| somewhat large leading to a poor yield of B-smooth integers for larger values of |c| in the sieving interval. The multiple-polynomial quadratic sieve method (MPQSM) applies a nice trick to reduce these average and maximum values. In the original QSM, we work with a single polynomial in c, namely,

T(c) = J + 2cH + c² = (H + c)² – n.

Now, we work with a more general quadratic polynomial

with W > 0 and V² – UW = n. (The original T(c) corresponds to U = J, V = H and W = 1.) Then we have , that is, in this case a relation looks like

This relation has an additional factor of W that was absent in Relation (4.2). However, if W is chosen to be a prime (possibly a large one), then the Gaussian elimination stage proceeds exactly as in the original method. Indeed in this case W appears in every relation and hence poses no problem. Only the integers need to be checked for B-smoothness and hence should have small values. The sieving procedure (that is, computing the appropriate locations of for subtracting ln q, ) for the general polynomial is very much similar to that for T(c). The details are left to the reader as an easy exercise.

Let us now explain how we can choose the parameters U, V, W. To start with we fix a suitable sieving interval and then choose W to be a prime close to such that n is a quadratic residue modulo W. Then we compute a square root V of n modulo W (Algorithm 3.16) and finally take U = (V² – n)/W. This choice clearly gives and . (Indeed one may choose 0 < V < W/2, but this is not an important issue.) Now, the maximum value of becomes . Thus even for , this maximum value is smaller by a factor of than the maximum value of |T(c)| in the original QSM. Moreover, we may choose somewhat smaller values of (compared to M) by working with several polynomials corresponding to different choices for the prime W. This is why the MPQSM, despite having the same theoretical running-time (L[1]) as the original QSM, runs faster in practice.

Parallelization

The QSM is highly parallelizable. More specifically, different processors can handle pairwise disjoint subsets of B during the sieving process. That is, each processor P maintains a local array indexed by c, –M ≤ c ≤ M. The (local) sieving process at P starts with initializing all the locations to 0. For each prime q in the subset B_P of the factor base B assigned to P, one adds ln q to appropriate locations (and appropriate numbers of times). After all these processors finish local sieving, a central processor computes, for each c in the sieving interval, the value ln (where the sum extends over all processors P which have done local sieving) based on which T(c) is recognized as smooth or not. For the multiple-polynomial variant of the QSM, different processors might handle different polynomials and/or different subsets of B.

TWINKLE: Shamir’s factoring device

Adi Shamir has proposed the complete design of a (hardware) device, TWINKLE (The Weizmann INstitute Key Location Engine), that can perform the sieving stage of QSM a hundred to thousand times faster than software implementations in usual PCs available nowadays. This speed-up is obtained by using a high clock speed (10 GHz) and opto-electronic technology for detecting smooth integers. Each TWINKLE, if mass produced, has an estimated cost of US $5,000.

The working of TWINKLE is described in Figure 4.2. It uses an opaque cylinder of a height of about 10 inches and a diameter of about 6 inches. At the bottom of the cylinder is an array of LEDs,^[1] each LED representing a prime in the factor base. The i-th LED (corresponding to the i-th prime q_i) emits light of intensity proportional to log q_i. The device is clocked and the i-th LED emits light only during the clock cycles c for which q_i|T(c). The light emitted by all the active LEDs at a given clock cycle is focused by a lens and a photo-detector senses the total emitted light. If this total light exceeds a certain threshold, the corresponding clock cycle (that is, the time c) is reported to a PC attached to TWINKLE. The PC then analyses the particular T(c) for smoothness over {q₁, . . . , q_t} by trial division.

^[1] An LED (light emitting diode) is an electronic device that emits light, when current passes through it. A GaAs(Gallium arsenide)-based LED emits (infra-red) light of wavelength ~870 nano-meters. In the operational range of an LED, the intensity of emitted light is roughly proportional to the current passing through the LED.

Figure 4.2. Working of TWINKLE

Thus, TWINKLE implements incomplete sieving by opto-electronic means. The major difference between TWINKLE’s sieving and software sieving is that in the latter we used an array of times (the c values) and the iteration went over the set of small primes. In TWINKLE, we use an array of small primes and allow time to iterate over the different values of c in the sieving interval –M ≤ c ≤ M. An electronic circuit in TWINKLE computes for each LED the cycles c at which that LED is expected to emanate light. That is to say that the i-th LED emits light only in the clock cycles c congruent modulo q_i to any of the two solutions c₁ and of T(c) ≡ 0 (mod q_i). Shamir’s original design uses two LEDs for each prime q_i, one corresponding to c₁, the other to . In that case, each LED emits light at regularly spaced clock cycles and this simplifies the electronic circuitry (at the cost of having twice the number of LEDs).

Another difference of TWINKLE from software sieving is that here we add the log q_i values (to zero) instead of subtracting them from log |T(c)|. By Exercise 4.11, the values |T(c)| typically have variations by small constant factors. Taking logs reduces this variation further and, therefore, comparing the sum of the active log q_i values for a given c with a fixed predefined threshold (say log M H) independent of c is a neat way of bypassing the computation of all log |T(c)|, –M ≤ c ≤ M. (This strategy can also be used for software sieving.)

The reasons, why TWINKLE speeds up the sieving procedure over software implementations in conventional PCs, are the following:

Silicon-based PC chips at present can withstand clock frequencies on the order of 1 GHz. On the contrary a GaAs-based wafer containing the LED array can be clocked faster than 10 GHz.
There is no need to initialize the array (to log |T(c)| or zero). Similarly at the end, there is no need to compare the final values in all these array locations with a threshold.
The addition of all the log q_i values effective at a given c is done instantly by analog optical means. We do not require an explicit electronic adder.

Shamir [269] reports the full details of a VLSI^[2] design of TWINKLE.

^[2] very large-scale integration

*4.3.3. Factorization Using Elliptic Curves

H. W. Lenstra’s elliptic curve method (ECM) is another modern algorithm to solve the IFP and runs in expected time , where p is the smallest prime factor of n (the integer to be factored). Since , this running time is L[1] = L(n, 1/2, 1): that is, the same as the QSM. However, if p is small (that is, if p = O(n^α) for some α < 1/2), then the ECM is expected to outperform the QSM, since the working of the QSM is incapable of exploiting smaller values of p.

As before, let n be a composite natural number having no small prime divisors and let p be the smallest prime divisor of n. For denoting subexponential expressions in ln p, we use the symbol L_p[c] := L(p, 1/2, c), whereas the unsubscripted symbol L[c] stands for L(n, 1/2, c). We work with random elliptic curves

and consider the group of rational points on E modulo p. However, since p is not known a priori, we intend to work modulo n. The canonical surjection allows us to identify the -rational points on E as points on E over . We now define a bound and let B = {q₁, . . . , q_t} be all the primes smaller than or equal to M, so that by the prime number theorem (Theorem 2.20) #B ≈ M/ln . Of course, p is not known in advance, so that M and B are also not known. We will discuss about the choice of M and B later. For the time being, let us assume that we know some approximate value of p, so that M and B can be fixed, at least approximately, ate the beginning of the algorithm.

By Hasse’s theorem (Theorem 2.48, p 106), the cardinality satisfies , that is, ν = O(p). If we make the heuristic assumption that ν is a random integer on the order O(p), then Corollary 4.1 tells us that ν is B-smooth with probability . This assumption is certainly not rigorous, but accepting it gives us a way to analyse the running time of the algorithm.

If random curves are tried, then we expect to find one B-smooth value of ν. In this case, a non-trivial factor of n can be computed with high probability as follows. Define e_i := ⌊ln n/ln q_i⌋ for i = 1, . . . , t, and , where t is the number of primes in B. If ν is B-smooth, then ν|m and, therefore, for any point we have . Computation of mP involves computation of many sums P₁ + P₂ of points P₁ := (h₁, k₁) and P₂ := (h₂, k₂). At some point of time, we would certainly compute , that is, P₁ = –P₂, that is, h₁ ≡ h₂ (mod p) and k₁ ≡ –k₂ (mod p). Since p was unknown, we worked modulo n, that is, the values of h₁, h₂, k₁ and k₂ are known modulo n. Let d := gcd(h₁ – h₂, n). Then p|d and if d ≠ n (the case d = n has a very small probability!), we have the non-trivial factor d of n. The computation of the coordinates of P₁ + P₂ (assuming P₁ ≠ P₂) demands computing the inverse of h₁ – h₂ modulo n (Section 2.11.2). However, if d = gcd(h₁ – h₂, n) ≠ 1, then this inverse does not exist and so the computation of P₁ + P₂ fails, and we have a non-trivial factor of n. If ν is B-smooth, then the computation of mP is bound to fail. The basic steps of the ECM are then as shown in Algorithm 4.3.

Algorithm 4.3. Elliptic curve method (ECM)

Input: A composite integer (with no small prime factors).

Output: A non-trivial divisor d of n.

Steps:

while (1) {
   Select a random curve E : Y² = X³ + aX + b modulo n.
   Choose a point  in .
   Try to compute mP.   /* where m is as defined in the text */
   if (the computation of mP fails) {
       /* We have found a divisor d > 1 of n */
       if (d ≠ n) { Return d. }
   }
}

Before we derive the running time of the ECM, some comments are in order. A random curve E is chosen by selecting random integers a and b modulo n. It turns out that taking a as single-precision integers and b = 1 works quite well in practice. Indeed one can keep on trying the values a = 0, 1, 2, . . . successively. Note that the curve E is an elliptic curve, that is, non-singular, if and only if δ := gcd(n, 4a³ + 27b²) = 1. However, having δ > 1 is an extremely rare occurrence and one might skip the computation of δ before starting the trial with a curve. The choice b = 1 is attractive, because in that case we may take the point P = (0, 1). In Section 3.6, we have described a strategy to find a random point on an elliptic curve over a field K. This is based on the assumption that computing square roots in K is easy. The same method can be applied to curves over , but n being composite, it is difficult to compute square roots modulo n. So taking b to be 1 (or the square of a known integer) is indeed a pragmatic decision. After all, we do not need P to be a random point on E.

Recall that we have taken , where e_i = ⌊ln n/ln q_i⌋. If instead we take e_i := ⌊ln M/ln q_i⌋ (where M is the bound mentioned earlier), the computation of mP per trial reduces much, whereas the probability of a successful trial (that is, a failure of computing mP) does not decrease much. The integer m can be quite large. One, however, need not compute m explicitly, but proceed as follows: first take Q₀ := P and subsequently for each i = 1, . . . , t compute . One finally gets mP = Q_t.

Now comes the analysis of the running time of the ECM. We have fixed the parameter M to be , so that B contains small primes. The most expensive part of a trial with a random elliptic curve is the (attempted) computation of the point mP. This involves additions of points. Since an expected number of elliptic curves needs to be tried for finding a non-trivial factor of n, the algorithm performs an expected number of additions of points on curves modulo n. Since each such addition can be done in polynomial time, the announced running time follows.

Note that is the optimal running time of the ECM and can be shown to be achieved by taking . But, in practice, p is not known a priori. Various ad hoc ways may be adopted to get around with this difficulty. One possibility is to use the worst-case bound . For example, for factoring integers of the form n = pq, where p and q are primes of roughly the same size, this is a good approximation for p. Another strategy is to start with a small value of M and increase M gradually with the number of trials performed. For larger values of M, the probability of a successful trial increases implying that less number of elliptic curves needs to be tried, whereas the time per iteration (that is, for the computation of mP) increases. In other words, the total running time of the ECM is apparently not very sensitive to the choice of M.

A second stage can be used for each elliptic curve in order to increase the probability of a trial being successful. A strategy very similar to the second stage of the p – 1 method can be employed. The reader is urged to fill out the details. Employing the second stage leads to reasonable speed-up in practice, though it does not affect the asymptotic running time.

The ECM can be effectively parallelized, since different processors can carry out the trials, that is, computations of mP (together with the second stage) with different sets of (random) elliptic curves.

**4.3.4. The Number Field Sieve Method

The number field sieve method (NFSM) is till date the most successful of all integer factoring algorithms. Under certain heuristic assumptions it achieves a running time of the form L(n, 1/3, c), which is better than the L(n, 1/2, c′) algorithms described so far. The NFSM was first designed for integers of a special form. This variant of the NFSM is called the special NFS method (SNFSM) and was later modified to the general NFS method (GNFSM) that can handle arbitrary integers. The running time of the SNFSM has c = (32/9)^1/3 ≈ 1.526, whereas that for the GNFSM has c = (64/9)^1/3 ≈ 1.923. For the sake of simplicity, we describe only the SNFSM in this book (see Cohen [56] and Lenstra and Lenstra [165] for further details).

We choose an integer and a polynomial such that f(m) ≡ 0 (mod n). We assume that f is irreducible in ; otherwise a non-trivial factor of f yields a non-trivial factor of n. Consider the number field . Let d := deg f be the degree of the number field K. We use the complex embedding for some root of f. The special NFS method makes certain simplifying assumptions:

f is monic, so that .
is monogenic.
is a PID.

Consider the ring homomorphism

This is well-defined, since f(m) ≡ 0 (mod n). We choose small coprime (rational) integers a, b and note that Φ(a+bα) = a + bm (mod n). Let be a predetermined smoothness bound. Assume that for a given pair (a, b), both a + bm and a + bα are B-smooth. For the rational integer a + bm, this means

being the set of all rational primes ≤ B. On the other hand, smoothness of the algebraic integer a + bα means that the principal ideal is a product of prime ideals of prime norms ≤ B; that is, we have a factorization

where is the set of all prime ideals of of prime norms ≤ B. By assumption, each is a principal ideal. Let denote a set of generators, one for each ideal in . Further let denote a set of generators of the multiplicative group of units of . The smoothness of a + bα can, therefore, be rephrased as

Equation 4.4

Applying Φ then yields

This is a relation for the SNFSM. After relations are available, Gaussian elimination modulo 2 (as in the case of the QSM) is expected to give us a congruence of the form

x² ≡ y² (mod n),

and gcd(x – y, n) is possibly a non-trivial factor of n. This is the basic strategy of the SNFSM. We clarify some details now.

Selecting the polynomial f(X)

There is no clearly specified way to select the polynomial f for defining the number field . We require f to have small coefficients. Typically, m is much smaller than n and one writes the expansion of n in base m as n = b_tm^t + b_t–1m^t–1 + ··· + b₁m + b₀ with 0 ≤ b_i < m. Taking f(X) = b_tX^t + b_t–1X^t–1 + ··· + b₁X + b₀ is often suggested.

For integers n of certain special forms, we have natural choices for f. The seminal paper on the NFSM by Lenstra et al. [167] assumes that n = r^e – s for a small integer and a non-zero integer s with small absolute value. In this case, one first chooses a small extension degree and sets m := r^⌈e/d⌉ and f(X) := X^d – sr^{⌈e/d⌉d–e}. Typically, d = 5 works quite well in practice. Lenstra et al. report the implementation of the SNFSM for factoring n = 3²³⁹ – 1. The parameters chosen are d = 5, m = 3⁴⁸ and f(X) = X⁵ – 3. In this case, is monogenic and a PID.

Construction of

Take a small rational prime . From Section 2.13, it follows that if is the factorization of the canonical image of f(X) modulo p, then , i = 1, . . . , r, are all the primes lying over p. We have also seen that , , is prime if and only if d_i = 1, that is, for some . Thus, each root of in corresponds to a prime ideal of of prime norm p.

To sum up, a prime ideal in of prime norm is specified by a pair (p, c_p) of values (in ). We denote this ideal by . All ideals in can be precomputed by finding the roots of the defining polynomial f(X) modulo the small primes p ≤ B. One can use the root-finding algorithms of Exercise 3.29.

Construction of

Constructing a set of generators of ideals in is a costly operation. We have just seen that each prime ideal of corresponds to a pair (p, c_p) and is a principal ideal by assumption. A generator of such an ideal is an element of the form , , with N(g_{p,c_p}) = ±p and (mod p). Algorithm 4.4 (quoted from Lenstra et al. [167]) computes the generators g_{p,c_p} for all relevant pairs (p, c_p). The first for loop exhaustively searches over all small polynomials h(α) in order to locate for each (p, c_p) an element of norm kp with |k| as small as possible. If the smallest k (stored in a_{p,c_p}) is ±1, is already a generator g_{p,c_p} of , else some additional adjustments need to be performed.

Algorithm 4.4. Construction of generators of ideals for the SNFSM

Choose two suitable positive constants a_B and C_B (depending on B and K).

Initialize an array a_{p,c_p} := a_B indexed by the relevant pairs (p, c_p).

for each , , with , N(h) = kp, p ≤ B,
     \ {0}, |k| < min(p, a_B) {
    Find c_p such that .    /* Root finding */
    if (|k| < |a_{p,c_p}|) {
       /* Store the least k and the corresponding h found so far */
       a_{p,c_p} := k, .
    }
}
for each relevant pair (p, c_p) {
    if (a_{p,c_p} = ±1)    /* The more frequent case */
    else {
       Locate a  with N(g) = a_{p,c_p}.
       .
    }
}

Construction of

Let K have the signature (r₁, r₂). Write ρ = r₁ + r₂ – 1. By Dirichlet’s unit theorem, the group of units of is generated by an appropriate root u₀ of unity and ρ multiplicatively independent^[3] elements u₁, . . . , u_ρ of infinite order. Each unit u of has norm N(u) = ±1. Thus, one may keep on generating elements , h_i small integers, of norm ±1, until ρ independent elements are found. Many elements of are available as a by-product during the construction of , which involves the computation of norms of many elements in . For a more general exposition on this topic, see Algorithm 6.5.9 of Cohen [56].

^[3] The elements u₁, . . . , u_ρ in a (multiplicatively written) group are called (multiplicatively) independent if , , is the group identity only for n₁ = ··· = n_ρ = 0.

Computing the factorization of a + bα

In order to compute the factorization of Equation (4.4), we first factor the integer N(a + bα) = b^df(–a/b). If is the prime factorization of 〈a + bα〉 with pairwise distinct prime ideals of , by the multiplicative property of norms we obtain .

Now, let p ≤ B be a small prime. If p  N(a + bα), it is clear that no prime ideal of of norm p (or a power of p) appears in the factorization of 〈a + bα〉. On the other hand, if p| N(a + bα), then for some . The assumption implies that the inertial degree of is 1: that is, , that is, , that is, there is a c_p with f(c_p) ≡ 0 (mod p) such that the prime ideal corresponds to the pair (p, c_p). In this case, we have a ≡ –c_pb (mod p). Assume that another prime ideal of norm p appears in the prime factorization of 〈a + bα〉. If corresponds to the pair p, , then . Since c_p and are distinct modulo p, it follows that p|gcd(a, b), a contradiction, since gcd(a, b) = 1. Thus, a unique ideal of norm p appears in the factorization of 〈a + bα〉. Moreover, the multiplicity of in the factorization of 〈a + bα〉 is the same as the multiplicity v_p(N(a + bα)).

Thus, one may attempt to factorize N(a + bα) using trial divisions by primes ≤ B. If the factorization is successful, that is, if N(a + bα) is B-smooth, then for each prime divisor p of N(a + bα) we find out the ideal and its multiplicity in the factorization of 〈a + bα〉 as explained above. Since we know a generator of each , we eventually compute a factorization , where u is a unit in . What remains is to factor u as a product of elements of . We don’t discuss this step here, but refer the reader to Lenstra et al. [167].

Sieving

In the QSM, we check the smoothness of a single integer T(c) per trial, whereas for the NFS method we do so for two integers, namely, a + bm and N(a + bα). However, both these integers are much smaller than T(c), and the probability that they are simultaneously smooth is larger than the probability that T(c) alone is smooth. This accounts for the better asymptotic performance of the NFS method compared to the QSM.

One has to check the smoothness of a + bm and N(a + bα) for each coprime a, b in a predetermined interval. This check can be carried out efficiently using sieves. We have to use two sieves, one for filtering out the non-smooth a + bm values and the other for filtering out the non-smooth a + bα values. We should have gcd(a, b) = 1, but computing gcd(a, b) for all values of a and b is rather costly. We may instead use a third sieve to throw away the values of a for a given b for which gcd(a, b) is divisible by primes ≤ B. This still leaves us with some pairs (a, b) for which gcd(a, b) > 1. But this is not a serious problem, since such values are small in number and can be later discarded from the list of pairs (a, b) selected by the smoothness test.

We fix b and allow a to vary in the interval –M ≤ a ≤ M for a predetermined bound M. We use an array indexed by a. Before the first sieve we initialize this array to . We may set for those values of a for which gcd(a, b) is known to be > 1 (where +∞ stands for a suitably large positive value). For each small prime p ≤ B and small exponent , we compute a′ := –mb (mod p^h) and subtract ln p from for each a, –M ≤ a ≤ M, with a ≡ a′ (mod p^h). Finally, for each value of a for which is not (close to) 0, that is, for which a + mb is not B-smooth, we set . For the other values of a, we set . One may use incomplete sieving (with a liberal selection criterion) during the first sieve.

The second sieve proceeds as follows. We continue to work with the value of b fixed before the first sieve and with the array available from the first sieve. For each prime ideal , we compute a″ := –bc_p (mod p) and subtract ln p from each location for which a ≡ a″ (mod p). For those a for which ln B for some real ξ ≥ 1, say ξ = 2, we try to factorize a + bα over and . If the attempt is successful, both a + bm and a + bα are smooth. This second sieve is an incomplete one and, therefore, we must use a liberal selection criterion.

The running time of the SNFSM

For deriving the running time of the SNFSM, take d ≤ (3 ln n/(2 ln ln n))^1/3, m = L(n, 2/3, (2/3)^1/3), B = L(n, 1/3, (2/3)^2/3) and M = L(n, 1/3, (2/3)^2/3). From the prime number theorem and from the fact that d is small, it follows that both and have the same asymptotic bound as B. Also meets this bound. We then have L(n, 1/3, (2/3)^2/3) unknown quantities on which we have to do Gaussian elimination.

The integers a + mb have absolute values ≤ L(n, 2/3, (2/3)^1/3). If the coefficients of f are small, then

|N(a + bα)| = |b^df(–a/b)| ≤ L(n, 1/3, d · (2/3)^2/3) = L(n, 2/3, (2/3)^1/3).

Under the heuristic assumption that a + mb and N(a + bα) behave as random integers of magnitude L(n, 2/3, (2/3)^1/3), the probability that both these are B-smooth turns out to be L(n, 1/3, –(2/3)^2/3), and so trying L(n, 1/3, 2(2/3)^2/3) pairs (a, b) is expected to give us L(n, 1/3, (2/3)^2/3) relations. The entire sieving process takes time L(n, 1/3, 2(2/3)^2/3), whereas solving a sparse system in L(n, 1/3, (2/3)^2/3) unknowns can be done essentially in the same time. Thus the running time of the SNFSM is L(n, 1/3, 2(2/3)^2/3) = L(n, 1/3, (32/9)^1/3).

Exercise Set 4.3

4.6

For

, define the harmonic numbers

. Show that for each

we have ln(m + 1) ≤ H_m ≤ 1 + ln m. [H] Deduce that the sequence H_m,

, is not convergent. (Note, however, that the sequence H_m – ln m,

, converges to the constant γ = 0.57721566 . . . known as the Euler constant. It is not known whether γ is rational or not.)

4.7

Let k, c, c′, α be positive constants with α < 1. Prove the following assertions.

.
L(n, α, c)L(n, α, c′) is of the form L(n, α, c + c′).
(ln n)^kL(n, α, c) is again of the form L(n, α, c).
L(n, α, c)n^k is of the form n^k+o(1).

4.8

Let us assume that an adversary C has computing power to carry out 10¹² floating point operations (flops) per second. Let A be an algorithm that computes a certain function P(n) using T(n) flops for an input

. We say that it is infeasible for C to compute P(n) using algorithm A, if it takes ≥ 100 years for the computation or, equivalently, if T(n) ≥ 3.1536 × 10²¹. Find, for the following expressions of T(n), the smallest values of n that make the computation of P(n) by Algorithm A infeasible: T(n) = (ln n)³, T(n) = (ln n)¹⁰, T(n) = n,

, T(n) = n^1/4, T(n) = L[2], T(n) = L[1], T(n) = L[0.5], T(n) = L(n, 1/3, 2) and T(n) = L(n, 1/3, 1). (Neglect the o(1) terms in the definitions of L( ) and L[ ].)

4.9

Let

be an odd integer and let r be the total number of distinct (odd) prime divisors of n. Show that for each integer a the congruence x² ≡ a² (mod n) has ≤ 2^r solutions for x modulo n. If gcd(a, n) = 1, show that this congruence has exactly 2^r solutions. [H]

4.10

Show that the problems IFP and SQRTP are probabilistic polynomial-time equivalent. [H]

4.11

In this exercise, we use the notations introduced in connection with the Quadratic Sieve method for factoring integers (Section 4.3.2). We assume that M ≪ H, since

, whereas M = L[1].

Show that J ≤ 2H – 1.
Prove that the average of the integers |T(c)|, –M ≤ c ≤ M, is and that the maximum of the same integers is |T(M)| = J + 2MH + M² ≈ J + 2MH.
Prove that the average and the maximum of the integers |T(c)|, 0 ≤ c ≤ 2M, are respectively J + 2MH + M(4M + 1)/3 ≈ J + 2MH and |T(2M)| = J + 4MH + 4M² ≈ J + 4MH.
Conclude that it is better to choose the sieving interval as –M ≤ c ≤ M instead of as 0 ≤ c ≤ 2M.

4.12

Reyneri’s cubic sieve method (CSM) Suppose that we want to factor an odd integer n. Suppose also that we know a triple (x, y, z) of integers satisfying x³ ≡ y²z (mod n) with x³ ≠ y²z (as integers). We assume further that |x|, |y|, |z| are all O(p^ξ) for some ξ, 1/3 < ξ < 1/2.

Show that for integers a, b, c with a + b + c = 0 one has
(x + ay)(x + by)(x + cy) ≡ y²T(a, b, c) (mod n),
where T(a, b, c) := z + (ab + ac + bc)x + (abc)y = –b(b + c)(x + cy) + (z – c²x). If x, y, z = O(p^ξ), then T(a, b, c) is O(p^ξ) for small values of a, b, c.
Let . Choose a factor base comprising all primes q₁, . . . , q_t with t = L[α] together with the integers x + ay, –M ≤ a ≤ M, M = L[α]. The size of the factor base is then L[α].
If T(a, b, c) with –M ≤ a, b, c ≤ M and a + b + c = 0 is q_t-smooth, we get a relation for the CSM. Show that trying out the L[2α] pairs (a, b, c) gives us a set of linear congruences of the desired size under the heuristic assumption that the T(a, b, c) values behave as random integers on the order of p^ξ.
Propose a strategy how these linear congruences can be combined (by Gaussian elimination) to get a quadratic congruence of the form u² ≡ v² (mod n).
Design a sieve for checking the smoothness of the expressions T(a, b, c). [H]
Show that the running time of the CSM is . Since ξ < 1/2, the CSM is more efficient than the QSM. For ξ ≈ 1/3, the running time is .
(Remark: It is not known how we can efficiently obtain a solution of x³ ≡ y²z (mod n) with x³ ≠ y²z and |x|, |y|, |z| = O(p^ξ), ξ being as small as possible. For some particular values of n, say, for n of the form x³ – z with small z, a solution is naturally available.)

4.13

Sieve of Eratosthenes Two hundred years before Christ, Eratosthenes proposed a sieve (Algorithm 4.5 ) for computing all primes between 1 and a positive integer n. Prove the correctness of this algorithm and compute its running time. [H]

Algorithm 4.5. The sieve of Eratosthenes

Initialize to zero an array A indexed 2, . . . , n.
for
if (A_k = 0) { for l = 2, . . . , ⌊n/k⌋ { A_lk := 1. } }
}
for k = 2, . . . , n { if (A_k = 0) { Print “k is a prime”. } }

4.14

This exercise proposes an adaptation of the sieve of Eratosthenes for computing a random prime of a given bit length l. In Section 3.4.2, we have described an algorithm for this computation, that generates random (odd) integers of bit length l and checks the primality of each such integer, until a (probable) prime is found. An alternative strategy is to generate a random l-bit odd integer n and check the integers n, n + 2, n + 4, . . . for primality.

Use sieving to design an algorithm that generalizes this second strategy in the sense that it checks for primality only those integers n + r, r = 0, 1, 2, . . . , M, n a random l-bit integer, which are not divisible by the first t primes. In practice, the values 100 ≤ t ≤ 10,000 and M = 10l work quite well. For cryptographic sizes, sieving typically speeds up the generation of naive primes 10 to 100 times.
Generalize the sieve of Part (a) for the computation of safe and strong primes.

4.4. The Finite Field Discrete Logarithm Problem

The discrete logarithm problem (DLP) has attracted somewhat less attention of the research community than the IFP has done. Nonetheless, many algorithms exist to solve the DLP, most of which are direct adaptations of algorithms for solving the IFP. We start with the older algorithms collectively known as the square root methods, since the worst-case running time of each of these is for the field . The new family of algorithms based on the index calculus method provides subexponential solutions to the DLP and is described next. For the sake of simplicity, we assume in this section that we want to compute the discrete logarithm ind_g a of with respect to a primitive element g of . We concentrate only on the fields , p odd prime, and , , since non-prime fields of odd characteristics are only rarely used in cryptography.

4.4.1. Square Root Methods

Square root methods are applicable to any finite (cyclic) group. To avoid repetitions we provide here a generic description. That is, we assume that G is a multiplicatively written group of order n and . The identity of G is denoted as 1. It is not necessary to assume that G is cyclic or that g is a generator of G. However, these assumptions are expected to make the descriptions of the algorithms somewhat easier and hence we will stick to them. The necessary modifications for non-cyclic groups G or non-primitive elements g are rather easy and the reader is requested to fill out the details. We assume that each element of G can be represented by O(lg n) bits (so that the input size is taken to be lg n) and that multiplications, exponentiations and inverses in G can be computed in time polynomially bounded by this input size.

Shanks’ baby-step–giant-step method

Let us assume that the elements of G can be (totally) ordered in such a way that comparing two elements of G with respect to this order can be done in polynomial time of the input size. For example, a natural order on is the relation ≤ on . Note that k elements of G can be sorted (under the above order) using O(k log k) comparisons.

Let . Then d := ind_g a is uniquely determined by two (nonnegative) integers d₀, d₁ < m such that d = d₀ + d₁m (the base m representation of d). In Shanks’ baby-step–giant-step (BSGS) method, we compute d₀ and d₁ as follows. To start with we compute a list of pairs (d₀, g^d₀) for d₀ = 0, 1, . . . , m – 1 and store these pairs in a table sorted with respect to the second coordinate (the baby steps). Now, for each d₁ = 0, 1, . . . , m – 1, we compute g^–md₁ (the giant steps) and search if ag^–md₁ is the second coordinate of a pair (d₀, g^d₀) of some entry in the table mentioned above. If so, we have found the desired d₀ and d₁, otherwise we try with the next value of d₁. An optimized implementation of this strategy is given as Algorithm 4.6.

The computation of all the elements of T and sorting T can be done in time O^~(m). If we use a binary search algorithm (Exercise 4.15), then the search for h in T can be performed using O(lg m) comparisons in G. Therefore, the giant steps also take a total running time of O^~(m). Since , the BSGS method runs in time . The memory requirement of the BSGS (that is, of the table T) is . Thus this method becomes impractical, even when n contains as few as 30 decimal digits.

Pollard’s rho method

Pollard’s rho method for solving the DLP is similar in idea to the method with the same name for solving the IFP. Let be a random map and let us generate a sequence of tuples , , starting with a random (r₁, s₁) and subsequently computing (r_i+1, s_i+1) = f(r_i, s_i) for each i = 1, 2, . . . . The elements for i = 1, 2, . . . can then be thought of as randomly chosen ones from G. By the birthday paradox (Exercise 2.172), we expect to get a match b_i = b_j for some i ≠ j, after of the elements b₁, b₂, . . . are generated. But then we have a^r_i–r_j = g^{s_j –s_i}, that is, ind_g a ≡ (r_i – r_j)^–1(s_j – s_i) (mod n), provided that the inverse exists, that is, gcd(r_i – r_j, n) = 1. The expected running time of this algorithm is , the same as that of the BSGS method, but the storage requirement drops to only O(1) elements of G.

Algorithm 4.6. Shanks’ baby-step–giant-step method

Input: G, g and a as described above.

Output: d = ind_g a.

Steps:

n : = ord(G), .

/* Baby steps */

Initialize T to an empty table.

Insert the pairs (0, 1) and (1, g) in T.

h := g.
for d₀ = 2, . . . , m – 1 {
    h := hg.
    Insert (d₀, h) in T.
}
sort T with respect to the second coordinate.

/* Giant steps */
h := a. l := (g^–1)^m.
for d₁ = 0, . . . , m – 1 {
    if T contains an entry (d₀, h) { Return d := d₀ + d₁m. }
    h := hl.
}

The Pohlig–Hellman method

The Pohlig–Hellman (PH) method assumes that the prime factorization of n = ord is known. Since d := ind_g a is unique modulo n, we can easily compute d using the CRT from a knowledge of d modulo , j = 1, . . . , r. So assume that p is a prime dividing n and that . Let d₀ + d₁p + ··· + d_α–1p^α–1, , be the p-ary representation of d modulo p^α. The p-ary digits d₀, d₁, . . . , d_α–1 can be successively computed as follows.

Let H be the subgroup of G generated by h := g^n/p. We have ord H = p (Exercise 2.44). For the computation of d_i, , from the knowledge of d₀, . . . , d_i–1, consider the element

[View full size image]

But ord(g^n/pⁱ⁺¹) = pⁱ⁺¹, so that

[View full size image]

Thus, and d_i = ind_h b, that is, each d_i can be obtained by computing a discrete logarithm in the group H of order p (using the BSGS method or the rho method).

From the prime factorization of n, we see that the computations of d modulo for all j = 1, . . . , r can be done in time , q being the largest prime factor of n, since α_j and r are O(log n). Combining the values of d modulo by the CRT can be done in polynomial time (in log n). In the worst case, q = O(n) and the PH method takes time which is fully exponential in the input size log n. But if q (or, equivalently, all the prime divisors p₁, . . . , p_r of n) are small, then the PH method runs quite efficiently. In particular, if q = O((log p)^c) for some (small) constant c, then the PH method computes discrete logarithms in G in polynomial time. This fact has an important bearing on the selection of a group G for cryptographic applications, namely, n = ord G is required to have a suitably large prime divisor, so that the PH method cannot compute discrete logarithms in G in feasible time.

4.4.2. The Index Calculus Method

The index calculus method (ICM) is not applicable to all (cyclic) groups. But whenever it applies, it usually leads to the fastest algorithms to solve the DLP. Several variants of the ICM are used for prime finite fields and also for finite fields of characteristic 2. On such a field they achieve subexponential running times of the order of L(q, 1/2, c) = L[c] or L(q, 1/3, c) for positive constants c. We start with a generic description of the ICM. We assume that g is a primitive element of and want to compute d := ind_g a for some .

To start with we fix a suitable subset B = {b₁, . . . , b_t} of of small cardinality, so that a reasonably large fraction of the elements of can be expressed easily as products of elements of B. We call B a factor base. In the ICM, we search for relations of the form

Equation 4.5

for integers α, β, γ_i and δ_i. This gives us a linear congruence

Equation 4.6

The ICM proceeds in two^[4] stages. In the first stage, we compute d_i := ind_g b_i for each element b_i in the factor base B. For that, we collect Relation (4.5) with β = 0. When sufficiently many relations are available, the corresponding system of linear Congruences (4.6) is solved mod q – 1 for the unknowns d_i. In the second stage, a single relation with gcd(β, q – 1) = 1 is found. Substituting the values of d_i available from the first stage yields ind_g a.

^[4] Some authors prefer to say that the number of stages in the ICM in actually three, because they decouple the congruence-solving phase from the first stage. This is indeed justified, since implementations by several researchers reveal that for large fields this linear algebra part often demands running time comparable to that needed by the relation collection part. Our philosophy is to call the entire precomputation work the first stage. Now, although it hardly matters, it is up to the reader which camp she wants to join.

Note that as long as q (and g) are fixed, we don’t have to carry out the first stage every time the discrete logarithm of an element of is to be computed. If the values d_i, i = 1, . . . , t, are stored, then only the second stage needs to be carried out for computing the indices of any number of elements of . This is the reason why the first stage of the ICM is often called the precomputation stage.

In order to make the algorithm more concrete, we have to specify:

how to choose a factor base B;
how to find Relation (4.5);
how to solve a linear system of congruences modulo q – 1 (in particular, when the system is sparse).

In the rest of this section, we describe variants of the ICM based on their strategies for selecting the factor base and for collecting relations. We discuss the third issue in Section 4.7.

4.4.3. Algorithms for Prime Fields

Let be a finite field of prime cardinality. For cryptographic applications, p should be quite large, say, of length around thousand bits or more, and so naturally p is odd. Elements of are canonically represented as integers between (and including) 0 and p–1. The equality x = y in means equality of two integers in the range 0, . . . , p–1, whereas x ≡ y (mod p) means that the two integers x and y may be different, but their equivalence classes in are the same.

The basic ICM

In the basic version of the ICM, we choose the factor base B to comprise the first t primes q₁, . . . , q_t, where t = L[ζ]. (The optimal value of ζ is determined below.) In the first stage, we choose random values of and compute g^α. Any integer representing g^α can be considered, but we think of as an integer in {1, . . . , p – 1}. We then try to factorize g^α using trial divisions by elements of the factor base B. If g^α is found to be B-smooth, then we get a desired relation for the first stage, namely,

If g^α is not B-smooth, we try another random α and proceed as above. After sufficiently many relations are available, we solve the resulting system of linear congruences modulo p – 1. This gives us d_i := ind_g q_i for i = 1, . . . , t.

In the second stage, we again choose random integers α and try to factorize ag^α completely over B. Once the factorization gets successful, that is, we have , we compute .

In order to optimize the running time, we note that the relation collection phase of the first stage is usually the bottleneck of the algorithm. If ζ (or equivalently t) is chosen to be too small, then finding B-smooth integers would be very difficult. On the other hand, if ζ is too large, then we have to collect too many relations to have a solvable linear system of congruences. More explicitly, since the integers g^α can be regarded as random integers of the order of p, the probability that g^α is B-smooth is (Corollary 4.1). Thus we expect to get each relation after random values of α are tried. Since for each α we need to carry out L[ζ] divisions by elements of the factor base B (the exponentiation g^α can be done in polynomial time and hence can be neglected for this analysis), each relation can be found in expected time . Now, in order to solve for d_i, i = 1, . . . , t, we must have (slightly more than) t = L[ζ] relations. Thus, the relation collection phase takes a total time of . It can be easily checked that is minimized for ζ = 1/2. This gives a running time of L[2] for the relation collection phase.

Since each g^α is a positive integer less than p, it is evident that it can have at most O(log p) prime divisors. In other words, the congruences collected are necessarily sparse. As we will see later, such a system can be solved in time O^~(t²), that is, in time L[1] for ζ = 1/2.

In the second stage, it is sufficient to have a single relation to compute d = ind_g a. As explained before, such a relation can be found in expected time . Thus the total running time of the basic ICM is L[2].

The second stage of the basic ICM is much faster than the first stage. In fact, this is a typical phenomenon associated with most variants of the ICM. Speeding up the first stage is, therefore, our primary concern.

Each step in the search for relations consists of an exponentiation (g^α) modulo p followed by trial divisions by q₁, . . . , q_t. Now, g^α may be non-smooth, but g^α + kp (integer sum) may be smooth for some . Once g^α is computed and found to be non-smooth, one can check for the smoothness of g^α + kp for k = ±1, ±2, . . ., before another α is tried. Since these integers are available by addition (or subtraction) only (which is much faster than exponentiation), this strategy tends to speed up the relation collection phase. Moreover, information about the divisibility of g^α + kp by q_i can be obtained from that of g^α + (k – 1)p by q_i. So using suitable tricks one might reduce the cost of trial divisions. Two such possibilities are explored in Exercise 4.18. Though these modifications lead to some speed-up in practice, they have the disadvantage that as |k| increases, the size of |g^α+kp| also increases, so that the chance of getting smooth candidates reduces, and therefore using high values of k does not effectively help.

There are other heuristic modification schemes that help us gain some speed-up in practice. For example, the large prime variation as discussed in connection with the QSM applies equally well here. Another trick is to use the early abort strategy. A random B-smooth integer has higher probability of having many small prime factors rather than a few large prime factors. This observation can be incorporated in the smoothness tests as follows. Let us assume that we do trial divisions by the small primes in the order q₁, q₂, . . . , q_t. After we do trial divisions of a candidate x by the first t′ < t primes (say, t′ ≈ t/2), we check how far we have been able to reduce x. If the reduction of x is already substantial, we continue with the trial divisions by the remaining primes q_t′+1, . . . , q_t. In the other case, we abort the smoothness test for x and try another candidate. Obviously, this strategy prematurely rejects some smooth candidates (which are anyway rather small in number), but since most candidates are expected to be non-smooth, it saves a lot of trial divisions in the long run. Determination of t′ and/or the quantization of “substantial” reduction actually depend on practical experiences. With suitable choices one may expect to get a speed-up of about 2. The drawback with the early abort strategy is that it often does not go well with sieving. Sieving, whenever applicable, should be given higher preference.

To sum up, the basic ICM and all its modifications can be used for computing discrete logarithms only in small fields, say, of size ≤ 80 bits. For bigger fields, we need newer ideas.

The linear sieve method

The linear sieve method (LSM) is a direct adaptation of the quadratic sieve method for factoring integers (Section 4.3.2). In the basic ICM just discussed, we try to find smooth integers from candidates that are on an average as large as O(p). The LSM, on the other hand, finds smooth ones from a pool of integers each of which is . As a result, we expect to have a higher density of smooth integers among the candidates tested in the LSM than those in the basic method. Furthermore, the LSM employs sieving techniques instead of trial divisions. All these help LSM achieve a running time of L[1], a definite improvement over the L[2] performance of the basic method.

Let and J := H² – p. Then . Let’s consider the congruence

Equation 4.7

For small integers c₁, c₂, the right side of the above congruence, henceforth denoted as

T(c₁, c₂) := J + (c₁ + c₂)H + c₁c₂,

is of the order of . If the integer T(c₁, c₂) is smooth with respect to the first t primes q₁, q₂, . . . , q_t, that is, if we have a factorization like , then we have a relation

For the linear sieve method, the factor base comprises primes less than L[1/2] (so that t = L[1/2] by the prime number theorem) and integers H + c for –M ≤ c ≤ M. The bound M on c is chosen to be of the order of L[1/2]. Each T(c₁, c₂), being in absolute value, has a probability of L[–1/2] for being q_t-smooth. Thus once we check the factorization of T(c₁, c₂) for all (that is, for a total of L[1]) values of the pair (c₁, c₂) with –M ≤ c₁ ≤ c₂ ≤ M, we expect to get L[1/2] Relations (4.7) involving the unknown indices of the factor base elements. If we further assume that the primitive element g is a small prime which itself is in the factor base, then we get a free relation ind_g g = 1. The resulting system is then solved to compute the discrete logarithms of elements in the factor base. This is the basic principle for the first stage of the LSM.

If we compute all T(c₁, c₂) and use trial divisions by q₁, . . . , q_t to separate out the smooth ones, we achieve a running time of L[1.5], as can be easily seen. Sieving is employed to reduce the running time to L[1]. First one fixes a and initializes to ln |T(c₁, c₂)| an array indexed by c₂ in the range c₁ ≤ c₂ ≤ M. One then computes for each prime power q^h, q being a small prime in the factor base and h a small positive exponent, a solution for c₂ of the congruence (H + c₁)c₂ + (J + c₁H) ≡ 0 (mod q^h).

If gcd(H + c₁, q) = 1, that is, if H + c₁ is not a multiple of q, then the solution is given by σ ≡ –(J + c₁H)(H + c₁)^–1 (mod q^h). The inverse in the last congruence can be calculated by running the extended gcd algorithm (Algorithm 3.8) on H + c₁ and q^h. Then for each value of c₂ (in the range c₁ ≤ c₂ ≤ M) that is congruent to σ (mod q^h), ln q is subtracted from the array location .

If q|(H + c₁), we find out h₁ := v_q(H + c₁) > 0 and h₂ := v_q(J + c₁H) ≥ 0. If h₁ > h₂, then for each value of c₂, the expression T(c₁, c₂) is divisible by q^h₂ and by no higher powers of q. So we subtract the quantity h₂ ln q from for all c₂. Finally, if h₁ ≤ h₂, then we subtract h₁ ln q from for all c₂ and for h > h₁ solve the congruence as .

Once the above procedure is carried out for each small prime q in the factor base and for each small exponent h, we check for which values of c₂, the value is equal (that is, sufficiently close) to 0. These are precisely the values of c₂ such that for the given c₁ the integer T(c₁, c₂) factors smoothly over the small primes in the factor base.

As in the QSM for integer factorization, it is sufficient to have some approximate representations of the logarithms (like ln q). Incomplete sieving and large prime variation can also be adopted as in the QSM.

Finally, we change c₁ and repeat the sieving process described above. It is easy to see that the sieving operations for all c₁ in the range –M ≤ c₁ ≤ M take time L[1] as announced earlier. Gaussian elimination involving sparse congruences in L[1/2] variables also meets the same running time bound.

The second stage of the LSM can be performed in L[1/2] time. Using a method similar to the second stage of the basic ICM leads to a huge running time (L[3/2]), because we have only L[1/2] small primes in the factor base. We instead do the following. We start with a random j and try to obtain a factorization of the form , where q runs over L[1/2] small primes in the factor base and u runs over medium-sized primes, that is, primes less than L[2]. One can use an integer factorization algorithm to this effect. Lenstra’s ECM is, in particular, recommended, since it can detect smooth integers fast. More specifically, about L[1/4] random values of j need to be tried, before we expect to get an integer with the desired factorization. Each attempt of factorization using the ECM takes time less than L[1/4].

Now, we have . The indices ind_g q are available from the first stage, whereas for each u (with w_u ≠ 0) the index ind_g u is calculated as follows. First we sieve in an interval of size L[1/2] around and collect integers y in this interval, which are smooth with respect to the L[1/2] primes in the factor base. A second sieve in an interval of size L[1/2] around H gives us a small integer c, such that (H + c)yu – p is smooth again with respect to the L[1/2] primes in the factor base. Since H + c is in the factor base, we get ind_g u. The reader can easily verify that computing individual logarithms ind_g a using this method takes time L[1/2] as claimed earlier.

There are some other L[1] methods (like the Gaussian integer method and the residue list sieve method) known for computing discrete logarithms in prime fields. We will not discuss these methods in this book. Interested readers may refer to Coppersmith et al. [59] to know about these L[1] methods. A faster method (running time L[0.816]), namely the cubic sieve method, is covered in Exercise 4.21. Now, we turn our attention to the best method known till date.

** The number field sieve method

The number field sieve method (NFSM) for solving the DLP in a prime field is a direct adaptation of the NFSM used to factor integers (Section 4.3.4 ). As before, we let g be a generator of and are interested in computing the index d = ind_g a for some .

We choose an irreducible polynomial with small integer coefficients and of degree d, and use the number field for some root of f. For the sake of simplicity, we consider the special case (SNFSM) that f is monic, is a PID, and . We also choose an integer m such that f(m) ≡ 0 (mod p) and define the ring homomorphism

Finally, we predetermine a bound and let be the set of (rational) primes , the set of prime ideals of of prime norms , a set of generators of the (principal) ideals and a set of generators of the group of units of .

We try to find coprime integers c, d of small absolute values such that both c + dα and Φ(c + dα) = c + dm are smooth with respect to and respectively, that is, we have factorizations of the forms and or equivalently, . But then , that is,

Equation 4.8

This motivates us to define the factor base as

We assume that so that we have the free relation ind_g g ≡ 1 (mod p – 1).

Trying sufficiently many pairs (c, d) we generate many Relations (4.8). The resulting sparse linear system is solved for the unknown indices of the elements of B. This completes the first stage of the SNFSM.

In the second stage, we bring a to the scene in the following manner. First assume that a is small such that either a is -smooth, that is,

or for some the ideal can be written as a product of prime ideals of , that is,

or, equivalently,

In both the cases, taking logarithms and substituting the indices of the elements of the factor base (available from the first stage) yields d = ind_g a.

However, a is not small, in general, and it is a non-trivial task to find a such that 〈γ〉 is -smooth. We instead write a as a product

Equation 4.9

where each a_i is small enough so that ind_g a_i can be computed using the method described above. This gives . In order to see how one can find a representation of a as a product of small integers as in Congruence (4.9), we refer the reader to Weber [300].

As in most variants of the ICM, the running time of the SNFSM is dominated by the first stage and under certain heuristic assumptions can be shown to be of the order of L(p, 1/3, (32/9)^1/3). Look at Section 4.3.4 to see how the different parameters can be set in order to achieve this running time. For the general NFS method (GNFSM), the running time is L(p, 1/3, (64/9)^1/3). The GNFSM has been implemented by Weber and Denny [301] for computing discrete logarithms modulo a particular prime having 129 decimal digits (see McCurley [189]).

4.4.4. Algorithms for Fields of Characteristic 2

We wish to compute the discrete logarithm ind_g a of an element , q = 2ⁿ, with respect to a primitive element g of . We work with the representation for some irreducible polynomial with deg f = n. For certain algorithms, we require f to be of special forms. This does not create enough difficulties, since it is easy to compute isomorphisms between two polynomial basis representations of (Exercise 3.38).

Recall that we have defined the smoothness of an integer x in terms of the magnitudes of the prime divisors of x. Now, we deal with polynomials (over ) and extend the definition of smoothness in the obvious way: that is, a polynomial is called smooth if it factors into irreducible polynomials of low degrees. The next theorem is an analog of Theorem 2.21 for polynomials. By an abuse of notation, we use ψ(·, ·) here also. The context should make it clear what we are talking about – smoothness of integers or of polynomials.

Theorem 4.1.

Let r, , r^1/100 ≤ m ≤ r^99/100, and let u := r/m. Then the number of polynomials , deg f = r, such that all irreducible factors of f have degrees ≤ m, equals 2^ru^–u+o(u) = 2^re^{–[(1+o(1))u ln u]} as u → ∞. In particular, the probability that the degrees of all irreducible factors of a randomly chosen polynomial in of degree r are ≤ m is asymptotically equal to

ψ(r, m) := u^–u+o(u) = e^{–[(1+o(1))u ln u]}.

The above expression for ψ(r, m), though valid asymptotically, gives good approximations for finite values of r and m. The condition r^1/100 ≤ m ≤ r^99/100 is met in most practical situations. The probability ψ(r, m) is a very sensitive function of u = r/m. For a fixed m, polynomials of smaller degrees have higher chances of being smooth (that is, of having all irreducible factors of degrees ≤ m).

Now, let us consider the field with q = 2ⁿ. The elements of are represented as polynomials of degrees ≤ n–1. For a given m, the probability that a randomly chosen element of has all irreducible factors of degrees ≤ m is then approximately given by , as n, m → ∞ with n^1/100 ≤ m ≤ n^99/100. We can, therefore, approximate by ψ(n, m).

For many algorithms that we will come across shortly, we have r ≈ n/α and for some positive α and β, so that and, consequently,

[View full size image]

The basic ICM

The idea of the basic ICM for is analogous to that for prime fields. Now, the factor base B comprises all irreducible polynomials of having degrees ≤ m. We choose . (As in the case of the basic ICM for prime fields, this can be shown to be the optimal choice.) By Approximation (2.5) on p 84, we then have .

In the first stage, we choose random α, 1 ≤ α ≤ q – 2, compute g^α and check if g^α is B-smooth. If so, we get a relation. For a random α, the polynomial g^α is a random polynomial of degree < n and hence has a probability of nearly of being smooth. Note that unlike integers a polynomial over can be factored in probabilistic polynomial time (though for small m it may be preferable to do trial division by elements of B). Thus checking the smoothness of a random element of can be done in (probabilistic) polynomial time, and each relation is available in expected time . Since we need (slightly more than) relations for setting up the linear system, the relation collection stage runs in expected time . A sparse system with unknowns can also be solved in time .

In the second stage, we need a single smooth polynomial of the form g^αa. If α is randomly chosen, we expect to get this relation in time . Therefore, the second stage is again faster than the first and the basic method takes a total expected running time of . Recall that the basic method for requires time L[2]. The difference arises because polynomial factorization is much easier than integer factorization.

We now explain a modification of the basic method, proposed by Blake et al. [23 ]. Let : that is, a non-zero polynomial in of degree < n. If h is randomly chosen from (as in the case of g^α or g^αa for random α), then we expect the degree of h to be close to n. Let us write h ≡ h₁/h₂ (mod f) (f being the defining polynomial) with h₁ and h₂ each having degree ≈ n/2. Then the ratio of the probability that both h₁ and h₂ are smooth to the probability that h is smooth is ψ(n/2, m)²/ψ(n, m) ≈ 2^n/m (neglecting the o( ) terms). For practical values of n and m, this ratio of probabilities can be substantially large implying that it is easier to get relations by trying to factor both h₁ and h₂ instead of trying to factor h. This is the key observation behind the modification due to Blake et al. [23]. Simple calculations show that this modification does not affect the asymptotic behaviour of the basic method, but it leads to considerable speed-up in practice.

In order to complete the description of the modification of Blake et al. [23], we mention an efficient way to write h as h₁/h₂ (mod f). Since 0 ≤ deg h < n and since f is irreducible of degree n, we must have gcd(h, f) = 1. During the iteration of the extended gcd algorithm we actually compute a sequence of polynomials u_k, v_k, x_k such that u_kh + v_kf = x_k for all k = 0, 1, 2, . . . . At the start of the algorithm we have u₀ = 1, v₀ = 0 and x₀ = h. As the algorithm proceeds, the sequence deg u_k changes non-decreasingly, whereas the sequence deg x_k changes non-increasingly and at the end of the extended gcd algorithm we have x_k = 1 and the desired Bézout relation u_kh + v_kf = 1 with deg u_k ≤ n – 1. Instead of proceeding till the end of the gcd loop, we stop at the value k = k′ for which deg x_k′ is closest to n/2. We will then usually have deg u_k′ ≈ n/2, so that taking h₁ = x_k′ and h₂ = u_k′ serves our purpose.

The concept of large prime variation is applicable for the basic ICM. Moreover, if trial divisions are used for smoothness tests, one can employ the early abort strategy. Despite all these modifications the basic variant continues to be rather slow. Our hunt for faster algorithms continues.

The adaptation of the linear sieve method

The LSM for prime fields can be readily adapted to the fields , q = 2ⁿ. Let us assume that the defining polynomial f is of the special form f(X) = Xⁿ + f₁(X), where deg f₁ is small. The total number of choices for such f with deg f₁ < k is 2^k. Under the assumption that irreducible polynomials (over ) of degree n are randomly distributed among the set of polynomials of degree n, we expect to find an irreducible polynomial f = Xⁿ + f₁ for deg f₁ = O(lg n) (see Approximation (2.5) on p 84). In particular, we may assume that deg f₁ ≤ n/2.

Let k := ⌈n/2⌉ and . For polynomials h₁, of small degrees, we then have

(X^k + h₁)(X^k + h₂) ≡ X^σf₁ + (h₁ + h₂)X^k + h₁h₂ (mod f).

The right side of the congruence, namely,

T(h₁, h₂) := X^σf₁ + (h₁ + h₂)X^k + h₁h₂,

has degree slightly larger than n/2. This motivates the following algorithm.

We take and let the factor base B be the (disjoint) union of B₁ and B₂, where B₁ contains irreducible polynomials of degrees ≤ m, and where B₂ contains polynomials of the form X^k + h, deg h ≤ m. Both B₁ and B₂ (and hence B) contain L[1/2] elements. For each X^k + h₁, , we then check the smoothness of T(h₁, h₂) over B₁. Since deg T(h₁, h₂) ≈ n/2, the probability of finding a smooth candidate per trial is L[–1/2]. Therefore, trying L[1] values of the pair (h₁, h₂) is expected to give L[1/2] relations (in L[1/2] variables). Since factoring each T(h₁, h₂) can be performed in probabilistic polynomial time, the relation collection stage takes time L[1]. Gaussian elimination (with sparse congruences) can be done in the same time. As in the case of the LSM for prime fields, the second stage can be carried out in time L[1/2]. To sum up, the LSM for fields of characteristic 2 takes L[1] running time.

Note that the running time L[1] is achievable in this case without employing any sieving techniques. This is again because checking the smoothness of each T(h₁, h₂) can simply be performed in polynomial time. Application of polynomial sieving, though unable to improve upon the L[1] running time, often speeds up the method in practice. We will describe such a sieving procedure in connection with Coppersmith’s algorithm that we describe next.

Coppersmith’s algorithm

Coppersmith’s algorithm is the fastest algorithm known to compute discrete logarithms in finite fields of characteristic 2. Theoretically it achieves the (heuristic) running time L(q, 1/3, c) and is, therefore, subexponentially faster than the L[c′] = L(q, 1/2, c′) algorithms described so far. Gordon and McCurley have made aggressive attempts to compute discrete logarithms in fields as large as using Coppersmith’s algorithm in tandem with a polynomial sieving procedure and, thereby, established the practicality of the algorithm.

In the basic method, each trial during the search for relations involves checking the smoothness of a polynomial of degree nearly n. The modification due to Blake et al. [23] replaces this by checking the smoothness of two polynomials of degree ≈ n/2. For the adaptation of the LSM, on the other hand, we check the smoothness of a single polynomial of degree ≈ n/2. In Coppersmith’s algorithm, each trial consists of checking the smoothness of two polynomials of degrees ≈ n^2/3. This is the basic reason behind the improved performance of Coppersmith’s algorithm.

To start with we make the assumption that the defining polynomial f of is of the form f(X) = Xⁿ + f₁(X) with deg f₁ = O(lg n). We have argued earlier that an irreducible polynomial f of this special form is expected to be available. We now choose three integers m, M, k such that

m ≈ αn^1/3(ln n)^2/3, M ≈ βn^1/3(ln n)^2/3 and 2^k ≈ γn^1/3(ln n)^–1/3,

where the (positive real) constants α, β and γ are to be chosen appropriately to optimize the running time. The factor base B comprises irreducible polynomials (over ) of degrees ≤ m. Let

l := ⌊n/2^k⌋ + 1,

so that l ≈ (1/γ)n^2/3(ln n)^1/3. Choose relatively prime polynomials u₁(X) and u₂(X) (in ) of degrees ≤ M and let

h₁(X) := u₁(X)X^l + u₂(X) and h₂(X) := (h₁(X))^{2^k} rem f(X).

But then, since ind_g h₂ ≡ 2^k ind_g h₁ (mod q – 1), we get a relation if both h₁ and h₂ are smooth over B. By choice, deg h₁ is clearly O^~(n^2/3), whereas

h₂(X)≡ u₁(X^{2^k})X^{l2^k} + u₂(X^{2^k})≡ u₁(X^{2^k})X^{l2^k–n}f₁(X) + u₂(X^{2^k})(mod f)

and, therefore, deg h₂ = O^~(n^2/3) too.

For each pair (u₁, u₂) of relatively prime polynomials of degrees ≤ M, we compute h₁ and h₂ as above and collect all the relations corresponding to the smooth values of both h₁ and h₂. This gives us the desired (sparse) system of linear congruences in the unknown indices of the elements of B, which is subsequently solved modulo q – 1.

The choice and γ = α^–1/2 gives the optimal running time of the first stage as

e^{[(2α ln 2)+o(1))n^1/3(ln n)^2/3]} = L(q, 1/3, 2α/(ln 2)^1/3) ≈ L(q, 1/3, 1.526).

The second stage of Coppersmith’s algorithm is somewhat involved. The factor base now contains only nearly L(q, 1/3, 0.763) elements. Therefore, finding a relation using a method similar to the second stage of the basic method requires time L(q, 2/3, c) for some c, which is much worse than even L[c′] = L(q, 1/2, c′). To work around this difficulty we start by finding a polynomial g^αa all of whose irreducible factors have degrees ≤ n^2/3(ln n)^1/3. This takes time of the order of L(q, 1/3, c₁) (where c₁ ≈ 0.377) and gives us , where v_i have degrees ≤ n^2/3(ln n)^1/3. Note that the number of v_i is less than n, since deg(g^αa) < n. We then have

All these v_i need not belong to the factor base, so we cannot simply substitute the values of ind_g v_i. We instead reduce the problem of computing each ind_g v_i to the problem of computing ind_g v_ii′ for several i′ with deg v_ii′ ≤ σ deg v_i for some constant 0 < σ < 1. Subsequently, computing each ind_g v_ii′ is reduced to computing ind_g v_ii′i″ for several i″ with deg v_ii′i″ < σ deg v_ii′. Repeating this process, we eventually end up with the polynomials in the factor base. Because reduction of a polynomial generates new polynomials with degrees reduced by at least the constant factor σ, it is clear that the recursion depth is O(ln n). Now, if for each i the number of i′ is ≤ n and for each i′ the number of i″ is ≤ n and so on, we have to carry out the reduction of ≤ n^{O(ln n)} = e^{O((ln n)²)} = L(q, 1/3, 0) polynomials. Therefore, if each reduction can be performed in time L(q, 1/3, c₂), the second stage will run in time L(q, 1/3, max(c₁, c₂)).

In order to explain how a polynomial v of degree ≤ d ≤ n^2/3(ln n)^1/3 can be reduced in the desired time, we choose such that , and let l := ⌊n/2^k⌋ + 1. As in the first stage, we fix a suitable bound M, choose relatively prime polynomials u₁(X), u₂(X) of degrees ≤ M and define

h₁(X) := u₁(X)X^l + u₂(X)

and

h₂(X) := (h₁(X))^{2^k} rem f(X) = u₁(X^{2^k})X^{l2^k–n}f₁(X) + u₂(X^{2^k}).

The polynomials u₁ and u₂ should be so chosen that v|h₁. We see that h₁ and h₂ have low degrees and we try to factor h₁/v and h₂. Once we get a factorization of the form

with deg v_i, deg w_j < σ deg v, we have the desired reduction of v, namely,

that is, the reduction of computation of ind_g v to that of all ind_g v_i and ind_g w_j. With the choice M ≈ (n^1/3(ln n)^2/3(ln 2)^–1 + deg v)/2 and σ = 0.9, reduction of each polynomial can be shown to run in time L(q, 1/3, (ln 2)^–1/3) ≈ L(q, 1/3, 1.130). Thus the second stage of Coppersmith’s algorithm runs in time L(q, 1/3, 1.130) and is faster than the first stage.

Large prime variation is a useful strategy to speed up Coppersmith’s algorithm. In case of trial divisions for smoothness tests, early abort strategy can also be applied. However, a more efficient idea (though seemingly non-collaborative with the early abort strategy) is to use polynomial sieving as introduced by Gordon and McCurley.

Recall that in the first stage we take relatively prime polynomials u₁ and u₂ of degrees ≤ M and check the smoothness of both h₁(X) = u₁(X)X^l + u₂(X) and h₂(X) = h₁(X)^{2^k} rem f(X). We now explain the (incomplete) sieving technique for filtering out the (non-)smooth values of h₁ = (h₁)_u₁,u₂ for the different values of u₁ and u₂. To start with we fix u₁ and let u₂ vary. We need an array indexed by u₂, a polynomial of degree ≤ M. Clearly, u₂ can assume 2^M+1 values and so must contain 2^M+1 elements. To be very concrete we will denote by the location , where u₂(2) ≥ 0 is the integer obtained canonically by substituting 2 for X in u₂(X) considered to be a polynomial in with coefficients 0 and 1. We initialize all the locations of to zero.

Let t = t(X) be a small irreducible polynomial in the factor base B (or a small power of such an irreducible polynomial) with δ := deg t. The values of u₂ for which t divides (h₁)_u₁,u₂ satisfy the polynomial congruence u₂(X) ≡ u₁(X)X^l (mod t). Let be the solution of this congruence with . If δ^* > M, then no value of u₂ corresponds to smooth (h₁)_u₁,u₂. So assume that δ^* ≤ M. If δ > M, then the only value of u₂ for which (h₁)_u₁,u₂ is smooth is . So we may also assume that δ ≤ M. Then the values of u₂ that makes (h₁)_u₁,u₂ smooth are given by for all polynomials v(X) of degrees ≤ M – δ. For each of these 2^M–δ+1 values of u₂, we add δ = deg t to the location .

When the process mentioned in the last paragraph is completed for all , we find out for which values of u₂ the array locations contain values close to deg(h₁)_u₁,u₂. These values of u₂ correspond to the smooth values of (h₁)_u₁,u₂ for the chosen u₁. Finally, we vary u₁ and repeat the sieving procedure again.

In each sieving process described above, we have to find out all the values as v runs through all polynomials of degrees ≤ M – δ. We may choose the different possibilities for v in any sequence, compute the products vt and then add these products to . While doing so serves our purpose, it is not very efficient, because computing each u₂ involves performing a polynomial multiplication vt. Gordon and McCurley’s trick steps through all the possibilities of v in a clever sequence that helps one get each value of u₂ from the previous one by a much reduced effort (compared to polynomial multiplication). The 2^M–δ+1 choices of v can be naturally mapped to the bit strings of length (exactly) M – δ + 1 (with the coefficients of lower powers of X appearing later in the sequence). This motivates using the following concept.

Definition 4.2.

Let . Then the (binary) gray code of dimension d is a sequence of all (that is, 2^d) bit strings of length d defined inductively as follows. For d = 1, we define and , whereas for d > 1 we define

where juxtaposition denotes string concatenation.

For example, the gray code of dimension 2 is 00, 01, 11, 10 and that of dimension 3 is 000, 001, 011, 010, 110, 111, 101, 100. Proposition 4.1 can be easily proved by induction on the dimension d.

Proposition 4.1.

Let and let be the gray code of dimension d. For any i, 1 ≤ i < 2^d, the bit strings and differ in exactly one bit position b(i). This position is given by b(i) = v₂(i), where v₂(i) denotes the multiplicity of 2 in i.

Back to our sieving business! Let us agree to step through the values of v in the sequence v₁, v₂, . . . , v_{2^{M – δ+1}}, where v_i corresponds to the bit string for the (M – δ + 1)-dimensional gray code. Let us also call the corresponding values of u₂ as . Now, v₁ is 0 and the corresponding is available at the beginning. By Proposition 4.1 we have for 1 ≤ i < 2^M–δ+1 the equality v_i+1 = v_i + X^v₂(i), so that (u₂)_i+1 = (u₂)_i + X^v₂(i)t. Computing the product X^v₂(i)t involves shifting the coefficients of t and is done efficiently using bit operations only (assuming data structures introduced in Section 3.5). Thus (u₂)_i+1 is obtained from (u₂)_i by a shift followed by a polynomial addition. This is much faster than computing (u₂)_i+1 directly as .

We mentioned earlier that efficient implementations of Coppersmith’s algorithm allows one to compute, in feasible time, discrete logarithms in fields as large as . However, for much larger fields, say for n ≥ 1024, this algorithm is still not a practical breakthrough. The intractability of the DLP continues to remain cryptographically exploitable.

Exercise Set 4.4

4.15	Binary search Let ≤ be a total order on a set S (finite or infinite) and let a₁ ≤ a₂ ≤ ··· ≤ a_m be a given sequence of elements of S. Device an algorithm that, given an arbitrary element , determines using only O(lg m) comparisons in S whether a = a_i for some i = 1, . . . , m and, if so, returns i. [H]
4.16	Show that any map can be represented uniquely as a polynomial of degree < q. [H] The set S of all maps is a ring under point-wise addition and multiplication. Prove the ring isomorphism .
4.17	Let p be a prime and g a primitive element of . For a , prove the explicit formula (mod p). What is the problem in using this formula for computing indices in ?
4.18	In the basic ICM for the prime field , we try to factor random powers g^α over the factor base B = {q₁, . . . , q_t}. In addition to the canonical representative of g^α in the set {1, . . . , p – 1}, one can also check for the smoothness of the integers g^α + kp for –M ≤ k ≤ M, where M is a small positive integer (to be determined experimentally). Let ρ_k,i := (g^α + kp) rem q_i for i = 1, . . . , t and for –M ≤ k ≤ M. How can one compute these remainders ρ_k,i efficiently? Device an algorithm that checks the smoothness of all g^α + kp using the values ρ_k,i. [H] Device an algorithm that uses a sieve over the interval –M ≤ k ≤ M. Explain how the above two strategies can be modified to work for the field .
4.19	Show that for the LSM over the average and the maximum T_max of \|T(c₁, c₂)\| over all values of c₁, c₂ (that is, for –M ≤ c₁ ≤ c₂ ≤ M) are approximately HM and 2HM, respectively. [H] For real 0 ≤ η ≤ 1, let , \|T(c₁, c₂)\| ≤ ηT_max} and let . Show that t(η) ≈ η(2 – η). (This shows that the distribution of T(c₁, c₂) is not really random.)
4.20	Consider the following modification of the LSM for . Define for the integers and . Choose a small and repeat the linear sieve method for each r, 1 ≤ r ≤ s, that is, check the smoothness (over the first t = L[1] primes) of the integers T_r(c₁, c₂) := J_r + (c₁ + c₂)H_r + c₁c₂ for all 1 ≤ r ≤ s, –μ ≤ c₁ ≤ c₂ ≤ μ. Let be the average of \|T_r(c₁, c₂)\| over all choices of r, c₁ and c₂. Show that , where is as defined in Exercise 4.19. In particular, for both the choices: (1) and (2) μ = ⌊M/s⌋, that is, on an average we check smaller integers for smoothness under this modified strategy. Determine the size of the factor base and the total number of integers T_r(c₁, c₂) checked for smoothness for the two values of μ given above.
4.21	Cubic sieve method (CSM) for Let the integers x, y, z satisfy x³ ≡ y²z (mod p) with x³ ≠ y²z. Assume that each of x, y, z is O(p^ξ). Show that for integers a, b, c with a + b + c = 0 one has (x + ay)(x + by)(x + cy) ≡ y²T(a, b, c) (mod p), where T(a, b, c) := z + (ab + ac + bc)x + (abc)y = –b(b + c)(x + cy) + (z – c²x). Since x, y, z are O(p^ξ), we have T(a, b, c) = O(p^ξ) for small values of a, b, c. For the CSM, the factor base B comprises all primes q₁, . . . , q_t with together with the integers x + ay, –M ≤ a ≤ M, . If T(a, b, c) factors completely over q₁, . . . , q_t, we get a relation. Show that if we check the smoothness of T(a, b, c) for all –M ≤ a ≤ b ≤ c ≤ M with a + b + c = 0, we expect to get enough relations to compute the discrete logarithms of elements of B. In order to carry out sieving, fix c and let b vary. Specify the details of the sieving process. [H] Specify an algorithm for the second stage of the CSM. [H] Show that the expected running time of the CSM is . Therefore, if ξ < 1/2, the CSM is asymptotically faster than the LSM method, since the LSM runs in time L[1]. The best possible value ξ = 1/3 corresponds to a running time of the CSM.
4.22	The problem with the CSM is that it is not known how to efficiently compute a solution of the congruence Equation 4.10 subject to the condition that x³ ≠ y²z and x, y, z = O(p^ξ) for 1/3 ≤ ξ < 1/2. In this exercise, we estimate the number of solutions of Congruence (4.10). Show that the total number of solutions of Congruence (4.10) modulo p with x, y, is (p – 1)² which is Θ(p²). Show that the total number of solutions of Congruence (4.10) modulo p with x, y, and x³ ≠ y²z is also Θ(p²). Under the heuristic assumption that the solutions (x, y, z) of Congruence (4.10) are randomly distributed in , deduce that the expected number of solutions of Congruence (4.10) modulo p with x, y, , x³ ≠ y²z, and 1 ≤ x, y, z ≤ p^ξ, 1/3 ≤ ξ ≤ 1, is nearly p^3ξ–1. (Therefore, if ξ is slightly larger than 1/3, we expect to get a solution. It is not known how to compute such a solution in polynomial (or even subexponential) time. However, for certain values of p a solution is naturally available, for example, if p (or a small multiple of p) is close to an integer cube.)
4.23	Adaptation of CSM for Let be represented as , where the defining polynomial f is of the form f(X) = Xⁿ + f₁(X) with deg f₁ ≤ n/3. Let k := ⌈n/3⌉. Show that for polynomials h₁, of small degrees (X^k + h₁(X))(X^k + h₂(X))(X^k + h₁(X) + h₂(X)) rem f(X) is of degree slightly larger than n/3. Device an ICM for solving the DLP in based on this observation. What is the best running time of this method? [H]

*4.5. The Elliptic Curve Discrete Logarithm Problem (ECDLP)

Unlike the finite field DLP, there are no general-purpose subexponential algorithms to solve the ECDLP. Though good algorithms are known for certain specific types of elliptic curves, all known algorithms that apply to general curves take fully exponential time. The square root methods of Section 4.4 are the fastest known methods for solving the ECDLP over an arbitrary curve. As a result, elliptic curves are gaining popularity for building cryptosystems. The absence of subexponential algorithms implies that smaller fields can be chosen compared to those needed for cryptosystems based on the (finite field) DLP. This, in particular, results in smaller sizes of keys.

We start with Menezes, Okamoto and Vanstone’s (MOV) algorithm that reduces the ECDLP in a curve over to the DLP over the field for some suitable . Since, the DLP can be solved in subexponential time, the ECDLP is also solved in that time, provided that the extension degree is small. For supersingular curves, one can choose k ≤ 6. For non-supersingular curves, this k is quite large, in general, and the MOV reduction takes exponential time.

A linear-time algorithm is known to solve the ECDLP over anomalous curves (that is, curves with trace of Frobenius equal to 1). This algorithm is called the SmartASS method after its inventors Smart, Araki, Satoh and Semaev [257, 265, 282].

J. H. Silverman [277] has proposed an algorithm known as the xedni calculus method for solving the ECDLP over an arbitrary curve. Rigorous running times of this algorithm are not known, however heuristic analysis and experiments suggest that this algorithm is not really practical.

Let E be an elliptic curve over a finite field and let be of order m. We want to compute ind_P Q (if it exists) for a point . Unless it is necessary, we will not assume any specific defining equation for E or a specific value of q.

**4.5.1. The MOV Reduction

Let us first look at the structure of the group of m-torsion points on an elliptic curve defined over K. Here is the algebraic closure of K.

Theorem 4.2.

Let K be a field of characteristic , and E an elliptic curve defined over K. We consider two separate cases:^[5]

^[5] For the MOV reduction, only the first case is important.

If p = 0 or if p > 0 does not divide m, then . In particular, in this case.
If p > 0, then either for all or for all .

Now, let E be an elliptic curve defined over a finite field K of characteristic p. Let with gcd(m, p) = 1. We use the shorthand notation E[m] for (and not for E_K[m]). We want to define a function

e_m : E[m] × E[m] → μ_m,

where is the group of m-th roots of unity (Exercise 4.24). This function e_m, known as the Weil pairing, helps us reduce the ECDLP in to the DLP in a suitable field . Let P, . The definition of e_m(P, R) calls for using divisors on E. Recall from Exercise 2.125 that a divisor belongs to (that is, is the divisor of a rational function on E) if and only if and . Since , there is a rational function such that . Now, as well and p  m². Hence, by Theorem 4.2 there exists a point R′ of order m² such that R = mR′. Since, #E[m] = m², it follows that and, therefore, there exists a rational function with . The functions f and g as introduced above are unique up to multiplication by elements of . One can show that we can choose f and g in such a manner that f ο λ_m = g^m, where is the multiplication map Q ↦ mQ. Then for and we have g^m(P + U) = f(mP + mU) = f(mU) = g^m(U). Since g has only finitely many poles and zeros (whereas is infinite), we can choose U such that both g(U) and g(P + U) are defined and non-zero. For such a point U, we then have and define

e_m(P, R) := g(P + U)/g(U).

The right side can be shown to be independent of the choice of U. The relevant properties of the Weil pairing e_m are now listed.

Proposition 4.2.

Let P, P′, R, and a, Then we have:

Identity	e_m(P, P)	= 1.
Alternation	e_m(P, R)	= e_m(R, P)^–1.
Bilinearity	e_m(P + P′, R)	= e_m(P, R)e_m(P′, R),
	e_m(P, R + R′)	= e_m(P, R)e_m(P, R′),
	e_m(aP, bR)	= (e_m(P, R))^ab.
Non-degeneracy	e_m(P, )	= 1.
	If e_m(P, T) = 1 for all , then .

The above definition of e_m is not computationally effective. We will see later how we can compute e_m(P, T) in probabilistic polynomial time using an alternative (but equivalent) definition.

Algorithm 4.7 shows how the MOV reduction algorithm makes use of Weil pairing. We now clarify the subtle details of this algorithm.

Algorithm 4.7. MOV reduction

Input: A point of order m, gcd(m, q) = 1, and a multiple Q of P.

Output: The index ind_P Q, that is, an integer l with Q = lP.

Steps:

Choose the smallest  such that .
while (1) {
   Choose a random point .
   α := e_m(P, R),   β := e_m(Q, R).  /* α,
   l := ind_α β.   /* Discrete logarithm in  */
   if (Q = lP) { Return l. }
}

The correctness of the algorithm

From the bilinearity of the Weil pairing, it follows that if Q = lP, 0 ≤ l < m, then β = e_m(Q, R) = e_m(lP, R) = e_m(P, R)^l = α^l. Thus treating ind_α β as the least nonnegative integer modulo ord α we conclude that l = ind_α β if and only if ord α = m, that is, α is a primitive m-th root of unity. That α is an m-th root of unity for any is obvious from the definition of e_m. We now show that there exists some for which α = e_m(P, R) is primitive.

Lemma 4.1.

Let be of order m (so that P generates the subgroup 〈P〉 of order m in E[m]). Then for any R₁, the cosets R₁ + 〈P〉 and R₂ + 〈P〉 are equal if and only if e_m(P, R₁) = e_m(P, R₂).

Proof

If R₁ + 〈P〉 = R₂ + 〈P〉, then R₁ = R₂ + rP for some integer r and so by bilinearity and identity of Weil pairing e_m(P, R₁) = e_m(P, R₂)e_m(P, P)^r = e_m(P, R₂).

Conversely, let e_m(P, R₁) = e_m(P, R₂). By Theorem 4.2, is generated by two elements of order m. We can take one of these elements to be P, let P′ be the other element and write R₁ – R₂ = aP + a′P′ for some a, . Then e_m(P, R₁) = e_m(P, R₂ + aP + a′P′) = e_m(P, R₂)e_m(P, P)^ae_m(P, a′P′), whence it follows that e_m(P, a′P′) = 1. Finally, for an arbitrary , b, , we have e_m(a′P′, T) = e_m(a′P′, bP + b′P′) = e_m(a′P′, P)^be_m(P′, P′)^a′b′ = e_m(P, a′P′)^–b = 1. By the non-degeneracy property of e_m, it then follows that , that is, .

As an immediate corollary to Lemma 4.1, the desired result follows.

Proposition 4.3.

Let be of order m and let

Then #S/#E[m] = φ(m)/m. In particular, S is non-empty.

Proof

There are m distinct cosets of 〈P〉 in E[m]. Now, as R ranges over all points of E[m], the coset R+〈P〉 ranges over all of these m possibilities and, accordingly by Lemma 4.1 the value e_m(P, R) ranges over m distinct values. Since μ_m is cyclic of order m and hence with φ(m) generators, the theorem follows.

By Theorem 3.1 , one should try an expected number of O(ln ln m) random points before a primitive m-th root α = e_m(P, R) is found.

Choosing k

Since E[m] consists of finitely many (m²) points, it is obvious that there exist finite values of k such that . It can also be shown that if , then that is, for all P, . The computation of the discrete logarithm ind_α β is then carried out in . For Algorithm 4.7 to be efficient, one requires k to be rather small. However, for most curves, k is rather large implying that the MOV reduction is impractical for these curves. For the specific class of curves, the so-called supersingular curves, one can choose k to be rather small, namely k ≤ 6. We don’t go to the details of the choices for k for various cases of supersingular curves, but refer the reader to Menezes [192].

Computing e_m(P, R)

We start with an alternative definition of the Weil pairing for P, . First note that if is a divisor and if is a rational function on E such that for every pole or zero T of f one has m_T = 0 (that is, such that Div(f) and T have disjoint supports), then one can define

Choose points U, (where ) and consider the divisors D_P := [P + U] – [U] and D_R := [R + V] – [V]. Since ) is infinite, one can choose both P + U and U distinct from R + V and V. Since P, , it follows that mD_P and mD_R are principal, namely, there are rational functions f_P and f_R such that Div(f_P) = mD_P = m[P + U] – m[U] and Div(f_R) = mD_R = m[R + V] – m[V]. One can show that

Equation 4.11

independent of the choice of U and V as long as f_P (D_R) and f_R(D_P) are defined. Therefore, e_m(P, R) can be computed efficiently, if f_P and f_R can be computed efficiently. To this effect we now describe an algorithm for computing the rational function f of a principal divisor , where . Since deg , we can write . Suppose that we have an Algorithm A that, for a pair of reduced divisors

and

computes the sum (a reduced divisor)

Then, f can be computed by repeated application of Algorithm A as follows.

Compute for each i = 1, . . . , r the reduced divisor . Let 1 = a_i1, a_i2, . . . , a_{it_i} = |m_i| be an addition chain for |m_i| (Exercise 3.18). Clearly, t_i – 1 applications of Algorithm A computes Δ_i. Since we can choose t_i ≤ 2 ⌈lg |m_i|⌉, each Δ_i can be computed using O(log |m_i|) applications of Algorithm A.
Compute f by computing D = Div(f) = Δ₁ + ··· + Δ_r. This can be done by applying Algorithm A a total of r – 1 times.

What remains is the description of Algorithm A that computes P₃ and f₃ from a knowledge of P₁, P₂, f₁ and f₂. Clearly, if , then we have P₃ = P₂ and f₃ = f₁f₂. Similar is the case for . So assume and . Let l₁ be the line passing through P₁ and P₂ and P′ := –(P₁ + P₂). First, assume that . By Exercise 2.125, we have . Let l₂ be the (vertical) line passing through P′ and –P′. Again by Exercise 2.125, we have . But then , that is, we take P₃ = –P′ = P₁+P₂ and f₃ = f₁f₂l₁/l₂. Finally, if , then and, therefore, . Thus, in this case too, we take and f₃ = f₁f₂l₁/l₂ with l₂ := 1.

Before we finish the description of the MOV reduction, some comments are in order. First note that if f₁, and P₁, , then both l₁ and l₂ are in K(E) and the computation of f₃ and P₃ can be carried out by working in K only.

Second, consider the (general) case . Since , the rational function f₃ has poles and is, therefore, undefined only at the points P₃ and . f₃ is certainly defined at –P₃, but l₂(–P₃) = 0 and, therefore, evaluating f₃(–P₃) as (f₁f₂l₁)(–P₃)/l₂(–P₃) fails. Of course, there is a rational function g such that both f₁f₂l₁g and l₂g are defined and non-zero at –P₃, but finding such a rational function is an added headache. So we choose to continue to have the representation f₃ = f₁f₂l₁/l₂ and agree not to evaluate f₃ at –P₃. Recall from Equation (4.11) that we want to evaluate f_P at D_R (that is, at R + V and V) and also f_R and D_P (that is, at P + U and U). Let us assume that we use the addition chain 1 = a₁, a₂, . . . , a_t = m for m. This means that we cannot evaluate f_P at the points ±a_i(P + U) and ±a_iU for all i = 1, . . . , t. Therefore, V should be chosen such that both R + V and V are not one of these points. Similar constraints dictate the choice of U. However, if m is sufficiently large (m ≥ 1024) and if we choose an addition chain of length t ≤ 2 ⌈lg m⌉, then it can be easily seen that for a random choice of (U, V) the evaluation of f_P (D_R) or f_R(D_P) fails with a probability of no more than 1/2. Therefore, few random choices of (U, V) are expected to make the algorithm work. This is the only place where a probabilistic behaviour of the algorithm creeps in. In practice, however, this is not a serious problem, since we have much larger values of m (than 1024) and accordingly the above probability of failure becomes negligibly small.

Finally, note that if we multiply the factors f₁, f₂ and l₁ in the numerator, then the coefficients of the numerator grow very rapidly, when the algorithm is applied repeatedly. Thus we prefer to keep the numerator in the factored form. The same applies to the denominator as well.

**4.5.2. The SmartASS Method

The SmartASS method, named after its inventors Smart [282], Satoh and Araki [257] and Semaev [265], is also called the anomalous attack to solve the ECDLP, since it is applicable to anomalous elliptic curves. Let be a finite field of odd prime cardinality p and E an elliptic curve defined over . We assume that E is anomalous: that is, the trace of Frobenius of E at p is 1; that is, . Since p is prime, the group is cyclic and, in particular, isomorphic to the additive group (, +). This isomorphism is effectively exploited by the SmartASS method to give a polynomial time algorithm for computing ECDLP in the group .

Before proceeding further we introduce some auxiliary results. Recall (Exercise 2.133) that a local PID is called a discrete valuation ring (DVR). Now, we see an equivalent definition of a DVR, that gives a justification to its name.

Definition 4.3.

A discrete valuation on a field K is a surjective group homomorphism

such that for every a, we have v(a + b) ≥ min(v(a), v(b)). We extend the definition of v to a map by setting v(0) = +∞. The set

is a ring called the valuation ring of v.

A DVR can be characterized as follows:

Proposition 4.4.

Let R be an integral domain and let K := Q(R) be the field of fractions of R. Then R is a DVR if and only if there exists a discrete valuation of K such that R is the valuation ring of v.

Proof

[if] By definition, . We have v(1) = v(1 · 1) = v(1) + v(1), so that v(1) = 0. If ab = 1 for some a, , then 0 = v(1) = v(ab) = v(a) + v(b). Since v(a), v(b) ≥ 0, it follows that v(a) = v(b) = 0. Conversely, let v(a) = 0 for some , a ≠ 0. Now, and we have 0 = v(1) = v(aa^–1) = v(a) + v(a^–1) = v(a^–1): that is, . We conclude that is a unit if and only if v(a) = 0. Any proper ideal of R consists only of non-units and hence is contained in the set which is easily seen to be an ideal of R. Thus R is a local domain with maximal ideal .

Let and define . Clearly, each is an ideal of R. For an arbitrary non-zero ideal of R, consider . If i = 0, then contains a unit, that is, . So assume i > 0. Clearly, . Conversely, let , so that v(a) ≥ i. Choose with v(b) = i. But then i ≤ v(a) = v(ab^–1) + v(b) = v(ab^–1) + i: that is, v(ab^–1) ≥ 0; that is, ; that is, . Thus, . In other words, , , are the only non-zero ideals of R. These ideals form the (infinite) descending chain .

By definition, is surjective. Let be such that v(x) = 1. The principal ideal 〈x〉 is not the unit ideal, satisfies and hence equals . One can likewise show that for all . Thus R is a PID. [only if] See Exercise 2.133.

Recall that the ring of p-adic integers (Definition 2.111) is a DVR. The field of fractions of is called the field of p-adic numbers. We now explicitly describe a valuation v on of which is the valuation ring. Let the p-adic expansion (Exercises 2.144 and 2.145) of a p-adic integer α be

Equation 4.12

A rational integer can be naturally viewed as a p-adic integer with finitely many nonzero terms, that is, one for which k_i = 0 except for finitely many . However, a p-adic integer with infinitely many non-zero k_i does not correspond to a rational integer. If in Expansion (4.12) we have k₁ = k₂ = ··· = k_r–1 = 0, we can write

α = p^r(k_r + k_r+1p + k_r+2p² + ···).

A p-adic integer is, in general, an infinite series and a representation with finite precision looks like

k₀ + k₁p + k₂p² + ··· + k_sp^s + O(p^s+1).

Arithmetic on p-adic numbers is done like integers written in base p, but from left to right. Thus, for example, if one wants to add two p-adic integers k₀ + k₁p + k₂p² + ... and , one may add the base-p integers ... k₂k₁k₀ and in the usual manner till the desired level of precision. A p-adic integer α = k₀ + k₁p k₂p² + ··· is invertible (in ) if and only if k₀ ≠ 0 (Proposition 2.52).

An element also has a p-adic expansion, but in this case one has to allow terms involving a finite number of negative exponents of p. That is to say, we have an expansion of the form

β = k_–tp^–t + k_–t+1p^–t+1 + ··· + k_–1p^–1 + k₀ + k₁p + k₂p² + ···

β = p^–t(k_–t + k_–t+1p + ··· + k_–1p^t–1 + k₀p^t + k₁p^t+1 + k₂p^t+2 + ···).

Of course, if k_–t = k_–t+1 = ··· = k_–1 = 0, then β is already in .

From the arguments above, it follows that any non-zero can be written uniquely as γ = p^δ(γ₀ + γ₁p + γ₂p² + ···) with and γ₀ ≠ 0. We then set v(γ) := δ. It is easy to see that v defines a discrete valuation on of which is the valuation ring. Moreover, since γ₀ + γ₁p + γ₂p² + ··· is a unit in , p = 0 + 1 · p + 0 · p² + ··· plays the role of a uniformizer of the DVR . As usual, we write v(0) = +∞.

Now, back to our ECDLP business. Let E be an elliptic curve defined over . Here we consider the case that E is anomalous. We can naturally think of E as a curve over the field as well and denote this curve by ε. The coordinate-wise application of the canonical surjection induces the reduction homomorphism . Now, we define the following subgroups of :

It can be shown that is a subgroup of and is a subgroup of . Furthermore, since E is anomalous, we have

Now, let and Q a point in the subgroup of generated by P. Our purpose is to find an integer l such that Q = lP. Let , be such that and . It is not difficult to find such points and . For example, if P = (a, b), we can take , where b₀ = b and b₁, b₂, . . . are successively obtained by Hensel lifting.

Since , the point and, therefore, . Now, if we take the so-called p-adic elliptic logarithm ψ_p on both sides, we get (mod p²), whence it follows that

provided that is invertible modulo p. The function ψ_p can be easily calculated. Therefore, this gives a very efficient probabilistic algorithm for computing discrete logarithms over anomalous elliptic curves. Here the most time-consuming step is the linear-time computation of the points p and p. For further details on the algorithm (like the computation of and from P and Q, and the definition of p-adic elliptic logarithms), see Blake et al. [24] and Silverman [275].

**4.5.3. The Xedni Calculus Method

Joseph Silverman’s xedni calculus method (XCM) is a recent algorithm for solving the ECDLP in an arbitrary elliptic curve over a finite field. The algorithm is based on some deep mathematical conjectures and heuristic ideas. However, its performance has been experimentally established to be poor. Here we give a sketchy description of the XCM. For simplicity, we concentrate on elliptic curves over prime fields only.

The basic idea of the XCM is to lift an elliptic curve E over to a curve ε over . In view of this, we start with a couple of important results regarding elliptic curves over (or, more generally, over a number field). See Silverman [275], for example, for the proofs.

Let ε be an elliptic curve defined over a number field K.

Theorem 4.3. Mordell–Weil theorem

The group ε(K) is finitely generated.

The group structure of ε(K) is made explicit by the next theorem. Note that the elements of ε(K) of finite order form a subgroup ε_tors(K) of ε(K), called the torsion subgroup of ε(K) (Exercise 4.26).

Theorem 4.4.

for some .

The non-negative integer ρ of Theorem 4.4 is called the rank of ε(K).

Now, let E be an elliptic curve defined over a prime field , and Q a multiple of P. Our task is to compute an integer such that Q = lP. We assume that E is defined by a suitable Weierstrass equation. We consider the projective coordinates of points on . Let n denote the cardinality of .

The basic idea of the XCM is to select r points , compute an elliptic curve ε defined over and points such that modulo p the curve ε reduces to E and the points S₁, . . . , S_r to R_p,1, . . . , R_p,r. If the rank of ε is small, then the points S₂, . . . , S_r are expected to be linearly dependent. Computing a non-trivial linear dependency among S₂, . . . , S_r gives a linear dependency among R_p,1, . . . , R_p,r, which in turn yields ind_P Q with high probability. The details are now explained. For r points L_i := [h_i, k_i, l_i], i = 1, . . . , r, we use the notation:

[View full size image]

We start by fixing an integer r, 4 ≤ r ≤ 9. We then choose r random pairs (s_i, t_i) of integers and compute the points

We now apply a change of coordinates of the form

Equation 4.13

so that the first four of the points R_p,i become R_p,1 = [1, 0, 0], R_p,2 = [0, 1, 0], R_p,3 = [0, 0, 1] and R_p,4 = [1, 1, 1]. This change of coordinates fails if some three of the four points R_p,1, R_p,2, R_p,3 and R_p,4 sum to . But in that case the desired index ind_P Q can be computed with high probability. If, for example, , then we have (s₁ + s₂ + s₃)P = (t₁ + t₂ + t₃)Q and, therefore, if gcd(t₁ + t₂ + t₃, n) = 1, then ind_P Q ≡ (t₁ + t₂ + t₃)^–1(s₁ + s₂ + s₃) (mod n). On the other hand, if gcd(t₁ + t₂ + t₃, n) ≠ 1, we repeat with a different set of pairs (s_i, t_i).

Henceforth, we assume that the change of coordinates, as given in Equation (4.13), is successful. This transforms the equation for E to a general cubic equation:

C_p : u_p,1X³ + u_p,2X²Y + u_p,3XY² + u_p,4Y³ + u_p,5X²Z + u_p,6XY Z + u_p,7Y²Z + u_p,8XZ² + u_p,9Y Z² + u_p,10Z³ = 0.

Now, we carry out a step that heuristically ensures that the curve ε over (that we are going to construct) has a small rank. We choose a product M of small primes with p M, a cubic curve

C_M : u_M,1X³ + u_M,2X²Y + u_M,3XY² + u_M,4Y³ + u_M,5X²Z + u_M,6XYZ + u_M,7Y²Z + u_M,8XZ² + u_M,9Y Z² + u_M,10Z³ ≡ 0 (mod M)

over and points R_M,1, . . . , R_M,r on C_M and with coordinates in . The first four points should be R_M,1 = [1, 0, 0], R_M,2 = [0, 1, 0], R_M,3 = [0, 0, 1] and R_M,4 = [1, 1, 1]. We have to ensure also that for every prime divisor q of M, the matrix B(R_M,1, . . . , R_M,r) has maximal rank modulo q. In practice, it is easier to choose the points R_M,1, . . . , R_M,r first and then compute a curve C_M passing through these points by solving a set of linear equations in the coefficients u_M,1, . . . , u_M,10 of C_M. The curve C_M should be so chosen that it has the minimum possible number of solutions modulo M. This, in conjunction with some deep conjectures in the theory of elliptic curves, guarantees that the curve ε that we will construct shortly will have a rank less than the expected value.

We now combine the curves C_p and C_M as follows. Using the Chinese remainder theorem, we compute integers such that (mod p) and (mod M) for each i = 1, . . . , 10. Similarly, we compute points R₁, . . . , R_r with integer coefficients such that R_i ≡ R_p,i (mod p) and R_i ≡ R_M,i (mod M) for each i = 1, . . . , r, where congruence of points stands for coordinate-wise congruence. Here we have R₁ = [1, 0, 0], R₂ = [0, 1, 0], R₃ = [0, 0, 1] and R₄ = [1, 1, 1].

Clearly, the points R₁, . . . , R_r are lifts of the points R_p,1, . . . , R_p,r respectively, whereas the cubic curve

over is a lift of E. However, , treated as a curve over , need not pass through the points R₁, . . . , R_r. In order to ensure this last condition, we modify the coefficients of to the (small integer) coefficients u₁, . . . , u₁₀ by solving the system of linear equations

subject to the condition that (mod pM) for each i = 1, . . . , 10. The resulting cubic curve

C : u₁X³ + u₂X²Y + u₃XY² + u₄Y³ + u₅X²Z + u₆XYZ + u₇Y²Z + u₈XZ² + u₉Y Z² + u₁₀Z³ = 0

over evidently continues to be a lift of E.

Now, we apply a change of coordinates in order to transfer to the standard Weierstrass equation

ε : Y² + a₁XY + a₃Y = X³ + a₂X² + a₄X + a₆

with integer coefficients a_i. This transformation changes the points R₁, . . . , R_r to the points S₁, . . . , S_r. One should also ensure that .

Finally, we check if S₂, . . . , S_r are linearly dependent. If so, we determine a (non-trivial) relation with . This corresponds to the relation , where n₁ := –(n₂ + ··· + n_r), that is, sP = tQ with s := n₁s₁ + ··· + n_rs_r and t := n₁t₁ + ··· + n_rt_r. If gcd(t, n) = 1, we have ind_P Q ≡ t^–1s (mod n).

On the other hand, if S₂, . . . , S_r are linearly independent or if gcd(t, n) > 1, then the lifted data fail to compute ind_P Q. In that case, we repeat the entire process by selecting new pairs (s_i, t_i) and/or new points R_M,1, . . . , R_M,r.

This completes our description of the XCM. See Silverman [277] for further details. No rigorous or heuristic analysis of the running time of the XCM is available in the literature. Practical experience (reported in Jacobson et al. [139]) shows that the algorithm is rather impractical. The predominant cause for failure of a trial of the XCM is that the probability that the points S₂, . . . , S_r are linearly dependent is amazingly low. Suitable choices of the curve C_M help us to construct curves ε of low rank, but not low enough, in general, to render S₂, . . . , S_r linearly dependent. Larger values of r are expected to increase the probability of success in each trial, but it is not clear how to handle the values r > 9. Nevertheless, the XCM is a radically new idea to solve the ECDLP. As Joseph Silverman [277] says, “some of the ideas may prove useful in future work on ECDLP”.

Exercise Set 4.5

4.24

Let K be a field,

and

. Elements of μ_m are called the m-th roots of unity. Prove the following assertions.

μ_m is a subgroup of (, ·).
If char K = 0, then #μ_m = m. [H]
If p := char K > 0, then #μ_m = m/p^vp(m). [H]
μ_m is cyclic. [H]
The set is a subgroup of .

4.25

We use the notations of the last exercise and assume that #μ_m = m, that is, either char K = 0 or p := char K > 0 is coprime to m. In this case, a generator of μ_m is called a primitive m-th root of unity. If

is a primitive m-th root of unity and ω^r = 1 for some

, then evidently m|r. In particular, m is the smallest of the exponents

such that ω^r = 1. The (monic) polynomial

where the product runs over all primitive m-th roots of unity, is called the m-th cyclotomic polynomial (over K). Clearly, deg Φ_m(X) = φ(m) (where φ is Euler’s totient function).

Show that . [H] Use the Möbius inversion formula to deduce that , where μ is the Möbius function. Conclude that .
If m is a prime, show that Φ_m(X) = X^m–1 + ··· + X + 1.
Let m ≠ 1 be odd and char K ≠ 2. Show that Φ_2m(X) = Φ_m(–X). [H]
Show that if , l is the (multiplicative) order of q modulo m and if ω is a primitive m-th root of unity, then [K(ω) : K] = l. [H] In particular, Φ_m is a product of φ(m)/l (distinct) irreducible polynomials each of degree l.

4.26

Let G be an (additive) Abelian group (not necessarily finite). Show that the subset

is a subgroup of G. G_tors is called the torsion subgroup of G and the elements of G_tors are called torsion elements of G. An element is a torsion element of G if and only if a is of finite order.
Let ε be an elliptic curve defined over a number field K. Show that the torsion subgroup ε_tors(K) of ε(K) is finite. [H]
Let ε and K be as in Part (b). Show that is not finite. [H]

4.7. Solving Large Sparse Linear Systems over Finite Rings

So far we have seen many algorithms which require solving large systems of linear equations (or congruences). The number n of unknowns in such systems can be as large as several millions. Standard Gaussian elimination on such a system takes time O(n³) and space O(n²). There are asymptotically faster algorithms like Strassen’s method [292] that takes time O(n^2.807) and Coppersmith and Winograd’s method [60] having a running time of O(n^2.376). Unfortunately, these asymptotic estimates do not show up in the range of practical interest. Moreover, the space requirements of these asymptotically faster methods are prohibitively high (though still O(n²)).

Luckily enough, cryptanalytic algorithms usually deal with coefficient matrices that are sparse: that is, that have only a small number of non-zero entries in each row. For example, consider the system of linear congruences available from the relation collection stage of an ICM for solving the DLP over a finite field . The factor base consists of a subexponential (in lg q) number of elements, whereas each relation involves at most O(lg q) non-zero coefficients. Furthermore, the sparsity of the resulting matrix A is somewhat structured in the sense that the columns of A corresponding to larger primes in the factor base tend to have fewer numbers of non-zero entries. In this regard, we refer to the interesting analysis by Odlyzko [225] in connection with the Coppersmith method (Section 4.4.4). Odlyzko took m = 2n equations in n unknown indices and showed that about n/4 columns of A are expected to contain only zero coefficients, implying that these variables never occurred in any relation collected. Moreover, about 0.346n columns of A are expected to have only single non-zero coefficients.

The sparsity (as well as the structure of the sparsity) of the coefficient matrix A can be effectively exploited and the system can be solved in time O^~(n²). In this section, we describe some special algorithms for large sparse linear systems. In what follows, we assume that we want to compute the unknown n-dimensional column vector x from the given system of equations

Ax = b,

where A is an m × n matrix, m ≥ n, and where b is a non-zero m-dimensional column vector. Though this is not the case in general, we will often assume for the sake of simplicity that A has full rank (that is, n). We write vectors as column vectors, that is, an l-dimensional vector v with elements v₁, . . . , v_l is written as v = (v₁ v₂ . . . v_l)^t, where the superscript ^t denotes matrix transpose.

Before we proceed further, some comments are in order. First note that our system of equations is often one over the finite ring which is not necessarily a field. Most of the methods we describe below assume that is a field, that is, r is a prime. If r is composite, we can do the following. First, assume that the prime factorization , α_i > 0, of r is known. In that case, we first solve the system over the fields for i = 1, . . . , s. Then for each i we lift the solution modulo p_i to the solution modulo . Finally, all these lifted solutions are combined using the CRT to get the solution modulo r.

Hensel lifting can be used to lift a solution of the system Ax ≡ b (mod p) to a solution of Ax ≡ b (mod p^α), where p is a prime and . We proceed by induction on α. Let us denote the (or a) solution of Ax ≡ b (mod p) by x₁, which can be computed by solving a system in the field . Now, assume that for some we know (integer) vectors x₁, . . . , x_i such that

Equation 4.14

We then attempt to compute a vector x_i+1 such that

Equation 4.15

Congruence (4.14) shows that the elements of A, x₁, . . . , x_i, b can be so chosen (as integers) that for some vector y_i we have the equality

A(x₁ + px₂ + ··· + p^i–1x_i) = b – pⁱy_i

in . Substituting this in Congruence (4.15) gives Ax_i+1 ≡ y_i (mod p). Thus the (incremental) vector x_i+1 can be obtained by solving a linear system in .

It, therefore, suffices to know how to solve linear congruences modulo a prime p. However, problems arise, when we do not know the factorization of r (while solving Ax ≡ b (mod r)). If r is large, it would be a heavy investment to make attempts to factor r. What can be done instead is the following. First, we use trial divisions to extract the small prime factors of r. We may, therefore, assume that r has no small prime factors. We proceed to solve Ax ≡ b (mod r) assuming that r is a prime (that is, that is a field). In a field, every non-zero element is invertible. But if r is composite, there are non-zero elements which are not invertible (that is, for which gcd(a, r) > 1). If, during the course of the computation, we never happen to meet (and try to invert) such non-zero non-invertible elements, then the computation terminates without any trouble. Otherwise, such an element a corresponds to a non-trivial factor gcd(a, r) of r. In that case, we have a partial factorization of r and restart solving the system modulo each suitable factor of r.

Some of the algorithms we discuss below assume that A is a symmetric matrix. In our case, this is usually not the case. Indeed we have matrices A which are not even square. Both these problems can be overcome by trying to solve the modified system A^tAx = A^t b. If A has full rank, this leads to an equivalent system.

If r = 2 (as in the case of the QSM for factoring integers), using the special methods is often not recommended. In this case, the elements of A are bits and can be packed compactly in machine words, and addition of rows can be done word-wise (say, 32 bits at a time). This leads to an efficient implementation of ordinary Gaussian elimination, which usually runs faster than the more complicated special algorithms described below, at least for the sizes of practical systems.

In what follows, we discuss some well-known methods for solving large sparse linear systems over finite fields (typically prime fields). In order to simplify notations, we will refrain from writing the matrix equalities as congruences, but treat them as equations over the underlying finite fields.

4.7.1. Structured Gaussian Elimination

Structured Gaussian elimination is applied to a sparse system before one of the next three methods is employed to solve the system. If the sparsity of A has some structures (as discussed earlier), then structured Gaussian elimination tends to reduce the size of the system considerably, while maintaining the sparsity of the system. We now describe the essential steps of structured Gaussian elimination. Let us define the weight of a row or column of a matrix to be the number of non-zero entries in that row or column.

First we delete all the columns (together with the corresponding variables) that have weight 0. These variables never occur in the system and need not be considered at all.

Next we delete all the columns that have weight 1 and the rows corresponding to the non-zero entries in these columns. Each such deleted column correspond to a variable x_i that appears in exactly one equation. After the rest of the system is solved, the value of x_i is obtained by back substitution. Deleting some rows in the matrix in this step may expose some new columns of weight 1. So this step should be repeated, until all the columns have weight > 1.

Now, choose each row with weight 1. This gives a direct solution for the variable x_i corresponding to the non-zero entry of the row. We then substitute this value of x_i in all the equations where it occurs and subsequently delete the ith column. We repeat this step, until all rows are of weight > 1.

At this point, the system usually has many more equations than variables. We may make the system a square one by throwing away some rows. Since subtracting multiples of rows of higher weights tends to increase the number of non-zero elements in the matrix, we should throw away the rows with higher weights. While discarding the excess rows, we should be careful to ensure that we are not left with a matrix having columns of weight 0. Some columns in the reduced system may again happen to have weight 1. Thus, we have to repeat the above steps again. And again and again and . . . , until we are left with a square matrix each row and column of which has weight ≥ 2.

This procedure leads to a system which is usually much smaller than the original system. In a typical example quoted in Odlyzko [225], structured Gaussian elimination reduces a system with 16,500 unknowns to one with less that 1,000 unknowns. The resulting reduced system may be solved using ordinary Gaussian elimination which, for smaller systems, appears to be much faster than the following sophisticated methods.

4.7.2. The Conjugate Gradient Method

The conjugate gradient method was originally proposed to solve a linear system Ax = b over for an n × n (that is, square) symmetric positive definite matrix A and for a nonzero vector b and is based on the idea of minimizing the quadratic function . The minimum is attained, when the gradient ∇f = Ax – b equals zero, which corresponds to the solution of the given system.

The conjugate gradient method is an iterative procedure. The iterations start with an initial minimizer x₀ which can be any n-dimensional vector. As the iterations proceed, we obtain gradually improved minimizers x₀, x₁, x₂, . . . , until we reach the solution. We also maintain and update two other sequences of vectors e_i and d_i. The vector e_i stands for the error b – Ax_i, whereas the vectors d₀, d₁, . . . constitute a set of mutually conjugate (that is, orthogonal) directions. We initialize e₀ = d₀ = b – Ax₀ and for i = 0, 1, . . . repeat the steps of Algorithm 4.8, until e_i = 0. We denote the inner product of two vectors v = (v₁ v₂ . . . v_n)^t and w = (w₁ w₂ . . . w_n)^t by .

Algorithm 4.8. An iteration in the conjugate gradient method

a_i := 〈e_i, e_i〉/〈d_i, Ad_i〉.

x_i+1 := x_i + a_id_i.

e_i+1 := e_i – a_iAd_i.

b_i := 〈e_i+1, e_i+1〉/〈e_i, e_i〉.

d_i+1 := e_i+1 + b_id_i.

This method computes a set of mutually orthogonal directions d₀, d₁, . . . , and hence it has to stop after at most n – 1 iterations, since we run out of new orthogonal directions after n – 1 iterations. Provided that we work with infinite precision, we must eventually obtain e_i = 0 for some i, 0 ≤ i ≤ n – 1.

If A is sparse, that is, if each row of A has O(log^c n) non-zero entries, c being a positive constant, then the product Ad_i can be computed using O^~(n) field operations. Other operations clearly meet this bound. Since at most n – 1 iterations are necessary, the conjugate gradient method terminates after performing O^~(n²) field operations.

We face some potential problems, when we want to apply this method to solve a system over a finite field . First, the matrix A is usually not symmetric and need not even be square. This problem can be avoided by solving the system A^tAx = A^t b. The new coefficient matrix A^tA may be non-sparse (that is, dense). So instead of computing and working with A^tA explicitly, we compute the product (A^tA)d_i as A^t (Ad_i), that is, we avoid multiplication by a (possibly) dense matrix at the cost of multiplications by two sparse matrices.

The second difficulty with a finite field is that the question of minimizing an -valued function makes hardly any sense (and so does positive definiteness of a matrix over ). However, the conjugate gradient method is essentially based on the generation of a set of mutually orthogonal vectors d₀, d₁, . . . . This concept continues to make sense in the setting of a finite field.

If A is a real positive definite matrix, we cannot have 〈d_i, Ad_i〉 = 0 for a nonzero vector d_i. But this condition need not hold for a matrix A over . Similarly, we may have a non-zero error vector e_i over , for which 〈e_i, e_i〉 = 0. (Again this is not possible for real vectors.) So for the iterations over (more precisely, the computations of a_i and b_i) to proceed gracefully, all that we can hope for is that before reaching the solution we never hit a non-zero direction vector d_i for which 〈d_i, Ad_i〉 = 0 nor a non-zero error vector e_i for which 〈e_i, e_i〉 = 0. If q is sufficiently large and if the initial minimizer x₀ is sufficiently randomly chosen, then the probability of encountering such a bad d_i or e_i is rather low and as a result the method is very likely to terminate without problems. If, by a terrible stroke of bad luck, we have to abort the computation prematurely, we should restart the procedure with a new random initial vector x₀. If q is small (say q = 2 as in the case of the QSM), it is a neater idea to select the entries of the initial vector x₀ from a field extension and work in this extension. The eventual solution we will reach at will be in , but working in the larger field decreases the possibility of an attempt of division by 0.

There is, however, a brighter side of using a finite field in place of , namely every calculation we perform in is exact, and we do not have to bother about a criterion for determining whether an error vector e_i is zero or about the conditioning of the matrix A. One of the biggest headaches of numerical analysis is absent here.

4.7.3. The Lanczos Method

The Lanczos method is another iterative method quite similar to the conjugate gradient method. The basic difference between these methods lies in the way by which the mutually conjugate directions d₀, d₁, . . . are generated. For the Lanczos method, we start with the initializations: d₀ := b, , , x₀ = a₀d₀. Then, for i = 1, 2, . . . , we repeat the steps in Algorithm 4.9 as long as .

Algorithm 4.9. An iteration in the Lanczos method

v_i+1 := Ad_i.

x_i := x_i–1 + a_id_i.

If A is a real positive definite matrix, the termination criterion is equivalent to the condition d_i = 0. When this is satisfied, the vector x_i–1 equals the desired solution x of the system Ax = b. Since d₀, d₁, . . . are mutually orthogonal, the process must stop after at most n – 1 iterations. Therefore, for a sparse matrix A, the entire procedure performs O^~(n²) field operations.

The problems we face with the Lanczos method applied to a system over are essentially the same as those discussed in connection with the conjugate gradient method. The problem with a non-symmetric and/or non-square matrix A is solved by multiplying the system by A^t. Instead of working with A^tA explicitly, we prefer to multiply separately by A and A^t.

The more serious problem with a system over is that of encountering a non-zero direction vector d_i with . If it happens, we have to abort the computation prematurely. In order to restart the procedure, we try to solve the system BAx = Bb, where B is a diagonal matrix whose diagonal elements are chosen randomly from the non-zero elements of the field or of some suitable extension (if q is small).

4.7.4. The Wiedemann Method

The Wiedemann method for solving a sparse system Ax = b over uses ideas different from those employed by the other methods discussed so far. For the sake of simplicity, we assume that A is a square non-singular matrix (not necessarily symmetric). The Wiedemann method tries to compute the minimal polynomial , d ≤ n, of A. To that end, one selects a small positive integer l in the range 10 ≤ l ≤ 20. For , let v_i denote the column vector of length l consisting of the first l entries of the vector Aⁱb. For the working of the Wiedemann method, we need to compute only the vectors v₀, . . . , v_2n. If A is a sparse matrix, this computation involves a total of O^~(n²) operations in .

Since μ_A(A) = 0, we have for every . Therefore, for each k = 1, . . . , l the sequence v_0,k, v_1,k, . . . of the k-th entries of v₀, v₁, . . . satisfies the linear recurrence

But then the minimal polynomial μ_k(X) of the k-th such sequence is a factor of μ_A(X). There are methods that compute each μ_k(X) using O(n²) field operations. We then expect to obtain μ_A(X) = lcm(μ_k(X) | 1 ≤ k ≤ l).

The assumption that A is non-singular is equivalent to the condition that c₀ ≠ 0. In that case, the solution vector can be computed using O^~(n²) arithmetic operations in the field .

If A is singular, we may find out linear dependencies among the rows of A and subsequently throw away suitable rows. Doing this repeatedly eventually gives us a non-singular A. For further details on the Wiedemann method, see [303].

5.2. Secure Transmission of Messages

Consider the standard scenario: a party named Alice, and called sender, is willing to send a secret message m to a party named Bob, and called receiver or recipient, over a public communication channel. A third party Carol may intercept and read the message. In order to maintain the secrecy of the message, Alice uses a well-defined transform f_e to convert the plaintext message m to the ciphertext message c and sends c to Bob. Bob possesses some secret information with the help of which he uses the reverse transformation f_d in order to get back m. Carol who is expected not to know the secret information cannot retrieve m from c by applying the transformation f_d.

In a public-key system, the realization of the transforms f_e and f_d is based on a key pair (e, d) predetermined by Bob. The public key e is made public, whereas the private key d is kept secret. The encryption transform generates c = f_e(m, e). Since e is a public knowledge, anybody can generate c from a given m, whereas the decryption transform m = f_d(c, d) can be performed only by Bob who possesses the knowledge of d. The key pair has to be so chosen that knowledge of e does not allow Carol to compute d in feasible time. The intractability of the computational problems discussed in Chapter 4 can be exploited to design such key pairs. The exact realization of the keys e, d and the transforms f_e, f_d depends on the choice of the underlying intractable problem and also on the way to make use of the problem. Since there are several intractable problems suitable for cryptography, there are several encryption schemes varying widely in algorithmic and mathematical details.

5.2.1. The RSA Public-key Encryption Algorithm

RSA has been the most popular encryption algorithm. Historically also, it is the first public-key encryption algorithm published in the literature (see Rivest et al. [252]). Its security is based on the intractability of the RSAP (or the RSAKIP) discussed in Exercise 4.2. Since both these problems are polynomial-time reducible to the IFP, we often say that the RSA algorithm derives its security from the intractability of the IFP. It may, however, be the case that breaking RSA is easier than factoring integers, though no concrete evidences seem to be available.

RSA key pair

Algorithm 5.1 generates a key pair for RSA.

Algorithm 5.1. RSA key generation

Input: A bit length l.

Output: A random RSA key pair.

Steps:

Generate two different random primes p and q each of bit length l.

n := pq.

Choose an integer e coprime to φ(n) = (p – 1)(q – 1).

d := e^–1 (mod φ(n)).

Return the pair (n, e) as the public key and the pair (n, d) as the private key.

The length l of the primes p and q should be chosen large enough so as to make the factorization of n infeasible. For short-term security, values of l between 256 and 512 suffice. For long-term security, one may choose l as large as 2,048.

The random primes p and q can be generated using a probabilistic algorithm like those described in Section 3.4.2. Naive primes are normally considered to be sufficiently secure in this respect, since p ± 1 and q ± 1 are expected to have large prime factors in general. Gordon’s algorithm (Algorithm 3.14) can also be used for generating strong primes p and q. Since Gordon’s algorithm runs only nominally slower than the algorithm for generating naive primes, there is no harm in using strong primes. Safe primes, on the other hand, are difficult to generate and may be avoided.

The RSA modulus n is public knowledge. Determining d from n and e is easily doable, given the value of φ(n) = (p – 1)(q – 1) which, in turn, is readily computable, if p and q are known. If an adversary can compute φ(n) (with or without factoring n), the security of the RSA protocol based on the modulus n is compromised. However, computing φ(n) without the knowledge of p and q is (at least historically) a very difficult computational problem, and so, if n is reasonably large, RSA encryption is assumed to be sufficiently secure.

RSA encryption is done by raising the plaintext message m to the power e modulo n. In order to speed up this (modular) exponentiation, it is often expedient to take a small value for e (like 3, 257 and 65,537). However, in that case one should adopt certain precautions as Exercise 5.2 suggests. More specifically, if e entities share a common (small) encryption key e but different (pairwise coprime) moduli and if the same message m is encrypted using all these public keys, then an eavesdropper can reconstruct m easily from a knowledge of the e ciphertext messages. Another potential problem of using small e is that if m is small, that is, if m < n^1/e, then m can be retrieved by taking the integer e-th root of the ciphertext message.

Although the pair (n, d) is sufficient for carrying out RSA decryption, maintaining some additional (secret) information significantly speeds up decryption. To this end, it is often recommended that some or all of the values n, e, d, p, q, d₁, d₂, h be stored, where d₁ := d rem (p – 1), d₂ := d rem (q – 1) and h := q^–1 (mod p).

If n can be factored, then d can be easily computed from the public key (n, e). Conversely, if n, e, d are all known, there is an efficient probabilistic algorithm which factors n. This algorithm is based on the fact that if ed – 1 = 2^st with t odd, then for at least half of the integers there exists such that a^{2^σ}t ≢ ±1 (mod n), whereas a^{2^σ+1t} ≡ 1 (mod n). But then the gcd of n and a^{2^σ_t} – 1 is a non-trivial factor of n. For the details, solve Exercise 7.9.

Different entities in a given network should use different values of n. If two or more entities share a common n but different exponent pairs (e_i, d_i), then each entity can first factor n and then use this factorization to compute the private keys of other entities. Primes are quite abundant in nature and so finding pairwise coprime RSA moduli for all entities is no problem at all. A common value of the encryption exponent e (for example, a small value of e) can, however, be shared by all entities. In that case, for pairwise different moduli n_i, the corresponding decryption exponents d_i will also be pairwise different.

RSA encryption

RSA encryption is rather simple, as Algorithm 5.2 shows.

Algorithm 5.2. RSA encryption

Input: The RSA public key (n, e) of the recipient and the plaintext message .

Output: The ciphertext message .

Steps:

c := m^e (mod n).

By Exercise 4.1, the exponentiation function m ↦ m^e is bijective; so m can be uniquely recovered from c. It is clear why small encryption exponents e speed up RSA encryption. For a general exponent e, the routine takes time O(log³ n), whereas for a small e (that is, e = O(1)) the running time drops to O(log² n).

RSA decryption

RSA decryption (Algorithm 5.3) is analogous to RSA encryption.

Algorithm 5.3. RSA decryption

Input: The RSA private key (n, d) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

m := c^d (mod n).

The correctness of this decryption procedure follows from Exercise 4.1. As in the case of encryption, one might go for small decryption exponents d. In general, both e and d cannot be small simultaneously. If e is small, the security of the RSA scheme is expected not be affected, whereas small values of d are not desirable for several reasons. First, if d is very small, the adversary chooses some m, computes the corresponding ciphertext c (using public knowledge) and then keeps on computing c^x (mod n) for x = 1, 2, . . . until x = d is reached, that is, until the original message m is recovered.

Even when d is not very small so that the possibility of exhaustive search with x = 1, 2, . . . can be precluded, there are several attacks known for small private exponents. Wiener [304] proposes an efficient algorithm in this respect. Boneh and Durfee [32] improve Wiener’s algorithm. Sun et al. [294] propose three variants of the RSA scheme that are resistant to these attacks. Durfee and Nguyen [82] extend the Boneh–Durfee attack to break two of these three variants. To sum up, it is advisable not to use small secret exponents d, that is, the bit length of d should be close to that of n in order to achieve the desired level of security.

There are alternative ways to speed up RSA decryption. If the values p, q, d₁ := d rem (p – 1), d₂ := d rem (q – 1) and h := q^–1 (mod p) are all available to the recipient, he can use Algorithm 5.4 for RSA decryption.

Algorithm 5.4. RSA decryption using CRT

Input: The RSA extended private key (p, q, d₁, d₂, h) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

m₁ := c^d₁ (mod p).

m₂ := c^d₂ (mod q).

t := h(m₁ – m₂) (mod p).

m := m₂ + tq.

In this modified routine, m₁ := m rem p and m₂ := m rem q are first computed and then combined using the CRT to get m modulo n = pq. Algorithm 5.3 performs a single modular exponentiation modulo n, whereas in Algorithm 5.4 two exponentiations modulo p and q respectively take the major portion of the running time. Since an exponentiation modulo N to an exponent O(N) runs in time O(log³ N), and since each of p and q has bit length (about) half of that of n, Algorithm 5.4 runs about four times as fast as Algorithm 5.3.

If only the values p, q, d are stored, then d₁, d₂ and h can be computed on the fly using relatively inexpensive operations and subsequently Algorithm 5.4 can be used. This leads to a decryption routine almost as fast as Algorithm 5.4, but calls for somewhat smaller memory requirements for the storage of the private key.

5.2.2. The Rabin Public-key Encryption Algorithm

The Rabin public-key encryption algorithm is based on the intractability of computing square roots modulo a composite integer (SQRTP). By Exercise 4.10, the SQRTP is probabilistically polynomial-time equivalent to the IFP, that is, breaking the Rabin scheme is provably as hard as factoring integers. Breaking RSA, on the other hand, is only believed to be equivalent to factoring integers. Moreover, Rabin encryption is faster than RSA encryption (for moduli of the same size).

Rabin key pair

Like RSA, Rabin encryption requires a modulus of the form n = pq.

Algorithm 5.5. Rabin key generation

Input: A bit length l.

Output: A random Rabin key pair.

Steps:

Generate two different random primes p and q each of bit length l.

n := pq.

Return n as the public key and the pair (p, q) as the private key.

Here, the choice of the bit length l and the generation of the primes p and q follow the same guidelines as discussed in connection with RSA key generation.

Rabin encryption

Encryption in the Rabin scheme involves a single modular squaring.

Algorithm 5.6. Rabin encryption

Input: The Rabin public key n of the recipient and the plaintext message .

Output: The ciphertext message .

Steps:

c := m² (mod n).

Unfortunately, the Rabin encryption map m ↦ m² (mod n) is not injective. In general, a ciphertext c has four square roots modulo n.^[1] This poses ambiguity during decryption. In order to work around this difficulty, one adds some distinguishing feature or redundancy to the message m before encryption. One possibility is to duplicate a predetermined number of bits at the least significant end of m. This reduces the message space somewhat, but is rarely a serious issue. Only one of the (four) square roots of the ciphertext c is expected to have the desired redundancy. If none or more than one square root possesses the redundancy, decryption fails. However, this is a very rare phenomenon and can be ignored for all practical purposes.

^[1] More specifically, if an element is a square modulo both p and q, then the number of square roots of c equals 1 if c = 0; it is 2 if either c ≡ 0 (mod p) or c ≡ 0 (mod q) but not both; and it is 4 if c ≢ 0 (mod p) and c ≢ 0 (mod q). If c is not a square modulo either p or q, then c does not possess a square root modulo n. These assertions can be readily proved using the Chinese remainder theorem.

Rabin decryption

Rabin decryption (Algorithm 5.7) involves computing square roots modulo n. Since n is composite, this is a very difficult problem (for the eavesdropper). But the knowledge of the prime factors p and q of n allows the recipient to decrypt.

Algorithm 5.7. Rabin decryption

Input: The Rabin private key (p, q) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

if or ( { Return “c is not a ciphertext message”. }

Compute the square roots of c mod p.	/* Algorithm 3.17 */
Compute the square roots of c mod q.	/* Algorithm 3.17 */
Compute the square roots of c mod n from those mod p and q.	/* Use CRT */

if (c has exactly one distinguished square root m mod n) { Return m. }

else { Return “failure”. }

5.2.3. The Goldwasser–Micali Encryption Algorithm

So far, we have encountered encryption algorithms that are deterministic in the sense that for a given public key of the recipient the same plaintext message encrypts to the same ciphertext message. In a probabilistic encryption algorithm, different calls of the encryption routine produce different ciphertext messages for the same plaintext message and public key.

The Goldwasser–Micali encryption algorithm is probabilistic and is based on the intractability of the quadratic residuosity problem (QRP) described in Exercise 4.2. If n is a composite integer and a an integer coprime to n, then implies that a is a quadratic non-residue modulo n. The converse does not hold, that is, one may have , even when a is a quadratic non-residue modulo n. For example, if n is the product of two distinct odd primes p and q, then a is a quadratic residue modulo n if and only if a is a quadratic residue modulo both p and q. However, if , we continue to have . There is no easy way to find out if a is a quadratic residue modulo n for an integer a with . If the factorization of n is available, the QRP is solvable in polynomial time. These observations lead to the design of the Goldwasser–Micali scheme.

Goldwasser–Micali key pair

The Goldwasser–Micali scheme works in the ring , where n is the product of two distinct sufficiently large primes. The integer a (resp. b) in Algorithm 5.8 can be found by randomly choosing elements of (resp. ) and computing the Legendre symbol (resp. ). Under the assumption that quadratic non-residues are randomly located in and , a and b can be found after only a few trials. The integer x is a quadratic non-residue modulo n with .

Goldwasser–Micali encryption

Goldwasser–Micali encryption (Algorithm 5.9) is probabilistic, since its output is dependent on a sequence of random elements a_i of . It generates a tuple (c₁, . . . , c_r) of elements of such that each . If m_i = 0, then c_i is a quadratic residue modulo n, whereas if m_i = 1, c_i is a quadratic non-residue modulo n. Therefore, if the quadratic residuosity of c_i modulo n can be computed, the bit m_i can be determined. If one (for example, the recipient) knows the factorization of n or equivalently the prime factor p of n, one can perform decryption easily. An eavesdropper, on the other hand, must solve the QRP (or the IFP) in order to find out the bits m₁, . . . , m_r. This is how Goldwasser–Micali encryption derives its security.

Algorithm 5.8. Goldwasser–Micali key generation

Input: A bit length l.

Output: A random Goldwasser–Micali key pair.

Steps:

Generate two (different) random primes p and q each of bit length l.

n := pq.

Find out integers a and b such that .

Compute an integer x with x ≡ a (mod p) and x ≡ b (mod q). /* Use CRT */

Return the pair (n, x) as the public key and the prime p as the private key.

Algorithm 5.9. Goldwasser—Micali encryption

Input: The Goldwasser—Micali public key (n, x) of the recipient and the plaintext message m = m₁ . . . m_r, , which is a bit string of length r.

Output: The ciphertext message .

Steps:

for i = 1, . . . , r {
Select a random element .
.
}

Since randomly chosen non-zero elements of are with high probability coprime to n, it is sufficient to draw a_i from \{0} and skip the check whether gcd(a_i, n) = 1. In fact, if an a_i with gcd(a_i, n) > 1 is somehow located, this gcd equals a non-trivial factor of n, and the security of the scheme is broken.

The Goldwasser–Micali scheme has the drawback that the length of the ciphertext message is much bigger than that of the plaintext message. Thus, for example, for a 1024-bit modulus n and a message m of bit length 64, the output requires a huge 65,536-bit space. This phenomenon is called message expansion and can be a serious limitation in certain circumstances.

Goldwasser–Micali decryption

Goldwasser–Micali decryption (Algorithm 5.10) recovers the bits of the plaintext message by computing Legendre symbols modulo the prime divisor p of n. The correctness of this decryption algorithm is evident from the discussion immediately following Algorithm 5.9.

Algorithm 5.10. Goldwasser—Micali decryption

Input: The Goldwasser—Micali private key p of the recipient and the ciphertext message .

Output: The recovered plaintext message m = m₁, . . . , m_r, .

Steps:

for i = 1, . . . , r {
if else { m_i :=1 }
}

5.2.4. The Blum–Goldwasser Encryption Algorithm

The Blum–Goldwasser algorithm is another probabilistic encryption algorithm and is better than the Goldwasser–Micali algorithm in the sense that in this case the message expansion is by only a constant number of bits irrespective of the length of the plaintext message. The Blum–Goldwasser scheme is based on the intractability of the SQRTP (modulo a composite integer).

Blum–Goldwasser key pair

As in the case of the encryption algorithms discussed so far, the Blum–Goldwasser algorithm works in the ring , where n = pq is the product of two distinct primes p and q. Now, we additionally demand p and q to be both congruent to 3 modulo 4.

Algorithm 5.11. Blum–Goldwasser key generation

Input: A bit length l.

Output: A random Blum–Goldwasser key pair.

Steps:

Generate two (different) random primes p and q each of bit length l and each congruent to 3 mod 4.

n := pq.

Return n as the public key and the pair (p, q) as the private key.

Since p and q are two different primes, there exist integers u and v such that up + vq = 1. In order to speed up decryption, it is often expedient to store u and v along with p and q in the private key. Recall that the solution of the congruences x ≡ a (mod p) and x ≡ b (mod q) is given by x ≡ vqa + upb (mod n).

Blum–Goldwasser encryption

The Blum–Goldwasser encryption algorithm assumes that the input plaintext message m is in the form of a bit string, and breaks m into substrings of a fixed length t. A typical choice for t is t = ⌊lg lg n⌋, where n is the public key of the recipient. Write m = m₁ . . . m_r, where each m_i is a bit string of length t. The ciphertext consists of r bit strings c₁, . . . , c_r, each of bit length t, and an element .

Algorithm 5.12. Blum–Goldwasser encryption

Input: The Blum–Goldwasser public key n of the recipient and the plaintext message m = m₁ . . .m_r, where each m_i is a bit string of length t.

Output: The ciphertext message (c₁, . . . , c_r, d), where each c_i is a bit string of length t and .

Steps:

Choose a random element .

d := d² (mod n).
for i = 1, . . . , r {
   d := d² (mod n).
   δ := the t least significant bits of d.
   c_i := m_i ⊕ δ.                                            /* Here ⊕ denotes bit-wise XOR of t-bit strings */
}
d := d² (mod n).

Blum–Goldwasser encryption involves computation of r modular squares in and is quite fast (for example, faster than RSA encryption with a general encryption exponent). It makes sense to assume that the initial choice of d is from , since finding a non-zero non-invertible element of is as difficult as factoring n.

For an intruder to determine the plaintext message m from the corresponding ciphertext message, the values of d inside the for loop are necessary. These can be obtained by taking repeated square roots modulo n. Since n is composite, this is a difficult problem. On the other hand, since the recipient knows the prime divisors p and q of n, taking square roots modulo n requires only polynomial-time effort.

Blum–Goldwasser decryption

Recall from Exercise 3.43 that a quadratic residue (where n is the public key of the recipient) has four distinct square roots of which exactly one is again a quadratic residue modulo n. This distinguished square root y of d satisfies the congruences y ≡ d^(p+1)/4 (mod p) and y ≡ d^(q+1)/4 (mod q). In the decryption Algorithm 5.13, we assume that .

Algorithm 5.13 assumes that each value of d is a quadratic residue modulo n. This can be verified by inserting in the for loop a check whether , before an attempt is made to compute the square root of d modulo n. If (c₁, . . . , c_r, d) is a valid ciphertext message, this condition necessarily holds, and there is no fun wasting time for checking obvious things. However, if there is a possibility that d is altered by an (active) adversary (or corrupted during transmission), one may insert this check. In that case, the routine should report failure, when the square root of a quadratic non-residue modulo n is to be computed.

Algorithm 5.13. Blum–Goldwasser decryption

Input: The Blum–Goldwasser private key (p, q) of the recipient and the ciphertext message (c₁, . . . , c_r, d), where each c_i is a bit string of length t and .

Output: The recovered plaintext message m = m₁ . . . m_r, where each m_i is a bit string of length t.

Steps:

for i = r, r – 1, . . . , 1 {
   a := d^(p+1)/4 (mod p) and b := d^(q+1)/4 (mod q).
   Compute  with d ≡ a (mod p) and d ≡ b (mod q).  /* Use CRT */
   δ := the t least significant bits of d.
   m_i := c_i ⊕ δ.  /* XOR of t-bit strings */
}

5.2.5. The ElGamal Public-key Encryption Algorithm

The ElGamal encryption algorithm works in a group G in which it is difficult to solve the Diffie–Hellman problem (DHP). Typical candidates for G include the multiplicative group of a finite field (usually q is a prime or a power of 2), the (additive) group of points on an elliptic curve over a finite field and the (additive) group (called the Jacobian) of reduced divisors on an hyperelliptic curve over a finite field. Here we assume that G is multiplicatively written and has order n. It is not necessary for G to be cyclic, but we should have at our disposal an element with a suitably large (preferably prime) order k. We essentially work in the cyclic subgroup H of G generated by g (but using the arithmetic of G). For the ElGamal scheme, G (together with its representation), g, n and k are made public and can be shared by different entities on a network.

ElGamal key pair

Generating a key pair for the ElGamal scheme (Algorithm 5.14) involves an exponentiation in G. In order to make the exponentiation efficient, the exponent (the private key) is often chosen to have a small number of 1 bits. However, if this number is too small, exhaustive search by an adversary may become feasible.

If the DLP can be solved in G, the private key d can be computed from the public key g^d. This amounts to breaking a system based on this key pair. This is why we often say that the security of the ElGamal encryption scheme banks on the intractability of the DLP. But as we see shortly, the DHP is the more fundamental computational problem that dictates the security of ElGamal encryption.

Algorithm 5.14. ElGamal key generation

Input: G, g and k as defined above.

Output: A random ElGamal key pair.

Steps:

Generate a random integer d, 2 ≤ d ≤ k – 1.

Return g^d as the public key and d as the private key.

ElGamal encryption

Given a message , the ElGamal encryption procedure (Algorithm 5.15) generates a pair (r, s) of elements of G as the ciphertext message and thus corresponds to message expansion by a factor of 2. Clearly, the sender has all the relevant information for computing (r, s). The need for using a different session key for each encryption is explained in Exercise 5.6.

Algorithm 5.15. ElGamal encryption

Input: (G, g, k and) the ElGamal public key g^d of the recipient and the plaintext message .

Output: The ciphertext message (where H = 〈g〉).

Steps:

Generate a (random) session key d′, 2 ≤ d′ ≤ k – 1.

r := g^d′.

s := mg^dd′ = m(g^d)^d′.

Notice that ElGamal encryption uses two exponentiations in G to exponents which are O(k). Therefore, the running time of Algorithm 5.15 reduces, if smaller values of k are selected. On the other hand, if k is too small, the square-root methods in H = 〈g〉 may become efficient (see Section 4.4.1). In practice, it is recommended that k be taken as a prime of length 160 bits or more.

ElGamal decryption

ElGamal decryption involves an exponentiation in G to an exponent which is O(k). It is easy to verify that Algorithm 5.16 performs decryption correctly and that the recipient has the necessary information to carry out decryption.

Algorithm 5.16. ElGamal decryption

Input: (G, g, k and) the ElGamal private key d of the recipient and the ciphertext message (where H = 〈g〉).

Output: The recovered plaintext message .

Steps:

m := sr^–d = sr^k–d.

An eavesdropper Carol knows the domain parameters G, g, k and n and also the recipient’s public key g^d. Determining the message m from a knowledge of the corresponding ciphertext (r, s) is then equivalent to computing the element g^dd′. This implies that a (quick) solution of the DHP permits Carol to decrypt a ciphertext. If a (quick) solution of the DLP is available, then the element g^dd′ is computable fast. The reverse implication is, however, not clear: it may be easier to solve the DHP than the DLP, though no concrete evidences are available to corroborate this fact.

5.2.6. The Chor–Rivest Public-key Encryption Algorithm

The Chor–Rivest encryption algorithm is based on a variant of the subset sum problem. It selects a prime p and an integer h ≥ 2, uses a knapsack set A = {a₀, . . . , a_p–1} with 1 ≤ a_i ≤ p^h – 2 for each i, and considers sums of the form , , with . In order to construct the set A for which the h-fold sum s is uniquely determined by the binary vector (∊₀, . . . , ∊_p–1) of weight h (that is, with exactly h bits equal to 1), we take the help of the finite field . We represent as , where is irreducible of degree h and where x is the residue class of X in . The parameters p and h must be so chosen that p^h –1 is reasonably smooth, so that the integer factorization of p^h – 1 can be easily computed. This helps us in two ways. First, a generator g(x) of the multiplicative group can be made available quickly using Algorithm 3.25. Second, the Pohlig–Hellman method of Section 4.4.1 becomes efficient for computing discrete logarithms in . We can then take a_i := ind_g(x)(x + i), i = 0, 1, . . . , p – 1. If (∊₀, . . . , ∊_p–1) and are two binary vectors of weight h, then implies , that is, , that is, for all i = 0, . . . , p – 1 , since otherwise x would satisfy a non-zero polynomial of degree < h.

Chor–Rivest key pair

A randomly permuted version of a₀, . . . , a_p–1 shifted by a noise (that is, a random bias) d together with p and h constitute the public key of the Chor–Rivest scheme. The private key, on the other hand, comprises the polynomials f(X) and g(x), the permutation just mentioned and the noise d. Algorithm 5.17 elaborates the generation of such a key pair. The same values of p and h can be used by different entities on a network. So we assume that p and h are provided instead of generated by the recipient as a part of his public key. For brevity, we use the notation q := p^h.

Key generation may be a long process in the Chor–Rivest scheme depending on how difficult it is to compute all the indexes ind_g(x)(x + i). Furthermore, the size of the public key is quite large, namely O(ph log p). Typically one may take p ≈ 200 and h ≈ 25. The original paper of Chor and Rivest [54] recommends the possibilities (197, 24), (211, 24), (243, 24) and (256, 25) for (p, h). Note that 256 is not a prime, but Chor–Rivest’s algorithm works, even when p is a power of a prime. For the sake of simplicity, we here stick to the case that p is a prime.

Algorithm 5.17. Chor–Rivest key generation

Input: A prime p and an integer h ≥ 2 such that p^h – 1 is smooth.

Output: A Chor–Rivest key pair.

Steps:

Choose an irreducible polynomial of degree h.

Use the representation , where x := X + 〈f(X)〉.

Choose a random generator g(x) of .

Compute the indexes a_i := ind_g(x)(x + i) for i = 0, 1, . . . , p – 1.

Select a random permutation π of {0, 1, . . . , p – 1}.

Select a random noise d in the range 0 ≤ d ≤ q – 2.

Compute α_i := a_π(i) + d (mod q – 1) for i = 0, 1, . . . , p – 1.

Return (α₀, α₁ . . . , α_p–1) as the public key and (f, g, π, d) as the private key.

Chor–Rivest encryption

The Chor–Rivest encryption procedure (Algorithm 5.18) assumes that the input plaintext message is represented as a binary vector (m₀, . . . , m_p–1) of weight (that is, number of one-bits) equal to h. Since there are such binary vectors, arbitrary binary strings of bit length can be encoded into binary vectors of the above special form. See Chor and Rivest [54] for an algorithm that describes how such an encoding can be done. Chor–Rivest encryption is quite fast, since it computes only h integer additions modulo q – 1.

Algorithm 5.18. Chor–Rivest encryption

Input: The Chor–Rivest public key (α₀, . . . , α_p–1) (together with p and h) and the plaintext message (m₀, . . . , m_p–1) which is a binary vector of weight h.

Output: The ciphertext message .

Steps:

(mod q – 1).

Chor–Rivest decryption

The Chor–Rivest decryption procedure (Algorithm 5.19) generates a monic polynomial of degree h, the h (distinct) roots of which gives the non-zero bits m_i in the original plaintext message.

In order to prove that the decryption correctly works, note that (mod q – 1) , so that (mod f(X)). The polynomial u(X) is computed as one of degree < h. Adding f(X) to u(X) gives a monic polynomial v(X) of degree h, which is congruent modulo f(X) to . The roots of v(X) can be obtained either by a root finding algorithm or by trial divisions of v(X) by X + i, i = 0, 1, . . . , p – 1. Applying the inverse of π on these roots then reconstructs the plaintext message.

Algorithm 5.19. Chor–Rivest decryption

Input: The Chor–Rivest private key (f, g, π, d) (together with p and h) and the ciphertext message .

Output: The recovered plaintext message (m₀, . . . , m_p–1) which is a binary vector of weight h.

Steps:

s := c – hd (mod q – 1).

u(X) := g(X)^s (mod f(X)).

v(X) := f(X) + u(X).

Factorize u(X) as u(X) = (X + i₁)· · ·(X + i_h), .

For i = 0, 1, . . . , p – 1 set

An eavesdropper sees only the sum (mod q – 1) of the (known) knapsack weights α₀, . . . , α_p–1. In order to recover m₀, . . . , m_p–1, she should solve the SSP. By choosing p and h carefully, the density of the knapsack set can be adjusted to be high, that is, larger than what the cryptanalytic routines described in Section 4.8 can handle. Thus, the Chor–Rivest scheme is assumed to be secure. However, as discussed in Chor and Rivest [54], the security of the system breaks down, when certain partial information on the private key are available.

*5.2.7. The XTR Public-key Encryption Algorithm

XTR, a phonetic abbreviation of efficient and compact subgroup trace representation, is designed by Arjen Lenstra and Eric Verheul as an attractive alternative to RSA (and similar cryptosystems including the ElGamal scheme over finite fields) and elliptic curve cryptosystems (ECC). The attractiveness of XTR arises from the following facts:

XTR runs (about three times) faster than RSA or ECC.
XTR has shorter keys (comparable with ECC).
The security of XTR is based on the DLP/DHP over finite fields of sufficiently big sizes and not on a new allegedly difficult computational problem.
The parameter and key generation for XTR is orders of magnitude faster than that for RSA/ECC.

XTR, though not a fundamental breakthrough, deserves treatment in this chapter. The working of XTR is somewhat involved and we plan to present only a conceptual description of the algorithm, hiding the mathematical details.

XTR considers the following tower of field extensions:

where p ≡ 2 (mod 3) is a prime, sufficiently large so that computing discrete logs in using known algorithms is infeasible. We have p⁶ – 1 = (p – 1)(p + 1)(p² – p + 1)(p² + p + 1). Let q be a prime divisor of p² – p + 1 of bit length 160 or more. There is a unique subgroup G of with #G = q. G is called the XTR (sub)group, whereas the entire group is called the XTR supergroup. The XTR group G is cyclic (Lemma 2.1, p 27). Let g be a generator of G, that is, G = 〈g〉 = {1, g, g², . . . , g^q–1}.

The working of XTR is based on the discrete log problem in G. Since p² – p + 1 and hence q are relatively prime to the orders of the multiplicative groups of all proper subfields of , computing discrete logs in G is (seemingly) as difficult as that in , that is, one gets the same level of security by the use of G instead of the full XTR supergroup.

The main technical innovation of XTR is the proposal of a compact representation of the elements of G in place of the obvious representation using ⌈6 lg p⌉ bits inherited from that of . This is precisely where the intermediate field comes into picture. We require a map , so that we can represent elements of G by those of . This map offers two benefits. First, the elements of G can now be represented using ⌈2 lg p⌉ bits leading to a three-fold reduction in the key size. Second, the arithmetic of can be exploited to implement the arithmetic in G, thereby improving the efficiency of encryption and decryption routines (compared to those over the full XTR supergroup).

The map uses the traces of elements of over (Definition 2.59). In this section, we use the shorthand notation Tr to stand for . The conjugates of an element over are h, h^p², h^p⁴ and so

Let us now specialize to . Since p² ≡ p – 1 (mod p² – p + 1) and p⁴ ≡ –p (mod p² – p + 1), the conjugates of h are gⁿ, g^(p–1)n, g^–pn. Thus, Tr(gⁿ) = gⁿ + g^(p–1)n + g^–pn. Moreover,

so the minimal polynomial of h = gⁿ over is

This minimal polynomial is determined uniquely by Tr(gⁿ) and so we can represent by . Note, however, that this representation is not unique, that is, the map , is not injective. More precisely, the only elements of G that map to Tr(gⁿ) are the conjugates gⁿ, g^(p–1)n, g^–pn of gⁿ. This is often not a serious problem, as we see below.

In order to complete the description of the implementation of the arithmetic of the group G, we need to address two further issues. This is necessary, since the trace representation defined above is not a homomorphism of groups. First, we specify how one can implement the arithmetic of . Since p ≡ 2 (mod 3), X²+X+1 is irreducible over . If is a root of X² + X + 1, we have the standard representation . That is, we can represent . Since 1 + α + α² = 0, we have y₀ + y₁α = (–α – α²)y₀ + y₁α = (y₁ – y₀)α + (–y₀)α². This leads to the non-standard representation

Since p ≡ 2 (mod 3) and α³ = 1 + (α – 1)(α² + α + 1) = 1, the -basis {α, α²} of is the same as the normal basis {α, α^p}. Under this basis, the basic arithmetic operations in can be implemented using only a few multiplications (and some additions/subtractions) in , as described in Table 5.1. Here, the operands are x = x₁α + x₂α², y = y₁α + y₂α² and z = z₁α + z₂α².

Table 5.1. Basic operations in
Operation	Number of multiplications
x^p	0 (since x^p = x₂α + x₁α².)
x²	2 (since x² = x₂(x₂ – 2x₁)α + x₁(x₁ – 2x₂)α².)
xy	3 (since xy = (x₂y₂–x₁y₂–x₂y₁)α + (x₁y₁–x₁y₂–x₂y₁)α², that is, it suffices to compute x₁y₁, x₂y₂, (x₁ + x₂)(y₁ + y₂).)
xz – yz^p	4 (since xz – yz^p = (z₁(y₁ – x₂ – y₂) + z₂(x₂ – x₁ + y₂))α + (z₁(x₁ – x₂ + y₁) + z₂(y₂ – x₁ – y₁))α².)

Now, we explain how arithmetic operations in G translate to those in under the representation of by . To start with, we show how the knowledge of Tr(h) and n allows one to compute Tr(hⁿ) for . This corresponds to an exponentiation in G. For , define the polynomial

where h₁, h₂, are the three roots (not necessarily distinct) of F_c(X). For , we use the notation

Putting c = Tr(g) yields c_n = Tr(gⁿ), or, more generally, for c = Tr(g^k) we have c_n = Tr(g^kn). Algorithm 5.20 computes

given (for example, Tr(g^k)) and (typically ). The correctness of the algorithm is based on the following identities, the derivations of which are left to the reader (alternatively, see Lenstra and Verheul [170]).

Equation 5.1

Equation 5.2

Equation 5.3

Equation 5.4

Equation 5.5

Equation 5.6

Equation 5.7

Equation 5.8

Algorithm 5.20. XTR exponentiation

Input: and .

Output:.

Steps:

if (n < 0) {
   Compute S_–n(c).
   Use Equation (5.3) to compute and return S_n(c).
}
if (n = 0) { Return (c^p, 3, c). }
if (n = 1) { Return (3, c, c² – 2c^p). }
if (n = 2) {
   Compute S₁(c) and hence c₃ using Equation (5.5).
   Return (c₁, c₂, c₃).
}
/* Now n ≥ 3 */

/* Initialize */
k := 1.
Compute S_2k+1(c) = S₃(c) = (c₂, c₃, c₄) from S₂(c) using Equation (5.5).
/* Exponentiation loop */
for j = l – 1, l – 2, . . . , 0 {
   if (m_j = 0) {
      Compute S_4k+1(c) = (c_4k, c_4k+1, c_4k+2) from S_2k+1(c) = (c_2k, c_2k+1, c_2k+2).
      /* Use Equation (5.6) for c_4k and c_4k+2 and Equation (5.7) for c_4k+1 */
   } else {       /* m_j = 1 */
      Compute S_4k+3(c) = (c_4k+2, c_4k+3, c_4k+4) from S_2k+1(c) = (c_2k, c_2k+1, c_2k+2).
      /* Use Equation (5.6) for c_4k+2 and c_4k+4 and Equation (5.8) for c_4k+3 */
   }
   k := 2k + m_j.
}

/* Now k = m and we have computed

if (n is even) {
   Compute S_n(c) = (c_n–1, c_n, c_n+1) from S_n–1(c) = (c_n–2, c_n–1, c_n).
   /* Use Equation (5.5) to compute c_n+1 from S_n–1 */
}

A careful analysis suggests that the computation of c_n from c requires 8 lg n multiplications in . An exponentiation in , on the other hand, requires an expected number of 23.4 lg n multiplications in (assuming that, in , the time for squaring is 80 per cent of that of multiplication). Thus, the XTR representation provides a speed-up of about 3.

XTR key pair

The domain parameters for an XTR cryptosystem include primes p and q satisfying the following requirements:

|q| ≥ 160 (where |a| = ⌈lg a⌉ denotes the bit size of a positive integer a).
|p⁶| ≥ 1024.
p ≡ 2 (mod 3).
q|(p² – p + 1).

We require a generator g of the XTR group G. Since we planned to replace working in G by working in , the element g is not needed explicitly. The trace Tr(g) suffices for our purpose. Lenstra and Verheul [170, 172] describe several methods for obtaining the domain parameters p, q, Tr(g). We describe here the naivest strategies. Algorithm 5.21 outputs the primes p, q with |p| = l_p and |q| = l_q for some given l_p, .

Algorithm 5.21. Generation of XTR primes

Randomly choose such that q := r² – r + 1 is a prime of size |q| = l_q.

Randomly choose such that p := r + kq is a prime with |p| = l_p and p ≡ 2 (mod 3).

Determination of Tr(g) for a suitable g requires some mathematics. First, notice that if the polynomial is irreducible (over ) for some , then c = Tr(h) for some with ord h|(p² – p + 1). Moreover, c_{(p₂ –p+1)/q}, if not equal to 3, is the trace of an element (for example, h^{(p²–p+1)/q}) of order q. Thus, we may take Tr(g) = c_{(p²–p+1)/q}. Although we do not need it explicitly, the corresponding can be taken to be any root of the polynomial F_Tr(g)(X).

What remains to explain is how one can find an irreducible . A randomized algorithm results from the fact that for a randomly chosen the polynomial F_c(X) is irreducible with probability ≈ 1/3.

Once the domain parameters of an XTR system are set, the recipient chooses a random and computes Tr(g^d) using Algorithm 5.20. The tuple (p, q, Tr(g), Tr(g^d)) is the public key and d the private key of the recipient.

XTR encryption

XTR encryption (Algorithm 5.22) is very similar to ElGamal encryption. The only difference is that now we work in under the trace representation of the elements of G, that is, one uses Algorithm 5.20 for computing exponentiations in G.

Algorithm 5.22. XTR encryption

Input: The public key (p, q, Tr(g), Tr(g^d)) of the recipient and the message to be encrypted.

Output: The ciphertext message .

Steps:

Generate a random session key .

Compute r := Tr(g^d′) using Algorithm 5.20 with c := Tr(g) and n := d′.

Compute Tr(g^dd′) using Algorithm 5.20 with c := Tr(g^d) and n := d′.

Set s := m Tr(g^dd′).

XTR decryption

XTR decryption (Algorithm 5.23) is again analogous to ElGamal decryption except that we have to incorporate the XTR representation of elements of G.

Algorithm 5.23. XTR decryption

Input: The private key d of the recipient and the ciphertext .

Output: The recovered plaintext message m.

Steps:

Compute Tr(g^dd′) using Algorithm 5.20 with c := r = Tr(g^d′) and n := d.

Set .

Note that XTR encryption and decryption use Algorithm 5.20 for performing exponentiations. Therefore, these routines run about three times faster than the corresponding ElGamal routines based on the standard arithmetic.

*5.2.8. The NTRU Public-key Encryption Algorithm

Hoffstein et al. [130] have proposed the NTRU encryption scheme in which encryption involves a mixing system using the polynomial algebra and reductions modulo two relatively prime integers α and β. The decryption involves an unmixing system and can be proved to be correct with high probability. The security of this scheme banks on the interaction of the mixing system with the independence of the reductions modulo α and β. Attacks against NTRU based on the determination of short vectors in certain lattices are known. However, suitable choices of the parameters make NTRU resistant to these attacks. The most attractive feature of the NTRU scheme is that encryption and decryption in this case are much faster than those in other known schemes (like RSA, ECC and even XTR).

NTRU key pair

NTRU parameters include three positive integers n, α and β with gcd(α, β) = 1 and with β considerably larger than α (see Table 5.2). Consider the polynomial algebra . An element of is represented as a polynomial f = f₀ + f₁X + · · · + f_n–1X^n–1 or, equivalently, as a vector (f₀, f₁, . . . , f_n–1) of the coefficients. Note that Xⁿ – 1 is not irreducible in (for n ≥ 2) and so R is not a field, but that does not matter for the NTRU scheme. For two polynomials f, g of degree < n and with integer coefficients, we denote by f g the product of f and g in , whereas f and g as elements of R multiplies to f ⊛ g = h with

[View full size image]

Table 5.2. Recommended NTRU parameters
Security	n	α	β	ν_f	ν_g	ν_u
short-term	107	3	64	15	12	5
moderate	167	3	128	61	20	18
standard^[*]	263	3	128	50	24	16
high	503	3	256	216	72	55

^[*] Assumed to be equivalent to 1024-bit RSA

NTRU works with polynomials having small coefficients. More specifically, we define the following subsets of R. The message space (that is, the set of plaintext messages) consists of all polynomials of R with coefficients reduced modulo α. Unlike our representation of so far, we use the integers between –α/2 and +α/2 to represent the coefficients of polynomials in , that is,

For ν₁, , we also define the subset

of R. For suitably chosen parameters ν_f, ν_g and ν_u (see Table 5.2), we use the special notations:

With these notations we are now ready to describe the NTRU key generation routine. The subsets , , and are assumed to be public knowledge (along with the parameters n, α and β).

Algorithm 5.24. NTRU key generation

Input: n, α, β and , as defined above.

Output: A random NTRU key pair.

Steps:

Choose and randomly.

/* f must be invertible modulo both α and β */

Compute f_α and f_β satisfying f_α ⊛ f ≡ 1 (mod α) and f_β ⊛ f ≡ 1 (mod β).

h := f_β ⊛ g (mod β).

Return h as the public key and f (along with f_α) as the private key.

The polynomial f_α can be computed from f during decryption. However, for the sake of efficiency, it is recommended that f_α be stored along with f.

The integers α and β are either small primes or small powers of small primes (Table 5.2). The most time-consuming step in the NTRU key generation procedure is the computation of the inverses f_α and f_β. Suppose we want to compute the inverse of f in , where p is a small prime and e is a small exponent (we may have e = 1). We first compute f(X)^–1 in the ring . Since p is a prime, is a field, that is, is a Euclidean domain (Exercise 2.31). We compute the extended Euclidean gcd of f(X) with Xⁿ – 1. If f(X) and Xⁿ – 1 are not coprime modulo p, then f(X) is not invertible in , else we get and s(X) is the inverse of f(X) in . A randomly chosen f(X) with gcd(f(1), p) = 1 has high probability of being invertible modulo p. Recall that we have chosen , so that f(1) = 1.

If e = 1, we have already computed the desired inverse of f(X). If e > 1, we have to lift the inverse f_p(X) = u(X) of f(X) modulo p to the inverse f_p² (X) of f(X) modulo p², and then to the inverse f_p³ (X) of f(X) modulo p³, and so on. Eventually, we get the inverse f_p^e (X) of f(X) modulo p^e. Here we describe the generic lift procedure of f_p^k (X) to f_p^k+1 (X). In the ring , we have f_p^k ⊛ f ≡ 1 (mod p^k). We can write f_p^k+1 (X) = f_p^k (X) + p^ka(X) for some . Substituting this value in f_p^k+1 ⊛ f ≡ 1 (mod p^k+1) gives the unknown polynomial a(X) as

where s(X) = f_p(X) is the inverse of f modulo p.

It is often recommended that f(X) be taken of the form for some . In this case, f_α(X) = 1 is trivially available and need not be computed as mentioned above. Such a choice of f also speeds up NTRU decryption (see Algorithm 5.26) by reducing the number of polynomial multiplications from two to one. The inverse f_β, however, has to be computed (but need not be stored).

NTRU encryption

For NTRU encryption (Algorithm 5.25 ), the message is encoded to a polynomial in . The costliest step in this algorithm is computing the product u ⊛ h and can be done in time O(n²). Asymptotically better running time (O(n log n)) is achievable by Algorithm 5.25, if one uses faster polynomial multiplication routines (like those based on fast Fourier transforms). However, for the cryptographic range of values of n, straightforward quadratic multiplication gives better performance. Most other encryption schemes (like RSA) take time O(n³), where n is the size of the modulus. This explains why NTRU encryption is much faster than conventional encryption routines.

Algorithm 5.25. NTRU encryption

Input: (n, α, β and) the NTRU public key h of the recipient and the plaintext message .

Output: The ciphertext c which is a polynomial in R, reduced modulo β.

Steps:

Randomly select .

c := αu ⊛ h + m (mod β).

NTRU decryption

NTRU decryption (Algorithm 5.26) involves two multiplications in R and runs in time O(n²). In order to prove the correctness of Algorithm 5.26, one needs to verify that v ≡ αu ⊛ g + f ⊛ m (mod β). With an appropriate choice of the parameters, it can be ensured that almost always the polynomial has coefficients in the interval –β/2 and +β/2. In that case, we have the equality v = αu ⊛ g + f ⊛ m in R. Multiplication of v by f_α and reduction modulo α now clearly retrieves m.

Algorithm 5.26. NTRU decryption

Input: The NTRU private key f (and f_α) of the recipient and the ciphertext message c.

Output: The recovered plaintext message .

Steps:

v := f ⊛ c (mod β).

/* The coefficients of v are chosen to lie between –β/2 and +β/2 */

m := f_α ⊛ v (mod α).

If f is chosen to be of the special form f = 1 + αf₁ (for some polynomial f₁), then v = αu ⊛ g + αf₁ ⊛ m + m. Thus, reduction of v modulo α straightaway gives m, that is, there is no need to multiply v by f_α. Also f_α (having the trivial value 1) need not be stored in the private key. To sum up, taking f to be of the above special form increases the efficiency of the NTRU scheme without (seemingly) affecting its security. But now f is no longer an element of and some care should be taken to choose suitable values of f.

NTRU decryption fails, usually when m is not properly centred (around 0). In that case, representing v as a polynomial with coefficients in the range –β/2 + x and +β/2 + x for a small positive or negative value of x may result in correct decryption. If, on the other hand, no values of x work, NTRU decryption cannot recover m easily and is said to suffer from a gap failure. For suitable parameter values, gap failures are very unlikely and can be ignored for all practical purposes.

Now, let us see how the NTRU system can be broken. In order to find out the private key f from the public key h = f_β ⊛ g, one may keep on searching for exhaustively, until . Alternatively, one may try all , until . In a similar manner, m can be retrieved from c by trying all , until . Clearly, such an attack takes expected time proportional to the size of or or .

A baby-step–giant-step strategy reduces the running times to the square roots of the sizes of the above sets. For example, suppose we want to compute f from h. We split f = f₁ + f₂ into two nearly equal pieces f₁ and f₂. If n is odd, f₁ may contain the (n + 1)/2 most significant terms and f₂ the (n – 1)/2 least significant terms of f. Now, we compute (f₂, –f₂ ⊛ h (mod β)) for all possibilities of f₂ and store the pairs sorted by the second component. Next, for each possibility of f₁ (baby step) we compute f₁ ⊛ h (mod β) and see if there is any f₂ (giant step) for which f₁ ⊛ h (mod β) and –f₂ ⊛ h (mod β) have nearly equal values. If a matching pair (f₁, f₂) is located, we take f = f₁ + f₂. A similar method works for guessing m from c.

It is necessary to take the sets , and big enough, so that exhaustive or square root attacks are not feasible. Typically, choosing the sizes of these sets to be ≥ 2¹⁶⁰ is deemed sufficiently secure.

Another relevant attack is discussed in Exercise 5.11. By far, the most sophisticated attack on the NTRU encryption scheme is based on finding short vectors in a lattice. We describe this attack in connection with the computation of the private key f from a knowledge of the private key h. Let L denote the lattice in generated by the rows of the 2n × 2n matrix:

where h = h₀ + h₁X + · · · + h_n–1X^n–1 = (h₀, h₁, . . . , h_n–1) and where λ is a parameter whose choice is discussed below. Since h = g ⊛ f^–1 (mod β), multiplying the i-th row by f_i–1 (i = 1, . . . , n) and adding we conclude that the vector v := (λf₀, λf₁, . . . , λf_n–1, g₀, g₁, . . . , g_n–1) is in L. By tuning the value λ, the attacker maximizes the chance for v to be a short vector in L. However, if the system parameters are appropriately selected, lattice reduction algorithms become rather ineffective in finding v. Heuristic evidences suggest that this attack runs in time exponential in n.

Exercise Set 5.2

5.1

Establish the correctness of Algorithm 5.4.

5.2

Assume that the same message m is encrypted using the RSA algorithm and using the public keys (n₁, e), . . . , (n_e, e) of e entities each of which has the same encryption exponent e. Assume further that the moduli n₁, . . . , n_e are pairwise coprime. Specify a method by which an adversary can reconstruct the message m from a knowledge of the ciphertext messages c₁, . . . , c_e. [H]
How can such an attack be prevented? [H]

5.3

Let n, . How many solutions does the polynomial X^e – X have in ? [H]
In particular, conclude that if n = pq is an RSA modulus and e is the encryption exponent, there exist gcd(e – 1, p – 1) × gcd(e – 1, q – 1) messages m for which m^e ≡ m (mod n). Such messages are often called unconcealed. The number of unconcealed messages for random parameters n and e is, in general, vanishingly low compared to n.

5.4

Assume that two parties Bob and Barbara share a common RSA modulus n but relatively prime encryption exponents e₁ and e₂. Alice encrypts the same message by (n, e₁) and (n, e₂) and sends the ciphertext messages to Bob and Barbara respectively. Suppose also that Carol intercepts both the ciphertexts. Describe a method by which Carol retrieves the (common) plaintext. [H]

5.5

Let n = pq be a Rabin public key and let

be a quadratic residue modulo n. Show that the knowledge of the four square roots of c modulo n breaks the Rabin system.

5.6

What is the disadvantage of using the same session key in the ElGamal encryption scheme for encrypting two different messages (for the same recipient)? [H]

5.7

Let p be an odd prime and g a generator of

Show that the set S := {g²ⁱ | i = 0, 1, . . . , (p – 3)/2} is precisely the set of all quadratic residues modulo p. Show also that S is a subgroup of .
Assume that y ≡ g^x (mod p) for some . Show that the least significant bit of x is 0 or 1 according as whether y^(p–1)/2 is congruent to 1 or –1 modulo p respectively. Thus, it is easy to determine from y the least significant bit of the discrete logarithm x = ind_g y.
Assume that p ≡ 3 (mod 4) and that p, g, y are only known (but x is not known). Suppose further that there is an oracle (a black box) that, given , returns the second least significant bit of ind_g z. Show that x = ind_g y can be easily computed by making a polynomial (in log p) number of calls to this oracle. [H]

5.8

Show that if the private-key parameters f(X) and d are known to a cryptanalyst of the Chor–Rivest scheme, she can recover the other parts of the private key and thus break the system completely. [H]

5.9

Show that if f(X) is only known to a cryptanalyst of the Chor–Rivest scheme, then also she can recover the full private key. [H]

5.10

Derive the identities of Equations (5.1) through (5.8) (p 325).
With the notations of Section 5.2.7 deduce that:
c₃ = c³ – 3c^p+1 + 3.
c₄ = c⁴ – 4c^p+2 + 2c^2p + 4c.

5.11

In this exercise, we use the notations of Section 5.2.8. Assume that Alice encrypts the same message m several times using the NTRU public key h of Bob, but with different random polynomials

, i = 1, . . . , r, and sends the corresponding ciphertext messages c₁, . . . , c_r. Describe a strategy how an eavesdropper Carol can recover a considerable part of u₁. [H] Trying all the possibilities for the (relatively small) unknown part of u₁ allows Carol to retrieve m with little effort.

5.3. Key Exchange

Consider the scenario wherein two parties Alice and Bob want to share a secret information (say, a DES key for future correspondence), but it is not possible to communicate this secret by personal contact or by conversing over a secure channel. In other words, Alice and Bob want to arrive at a common secret value by communicating over a public (and hence insecure) channel. A key-exchange or a key-agreement protocol allows Alice and Bob to do so. The protocol should be such that an eavesdropper listening to the conversation between Alice and Bob cannot compute the secret value in feasible time.

Public-key technology is used to design a key-exchange protocol in the following way. Alice generates a key pair (e_A, d_A) and sends the public key e_A to Bob. Similarly, Bob generates a random key pair (e_B, d_B) and sends the public key e_B to Alice. Now, Alice and Bob respectively compute the values s_A = f(e_B, d_A) and s_B = f(e_A, d_B) using their respective knowledges, where f is a suitably chosen function. If s_A = s_B, then this value can be used as the shared secret between Alice and Bob. The intruder Carol can intercept e_A and e_B, but f should be such that a knowledge of e_A and e_B alone does not allow Carol to compute s_A = s_B. She needs d_A or d_B for this computation. Since (e_A, d_A) and (e_B, d_B) are key pairs, we assume that it is infeasible to compute d_A from e_A or d_B from e_B.

In what follows, we describe some key-exchange protocols. The security of these protocols is dependent on the intractability of the DHP (or the DLP). We provide a generic description, where we work in a finite Abelian multiplicative group G of order n. We write the identity of G as 1. G need not be cyclic, but we assume that an element having suitably large (and preferably prime) multiplicative order m is provided. G, g, n and m may be made publicly available, but G should be a group in which one cannot compute discrete logarithms in feasible time. Typical examples of G are given in Section 5.2.5.

5.3.1. Basic Key-Exchange Protocols

Basic key-exchange protocols provide provable security against passive attacks under the intractability of the DHP. However, several models of active attacks are known for the basic protocols. One requires authentication (validation of the public keys) to eliminate these attacks.

The Diffie–Hellman key-exchange protocol

The Diffie–Hellman (DH) key-exchange algorithm [78] is one of the pioneering discoveries leading to the birth of public-key cryptography.

Algorithm 5.27. Diffie–Hellman key exchange

Input: G, g, n and m as defined above.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice generates a random and computes e_A := g^d_A.

Alice sends e_A to Bob.

Bob generates a random and computes e_B := g^d_B.

Bob sends e_B to Alice.

Alice computes s := (e_B)^d_A = g^d_Ad_B.

Bob computes s := (e_A)^d_B = g^d_Ad_B.

if (s = 1) { Return “failure”. }

The DH scheme fails, if the shared secret turns out to be a trivial element (like the identity) of G. In that case, Alice and Bob should re-execute the protocol with different key pairs. The probability of such an incident is, however, extremely low.

The intruder Carol learns the group elements g^d_A and g^d_B by listening to the conversation between Alice and Bob and intends to compute s = g^d_Ad_B. Thus, she has to solve an instance of the DHP in the group G. By assumption, this is computationally infeasible. This is how the DH scheme derives its security.

Small-subgroup attacks

A small-subgroup attack on the DH protocol can be mounted by an active adversary. Assume that the order m of g in G is composite and has known factorization m = uv with u small. Carol intercepts the messages between Alice and Bob, replaces them by their respective v-th powers and retransmits the modified messages.

Algorithm 5.28. A small-subgroup attack by an active eavesdropper

Alice generates a random and computes e_A := g^d_A.

Alice transmits e_A for Bob.

Carol intercepts e_A, computes and sends to Bob.

Bob generates a random and computes e_B := g^d_B.

Bob transmits e_B for Alice.

Carol intercepts e_B, computes and sends to Alice.

Alice computes .

Bob computes .

if (s′ = 1) { Return “failure”. }

But ord g = uv and so (s′)^u = 1, that is, s′ has only u – 1 non-trivial values. Since u is small, the possibilities for s′ can be exhaustively searched by Carol. The best countermeasure against this attack is to take m to be a prime (of bit length ≥ 160).

Even when m is prime, it may be the case that the cofactor k := n/m has a small divisor u and it is possible that an active attacker intervenes in such a way that Alice and Bob agree upon a secret value of order (equal to or dividing) u. For example, Carol may replace both the transmitted public keys by an element h of order u. If d_A and d_B are congruent modulo u, the shared secret has only a few possible values and Carol can obtain the correct value by exhaustive search. On the other hand, if d_A ≢ d_B (mod u), Alice and Bob do not come up with the same secret. However, if Alice uses her secret to encrypt a message for Bob, it remains easy for Carol to decrypt the intercepted ciphertext by trying only a few choices for Alice’s key. Alice and Bob can prevent this attack by refusing to accept as the shared secret not only the trivial value s = 1 but also elements of small orders.

A small-subgroup attack can also be mounted by one of the communicating parties (say, Bob) in an attempt to gain information about the other’s (Alice’s) secret d_A. Let us continue to assume that the cofactor k := n/m has a small divisor u. Bob finds an element h in G of order u. Instead of e_B = g^d_B Bob now sends h to Alice. Alice computes the shared secret as . Bob, on the other hand, can normally compute s_B := g^d_Ad_B. Now, suppose that Alice uses a symmetric cipher with the key (or some part of it) and sends the ciphertext to Bob. In order to decrypt, Bob tries all of the u possible keys s_Bh^j for j = 0, 1, . . . , u – 1. The value of j for which decryption succeeds equals d_A modulo u. A similar attack can be mounted by Bob, when is chosen to be an element (like h itself) of order u.

If G is cyclic and H is the subgroup generated by g, then an element is in H if and only if a^m = 1 (Proposition 2.5, p 27). Moreover, if gcd(k, m) = 1, each communicating party can check the validity of the other party’s public key by using an m-th power exponentiation. An element like h or h of the last paragraph does not pass this test. If so, Alice should abandon the protocol. However, the validation of the public key requires a modular exponentiation and thereby slows down the protocol.

Cofactor exponentiation

We now present an efficient modification of the basic Diffie–Hellman scheme that prevents small-subgroup attacks (by a communicating party or an eavesdropper) without calculating an extra exponentiation. We continue with the notation k := n/m and assume that k is coprime to m. Now, the shared secret is computed as g^d_Ad_B or g^kd_Ad_B depending on whether compatibility with the original DH scheme is desired or not. Algorithm 5.29 describes the modified DH algorithm. Solve Exercise 5.12 in order to establish the effectiveness of this algorithm against small-subgroup attacks.

5.3.2. Authenticated Key-Exchange Protocols

Other active attack models on the (basic or modified) DH protocol can be conceived of. One important class of attacks is now described.

Unknown key-share attacks

An unknown key-share attack on a key-exchange protocol makes a party believe that (s)he shares a secret with another party, whereas the secret is actually shared by a third party. Assume that Carol can monitor and modify every message between Alice and Bob. When Alice and Bob execute Algorithm 5.27 or 5.29, Carol can intervene and pretend to Alice that she is Bob and to Bob that she is Alice. At the end of the protocol, Alice and Carol come up with a shared secret s_AC, and Bob and Carol with another shared secret s_BC. Alice believes that she shares s_AC with Bob, and Bob believes that he shares s_BC with Alice.

Algorithm 5.29. Diffie–Hellman key exchange with cofactor exponentiation

Input: G, g, n, m and k as defined above and a flag indicating compatibility with the original DH scheme.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice generates a random and computes e_A := g^d_A.

Alice sends e_A to Bob.

Bob generates a random and computes e_B := g^d_B.

Bob sends e_B to Alice.

if (compatibility with the original DH algorithm is desired) {
   Alice assigns δ_A := k^–1d_A (mod m).
   Bob assigns δ_B := k^–1d_B (mod m).
} else {
   Alice assigns δ_A := d_A (mod m).
   Bob assigns δ_B := d_B (mod m).
}
Alice computes s := (e_B)^kδ_A.
Bob computes s := (e_A)^kδ_B.
if (s = 1) { Return “failure”. }

Now, when Alice wants to send a secret message m to Bob, she encrypts m by s_AC and transmits the ciphertext c. Carol intercepts c, decrypts it by s_AC to retrieve m, encrypts m by s_BC and sends the new ciphertext c′ to Bob. Bob retrieves m by decrypting c′ with his key s_BC. The process raises hardly any suspicion in Alice or Bob about the existence of the mediating third party.

In order to avoid this attack, Alice and Bob should each validate the authenticity of the public key of the other party. Public-key certificates can be used to this effect. Unfortunately, using certificates alone may fail to eliminate unknown key-share attacks, as Algorithm 5.30 shows. At the end of this protocol Alice and Bob share a secret s, but Bob believes that he shares it with (the intruder) Carol. Here Carol herself cannot compute the shared secret s (provided that computing discrete logs in G is infeasible). Still there may be situations where this attack can be exploited (see Law et al. [161] for a hypothetical example).

This attack has two potential problems. Under the assumption of intractability of the DLP in G, Carol cannot compute the private key corresponding to the public key e_C and so her getting the certificate Cert_C knowing e_C alone may be questioned. Furthermore, replacing (e_B, Cert_B) to ((e_B)^d, Cert_B) may make the certificate invalid. If we assume that a certificate authenticates only the entity and not the public key, then these objections can be overruled. In practice, however, a public key certificate should bind the public key to an entity (who can prove the knowledge of the corresponding private key) and so the above attack cannot be easily mounted. Nonetheless, the need for stronger authenticated key-exchange protocols is highlighted by the attack.

Algorithm 5.30. An unknown key-share attack

Alice generates a random and computes e_A := g^d_A.

Alice gets the certificate Cert_A on e_A from the certifying authority.

Alice transmits (e_A, Cert_A) for Bob.

Carol intercepts (e_A, Cert_A).

Carol chooses a random .

Carol gets the certificate Cert_C on e_C := (e_A)^d from the certifying authority.

Carol sends (e_C, Cert_C) to Bob.

Bob generates a random and computes e_B := g^d_B.

Bob gets the certificate Cert_B on e_B from the certifying authority.

Bob sends (e_B, Cert_B) to Carol.

Carol transmits ((e_B)^d, Cert_B) to Alice.

Alice computes s = ((e_B)^d)^d_A = g^dd_Ad_B.

Bob computes s = (e_C)^d_B = ((e_A)^d)^d_B = g^dd_Ad_B.

The Menezes–Qu–Vanstone key-exchange protocol

The Menezes–Qu–Vanstone (MQV) key-exchange protocol is an improved extension of the basic DH scheme, that incorporates public-key authentication. Though the achievement of the desired security goals by the MQV protocol does not seem to be provable, heuristic arguments suggest the effectiveness of the protocol against active adversaries.

Once again, let Alice and Bob be the two parties who plan to agree on a secret element , where the domain parameters G, g, n and m are chosen as in the basic DH scheme. In the MQV scheme, each entity uses two key pairs, one of which ((E_A, D_A) for Alice and (E_B, D_B) for Bob) is called the static or the long-term key pair, whereas the other ((e_A, d_A) for Alice and (e_B, d_B) for Bob) is called the ephemeral or the short-term key pair. The static key is bound to an entity for a certain period of time and is used in every invocation of the MQV protocol during that period. On the other hand, each entity generates and uses a new ephemeral key pair during each invocation of the protocol. The static key of an entity is assumed to be authentic, say, certified by a trusted authority. The ephemeral key, on the other hand, is validated using the static private key.

Assume that there is a (publicly known) function . Let l := ⌊lg m⌋ + 1 denote the bit length of m = ord g. For , let denote the integer . The bit size of is about half of that of m. In particular, (mod m) for all .

In the MQV protocol, Alice and Bob each computes the shared secret s = g^σ_Aσ_B, where and . Here the exponents σ_A and σ_B bear the implicit signatures of Alice and Bob, impressed by their respective static private keys. Alice can compute , since she knows the static public key E_B and the ephemeral public key e_B of Bob. Similarly, Bob can compute from a knowledge of the public keys E_A and e_A of Alice. We summarize the steps in Algorithm 5.31.

Algorithm 5.31. MQV key exchange

Input: G, g, n and m as defined above.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice obtains Bob’s static public key E_B.

Bob obtains Alice’s static public key E_A.

Alice generates a random integer d_A, 2 ≤ d_A ≤ m – 1, and computes e_A := g^d_A.

Alice sends e_A to Bob.

Bob generates a random integer d_B, 2 ≤ d_B ≤ m– 1, and computes e_B := g^d_B.

Bob sends e_B to Alice.

Alice computes (mod m).

Alice computes .

Bob computes (mod m).

Bob computes .

if (s = 1) { Return “failure”. }

Each participating entity using the MQV protocol performs three exponentiations in G. Alice computes g^d_A, and , of which the first and the last ones have exponents O(m). On the other hand, is , so that the middle exponentiation is about twice as fast as a full exponentiation. This performance benefit justifies the use of and instead of e_A and e_B themselves. It appears that using these half-sized exponents does not affect security. Also note that (mod m), which implies a non-zero contribution of the static key D_A in the expression σ_A. Similarly for σ_B.

In order to guard against small-subgroup attacks, the MQV algorithm can incorporate the cofactor k := n/m, that is, assuming gcd(k, m) = 1, the shared secret would now be g^σ_Aσ_B or g^kσ_Aσ_B, depending on whether compatibility with the original MQV method is desired or not.

The MQV algorithm can be used in a situation when only one party, say, Alice, is capable of initiating a transmission to the other party (Bob). In that case, Bob’s static key pair is used also as his ephemeral key pair, that is, the secret element shared between Alice and Bob is .

See Raymond and Stiglic [250] to know more about the security issues for the DH key agreement protocol and its variants.

Exercise Set 5.3

5.12	Let G be a multiplicative Abelian group of order n and with identity 1, H the subgroup of G generated by an element of order n, k := n/m and gcd(k, m) = 1. Further let a be a non-identity element of G. Prove that if a^k = 1, then a ∉ H. (The converse of this statement is not true in general, even when G is cyclic. However, if a is an element of small order dividing k, we obviously have a^k = 1.) Explain how the modified Diffie–Hellman protocol (Algorithm 5.29) prevents an active attack by Bob described in connection with small-subgroup attacks.
5.13	Write the MQV key-exchange protocol with cofactor exponentiation.
5.14	Provide the details of the Diffie–Hellman key-exchange algorithm based on the XTR representation (Section 5.2.7).

5.4. Digital Signatures

Suppose an entity (Alice) is required to be bound to some electronic data (like messages or documents or keys). This binding is achieved by Alice digitally signing the data in such a way that no party other than Alice would be able to generate the signature. The signature should also be such that any entity can easily verify that it was Alice who generated the signature. Digital signatures can be realized using public-key techniques. The entity (Alice) generating a digital signature is called the signer, whereas anybody who wants to verify a signature is called a verifier.

We have seen in Section 5.2 how the encryption and decryption transforms f_e, f_d achieve confidentiality of sensitive data. If the set of all possible plaintext messages is the same as the set of all ciphertext messages and if f_e and f_d are bijective maps on that set, then the sequence of encryption and decryption can be reversed in order to realize a digital signature scheme. In order to sign m, Alice uses her private key d and the transform f_d to generate s = f_d(m, d). Any party who knows the corresponding public key e can recover m as m = f_e(s, e). This is broadly how a signature scheme works. Depending on how the representative m is generated from the message M that Alice wants to sign, signature schemes can be classified in two categories.

Signature scheme with message recovery

In this case, one takes m = M. Verification involves getting back the message M. If M is assumed to be (the encoded version of) some human-readable text, then the recovered M = f_e(s, e) will also be human-readable. If s is forged, that is, if a private key d′ ≠ d has been used to generate s′ = f_d(m, d′), then verification using Alice’s public key yields m′ = f_e(s′, e), and typically m′ ≠ m, since d′ and e are not matching keys. The resulting message m′ will, in general, make little or no sense to a human reader. If m is not a human-readable text, one adds some redundancy to it before signing. A forged signature yields m′ during verification, which, with high probability, is expected not to have this redundancy.

Attractive as it looks, it is not suitable if M is a long message. In that case, it is customary to break M into smaller pieces and sign each piece separately. Since public-key operations are slow, signature generation (and also verification) will be time-consuming, if there are too many pieces to sign (and verify). This difficulty is overcome using the second scheme described now.

Signature scheme with appendix

In this scheme, a short representative m = H(M) of M is first computed.^[2] The function H is usually chosen to be a hash function, that is, one which converts bit strings of arbitrary length to bit strings of a fixed length. H is assumed to be a public knowledge, that is, anybody who knows M can compute m. We also assume that H(M) can be computed fast for messages M of practical sizes. Alice uses the decryption transform on m to generate s = f_d(m, d). The signature now becomes the pair (M, s). A verifier obtains Alice’s public key e and checks if H(M) = f_e(s, e). The signature is taken to be valid if and only if equality holds. If a forger uses a private key d′ ≠ d, she generates a signature (M, s′), s′ = f_d(m, d′), on M and a verifier expects with high probability the inequality H(M) ≠ f_e(s′, e).

^[2] If M is already a short message, one may go for taking m = M. In order to promote uniform treatment, we assume that the function H is always applied for the generation of m. Use of H is also desirable from the point of security considerations (Exercise 5.15).

A kind of forgery is possible on signature schemes with appendix. Assume that Alice creates a valid signature (M, s), s = f_d(H(M), d), on a message M. The function H is certainly not injective, since its input space is much bigger (infinite) than its output space (finite). Suppose that Carol finds a message M′ ≠ M with H(M′) = H(M). In that case, the pair (M′, s) is a valid signature of Alice on the message M′, though it is not Alice who has generated it. (Indeed it has been generated without the knowledge of the private key d of Alice.) In order to foil such attacks, the function H should have second pre-image resistance. The first pre-image resistance and collision resistance properties of a hash functions also turn out to be important in the context of digital signatures. See Sections 1.2.6 and A.4 to know about hash functions.

We now describe some specific algorithms for (generating and verifying) digital signatures. Key pairs used for these algorithms are usually identical to those used for encryption algorithms of Section 5.2 and, therefore, we refrain from a duplicate description of the key-generation procedures. We focus our discussion only on signature schemes with appendix.

5.4.1. The RSA Digital Signature Algorithm

As in the RSA encryption scheme of Section 5.2.1, each entity generates an RSA modulus n = pq, which is the product of two distinct large primes p and q. A key pair consists of an encryption exponent e (the public key) and a decryption exponent d (the private key) satisfying ed ≡ 1 (mod φ(n)).

RSA signature generation involves a modular exponentiation in the ring .

Algorithm 5.32. RSA signature generation

Input: A message M to be signed and the signer’s private key (n, d).

Output: The signature (M, s) on M.

Steps:

m := H(M). /* is the short representative of M */
s := m^d (mod n).

Signature generation can be speeded up if the parameters p, q, d₁ := d rem (p – 1), d₂ := d rem (q – 1) and h := q^–1 (mod p) are stored (secretly) in the private key. Now, one can use Algorithm 5.4 for signature generation.

The verification routine also involves a modular exponentiation in .

Algorithm 5.33. RSA signature verification

Input: A signature (M, s) and the signer’s public key (n, e).

Output: Verification status of the signature.

Steps:

m := H(M). /* is the short representative of M */
(mod n).
if { Return “Signature verified”. }
else { Return “Signature not verified”. }

Small values of e speed up RSA signature verification and are not known to make the scheme suffer from some special attacks. So the values of e like 3, 257 and 65,537 are quite recommended.

5.4.2. The Rabin Digital Signature Algorithm

As in the Rabin encryption algorithm, we choose two distinct large primes p and q of nearly equal sizes and take n = pq. The public key is n, whereas the private key is the pair (p, q). The Rabin signature scheme is based on the intractability of computing square roots modulo n in absence of the knowledge of the prime factors p and q of n.

Rabin signature generation involves finding a quadratic residue m modulo n as a representative of the message M and computing a square root of m modulo n.

Algorithm 5.34. Rabin signature generation

Input: A message M to be signed and the signer’s private key (p, q).

Output: The signature (M, s) on M.

Steps:

m := H(M). /* is assumed to be a quadratic residue modulo n */

Compute a square root s₁ of m modulo p.	/* Algorithm 3.17 */
Compute a square root s₂ of m modulo q.	/* Algorithm 3.17 */
Compute satisfying s ≡ s₁ (mod p) and s ≡ s₂ (mod q).	/* CRT */

Verification (Algorithm 5.35) involves a square operation in .

Algorithm 5.35. Rabin signature verification

Input: A signature (M, s) and the signer’s public key n.

Output: Verification status of the signature.

Steps:

m := H(M).

is a quadratic residue modulo n */

(mod n).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.3. The ElGamal Digital Signature Algorithm

The ElGamal signature algorithm is based on the intractability of computing discrete logarithms in certain groups G. For a general description, we consider an arbitrary (finite Abelian multiplicative) group G of order n. We assume that G is cyclic and that a generator g of G is provided. A key pair is obtained by selecting a random integer (the private key) d, 2 ≤ d ≤ n – 1, and then computing g^d (the public key). The hash function H is assumed to convert arbitrary bit strings to elements of . We further assume that the elements of G can be identified as bit strings (on which the hash function H can be directly applied). G (together with its representation), g and n are considered to be public knowledge and are not input to the signature generation and verification routines.

ElGamal signatures are generated as in Algorithm 5.36. The appendix consists of a pair .

Algorithm 5.36. ElGamal signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key d′, 2 ≤ d′ ≤ n – 1.

s := g^d′.

t := d^′–1 (H(M) – dH(s)) (mod n).

The costliest step in the ElGamal signature generation algorithm is the exponentiation g^d′. Here, G is assumed to be cyclic and the exponent d′ to be O(n). We will shortly see modifications of the ElGamal scheme in which the exponent can be chosen to be much smaller, namely O(r), where r is a suitably large (prime) divisor of n.

In order to forge a signature, Carol can generate a random session key (d′, g^d′) and obtain s. For the computation of t, she requires the private key d of the signer. Conversely, if t (and d′) are available to Carol, she can easily compute the private key d. Thus, forging an ElGamal signature is equivalent to solving the DLP in G.

Each invocation of the ElGamal signature generation algorithm must use a new session key (d′, g^d′). If the same session key (d′, g^d′) is used to generate the signatures (M₁, s₁, t₁) and (M₂, s₂, t₂) on two different messages M₁ and M₂, then we have (t₁ – t₂)d′ ≡ H(M₁) – H(M₂) (mod n), whence d′ can be computed, provided that gcd(t₁ – t₂, n) = 1. If d′ is known, the private key d can be easily computed (see Exercise 5.6 for a similar situation).

ElGamal signature verification is described in Algorithm 5.37. This is based on the observation that for a (valid) ElGamal signature (M, s, t) on a message M we have . This verification calls for three exponentiations in G to full-size exponents. Working in a suitable (cyclic) subgroup of G makes the algorithm more efficient.

Algorithm 5.37. ElGamal signature verification

Input: A signature (M, s, t) and the signer’s public key g^d.

Output: Verification status of the signature.

Steps:

a₁ := g^H(M).

a₂ := (g^d)^H(s)s^t.

if (a₁ = a₂) { Return “Signature verified”. }

else { Return “Signature not verified”. }

ElGamal signatures use a congruence of the form A ≡ dB + d′C (mod n), and verification is done by checking the equality g^A = (g^d)^Bs^C. Our choice for A, B and C was A = H(M), B = H(s) and C = t. Indeed, any permutation of H(M), H(s) and t are acceptable as A, B, C. These give rise to several variants of the ElGamal scheme. It is also allowed to take as A, B, C any permutation of H(M)H(s), t, 1 or H(M)H(s), H(M)t, 1 or H(M)H(s), H(s)t, 1 or H(M)t, H(s)t, 1. Permutations of H(M)H(t), H(s), 1 or H(M), H(s)t, 1, on the other hand, are known to have security bugs. For any allowed combination of A, B, C, the choices ±A, ±B, ±C are also valid. For some other variants, see Horster et al. [132].

5.4.4. The Schnorr Digital Signature Algorithm

The Schnorr signature scheme is a modification of the ElGamal scheme and is faster than the ElGamal scheme, since it works in a subgroup of G generated by g of small order. We assume that r := ord g is a prime (though it suffices to have ord g possessing a suitably large prime divisor). We suppose further that the elements of G are represented as bit strings and that we have a hash function H that maps bit strings to elements of . A key pair now consists of an integer d (the private key), 2 ≤ d ≤ r – 1, and the element g^d (the public key).

Schnorr signature generation is described in Algorithm 5.38.

Algorithm 5.38. Schnorr signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, g^d′), 2 ≤ d′ ≤ r – 1.

s := H(M‖g^d′).	/* Here ‖ denotes string concatenation */
t := d′ – ds (mod r).

Similar to the ElGamal scheme, the most time-consuming step in this routine is the computation of the session public key g^d′. But now d′ < r and, therefore, Algorithm 5.38 runs faster than Algorithm 5.36. One can easily check that forging a signature of Alice is computationally equivalent to determining Alice’s private key d from her public key g^d. The importance of using a new session key pair in each run of Algorithm 5.38 is exactly the same as in the case of ElGamal signatures.

The verification of Schnorr signatures (Algorithm 5.39) is based upon the fact that g^t = g^d′(g^d)^–s. Thus, the knowledge of g, s, t and g^d allows one to compute g^d′ and subsequently H(M ‖g^d′). The algorithm involves two exponentiations with both the exponents (t and s) being ≤ r. Thus, signature verification is also faster in the Schnorr scheme than in the ElGamal scheme.

Algorithm 5.39. Schnorr signature verification

Input: A signature (M, s, t) and the signer’s public key g^d.

Output: Verification status of the signature.

Steps:

u := g^t(g^d)^s.

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.5. The Nyberg–Rueppel Digital Signature Algorithm

The Nyberg–Rueppel (NR) signature algorithm is another adaptation of the ElGamal signature scheme and is based on the intractability of solving the DLP in a group G. We assume that ord G = n has a large prime divisor r and that an element of order r is available. Here, a key pair is of the form (d, g^d), where the private key d is an integer between 2 and r – 1 (both inclusive) and where the public key g^d is an element of 〈g〉. The hash function H converts bit strings to elements of . We also assume the existence of a (publicly known) function .

NR signature generation can be performed as in Algorithm 5.40.

Algorithm 5.40. Nyberg–Rueppel signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, g^d′), 2 ≤ d′ ≤ r – 1.

s := H(M) + F(g^d′) (mod r).

t := d′ – ds (mod r).

The only difference between NR signature generation and Schnorr signature generation is the way how s is computed. Therefore, whatever we remarked in connection with the security and the efficiency of the Schnorr scheme applies equally well to the NR scheme. Signature verification is also very analogous, as Algorithm 5.41 explains.

Algorithm 5.41. Nyberg–Rueppel signature verification

Input: A signature (M, s, t) and the signer’s public key g^d.

Output: Verification status of the signature.

Steps:

u := g^t(g^d)^s.

(mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.6. The Digital Signature Algorithm (DSA)

The digital signature algorithm (DSA) has been proposed as a standard by the US National Institute of Standards and Technology (NIST) and later accepted as a Federal Information Processing Standard (FIPS) by the US government. This standard is also known as the digital signature standard (DSS). See the NIST document [220] for a complete description of this standard.

Algorithm 5.42. Generation of DSA primes

Input: An integer λ, 0 ≤ λ ≤ 8.

Output: A prime p of bit length l := 512+64λ such that p – 1 has a prime divisor r of length 160 bits.

Steps:

Let l – 1 = 160n + b, 0 ≤ b < 160.     /* n = (l–1) quot 160, b = (l–1) rem 160. */
while (1) {
   do {
       Choose a random seed σ which is a bit string of length k ≥ 160.
       Compute the bit string u := H(σ) ⊕ H((σ + 1) rem 2^k).
       r := u OR 2¹⁵⁹ OR 1.    /* Set the most and least significant bits of u */
   } while (r is not a prime).
   i := 0, f := 2.
   while (i < 4096) {
       for j = 0, 1, . . . , n { v_j := H((σ + f + j) rem 2^k). }
       v := v₀ + v₁2¹⁶⁰ + · · · + v_n–12^160(n–1) + (v_n rem 2^b)2¹⁶⁰ⁿ + 2^l–1.
                                                     /* v is an integer of bit length exactly l */
       p := v – (v rem 2r) + 1.   /* p – 1 is a multiple of 2r */
       if (p is prime) { Return (p, r). }
       i++, f := f + n + 1.
   }
}

DSA is based on the intractability of the DLP in the finite field , where p is a prime of bit length 512+64λ with 0 ≤ λ ≤ 8. The cardinality p–1 of is required to have a prime divisor r of length (exactly) 160 bits. The NIST document [220] specifies a standard method for obtaining such a field , which we describe in Algorithm 5.42. We denote by H the SHA-1 hash function that converts bit strings of arbitrary length to bit strings of length 160. We will identify (often without explicit mention) the bit string a₁a₂. . . a_k of length k with the integer a₁2^k–1 + a₂2^k–2 + · · · + a_k–12 + a_k.

The DSA prime generation procedure (Algorithm 5.42) starts by selecting the prime divisor r and then tries to find a prime p such that r|(p–1). The outputs of H are utilized as pseudorandomly generated bit strings of length 160.

Once the DSA parameters p and r are available, an element of multiplicative order r can be computed by Algorithm 3.26. Henceforth we assume that p, r and g are public knowledge and need not be supplied as inputs to the signature generation and verification routines. A DSA key pair consists of an integer (the private key) d, 2 ≤ d ≤ r – 1, and the element g^d (the public key) of .

The DSA signature-generation procedure is given as Algorithm 5.43. One may additionally include a check whether s = 0 or t = 0, and, if so, one should repeat signature generation with another session key. But this, being an extremely rare phenomenon, can be ignored for all practical purposes. Both s and t are elements of and hence are represented as integers between 0 and r – 1.

Algorithm 5.43. DSA signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key d′, 2 ≤ d′ ≤ r – 1.

t := d^′–1(H(M) + ds) (mod r).

DSA signature verification is described in Algorithm 5.44. For a valid signature (M, s, t) on a message M, the algorithm computes w ≡ d′(H(M) + ds)^–1 (mod r), w₁ ≡ H(M)w (mod r) and w₂ ≡sw (mod r). Therefore, g^w₁ (g^d)^w₂ ≡ g^w₁+dw₂ ≡ g^w(H(M)+ds) ≡ g^{d′(H(M)+ds)^–1 (H(M)+ds)} ≡ g^d′ (mod p). Reduction modulo r now gives .

Algorithm 5.44. DSA signature verification

Input: A signature (M, s, t) and the signer’s public key g^d.

Output: Verification status of the signature.

Steps:

if ( or ) { Return “Signature not verified”. }

w := t^–1 (mod r).

w₁ := H(M)w (mod r).

w₂ := sw (mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

DSA signature generation performs a single exponentiation and DSA verification does two exponentiations modulo p. All the exponents are positive and ≤ r. Thus, DSA is essentially as fast as the Schnorr scheme or the NR scheme.

*5.4.7. The Elliptic Curve Digital Signature Algorithm (ECDSA)

The ECDSA is the elliptic curve analog of the DSA. Algorithm 5.45 describes the generation of the domain parameters necessary to set up an ECDSA system. One first selects a suitable finite field and takes a random elliptic curve E over . E must be such that the cardinality n of the group has a suitably large prime divisor r. One generates a random point of order r and works in the subgroup 〈P〉 of generated by P. It is assumed that q is either a prime p or a power 2^m of 2.

Algorithm 5.45. Generation of ECDSA parameters

Input: A finite field , where q is a prime p or a power 2^m of 2.

Output: A set of parameters E, n, r, P for the ECDSA.

Steps:

while (1) {
  Choose a,  randomly.
  Consider the curve .
  Compute n := ord .
  if (n has a prime divisor r > max(2¹⁶⁰, )) {
     if (n  (q^k – 1) for k = 1, . . . , 20) and (n ≠ q) {
        do {
          Select  randomly.
          P := (n/r)P′.
        } while ().
     }
  }
}

The order n = ord can be computed using the SEA algorithm (for q = p) or the Satoh–FGH algorithm (for q = 2^m) described in Section 3.6. The integer n should be factored to check if it has a prime divisor r > max(2¹⁶⁰, ). The condition n  (q^k – 1) for small values of k is necessary to avoid the MOV attack, whereas the condition n ≠ q ensures that the SmartASS attack cannot be mounted. is not necessarily a cyclic group. But, r being a prime, a point must be one of order r.

An ECDSA key pair consists of a private key d (an integer in the range 2 ≤ d ≤ r – 1) and the corresponding public key . H denotes the hash function SHA-1 that converts bit strings of arbitrary length to bit strings of length 160. As discussed in connection with DSA, we identify bit strings with integers. We also make an association of elements of with integers in the set {0, 1, . . . , q – 1}. ECDSA signatures can be generated as in Algorithm 5.46. It is necessary to check the conditions s ≠ 0 and t ≠ 0. If these conditions are not both satisfied, one should re-run the procedure with a new session key pair.

Algorithm 5.46. ECDSA signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, d′P), 2 ≤ d′ ≤ r – 1.

/* Let us denote */

s := h (mod r).

t := d′^–1 (H(M) + ds) (mod r).

ECDSA signature verification is explained in Algorithm 5.47. The correctness of this algorithm can be proved like that of Algorithm 5.44.

Algorithm 5.47. ECDSA signature verification

Input: A signature (M, s, t) and the signer’s public key dP.

Output: Verification status of the signature.

Steps:

if ( or ) { Return “Signature not verified”. }

w := t^–1 (mod r).

w₁ := H(M)w (mod r).

w₂ := sw (mod r).

Q := w₁P + w₂(dP).

if () { Return “Signature not verified”. }

/* Otherwise denote */

(mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

*5.4.8. The XTR Signature Algorithm

As discussed in Section 5.2.7, the XTR family of algorithms is an adaptation of other conventional algorithms over finite fields. XTR achieves a speed-up of about three using a clever way of representing elements in certain finite fields. It is no surprise that the DLP-based signature algorithms, described so far, can be given efficient XTR renderings. We explain here XTR–DSA, the XTR version of the digital signature algorithm.

In order to set up an XTR system, we need a prime p ≡ 2 (mod 3). The XTR group G is a subgroup of the multiplicative group and has a prime order q dividing p² – p + 1. For compliance with the original version of DSA, one requires q to be of bit length 160. The trace map taking is used to represent an element by the element . Under this representation, arithmetic in G translates to that in . For example, we have seen how exponentiation in G can be efficiently implemented using arithmetic (Algorithm 5.20). The trace Tr(g) of a generator g of G should also be made available for setting up the XTR domain parameters. In Section 5.2.7, we have discussed how a random set of XTR parameters (p, q, Tr(g)) can be computed.

An XTR key comprises a random integer (the private key) and the trace (the public key). Algorithm 5.20 is used to compute Tr(g^d) from Tr(g) and d. This algorithm gives Tr(g^d–1) and Tr(g^d+1) as by-products. For an implementation of XTR–DSA, we require these two elements of . So we assume that the public key consists of the three traces S_d(Tr(g)) = (Tr(g^d–1), Tr(g^d), . As explained in Lenstra and Verheul [172], the values Tr(g^d–1) and Tr(g^d+1) can be computed easily from Tr(g^d) even when d is unknown, so it suffices to store only Tr(g^d) as the public key. But we avoid the details of this computation here and assume that all the three traces are available to the signature verifier.

Algorithm 5.20 provides an efficient way of computing exponentiations in G. For DSA-like signature verification (cf. Algorithm 5.44), one computes products of the form g^a(g^d)^b with d unknown. In the XTR world, this amounts to computing the trace Tr(g^a(g^d)^b) from the knowledge of a, b, Tr(g) and Tr(g^d) (or S_d(Tr(g))) but without the knowledge of d. The XTR exponentiation algorithm is as such not applicable in such a situation. We should, therefore, prescribe a method to compute traces of products in G. Doing that requires some mathematics that we mention now without proofs. See Lenstra and Verheul [170] for the missing details.

Let e :=ab^–1 (mod q). Then, a + bd ≡b(e + d) (mod q), that is, Tr(g^a(g^d)^b) = Tr(g^b(e+d)), that is, it is sufficient to compute Tr(g^e+d) from the knowledge of e, Tr(g) and Tr(g^d). We treat the 3-tuple S_k(Tr(g)) as a row vector (over ). For , let M_c denote the matrix

Equation 5.9

We take c := Tr(g). It can be shown that det , that is, the matrix M_Tr(g) is invertible, and we have:

Equation 5.10

Here the superscript ^t denotes the transpose of a matrix. With these observations, one can write the procedure for computing Tr(g^a(g^d)^b) as in Algorithm 5.48.

Algorithm 5.48. XTR multiplication

Input: a, b, Tr(g) and S_d(Tr(g)) for some unknown d.

Output: Tr(g^a(g^d)^b).

Steps:

Compute e := ab^–1 (mod q).
Compute S_e(Tr(g)) using Algorithm 5.20 with c := Tr(g) and n := e.
Use Equation (5.10) to compute Tr(g^e+d).
Use Algorithm 5.20 with c := Tr(g^e+d) and n := b to compute
.
Return Tr(g^b(e+d)).

XTR–DSA signature generation (Algorithm 5.49) is an obvious adaptation of Algorithm 5.43.

Algorithm 5.49. XTR signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M with s, .

Steps:

do {
  Generate a random .
  Compute Tr(g^d′).          /* Use Algorithm 5.20 with c := Tr(g) and n := d′ */
  Let Tr(g^d′) = x₁α + x₂α².     /* α is defined in Section 5.2.7 to represent  */
  s := x₁ + px₂ (mod q).
} while (s ≠ 0).
t := d′^–1(H(M) + ds) (mod q).         /* Here H is the hash function SHA-1 */

The bulk of the time taken by Algorithm 5.43 goes for the computation of Tr(g^d′). Since the trace representation of XTR makes this exponentiation three times as efficient as the corresponding DSA exponentiation, XTR–DSA signature generation runs nearly three times as fast as DSA signature generation.

XTR–DSA signature verification can be easily translated from Algorithm 5.44 and is shown in Algorithm 5.50. The most costly step in the XTR–DSA verification routine is the computation of Tr(g^w₁ (g^d)^w₂). One uses Algorithm 5.48 for this purpose. This algorithm, in turn, invokes the exponentiation Algorithm 5.20 twice. For the original DSA signature verification (Algorithm 5.44), the costliest step is the computation of g^w₁ (g^d)^w₂, which involves two exponentiations and a (cheap) multiplication. A careful analysis shows that XTR–DSA signature verification runs nearly 1.75 times faster than DSA verification.

Algorithm 5.50. XTR signature verification

Input: XTR–DSA signature (M, s, t) on a message M and the signer’s public key (Tr(g^d–1), Tr(g^d), Tr(g^d+1)).

Output: Verification status of the signature.

Steps:

if or { Return “Signature not verified”. }

w := t^–1 (mod q).

w₁ := H(M)w (mod q).

w₂ := sw (mod q).

Compute Tr(g^w₁ (g^d)^w₂).	/* Use Algorithm 5.48 */
Write this trace value as .	/* See Section 5.2.7 */

(mod q).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

*5.4.9. The NTRUSign Algorithm

The NTRU Signature Scheme (NSS) (Hoffstein et al. [131]) is an adaptation of the NTRU encryption algorithm discussed in Section 5.2.8. Cryptanalytic studies (Gentry et al. [110]) show that the NSS has security flaws. A newer version of the NSS, referred to as NTRUSign and resistant to these attacks, has been proposed by Hoffstein et al. [128]. In this section, we provide a brief overview of NTRUSign.

In order to set up the domain parameters for NTRUSign, we start with an and consider the ring . Elements of R are polynomials with integer coefficients and of degrees ≤ n – 1. The multiplication of R is denoted by ⊛, which is essentially the multiplication of two polynomials of followed by setting Xⁿ = 1. We also fix a positive integer β to be used as a modulus for the coefficients of the polynomials in R. The subsets and of R are of importance for the NTRUSign algorithm, where for one defines , and where ν_f and ν_g are suitably chosen parameters. The message space is assumed to consist of pairs of polynomials of R with coefficients reduced modulo β. We further assume that we have at our disposal a hash function H that maps messages (that is, binary strings) to elements of .

Let . The average of the coefficients of a is denoted by . The centred norm ‖a‖ of a is defined by

For two polynomials a, , one also defines

‖(a, b)‖² := ‖a‖² + ‖b‖².

The parameters ν_f and ν_g should be so chosen that any polynomial and any polynomial have (centred) norms on the order O(n). An upper bound B on the norms (of pairs of polynomials) should also be predetermined.

Typical values for NTRUSign parameters are

(n, β, ν_f, ν_g, B) = (251, 128, 73, 71, 300).

It is estimated that these choices lead to a security level at least as high as in an RSA scheme with a 1024-bit modulus. For very long-term security, one may go for (n, β) = (503, 256).

In order to set up a key pair, the signer first chooses two random polynomials and . The polynomial f should be invertible modulo β and the signer computes with the property that f_β ⊛ f ≡ 1 (mod β). The public key of the signer is the polynomial h ≡ f_β ⊛ g (mod β), whereas the private key is the tuple (f, g, F, G), where F and G are two polynomials in R satisfying

f ⊛ G – g ⊛ F = q

and

‖F‖, ‖G‖ = O(n).

Hoffstein et al. [128] present an algorithm to compute F and G with ‖F‖, from polynomials f and g with ‖f‖, , where c is a given constant.

Algorithm 5.51. NTRU signature generation

Input: A message M to be signed and the signer’s private key (f, g, F, G).

Output: The signature (M, s) on M.

Steps:

Compute .

Compute polynomials A, B, a, satisfying

G ⊛ m₁ – F ⊛ m₂	=	A + βB,
–g ⊛ m₁ + f ⊛ m₂	=	a + βb,

where a and A have coefficients in the range between –β/2 and +β/2.

Compute s ≡ f ⊛ B + F ⊛ b (mod β).

NTRUSign signature generation is described in Algorithm 5.51. It is apparent that the NTRUSign algorithm derives its security from the difficulty in computing a vector v in a certain lattice, close to the vector defined by the hashed message (m₁, m₂). For defining the lattice, we first note that a polynomial can be identified as a vector (u₀, u₁, . . . , u_n–1) of dimension n defined by its coefficients. Similarly, two polynomials u, define a vector, denoted by (u, v), of dimension 2n. To the public key h we associate the 2n-dimensional lattice

It is clear from the definitions that both (f, g) and (F, G) are in L_h.

If h = (h₀, h₁, . . . , h_n–1), then for each i = 0, 1, . . . , n – 1 we have

Xⁱ ⊛ h(X)	≡	(h_n–i, . . . , h_n–1, h₀, . . . , h_n–i–1) (mod β) and
0 ⊛ h(X)	≡	βXⁱ (mod β).

It follows immediately that L_h is generated by the rows of the matrix

Now, consider the signature generation routine (Algorithm 5.51). The hash function H generates from the message M a random 2n-dimensional vector m := (m₁, m₂) not necessarily on L_h. We then look at the vector v := (s, t) defined as:

s	≡	f ⊛ B + F ⊛ b (mod β), and
t	≡	g ⊛ B + G ⊛ b (mod β).

The lattice L_h has the rotational invariance property, namely, if , then (Xⁱ ⊛ u, Xⁱ ⊛ v) is also in L_h for all i = 0, 1, . . . , n – 1. More generally, if , then for any polynomial . In particular, since v = (s, t) = B ⊛ (f, g) + b ⊛ (F, G) (mod β) and since (f, g), , it follows that . Of these two polynomials only s is needed for the generation of NTRUSign signatures. The other is needed during signature verification and can be computed easily from s using the formula t ≡ h ⊛ s (mod β), the validity of which is established from the definition of the lattice L_h.

The vector is close to the message vector m in the sense that

for the constant c chosen earlier (see Hoffstein et al. [128] for a proof of this relation). The verification routine can, therefore, be designed as in Algorithm 5.52.

Algorithm 5.52. NTRU signature verification

Input: A signature (M, s) and the signer’s public key h.

Output: Verification status of the signature.

Steps:

Compute .

Compute t ≡ h ⊛ s (mod β).

if (‖(m₁ – s, m₂ – t)‖ ≤ B) { Return “Signature verified”. }

else { Return “Signature not verified”. }

For the choice (n, β, c) = (251, 128, 0.45), we have ‖(m₁ – s, m₂ – t)‖ ≈ 216. Therefore, choosing the norm bound B slightly larger than this value (say, B = 300) allows the verification scheme to work correctly most of the time. The knowledge of the private key (f, g, F, G) allows the legitimate signer to compute the close vector (s, t) easily. On the other hand, for a forger (who is lacking the private information) fast computation of a vector v′ = (s′, t′) with small norm ‖(m₁ – s′, m₂ – t′)‖ (say ≤ 400 for the above parameter values) seems to be an intractable task. This is precisely why forging an NTRUSign signature is considered infeasible.

An exhaustive search can be mounted for generating a valid signature (s′, t′) on a message M with H(M) = (m₁, m₂). More precisely, a forger fixes half of the 2n coefficients of the polynomials s′ and t′ and then tries to solve t′ ≡ h ⊛ s′ (mod β) for the remaining half such that the norm ‖(m₁ – s′, m₂ – t′)‖ is small. It is estimated (see Hoffstein et al. [128] for the details) that the probability that a random guess for the unknown half succeeds is very low (≤ 2^–178.44 for the given parameter values).

Another attack on the NTRUSign scheme is to determine the polynomials f, g from a knowledge of h. Since (f, g) is a short non-zero vector in the lattice L_h, an algorithm that can find such vectors can determine (f, g) (or a rotated version of it). However, for a proper choice of the parameters such an algorithm is deemed infeasible. (Also see the NTRU encryption scheme in Section 5.2.8.)

Similar to the NTRU encryption scheme, the NTRUSign scheme is fast, namely, both signature generation and verification can be carried out in time O(n²). This is one of the main reasons why the NTRUSign scheme deserves popularity. Indeed, it may be adopted as an IEEE standard. Unfortunately, however, several attacks on NTRUSign are known. Gentry and Szydlo [111] indicate the possibility of extending the attacks of Gentry et al. [110]. Nguyen [217] proposes a more concrete attack on NTRUSign, that is capable of recovering the private key from only 400 signatures. The future of NTRUSign and its modifications remains uncertain.

5.4.10. Blind Signature Schemes

Suppose that an entity (Alice) referred to as the sender or the user, wants to get a message M signed by a second entity (Bob) called the signer, without revealing M to Bob. This can be achieved as follows. First Alice transforms the message M to and sends to Bob. Bob generates the signature (, σ) on and sends this pair back to Alice. Finally, Alice applies a second transform g to generate the signature of Bob on M. The transform f hides the actual message M from Bob and, thereby, disallows Bob from associating Alice with the signed message (M, s). Such a signature scheme is called a blind signature scheme.

Blind signatures are widely used in electronic payment systems in which Alice (a customer) wants the signature of Bob (the bank) on an electronic coin, but does not want the bank to be capable of associating Alice with the coin. In this way, Alice achieves anonymity while spending an electronic coin.

In a blind signature scheme, Bob does not know M, but his signature on is essential for Alice to reconstruct the signature on M. Furthermore, the blind signature on M should not allow Alice to compute the blind signature on another message M′. More generally, Alice should not be able to generate l + 1 (or more) blind signatures with only l (or fewer) interactions with Bob. A forgery of this kind is often called an (l, l + 1) forgery or a one-more forgery (in case l is bounded above by a polynomial in the security parameter) or a strong one-more forgery (in case l is bounded above poly-logarithmically in the security parameter). An (l, l + 1) forgery is mountable on a scheme which is not existentially unforgeable (Exercises 5.15 and 5.19). Usually, existential forgery gives forged signatures on messages over which the forger has no (or little) control (that is, on messages which are likely to be meaningless).

Now, we describe some common blind signature schemes. We provide a brief overview of the algorithms. Detailed analysis of the security of these schemes can be found in the references cited at the end of this chapter.

Chaum’s RSA blind signature protocol

Chaum’s blind signature protocol is based on the intractability of the RSAP (or the IFP). The signer generates two (distinct) large random primes p and q and computes n := pq. He then chooses a random integer e with gcd(e, φ(n)) = 1 and computes an integer d such that ed ≡ 1 (mod φ(n)). The public key (of the signer) is the pair (n, e), whereas the private key is d. Chaum’s protocol works as in Algorithm 5.53.

Algorithm 5.53. Chaum’s RSA blind signature

Input: A message M generated by Alice.

Output: Bob’s blind RSA signature (M, s) on M.

Steps:

Alice hashes the message M to .

Alice chooses a random and computes .

Alice sends to Bob.

Bob generates the signature on .

Bob sends σ to Alice.

Alice computes Bob’s (blind) signature s := ρ^–1σ (mod n) on M.

Since σ ≡ (ρ^em)^d ≡ρm^d (mod n), we have s ≡ρ^–1σ ≡ m^d (mod n), that is, s is indeed the RSA signature of Bob on M. Bob receives and gains no idea about m, since ρ is randomly and secretly chosen by Alice.

The Schnorr blind signature protocol

Let G be a finite multiplicative Abelian group and let be of order r (a large prime). We assume that computing discrete logarithms in G is an infeasible task. The key pair of the signer is denoted by (d, g^d), where the integer d, 2 ≤ d ≤ r – 1, is the private key and g^d the public key. The Schnorr blind signature protocol is described in Algorithm 5.54.

Algorithm 5.54. Schnorr blind signature

Input: A message M generated by Alice.

Output: Bob’s blind Schnorr signature (M, s, t) on M.

Steps:

Alice asks Bob to initiate a communication.

Bob chooses a random and computes .

Bob sends to Alice.

Alice selects α, randomly.

Alice computes .

Alice computes and .

Alice sends to Bob.

Bob computes .

Bob sends to Alice.

Alice computes .

It is easy to check that the output (M, s, t) of Algorithm 5.54 is a valid Schnorr signature of Bob on the message M. The session key d′ (Algorithm 5.38 ) for this signature is . Since d and are secret knowledges of Bob, Alice must depend on Bob for the computation of . The message M is never sent to Bob. Also its hash is masked by β. This is how this protocol achieves blindness.

The Okamoto–Schnorr blind signature protocol

Okamoto’s adaptation of the Schnorr scheme is proved to be resistant to an attack by a third entity (Pointcheval and Stern [237]). As in the Schnorr scheme, we fix a (finite multiplicative Abelian) group G (in which it is difficult to compute discrete logarithms). We then choose two elements g₁, of (large prime) order r. The private key of the signer now comprises a pair (d₁, d₂) of integers in {2, . . . , r – 1}, whereas the public key y is the group element . We assume that there is a hash function H whose outputs are in . We identify elements of G as bit strings. The Okamoto–Schnorr blind signature protocol is explained in Algorithm 5.55.

Algorithm 5.55. Okamoto–Schnorr blind signature

Input: A message M generated by Alice.

Output: Bob’s blind signature (M, s₁, s₂, s₃) on M.

Steps:

Alice asks Bob to initiate a communication.

Bob chooses random and computes .

Bob sends to Alice.

Alice selects α, β, randomly.

Alice computes .

Alice computes and .

Alice sends to Bob.

Bob computes and .

Bob sends and to Alice.

Alice computes and .

An Okamoto–Schnorr signature (M, s₁, s₂, s₃) on a message can be verified by checking the equality s₁ = H(M‖u), where . Each invocation of the protocol uses a session private key . Alice must depend on Bob for generating s₂ and s₃, because she is unaware of the private values d₁, d₂, and . Alice, in an attempt to forge Bob’s blind signature, may start with random and of her choice. But she still needs the integers d₁ and d₂ in order to complete the protocol. The blindness of Algorithm 5.55 stems from the fact that the message M is never sent to Bob and its hash is masked by γ.

5.4.11. Undeniable Signature Schemes

So far we have seen signature schemes for which any entity with a knowledge of the signer’s public key can verify the authenticity of a signature. There are, however, situations where an active participation of the signer is necessary for the verification of a signature. Moreover, during a verification interaction a signer should not be allowed to deny a legitimate signature made by him. A signature meeting these requirements is called an undeniable signature.

Undeniable signatures are typically used for messages that are too confidential or private to be given unlimited verification facility. In case of a dispute, an entity should be capable of proving a forged signature to be so and at the same time must accept the binding to his own valid signatures. So in addition to the signature generation and verification protocols, an undeniable signature scheme comes with a denial or disavowal protocol to guard against a cheating signer that is unwilling to accept his valid signature either by not taking part in the verification interaction or by responding incorrectly or by claiming a valid signature to be forged.

There are applications where undeniable signatures are useful. For example, a software vendor can use undeniable signatures to prove the authenticity of its products only to its (paying) customers (and not to everybody).

Chaum and van Antwerpen gave a first concrete realization of an undeniable signature scheme [52, 51]. It is based on the intractability of computing discrete logs in the group , p a prime. Gennaro et al. [109] later adapted the algorithm to design an RSA-based undeniable signature scheme. We now describe these two schemes. Rigorous studies of these schemes can be found in the original papers. See also [53, 186, 187, 102, 202, 230].

The Chaum–Van Antwerpen undeniable signature scheme

For setting up the domain parameters for Chaum–Van Antwerpen (CvA) signatures, Bob chooses a (large) prime p of the form p = 2r + 1, where r is also a prime. (Such a prime p is called a safe prime (Definition 3.5).) Bob finds a random element of multiplicative order r, selects a random integer and computes y := g^d (mod p). Bob publishes (p, g, y) as his public key and keeps the integer d secret as his private key. The value d^–1 (mod r) is needed during verification and can be precomputed and stored (secretly) along with d. We assume that we have a hash function H that maps messages (that is, bit strings) to elements of the subgroup of order r in . In order to generate a CvA signature on a message M, Bob carries out the steps given in Algorithm 5.56. Verification of Bob’s CvA signature by Alice involves the interaction given in Algorithm 5.57.

Algorithm 5.56. Chaum–Van Antwerpen undeniable signature generation

Input: The message M to be signed and the signer’s private key (p, d).

Output: The signature (M, s) on M.

Steps:

m := H(M).

s := m^d (mod p).

If (M, s) is a valid CvA signature, then

v ≡ (sⁱy^j)^{d–1 (mod r)} ≡ ((m^d)ⁱ(g^d)^j)^{d–1 (mod r)} ≡ mⁱg^j ≡ v′ (mod p).

On the other hand, if s ≢ m^d (mod p), Bob can guess the element v′ with a probability of only 1/r, even under the assumption that Bob has unbounded computing resources. This means that unless the signature (M, s) is valid, it is extremely unlikely that Bob can make Alice accept the signature.

The denial protocol for the CvA scheme involves an interaction between the prover Bob and the verifier Alice, as given in Algorithm 5.58. In order to see how this denial protocol works, we note that Algorithm 5.58 essentially makes two calls of the verification protocol. First assume that Bob executes the protocol honestly, that is, Bob follows the steps as indicated. If the signature (M, s) is a valid one, the check v₁ ≡ m^i₁ g^j₁ (mod p) (as well as the check v₂ ≡ m^i₂ g^j₂ (mod p)) should succeed and Alice’s decision to accept the signature as valid is justified. On the other hand, if (M, s) is a forged signature, that is, if s ≢ m^d (mod p), then the probability that each of these checks succeeds is 1/r as discussed before. Thus, it is extremely unlikely that a forged signature is accepted as valid by Alice. So Alice eventually computes both w₁ and w₂ equal to s^{i₁ i₂d^–1 (mod r)} (mod p) and accepts the signature to be forged. Finally, suppose that Bob is intending to deny the (purported) signature (M, s). If Bob does not fully take part in the interaction, then his intention becomes clear. Otherwise, he sends v₁ and/or v₂ not computed according to the formulas specified. In that case, Bob succeeds in making Alice compute w₁ = w₂ with a probability of only 1/r. Thus, it is extremely unlikely that Bob executing this protocol dishonestly can successfully disavow a valid signature.

Algorithm 5.57. Chaum–Van Antwerpen undeniable signature verification

Input: A CvA signature (M, s) on a message M.

Output: Verification status of the signature.

Steps:

Alice computes m := H(M).

Alice chooses two secret random integers i, .

Alice computes u := sⁱy^j (mod p).

Alice sends u to Bob.

Bob computes v := u^{d–1 (mod r)} (mod p).

Bob sends v to Alice.

Alice computes v′ := mⁱg^j (mod p).

Alice accepts the signature (M, s) if and only if v = v′.

Algorithm 5.58. Chaum–Van Antwerpen undeniable signature: denial protocol

Input: A (purported) CvA signature (M, s) of Bob on a message M.

Output: One of the following decisions by Alice:

The signature is valid.
The signature is forged.
Bob is trying to deny the signature.

Steps:

Alice computes m := H(M).

Alice chooses two secret random integers i₁, .

Alice computes u₁ := s^i₁ y^j₁ (mod p) and sends u₁ to Bob.

Bob computes (mod p) and sends v₁ to Alice.

if (v₁ ≡ m^i₁ g^j₁ (mod p)) {
Alice accepts the signature (M, s) to be valid and quits the protocol.
}

Alice chooses two other secret random integers i₂, .

Alice computes u₂ := s^i₂ y^j₂ (mod p) and sends u₂ to Bob.

Bob computes and sends v₂ to Alice.

if (v₂ ≡ m^i₂ g^j₂ (mod p)) {
Alice concludes the signature (M, s) to be valid and quits the protocol.
}

Alice computes w₁ := (v₁g^–j₁)^i₂ (mod p) and w₂ := (v₂g^–j₂)^i₁ (mod p).

if (w₁ = w₂) {
Alice concludes that the signature is forged.
} else {
Alice concludes that Bob is trying to deny the signature.
}

RSA-based undeniable signature scheme

Gennaro, Krawczyk and Rabin’s undeniable signature scheme (the GKR scheme) is based on the (intractability of the) RSA problem.

A GKR key pair differs from a usual RSA key pair. The signer chooses two (large) random primes p and q such that both p′ := (p – 1)/2 and q′ := (q – 1)/2 are also prime, and sets n := pq. Two integers e and d satisfying ed ≡ 1 (mod φ(n)) are then selected. Finally, one requires a , g ≠ 1, and y ≡ g^d (mod n). The public key of the signer is the tuple (n, g, y), whereas the private key is the pair (e, d). It can be shown that g need not be a random element of . Choosing a (fixed) small value of g (for example, g = 2) does not affect the security of the GKR protocol, but makes certain operations (computing powers of g) efficient.

Algorithm 5.59. GKR RSA undeniable signature generation

Input: The message M to be signed and the signer’s private key (e, d).

Output: The signature (M, s) on M.

Steps:

m := H(M).	/* Hash the message M to an element m of */
s := m^d (mod n).

GKR signature generation (Algorithm 5.59) is the same as in RSA. The verification protocol described in Algorithm 5.60 accepts, in addition to a valid GKR signature (M, s), the signatures (M, αs), where has multiplicative order 1 or 2 (there are four such values of α). In view of this, we define the subset

of . Any element is considered to be a valid signature on M. Since Bob knows p and q, he can easily find out all the elements α of of order ≤ 2 and can choose to output (M, αH(M)^d) as the GKR signature for any such α. Taking α = 1 (as in Algorithm 5.59) is the canonical choice, but during the execution of the denial protocol Bob will not be allowed to disavow other valid choices.

The interaction between the prover Bob and the verifier Alice during GKR signature verification is given in Algorithm 5.60. It is easy to see that if (M, s) is a valid GKR signature, then v = v′. On the other hand, if (M, s) is a forged signature, that is, if s ∉ Sig M, then the equality v = v′ occurs with a probability of , even in the case that the forger has unbounded computational resources.

Algorithm 5.60. GKR RSA undeniable signature verification

Input: A GKR signature (M, s) on a message M.

Output: Verification status of the signature.

Steps:

Alice computes m := H(M).

Alice chooses random i, .

Alice computes u := s²ⁱy^j (mod n).

Alice sends u to Bob.

Bob computes v := u^e (mod n).

Bob sends v to Alice.

Alice computes v′ := m²ⁱg^j (mod n).

Alice accepts the signature (M, s) if and only if v = v′.

Algorithm 5.61. GKR RSA undeniable signature: denial protocol

Input: A (purported) GKR signature (M, s) of Bob on a message M.

Output: One of the following decisions by Alice:

The signature is forged.
Bob is trying to deny the signature.

Steps:

Alice computes m := H(M).

Alice chooses random and .

Alice computes w₁ := mⁱg^j (mod n) and w₂ := sⁱy^j (mod n).

Alice sends (w₁, w₂) to Bob.

Bob computes m := H(M).

Bob determines such that the following congruence holds:

Equation 5.11

if (no such i′ is found) {    /* This may happen, if Alice has cheated */
   Bob aborts the protocol.
}
Bob sends i′ to Alice.
if (i = i′) {
   Alice concludes that the signature is forged.
} else {
   Alice concludes that Bob is trying to deny the signature.
}

The denial protocol for the GKR scheme is described in Algorithm 5.61. This protocol is executed, after verification by Algorithm 5.60 fails. In that case, Alice wants to ascertain whether the signature is actually invalid or Bob has denied his valid signature by incorrectly executing the verification protocol. A small integer k is predetermined for the denial protocol. The prover needs a running time proportional to k, whereas the probability of a successful denial of a valid signature decreases with k. Taking k = O(lg n) gives optimal performance.

In order to see how this protocol prevents Bob from denying a valid signature, first consider the case that (M, s) is a valid GKR signature of Bob. In that case, . On the other hand, s^e ≡ α^em^de ≡ α^em (mod n). Therefore, for every , one has . Thus, Bob can only guess the secret value of i chosen by Alice and the guess is correct with a probability of 1/k. On the other hand, if (M, s) is a forged signature, Congruence (5.11) holds only for a single i′, that is, for i′ = i (Exercise 5.23). Sending i′ will then convince Alice that the signature is really forged. In both these cases, Congruence (5.11) holds for at least one i′. Failure to detect such an i′ implies that the value(s) of w₁ and/or w₂ have not been correctly sent by Alice. The protocol should then be aborted.

In order to reduce the probability of successful cheating, it is convenient to repeat the protocol few times instead of increasing k. If k = 1024, Bob can successfully cheat in eight executions of the denial protocol with a probability of only 2^–80.

5.4.12. Signcryption

The conventional way to ensure both authentication and confidentiality of a message is to sign the message first and then encrypt the signed message. Now that we have many signature and encryption algorithms in our bag, there is hardly any problem in achieving both the goals simultaneously. Zheng proposes signcryption schemes that combine these two operations together. A signcryption scheme is better than a sign-and-encrypt scheme in two aspects. First, the combined primitive takes less running time than the composite primitive comprising signature generation followed by encryption. Second, a signcrypted message is of smaller size than a signed-and-encrypted message. When communication overheads need to be minimized, signcryption proves to be useful.

Before describing the signcryption primitive, let us first review the composite sign-and-encrypt scheme. Let M be the message to be sent. Alice the sender generates the signature appendix s on M using one of the signature schemes described earlier. This step can be described as s = f_s(M, d_a), where d_a is the private key of Alice. Next a symmetric key k is generated by Alice. The message M is encrypted by a symmetric cipher (like DES) under the key k, that is, C := E(M, k). The key k is then encrypted using an asymmetric routine under the public-key e_b of Bob the recipient, that is, c = f_e(k, e_b). The triple (C, c, s) is then transmitted to Bob.

Upon reception of (C, c, s) Bob first retrieves k using his private key d_b, that is, k = f_d(c, d_b). The message M is then recovered by symmetric decryption: M = D(C, k). Finally, the authenticity of M is verified from the signature using the verification operation: f_v(M, s, e_a), where e_a is the public key of Alice. Algorithm 5.62 describes the sign-and-encrypt operation and its inverse.

Algorithm 5.62. Sign-and-encrypt

s := f_s(M, d_a).

Generate a random symmetric key k.

c := f_e(k, e_b).

C := E(M, k).

Send (C, c, s) to the recipient.

Decrypt-and-verify

k := f_d(c, d_b).

M := D(C, k).

Verify the signature: f_v(M, s, e_a).

Zheng’s signcryption scheme combines f_s and f_e to a single operation f_se and also f_d and f_v to another single operation f_dv. Each of these combined operations essentially takes the time of a single public- or private-key operation and hence leads to a performance enhancement by a factor of nearly two. Moreover, the encrypted key c need not be sent with the message, that is, C and s are sufficient for both authentication and confidentiality. This reduces communication overhead.

Signcryption is based on shortened digital signature schemes. Table 5.3 describes the shortened version of DSA (Section 5.4.6). We use the notations of Algorithms 5.43 and 5.44. Also ‖ denotes concatenation of strings, and H is a hash function (like SHA-1). The shortened schemes have two advantages over the original DSA. First, a DSA signature is of length 2|r|, whereas an SDSA1 or SDSA2 signature has length |r| + |H(·)|. For the current version of the standard, both r and H(·) are of size 160 bits. However, one may use potentially bigger r and in that case the shortened schemes give smaller signatures with equivalent security. Finally, DSA requires computing a modular inverse during verification, whereas SDSA does not. So verification is more efficient in the shortened schemes.

Table 5.3. Shortened digital signature algorithms
Name	Signature generation	Signature verification
SDSA1	s := H(g^d′ (mod p)‖M). t := d′(s + d)^–1 (mod r).	w := (e_ag^s)^t (mod p). Verify if s = H(w‖M).
SDSA2	s := H(g^d′ (mod p)‖M). t := d′(1 + ds)^–1 (mod r).	. Verify if s = H(w‖M).

Algorithms 5.63 and 5.64 provide the details of the signcryption algorithm and its inverse called unsigncryption. The algorithms use a keyed hash function KH. One may implement KH(x, ) as using an unkeyed hash function H.

Signcryption differs from the shortened scheme in that is used instead of g^d′ for the computation of s. The running time of the signcryption algorithm is dominated by this modular exponentiation. When signature and encryption are used separately, the encryption operation uses one (or more) exponentiations. So signcryption significantly improves upon the sign-and-encrypt scheme of Algorithm 5.62.

Algorithm 5.63. Signcryption

Input: Plaintext message M, the sender’s private key d_a, the recipient’s public key

e_b = g^d_b (mod p).

Output: The signcrypted message (C, s, t).

Steps:

Select a random .
.                /* Generate keys for both signing and encrypting. */
Write k := k₁ ‖ k₂ with |k₂| equal to the length of a symmetric key.
s := KH(M‖N, k₁).
                 /* Here N is the public key or the public key certificate of the sender. */

C := E(M, k₂).                                                          /* Symmetric encryption */

Algorithm 5.64. Unsigncryption

Input: The signcrypted message (C, s, t), the sender’s public key e_a = g^d_a (mod p) and the recipient’s private key d_b.

Output: The plaintext message M and the verification status of the signature.

Steps:

Write k := k₁ ‖ k₂ with |k₂| equal to the length of a symmetric key.

M := D(C, k₂).

/* Symmetric decryption */

if (KH(M‖N, k₁) = s) { Return “Signature verified”. }

else { Return “Signature not verified”. }

The most time-consuming part of unsigncryption is the computation of two modular exponentiations. DSA verification too has this property. However, an additional decryption in the decrypt-and-verify scheme of Algorithm 5.62 calls for one (or more) exponetiations, making it slower that unsigncryption.

Exercise Set 5.4

5.15

Show how first pre-image resistance of the hash function H plays an important role for RSA signatures (with appendix) described in Section 5.4.1. More precisely, show that if it is easy to find a pre-image of any hash value, it is easy to generate a valid signature (M, s) from two valid signatures (M₁, s₁) and (M₂, s₂) with M ∉ {M₁, M₂}. This is often referred to as existential forgery of a signature. [H]
Describe how existential forgery is possible for the Rabin signature scheme. [H]
Describe how existential forgery is possible for the ElGamal signature scheme. [H]

5.16

Assume that Bob uses the same RSA key pair ((n, e), d) for receiving encrypted messages and for signing. Suppose that Carol intercepts the ciphertext c ≡ m^e (mod n) sent by Alice. Also suppose that Bob is willing to sign any random message presented by Carol. Explain how Carol can choose a message to be signed by Bob in order to retrieve the secret m. [H]

5.17

Let G be a finite cyclic group of order n, and g a generator of G. Suppose that Alice’s private and public keys are respectively d and g^d.

Consider a variant of the ElGamal signature scheme, in which s is computed as in Algorithm 5.36, but the roles of d and d′ are interchanged in the generation of t, that is, the modified signature (s, ) on M is generated as:
s := g^d′,
:= d^–1[H(M) – d′H(s)] (mod n).

Write the verification routine for the modified scheme.
Show that forging modified ElGamal signatures is as difficult as computing discrete logarithms in G. You may assume that a forger can arrange d′ of her choice.
Explain why signature generation is (a bit) more efficient in the modified scheme. Suppose that because of this enhanced performance Alice decided to switch to the modified scheme, but for backward compatibility she maintained both the original signature (s, t) and the modified signature (s, ) on a message M. What went wrong?

5.18

Show that:

There are two valid ECDSA signatures on each message.
There are three valid XTR–DSA signatures on each message.

(Here we call a signature valid, if it passes the verification routine.)

5.19

Write the versions with message recovery of the RSA, Rabin, Schnorr and Nyberg–Rueppel signature schemes.
Describe the possibilities of existential forgery for these versions. (Since hash functions cannot be inverted, they are not used for signature schemes with message recovery, and so the problem of existential forgery is more acute in this case. To avoid such forgeries the signer should add some redundancy to each message block before signing the same. An existentially forged signature is likely to correspond to a message not containing the redundancy.)

5.20

Design the XTR version of the Nyberg–Rueppel signature scheme with appendix (Section 5.4.5). What are the speed-ups achieved by the signature generation and verification routines of the XTR version over the original NR routines?

5.21

Repeat Exercise 5.20 with the Schnorr digital signature scheme (Section 5.4.4).

5.22

Deduce that the determinant of the matrix M_c of Equation (5.9) is
Demonstrate that
[View full size image]

5.23

Let p, q, p′, q′ be distinct odd primes with p = 2p′ + 1 and q = 2q′ + 1, and let n := pq (as in the RSA-based undeniable signature scheme).

Let . Show that . [H]
Argue that there are exactly four elements in of order ≤ 2.
Let α ≢ ±1 (mod n) and ord_n α < p′q′. Show that gcd(α – 1, n) or gcd(α + 1, n) is a non-trivial divisor of n. How many such elements α does contain?
Let have order p′q′ or 2p′q′. Show that for every .
Look at the denial protocol for the GKR RSA signature scheme (Algorithm 5.61) and assume that p′ < q′. Suppose that (M, s) is a forged signature (that is, s ∉ Sig M) on some message M with . Show that s ≡ αm^d (mod n) for some with ord_n α ≥ p′. Deduce that ord_n(ms^–e) ≥ p′. Conclude that if 4k < p′, then there exists a unique (namely, i′ = i) for which Congruence (5.11) holds.

5.24

Write the shortened versions of ECDSA signature generation and verification.
Write the signcryption and unsigncryption algorithms based on shortened ECDSA.

5.5. Entity Authentication

Entity authentication (also called identification) is a process by means of which an entity Alice, called the claimant, proves her identity to another entity Bob, called the prover or the verifier. Alice is assumed to possess some secret piece(s) of information that no intruder is expected to know. During the execution of the identification protocol, an interaction takes place between Alice and Bob. If the interaction allows Bob to conclude (deterministically or with high probability) that the claimer possesses the secret knowledge, he accepts the claimer as Alice. An intruder Carol lacking the secret information is expected (with high probability) to fail to convince Bob of her identity as Alice. This is how entity authentication schemes tend to prevent impersonation attacks by intruders. Typically, identification schemes are used to protect access to some sensitive piece(s) of data, like a user’s (or a group’s) private files in a computer or an account in a bank. Both secret-key and public-key techniques are used for the realization of entity authentication protocols.

5.5.1. Passwords

A password is a small string to be remembered by an entity and produced verbatim to the verifier at the time of identification. The most common example is a computer password used to protect access to a user’s private working area in a file system. In this case, an alphanumeric string (or a string that can be input using a computer keyboard) of length between 4 and 20 characters is normally used as the secret information associated with an entity. Passwords are also used to prevent misuse of certain physical objects (like an ATM card for withdrawing cash from one’s bank account, a prepaid telephone card) by anybody other than the legitimate owners of the objects. In this case, a password usually consists of a sequence of four to ten digits and is also called a personal identification number or a PIN.

In order that Bob can recognize an entity from her password, a possibility for Bob is to store the (entity, password) pairs corresponding to all the entities that are expected to participate in identification interactions with Bob. When Alice enters her password, Bob checks if Alice’s input is the same as what he stores in the pair for Alice. The file(s) storing these private records should be preserved with high secrecy, and neither read nor write access should be granted to any user. But a privileged user (the superuser) is usually given the capability to inspect any file (even read-protected ones) and, therefore, can make misuse of the passwords.

This problem can be avoided by storing, instead of the passwords themselves, a one-way transform of the passwords.^[3] When Alice enters a password P, Bob computes the transform f(P) and compares f(P) with the record stored for Alice. The identity of Alice is accepted if and only if a match occurs. The password file now need not be read-protected, since any intruder (even the superuser) knowing the value f(P) cannot easily compute P.

^[3] Informally speaking, a one-way function is one which is computationally infeasible to invert.

Passwords should be chosen from a space large enough to preclude exhaustive search by an intruder in feasible time. Unfortunately, however, it is a common tendency for human users to choose passwords from limited subsets of the allowed space. For example, use of lower case characters, dictionary words, popular names, birth dates and so on in passwords makes attacks on passwords much easier. A strategy to foil such dictionary-based attacks is to use a pseudorandom bit sequence S known as the salt and apply the one-way function f to a combination of the password P and the salt S. That is, a function f(P, S) is now stored against an entity Alice having a password P. The combination (P, S) is often referred to as a key for the password scheme. Since a password now corresponds to many possible keys, the search space for an intruder increases dramatically. For instance, if S is a pseudorandomly chosen bit string of length 64, the intruder has to compute f(P, S) for a total of 2⁶⁴ times in order to guess the correct candidates for S for each P under trial. It is also necessary that the same key is not chosen for two different entities. If the salt S is a 64-bit string, then by the birthday paradox a collision between two keys is expected to occur only after (at least) 2³² keys are generated.

A second strategy to strengthen the protection of passwords is to increase the so-called iteration count n, that is, instead of storing f(P, S) for each password P, Bob now stores fⁿ(P, S). An n-fold application of the function f increases by a factor of n both the time for password verification and for exhaustive search by an intruder. For a legitimate user, this is not really a nuisance, since computation of fⁿ(P, S) only once during identification is tolerable (and may even be unnoticeable), whereas to an intruder breaking a password simply becomes n times as difficult. In typical applications, values of n ≥ 1000 are recommended.

In some situations, it is advisable to lock access to a password-protected area after a predetermined number of (say, three) wrong passwords have been input in succession. This is typically the case with PINs for which the search space is rather small. For unlocking the access (to the legitimate user Alice), a second longer key (again known only to Alice) is used or human intervention is called for.

As a case study, let us briefly describe the password scheme used by the UNIX operating system. During the creation of a password a user supplies a string P of eight 7-bit ASCII characters as the password. (Longer strings are truncated to first 8 characters.) A 56 bit DES^[4] key K is constructed from P. A 12-bit random salt S is obtained from the system clock at the time of the creation of the password. The zero message (that is, a block of 64 zero bits) is then iteratively encrypted n = 25 times using K as the key. The encryption algorithm is a variant of the DES, that depends on the salt S. The output ciphertext and the salt (which account for a total of 64 + 12 = 76 bits) are then packed into eleven 7-bit ASCII characters and stored in the password file (usually /etc/passwd). When UNIX was designed (in 1970), this algorithm, often referred to as the UNIX crypt password algorithm, was considered to be reasonably safe under the assumption of the difficulty of finding a DES key from a plaintext–ciphertext pair. With today’s hardware and software speed, a motivated attacker can break UNIX passwords in very little time.

^[4] The data encryption standard (DES) is a well-known symmetric-key cipher (Section A.2.1).

Password-based authentication schemes suffer from the disadvantage that the user has to disclose her secret P to the verifier. The verifier may misuse the knowledge of P by storing it secretly and deploying it afterwards. During the process of computation of fⁿ(P, S) the string P resides in the machine’s memory. An eavesdropper capable of monitoring the temporary storage holding the string P easily gets its value. In view of these shortcomings, password schemes are referred to as weak authentication schemes.

5.5.2. Challenge–Response Algorithms

In a strong authentication scheme, the claimant proves the possession of a secret knowledge to a verifier without disclosing the secret to the verifier. One of the communicating entities generates a random bit string c known as the challenge and sends c (or a function of c) to the other. The latter then reacts to the challenge appropriately, for example, by sending a response string r to the former. Strong authentication schemes are, therefore, also called challenge–response authentication schemes. The communication between the entities depends both on the random challenge and on the secret knowledge of the claimant. An intruder lacking the secret knowledge of a valid claimant cannot take part properly in the interaction. Furthermore, since a random challenge is used during each invocation of the identification protocol, an eavesdropper cannot use the intercepted transcripts of a particular session for a future invocation of the protocol.

Public-key protocols can be used to realize challenge–response schemes. We assume that Alice is the claimant and Bob is the verifier. Without committing to specific algorithms, we denote the public and private keys of Alice by e and d, and the encryption and decryption transforms by f_e and f_d respectively. Alice proves her identity by demonstrating her knowledge of d (but without revealing d) to Bob. Bob uses the transform f_e and Alice the transform f_d under the respective keys e and d. If a key d′ other than d is used by Carol in conjunction with e, some step of the interaction detects this and the protocol rejects Carol’s claim to be Alice. We describe two challenge–response schemes that differ in the sequence of applying the transforms f_e and f_d.

A challenge–response scheme based on encryption–decryption

In this scheme, Bob (the verifier) first generates a random string r, encrypts the same by the public key of Alice (the claimant) and sends the ciphertext c (the challenge) to Alice. Alice uses her private key to decrypt c to the message r′ and sends r′ (the response) back to Bob. Identification of Alice succeeds if and only if r = r′. Algorithm 5.65 illustrates the details of this scheme. It employs a one-way function H (like a hash function) for a reason explained later. This scheme checks whether the claimant can recover the random string r correctly. A knowledge of the decryption key d is needed for that.

Algorithm 5.65. Challenge–response authentication based on encryption

Bob generates a random bit string r and computes w := H(r).

Bob reads Alice’s (authentic) public key e and computes c := f_e(r, e).

Bob sends (w, c) to Alice.

Alice computes r′ := f_d(c, d).

if (H(r′) ≠ w) { Alice quits the protocol. }

Alice sends r′ to Bob.

Bob identifies Alice if and only if r′ = r.

The string H(r) = w is called the witness. By sending w to Alice, Bob convinces her of his knowledge about the secret r without disclosing r itself. If Bob (or a third party pretending to be Bob) tries to cheat, Alice has the option to abort the protocol prematurely. In other words, Alice does not have to decrypt an arbitrary ciphertext presented by Bob without confirming that Bob knows the corresponding plaintext.

A challenge–response scheme based on digital signatures

In the scheme explained in Algorithm 5.66, Alice (the claimant) first does the private key operation, that is, Alice sends her digital signature on a message to Bob (the prover). Bob then verifies the signature of Alice by employing the encryption transform with Alice’s public key.

Algorithm 5.66. Challenge–response authentication based on signature

Bob selects a random string r_B.

Bob sends r_B to Alice.

Alice selects a random string r_A.

Alice generates the signature s := f_d(r_A‖r_B, d).

Alice sends (r_A, s) to Bob.

Bob reads Alice’s (authentic) public key e.

Bob retrieves the strings and satisfying .

Bob identifies Alice if and only if and .

This authentication scheme is based on the assumption that only a person knowing Alice’s private key d can generate a signature s that leads to the equalities and . Using only r_A and the signature s = f_d(r_A, d) would demonstrate to Bob that Alice possesses the requisite knowledge of d. The random string r_B is used to prevent the so-called replay attack. If r_B were not used, an eavesdropper Carol intercepting the transcripts of a session can later claim her identity as Alice by simply supplying r_A and Alice’s signature on r_A to Bob. Using a new r_B in every session (and incorporating it in the signature) guarantees that the signature varies in different sessions, even when r_A remains the same.

There is an alternative strategy by which the use of the random string r_B can be avoided. All we have to ensure is that a value of r_A used once cannot be reused in a subsequent session. This can be achieved by using a timestamp, which is a string reflecting the time when a certain event occurs (in our case, when Alice generates the signature). Thus, if Alice gets the local time t_A, computes the signature s := f_d(t_A, d) and sends (t_A, s) to Bob, it is sufficient for Bob to check that the timestamp t_A is valid. A possible criterion for the validity of Alice’s timestamp t_A is that the difference between t_A and the time when Bob is verifying the signature is within an allowed bound (predetermined, based on the approximate time for the communication). But it may be possible for an adversary to provide to Bob the timestamp t_A and Alice’s signature on t_A, before t_A expires. Therefore, Bob should additionally ensure that timestamps from Alice come in a strictly ascending order. Maintaining the timestamp for the last interaction with Alice takes care of this requirement. Algorithm 5.67 describes the modified version of Algorithm 5.66, based on timestamps. A problem with timestamps is that (local) clocks across a network have to be properly synchronized.

Algorithm 5.67. Using timestamp in challenge–response authentication

Alice reads the local time t_A.

Alice generates the signature s := f_d(t_A, d).

Alice sends (t_A, s) to Bob.

Bob reads Alice’s (authentic) public key e.

Bob retrieves the time-stamp .

Bob identifies Alice if and only if and this timestamp is valid.

Mutual authentication

So far, we have described identification schemes that are unidirectional or unilateral in the sense that only Alice tries to prove her identity to Bob. For mutual authentication between Alice and Bob, the above schemes can be used a second time by reversing the roles of Alice and Bob. Algorithm 5.68 describes an alternative strategy that achieves mutual authentication with reduced communication overhead (compared to two invocations of the unidirectional scheme). Now, the key pairs (e_A, d_A) and (e_B, d_B) and the transforms f_{e, A}, f_{d, A} and f_{e, B}, f_{d, B} of both Alice and Bob should be used.

5.5.3. Zero-Knowledge Protocols

The challenge–response schemes described above ensure that the claimant’s secret is not made available to the verifier (or a listener to the communication between the verifier and the claimant). But the claimant uses her private key for generating the response and, therefore, it continues to remain possible that a verifier extracts some partial information on the secret by choosing challenges strategically.

Algorithm 5.68. Mutual authentication

Bob selects a random string r_B.

Bob sends r_B to Alice.

Alice selects a random string r_A.

Alice generates the signature s_A := f_{d, A}(r_A‖r_B, d_A).

Alice sends (r_A, s_A) to Bob.

Bob reads Alice’s (authentic) public key e_A.

Bob retrieves the strings and satisfying .

Bob identifies Alice if and only if and .

Bob generates the signature s_B := f_{d, B}(r_B‖r_A, d_B).

Bob sends s_B to Alice.

Alice reads Bob’s (authentic) public key e_B.

Alice retrieves the strings and satisfying .

Alice identifies Bob if and only if and .

Using a zero-knowledge (ZK) protocol overcomes this difficulty in the sense that (absolutely) no information on the claimant’s secret is leaked out during the conversation between the claimant and the verifier. The verifier (or a listener) continues to remain as much ignorant of the secret as he was before the invocation of the protocol. In other words, the verifier (or a listener) does not learn anything form the conversation, that he could not learn by himself in absence of the claimant. The only thing the verifier gains is the confidence whether the claimant actually knows the secret or not. This is intuitively the defining feature of a ZK protocol.

Similar to other public-key techniques, the security of the ZK protocols is based on the intractability of some difficult computational problems. A repeated use of a public-key scheme with a given set of parameters may degrade the security of the scheme under those parameters. For example, each encryption of a message (or each generation of a signature) makes available a plaintext–ciphertext pair which may eventually help a cryptanalyst. A ZK protocol, on the other hand, does not lead to such a degradation of the security of the protocol, irrespective of how many times it is invoked.

We stick to the usual scenario: Alice is the claimant, Bob is the verifier and Carol is an eavesdropper trying to impersonate Alice. In the jargon of ZK protocols, Alice (and not Bob) is called the prover. In order to avoid confusions, we continue to use the terms claimant and verifier. A ZK protocol is usually a three-pass interactive protocol. To start with, Alice chooses a random commitment and sends a witness of the commitment to Bob. A new commitment should be selected by Alice during each invocation of the protocol in order to guard against an adversarial verifier. Upon receiving the witness, Bob chooses and sends a random challenge to Alice. Finally, Alice replies by sending a response to the challenge. If Alice knows the secret (and performs the protocol steps correctly), her response can be easily proved by Bob to be valid. Carol, in an attempt to impersonate Alice without knowing the secret, can produce the valid response with a probability P bounded away from 1. If P happens not to be negligibly small, then the protocol can be repeated a sufficient number of times, so that Carol’s probability of giving the correct response on all occasions becomes extremely low.

The parameters and the secrets for a ZK protocol can be set privately by each claimant. Another alternative is that a trusted third party (TTP) generates a set of parameters and makes these parameters available for use by every claimant over a network. A second duty of the TTP is to register a secret against each entity. The secret may be generated either by the TTP or by the respective entity. The knowledge of this (registered) secret by an entity is equivalent to her identity in the network. Finally, the authenticity of the public key of an entity is ensured by the digital signature of the TTP on the public key. For simplicity, however, we will not bother about the existence of the TTP and the way in which the secret (the possession of which by Alice is to be proved) has been created and/or handed over to Alice. We will also assume that each entity’s public key is authentic.

The Feige–Fiat–Shamir (FFS) protocol

The FFS protocol (Algorithm 5.69) is based on the intractability of computing square roots modulo a composite integer n. We take n = pq with two distinct primes p and q each congruent to 3 modulo 4.

Algorithm 5.69. Feige–Fiat–Shamir zero-knowledge protocol

Selection of domain parameters:

Select two large distinct primes p and q each congruent to 3 modulo 4.

n := pq.

Select a small integer t.

/* The probability of a successful cheat is 2^–t */

Selection of Alice’s secret:

Alice selects t random integers .

Alice selects t random bits .

Alice computes for i = 1, . . . , t.

Alice makes (y₁, . . . , y_t) public and keeps (x₁, . . . , x_t) secret.

The protocol:

Alice randomly chooses and .	/* Commitment */
Alice computes and sends to Bob w := (–1)^γc² (mod n).	/* Witness */
Bob randomly chooses and sends to Alice .	/* Challenge */
Alice computes and sends to Bob .	/* Response */

Bob computes (mod n).

Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ ≡ ±w (mod n).

It is clear from Algorithm 5.69 that knowing the secret (x₁, . . . , x_t) allows Alice to let Bob accept her identity (as Alice). The check w′ ≠ 0 in the last line is necessary to preclude the commitment c = 0, that makes any claimant succeed irrespective of the availability of the knowledge of the secret.

Now, let us see how an opponent (Carol), without knowing the secret, can succeed in impersonating Alice by taking part in this protocol. To start with, we consider the simple case t = 1 (which corresponds to Fiat and Shamir’s original scheme). Carol can start the process by generating a random c and γ and computing w = (–1)^γc². Now, Carol should send the response c or cx₁ depending on whether Bob sends ∊₁ = 0 or 1. Her capability of sending both correctly is equivalent to her knowledge of x₁. If Bob sends ∊₁ = 0, then she can provide the correct response c. Otherwise, Carol can at best select a random response from , and the probability that this is correct is overwhelmingly low. On the other hand, let Carol choose a random c and and send the (improper) witness . In that case, Carol can answer the valid response r = c, if Bob’s challenge is ∊₁ = 1. Sending the correct response to the challenge ∊₁ = 0 now requires knowledge of x₁. Therefore, if ∊₁ is randomly chosen by Bob (without the prior knowledge of Carol), Carol can successfully respond with probability (very close to) 1/2. For t ≥ 1, this probability of a cheat by Carol can be easily shown to be (very close to) 1/2^t which is negligibly small for t ≥ 80.

In practice, however, t is chosen to be O(ln ln n). It is, therefore, necessary to repeat the protocol t′ times, so that the probability of a successful cheat becomes (nearly) 1/2^tt′. Taking t′ = Θ(ln n) is recommended. It can be shown that these choices for t and t′ offer the FFS protocol the desired ZK property. Without going into a proof of this assertion, let us informally explain the ZK property of the FFS protocol. Neither Bob nor a listener to the conversation between Alice and Bob can get any idea of the secret (x₁, . . . , x_t). Bob gets as a response the product of c and those x_i’s for which ∊_i = 1. Since c is randomly chosen by Alice and is not available to Bob, there is no way to choose a strategic challenge. However, if the square root of w (or –w) can be computed by Bob, then the interaction may give away partial information on the secret. For example, if Bob chooses the challenge (∊₁, ∊₂, . . . , ∊_t) = (1, 0, . . . , 0), then Alice’s response would be cx₁ from which x₁ can be computed by Bob, if he knows c. Thus, the security and the ZK property of the FFS protocol are based on the assumption that computing square roots modulo n is an infeasible computational problem.

The Guillou–Quisquater (GQ) protocol

The GQ identification protocol is based on the intractability of the RSA problem. The correctness of Algorithm 5.70 (for a legitimate claimant) is easy to establish. The check w′ ≠ 0 is necessary to avoid the commitment c = 0, which makes a claimant succeed always.

A TTP typically selects the domain parameters p, q, n, e and d. It also selects m and gives s to Alice without revealing d. The execution of the protocol does not require the use of the decryption exponent d. In fact, d is a global secret, whereas s is Alice’s personal secret. Alice tries to prove the knowledge of s (and not of d).

In the GQ algorithm, the power s^∊ is blinded by multiplying it with the random commitment c. As a witness for c, Alice presents its encrypted version w. With the assumption that RSA decryption without the knowledge of the decryption exponent d is infeasible, Bob (or an eavesdropper) cannot compute c and hence cannot separate out the value of s^∊. Thus, no partial information on s is provided. Furthermore, each invocation requires a random ∊. In order to compute a strategic witness, Carol can at best have a guess of ∊. The guess is correct with a probability of 1/e. If e is reasonably large, the probability of a successful cheat is low. However, larger values of e lead to more expensive generation of the witness from the commitment (and also of the response). So small values of e (say, 2¹⁶ + 1 = 65,537) are usually recommended. In that case, repeating the protocol a suitable number of times makes Carol’s chance of cheating as small as one desires. Taking t′e (where t′ is the number of iterations of the protocol) of the order of (log n)^α for some constant α gives the GQ protocol the desired zero-knowledge property.

Algorithm 5.70. Guillou–Quisquater zero-knowledge protocol

Selection of domain parameters:

Select two distinct large primes p and q and set the modulus n := pq.

Select an exponent and compute d := e^–1 (mod φ(n)).

The pair (n, e) is made public and d is kept secret.

Selection of Alice’s secret:

Alice selects a random and computes s := m^–d (mod n).

Alice makes m public and keeps s secret.

The protocol:

Alice selects a random .	/* Commitment */
Alice computes and sends to Bob w := c^e (mod n).	/* Witness */
Bob selects and sends to Alice a random .	/* Challenge */
Alice computes and sends to Bob r := cs^∊ (mod n).	/* Response */

Bob computes w′ := m^∊r^e (mod n).

Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ = w.

The Schnorr protocol

The Schnorr protocol is based on the intractability of computing discrete logarithms in a large prime field . We assume that a suitably large prime divisor q of p – 1 and an element of multiplicative order q are known. The algorithm works in the subgroup of , generated by g. In order to make the known algorithms for solving the DLP infeasible for the field , one should have q > 2¹⁶⁰.

Algorithm 5.71. Schnorr zero-knowledge protocol

Selection of domain parameters:

Select a large prime p such that p – 1 has a large prime divisor q.

Select an element having multiplicative order q modulo p.

Publish (p, q, g).

Select a small integer t < lg q. /* The probability of a successful cheat is 2^–t */

Selection of Alice’s secret:

Alice chooses a random secret integer .

Alice computes and makes public the integer y := g^–d (mod p).

The protocol:

Alice chooses a random .	/* Commitment */
Alice computes and sends to Bob w := g^c (mod p).	/* Witness */
Bob selects and sends to Alice a random .	/* Challenge */
Alice computes and sends to Bob r := d∊ + c (mod q).	/* Response */

Bob computes w′ := g^ry^∊ (mod p).

Bob accepts Alice’s identity if and only if w′ = w.

We leave the analysis of correctness and security of this protocol to the reader. The secret s is masked from Bob and other eavesdroppers by introducing the random additive bias c modulo q. The probability of a successful cheat by an adversary is 2^–t, since ∊ is chosen randomly from a set of cardinality 2^t. Usually the Schnorr protocol is not used iteratively. Therefore, t ≥ 40 is recommended for making the probability of cheating negligible. On the other hand, if t is too large, then the protocol can be shown to lose the ZK property. For the generation of the witness from the commitment, Alice computes a modular exponentiation to an exponent which is O(q). Generating the response, on the other hand, involves a single multiplication (and a single addition) modulo q and hence is very fast.

Exercise Set 5.5

5.25

Describe how a zero-knowledge witness–challenge–response identification scheme can be converted to a signature scheme. [H]
Write the Feige–Fiat–Shamir, Guillou–Quisquater and Schnorr signature schemes based on the corresponding identification schemes.

5.26

Let n := pq with distinct primes p and q each congruent to 3 modulo 4.

Show that –1 is a quadratic non-residue modulo p and modulo q.
If is a quadratic residue modulo n, prove that a has exactly four square roots modulo n, of which exactly one is a quadratic residue modulo n.

Consider the following identification protocol in which Alice wants to prove to Bob her knowledge of the factorization of n = pq. Assume that p and q are sufficiently large so that computing square roots modulo n is infeasible without the knowledge of the factorization of n. Argue that Alice can prove her identity to Bob if and only if she knows the factorization of n.

A bad zero-knowledge protocol

Bob chooses a random and computes a := x⁴ (mod n).

Bob sends a to Alice.

Alice computes four square roots of a modulo n and picks up the unique
square root b which is a quadratic residue modulo n.

Alice sends b to Bob.

Bob accepts Alice’s claim if and only if b ≡ x² (mod n).

Conclude that this is not a good zero-knowledge protocol, by demonstrating that Bob can maliciously send a bad a to Alice so that during the execution of the protocol he gathers enough information to factor n. [H]

6.2. IEEE Standards

In this section, we outline the first three of the drafts from IEEE, shown in Table 6.1. At the time of writing this book, these are the latest versions of the drafts available from IEEE. In future, these may be superseded by newer documents. We urge the reader to visit the web-site http://grouper.ieee.org/groups/1363/ for more up-to-date information. Also see the standard IEEE 1363–2000: Standards Specifications for Public-key Cryptography [134].

Table 6.1. IEEE drafts on public-key cryptography
Draft	Date	Description
P1363 / D13	12 November 1999	Traditional public-key cryptography based on IFP, DLP and ECDLP
P1363a/D12	16 July 2003	Additional techniques on traditional public-key cryptography
P1363.1/D4	7 March 2002	Lattice-based cryptography
P1363.2/D15	25 May 2004	Password-based authentication
P1363.3/D1	May 2008	Identity-based public-key cryptography

6.2.1. The Data Types

Public-key protocols operate on data of various types. The IEEE drafts specify only the logical descriptions of these data types. The realizations of these data types should be taken care of by individual implementations and are left unspecified.

Bit strings

A bit string is a finite ordered sequence a₀a₁ . . . a_l–1 of bits, where each bit a_i can assume the value 0 or 1. The length of the bit string a₀a₁ . . . a_l–1 is l. The bit a₀ in the bit string a₀a₁ . . . a_l–1 is called the leftmost or the first or the leading or the most significant bit, whereas the bit a_l–1 is called the rightmost or the last or the trailing or the least significant bit.

The order of appearance of the bits in a bit string is important, rather than the way the bits are indexed or named. That is to say, the most and least significant bits in a given bit string are uniquely determined by their positions of occurrences in the string, and not by the way the individual bits in the string are numbered. Thus, for example, if we call the bit string 01101 as a₀a₁a₂a₃a₄, then the leading and trailing bits are a₀ and a₄ respectively. If we index the bits in the same bit string as a₂a₃a₅a₇a₁₁, the first bit is a₂ and the last bit is a₁₁. Finally, for the indexing a₅a₄a₃a₂a₁, the leftmost and rightmost bits are a₅ and a₁ respectively.

Octet strings

Though bits are the basic building blocks in computer memory, programs typically access memory in groups of 8 bits, known as octets. Thus, an octet is a bit string of length 8 and can have one of the 256 values 0000 0000 through 1111 1111. It is convenient to write an octet as a concatenation of two hexadecimal digits, the first (resp. second) one corresponding to the first (resp. last) 4 bits in the octet being treated as an 8-bit integer in base 2. For example, the octet 0010 1011 is represented by 2b. It is also often customary to treat an octet a₀a₁ . . . a₇ as the integer (between 0 and 255, both inclusive) whose binary representation is a₀a₁ . . . a₇.

An octet string is a finite ordered sequence of octets. The length of an octet string is the number of octets in the string. The leftmost (or first or leading or most significant) and the rightmost (or last or trailing or least significant) octets in an octet string are defined analogously as in the case of bit strings. These octets are dependent solely on their positions in the octet string and are independent of how the individual octets in the octet string are numbered.

Integers

Integers are the whole numbers 0, ±1, ±2, . . . . For cryptographic applications, one typically considers only non-negative integers. Integers used in cryptography may have binary representations requiring as many as several thousand bits.

Prime finite fields

Let p be a prime (typically, odd). The elements of are represented as integers 0, 1, . . . , p – 1 under the standard way of associating the integer with the congruence class [a]_p in . Arithmetic operations in are the corresponding integer operations modulo the prime p.

Finite fields of characteristic 2

The elements of the field are represented as bit strings of length m. In order to provide the mathematical interpretation of these bit strings, we recall that is an m-dimensional -vector space. Let β₀, . . . , β_m–1 be an ordered basis of over . The bit string a₀ . . . a_m–1 is to be identified with the element a₀β₀ + · · · + a_m–1β_m–1, where the bit a_i represents the element [a_i]₂ of . Selection of the basis β₀, . . . , β_m–1 renders a complete meaning to this representation and determines how arithmetic operations on these elements are to be performed. The following two cases are recommended.

For the polynomial-basis representation, one chooses an irreducible polynomial of degree m and represents as . Letting x denote the canonical image of X in one chooses the ordered basis β₀ = x^m–1, β₁ = x^m–2, . . . , β_m–1 = 1. Arithmetic operations in under this representation are those of followed by reduction modulo the defining polynomial f(X). Choice of the irreducible polynomial f(X) is left unspecified in the IEEE drafts.

For the normal-basis representation, one selects an element which is normal over (see Definition 2.60, p 86), and takes the ordered basis β₀ = θ = θ^2⁰, β₁ = θ^2¹, β₂ = θ^2², . . . , β_m–1 = θ^{2^m–1}. Arithmetic in is carried out as explained in Section 2.9.3.

The IEEE draft P1363a also specifies a composite-basis representation of elements of , provided that m is composite. Let m = ds with 1 < d < m. One chooses an (ordered) polynomial or normal basis γ₀, γ₁, . . . , γ_s–1 of over . An element of is of the form a₀γ₀ + a₁γ₁ + · · · + a_s–1γ_s–1 and is represented by a₀a₁ . . . a_s–1, where each a_i, being an element of , is represented by a bit string of length d. The interpretation of the representation of a_i is dependent on how is represented. One can use a polynomial- or normal-basis representation of (over ), or even a composite-basis representation of over , if d happens to be composite with a non-trivial divisor d′.

Extension fields of odd characteristics

A non-prime finite field of odd characteristic is one with cardinality p^m for some odd prime p and for some , m > 1. The field is represented as , where is an irreducible polynomial of degree m. An element of is then of the form α = a_m–1x^m–1 + · · · + a₁x + a₀, where x := X + 〈f(X)〉 and where each a_i is an element of , that is, an integer in the range 0, 1, . . . , p – 1. The element α is represented as an integer by substituting p for x, that is, as the integer (see the packed representation of Exercise 3.39). In order to interpret an integer between 0 and p^m – 1 as an element of , one has to expand the integer in base p.

* Elliptic curves

An elliptic curve defined over a finite field is specified by two elements a, . Depending on the characteristic of this pair defines the following curves.

If char , 3, then 4a³ + 27b² must be non-zero in and the equation of the elliptic curve is taken to be Y² = X³ + aX + b.

For char , we must have b ≠ 0 in and we use the non-supersingular curve Y² + XY = X³ + aX² + b. Because of the MOV attack (Section 4.5.1), supersingular curves are not recommended for cryptographic applications.

Finally, if has characteristic 3, then both a and b must be non-zero in and the elliptic curve Y² = X³ + aX² + b is specified by (a, b).

* Elliptic curve points

A point on an elliptic curve defined over can be represented either in compressed or in uncompressed form. In the uncompressed form, one represents P as the pair (h, k) of elements of . The compressed form can be either lossy or lossless. In the lossy compressed form, P is represented by its X-coordinate h only. Such a representation is not unique in the sense that there can be two points on the elliptic curve with the same X-coordinate h. In applications where Y -coordinates of elliptic curve points are not utilized, such a representation can be used. In the lossless compressed form, one represents P as . There are two solutions (perhaps repeated) for Y for a given value h of X. The bit specifies which of these two values is represented. Depending on how the bit is computed, we have two different lossless compressed forms.

The LSB compressed form is applicable for odd prime fields or fields of even characteristic. For , the bit is taken to be the least significant (that is, rightmost) bit of k (treated as an integer). For , we have , if h = 0, whereas if h ≠ 0, then is the least significant bit of the element kh^–1 treated as an integer via the FE2I conversion primitive described in Section 6.2.2.

The SORT compressed form is used for q = p^m, m > 1. Let P′ = (h, k′) be the opposite of P = (h, k), that is, One converts k and k′ to integers and using the FE2I primitive and sets .

One may also go for a hybrid representation of the elliptic curve point P = (h, k), in which information for both the compressed and the uncompressed representations for P are stored, that is, P is stored as with computed by one of the methods (LSB or SORT) described above.

* Convolution polynomial rings

For NTRU public-key cryptosystems, we work in the ring . We denote as usual. An element of R is a polynomial a(x) = a₀ + a₁x + a₂x² + · · · + a_n–1x^n–1 with , and is represented by the ordered n-tuple of integers (a₀, a₁, . . . , a_n–1). Addition (resp. subtraction) in R is simply component-wise addition (resp. subtraction), whereas multiplication of a(x) = a₀ + a₁x + · · · + a_n–1x^n–1 and b(x) = b₀ + b₁x + · · · + b_n–1x^n–1 gives c(x) = c₀ + c₁x + · · · + c_n–1x^n–1, where a_jb_k (see Section 5.2.8). The IEEE draft P1363.1 designates elements of R as ring elements.

It is customary to deal with polynomials in R with small coefficients. If all the coefficients of are known to be from {0, 1}, it is convenient to represent a(x) as the bit string a₀a₁ . . . a_n–1 instead of as an n-tuple of integers. In this case, a(x) is called a binary ring element or simply a binary element.

6.2.2. Conversion Among Data Types

The IEEE drafts P1363 and P1363.1 specify algorithms for converting data among the formats discussed above. The standardized data conversion primitives are summarized in Figure 6.1. Though these drafts support elliptic curve cryptography, it is not specified how data representing elliptic curves can be converted to data of other types (like octet strings and bit strings).

Figure 6.1. IEEE P1363 data types and conversions

We now provide a brief description of the data conversion primitives at a logical level. The implementation details depend on the representations of the data types and are left out here.

Converting bit strings to octet strings (BS2OS)

A bit string a₀a₁ . . . a_l–1 can be broken up in groups of eight bits and packed into octets. But we run with difficulty, if the length of the input bit string is not an integral multiple of 8. We have to add extra bits in order the make the length of the augmented bit string an integral multiple of 8. This can be done is several ways and in this context a standard convention needs to be adopted. The IEEE drafts prescribe the following rules:

Every extra bit added must be the zero bit.
Add the minimal number of extra bits.
Add the extra bits, if any, to the left.^[1]
^[1] At the time of writing this book there is a serious conflict between the latest drafts of P1363 and P1363.1 from IEEE. The former asks to add extra bits to the left, the latter to the right. One of the authors of this book raised this issue in the discussion group stds-p1363-discuss maintained by IEEE and was notified that in the next version of the P1363.1 document this conflict would be resolved in favour of P1363.

In order to see what these rules mean, let a₀a₁ . . . a_l–1 be a bit string of length l to be converted to the octet string A₀A₁ . . . A_d–1. The length of the output octet string must be d = ⌈l/8⌉. 8d – l zero bits should be added to the left of the input bit string in order to create the augmented bit string 0 . . . 0a₀a₁ . . . a_l–1 whose length is 8d. Now, we start from the left and pack blocks of consecutive eight bits in A₀, A₁, . . . , A_d–1. Thus, we have A₀ = 0 . . . 0a₀ . . . a_k–1, A₁ = a_k . . . a_k+7, . . . , A_d–1 = a_k+8(d–2) . . . a_k+8(d–2)+7, where k = 8 – (8d – l). Note that if l is already a multiple of 8, then 8d – l = 0, that is, no extra bits need to be added.

As an example, consider the input bit string 01110 01101011 of length 13. The output octet string should be of length ⌈13/8⌉ = 2. Padding gives the augmented bit string 00001110 01101011. The first octet in the output octet string will then be 00001110, that is, 0e; and the second octet will be 01101011, that is, 6b.

Converting octet strings to bit strings (OS2BS)

The OS2BS primitive is designed to ensure that if we convert an octet string generated by BS2OS, we should get back the original bit string (that is, the input to BS2OS) with which we started. Suppose that we want to convert an octet string A₀A₁ . . . A_d–1. Let us write the bits of A_i as a_i,0a_i,1 . . . a_i,7. The desired length l of the output bit string has to be also specified. If d ≠ ⌈l/8⌉, the procedure OS2BS reports error and stops. If d = ⌈l/8⌉, we consider the bit string

a_0,0a_0,1 . . . a_0,7a_1,0a_1,1 . . . a_1,7 . . . a_d–1,0a_d–1,1 . . . a_d–1,7

of length 8d. If the leftmost 8d – l bits of this flattened bit string are not all zero, OS2BS should quit after reporting error. Otherwise, the trailing l bits of the flattened bit string is returned.

The reader can check that when 0e 6b and l = 13 are input to OS2BS, it returns the bit string 01110 01101011. (See the example in connection with BS2OS.) Notice also that for this input octet string, OS2BS reports error if and only if a value l ≥ 17 or l ≤ 11 is supplied as the desired length of the output bit string.

Converting integers to bit strings (I2BS)

Let a non-negative integer n be given. The I2BS primitive outputs a bit string of length l representing n. If n ≥ 2^l, this conversion cannot be done and the primitive reports error and quits. If n < 2^l, we write the binary representation of n as

n = a_l–12^l–1 + a_l–22^l–2 + · · · + a₁2 + a₀ with .

Treating each a_i as a bit^[2], I2BS returns the bit string a_l–1a_l–2 . . . a₁a₀. One or more leading bits of the binary representation of n may be zero. There is no limit on how many leading zero bits are allowed during the conversion. In particular, the integer 0 gets converted to a sequence of l zero bits for any value of l supplied.

^[2] Each a_i is logically an integer which happens to assume one of two possible values: 0 and 1. A bit, on the other hand, is a quantity that can also assume only two possible values. Traditionally, the values of a bit are also denoted by 0 and 1. But one has the liberty to call these values off and on, or false and true, or black and white, or even armadillo and platypus. To many people, bit is an abbreviation for binary digit which our a_is logically are. To others, binit is a safer and more individualistic acronym for binary digit. For I2BS, we identify the two concepts.

A request to I2BS to convert n = 2357 = 2¹¹ + 2⁸ + 2⁵ + 2⁴ + 2² + 2⁰ with l = 12 returns 1001 00110101, one with l = 18 returns 00 00001001 00110101 and finally one with l ≤ 11 reports failure. Note that for neater look we write bit strings in groups of eight and grouping starts from the right. This convention reflects the relationship between bit strings and octet strings, as mentioned above.

Converting bit strings to integers (BS2I)

The primitive BS2I converts the bit string a₀a₁ . . . a_l–1 to the integer a₀2^l–1 + a₁2^l–2 + · · · + a_l–22 + a_l–1, where we again identify a bit with an integer (or a binary digit). As an illustrative example, the bit string 1001 00110101 (or 00 00001001 00110101) gets converted to the integer 2¹¹ + 2⁸ + 2⁵ + 2⁴ + 2² + 2⁰ = 2357. The null bit string (that is, the one of zero length) is converted to the integer 0.

Converting integers to octet strings (I2OS)

In order to convert a non-negative integer n to an octet string of length d, we write the base-256 expansion of n as

n = A_d–1256^d–1 + A_d–2256^d–2 + · · · + A₁256 + A₀,

where each and can be naturally identified with an octet. I2OS returns the octet string A_d–1A_d–2 . . . A₁A₀. Note that the above representation of n to the base 256 is possible if and only if n < 256^d. If n ≥ 256^d, I2OS should return failure. Like bit strings, an arbitrary number of leading zero octets are allowed.

Consider the integer 2357 = 9 × 256 + 53. The two-digit hexadecimal representations of 9 and 53 are 09 and 35 respectively. Thus, a call of I2OS on this n with d = 3 (resp. d = 2, resp. d = 1) returns 00 09 35 (resp. 09 35, resp. failure).

Converting octet strings to integers (OS2I)

Let an octet string A₀A₁ . . . A_d–1 be given. Each A_i can be identified with a 256-ary digit. OS2I returns the integer A₀256^d–1 + A₁256^d–2 + · · · + A_d–2256 + A_d–1. If d = 0, the integer 0 should be output.

Converting field elements to octet strings (FE2OS)

In the IEEE P1363 jargon, a field element is an element of the finite field , where q is a prime or an integral power of a prime. We want to convert an element to an octet string. Depending on the value of q, we have two cases:

If char is odd, β is represented as an integer in {0, 1, . . . , q – 1}. FE2OS converts β to an octet string of length ⌈log₂₅₆ q⌉ by calling the primitive I2OS.

If q = 2^m, β is represented as a bit string of length m. The primitive BS2OS is called to convert β to an octet string.

Converting octet strings to field elements (OS2FE)

Assume that an octet string is to be converted to an element of the finite field . Again we have two possibilities depending on q.

If is of odd characteristic, the primitive OS2I is called to convert the given octet string to an integer. This integer is returned as the field element.

If q = 2^m, one calls the primitive OS2BS with the given octet string and with the length m supplied as inputs. The resulting bit string is returned by OS2FE. If OS2BS reports error, so should do OS2FE too.

Converting field elements to integers (FE2I)

Let and the integer equivalent of β be sought for. If q is odd, then β is already represented as an integer (in {0, 1, . . . , q – 1}) and is itself output. If q = 2^m, one first converts β to an octet string by FE2OS and subsequently converts this octet string to an integer by calling the primitive OS2I.

* Converting elliptic curve points to octet strings (EC2OS)

The point at infinity (on an elliptic curve over ) is defined by an octet string comprising a single zero octet only. So let P = (h, k) be a finite point. The EC2OS primitive produces an octet string PO = P C ‖ H ‖ K which is the concatenation of a single octet PC with octet strings H and K representing h and k respectively. The values of PC and K depend on the type of compression used. One has , where

S = 1 if and only if the SORT compression is used.
U = 1 if and only if uncompressed or hybrid form is used.
C = 1 if and only if compressed or hybrid form is used.
= if compression is used, it is 0 otherwise.

The first four bits of PC are reserved for (possible) future use and should be set to 0000 for this version of the standard. H is the octet string of length ⌈log₂₅₆ q⌉ obtained by converting h using FE2OS. If the compressed form is used, K is the empty octet string, whereas if uncompressed or hybrid form is used, we have K = FE2OS(k, ⌈log₂₅₆ q⌉). Finally, for the lossy compression we have PC = 0000 0001, H = FE2OS(h, ⌈log₂₅₆ q⌉) and K is empty. Table 6.2 summarizes all these possibilities. Here, l := ⌈log₂₅₆ q⌉, and p is an odd prime.

Table 6.2. The EC2OS primitive
Representation	PC	H	K	q
uncompressed	0000 0100	FE2OS(h, l)	FE2OS(k, l)	All
LSB compressed		FE2OS(h, l)	Empty	p, 2^m
LSB hybrid		FE2OS(h, l)	FE2OS(k, l)	p, 2^m
SORT compressed		FE2OS(h, l)	Empty	2^m, p^m
SORT hybrid		FE2OS(h, l)	FE2OS(k, l)	2^m, p^m
lossy compression	0000 0001	FE2OS(h, l)	Empty	All
point at infinity	0000 0000	Empty	Empty	All

* Converting octet strings to elliptic curve points (OS2EC)

The OS2EC data conversion primitive takes as input an octet string PO, the length l = ⌈log₂₅₆ q⌉ and the method of compression. If PO contains only one octet and that octet is zero, is output. Otherwise, the elliptic curve point P = (h, k) is computed as follows. OS2EC decomposes PO = PC ‖ H ‖ K, with PC the first octet and with H an octet string of length l. If PC does not match with the method of compression, OS2EC returns error. Otherwise, it uses OS2FE to compute the field element h. If no or hybrid compression is used, the Y -coordinate k is also computed using OS2FE on K. If (h, k) is not a point on the elliptic curve, error is reported. For the LSB or SORT compression, the Y -coordinate is computed using h and . If the hybrid scheme is used and , OS2EC halts after reporting error. If all computations are successful till now, the point (h, k) is output.

Note that the checks for (h, k) being on the curve or for the equality are optional and may be omitted. For the lossy compression scheme, the Y -coordinate k is not necessarily uniquely determined from the input octet string PO. In that case, any of the two possibilities is output.

* Converting ring elements to octet strings (RE2OS)

Ring elements are elements of the convolution polynomial ring and can be identified as polynomials with integer coefficients and of degrees < n. The element (where ) is represented by the n-tuple of integers (a₀, a₁, . . . , a_n–1). The IEEE draft P1363.1 assumes that the coefficients a_i are available modulo a positive integer β ≤ 256. But then each a_i is an integer in {0, 1, . . . , β – 1} and can be naturally encoded by a single octet. RE2OS, upon receiving a(x) as input, outputs the octet string a₀a₁ . . . a_n–1 of length n.

An example: Let n = 7 and β = 128. The ring element a(x) = 2 + 11x + 101x³ + 127x⁴ + 71x⁵ = (2, 11, 0, 101, 127, 71, 0) is converted to the octet string 02 0b 00 65 7f 47 00.

* Converting octet strings to ring elements (OS2RE)

Let an octet string a₀a₁ . . . a_n–1 of length n be given, which we want to convert to an element of . Once again a modulus β ≤ 256 is assumed, so that each octet a_i can be viewed as an integer reduced modulo β. Making the natural identification of a_i with an integer, the polynomial is output. Thus, for example, the octet string 02 0b 00 65 7f 47 00 gets converted to the ring element 2 + 11x + 101x³ + 127x⁴ + 71x⁵.

* Converting ring elements to bit strings (RE2BS)

The RE2BS primitive assumes that the modulus β is a power of 2, that is, β = 2^t for some positive integer t ≤ 8. Let a ring element be given, where each . One applies the I2BS primitive on each a_i to generate the bit string a_i,0a_i,1 . . . a_i,t–1 of length t. The concatenated bit string

a_0,0a_0,1 . . . a_0,t–1 a_1,0a_1,1 . . . a_1,t–1 . . . a_n–1,0a_n–1,1 . . . a_n–1,t–1

of length nt is then returned by RE2BS.

As before, take the example of n = 7, β = 128 = 2⁷ (so that t = 7) and a(x) = 2 + 11x + 101x³ + 127x⁴ + 71x⁵ = (2, 11, 0, 101, 127, 71, 0). The coefficients 2, 11, 0, . . . should first be converted to bit strings of length 7 each, that is, 2 gives 0000010, 11 gives 0001011 and so on. Thus, the bit string output by RE2BS will be 0000010 0001011 0000000 1100101 1111111 1000111 0000000. Note that here we have shown the bits in groups of 7 in order to highlight the intermediate steps (the outputs from I2BS). With the otherwise standard grouping in blocks of 8, the output bit string looks like 0 00001000 01011000 00001100 10111111 11100011 10000000 and hence transforms to the octet string 00 08 58 0c bf d3 80 by an invocation of BS2OS. This example illustrates that RE2BS followed by BS2OS does not necessarily give the same output as the direct conversion RE2OS, even when every underlying parameter (like β) remains unchanged.

* Converting bit strings to ring elements (BS2RE)

Once again we require the modulus β to be a power 2^t of 2. Let a bit string a₀a₁ . . . a_l–1 of length l be given, and we want to compute the ring element a(x) equivalent to this. If l is not an integral multiple of t, the algorithm should quit after reporting error. Otherwise we let l = nt for some , and repeatedly call the BS2I primitive on the bit strings a₀a₁ . . . a_t–1, a_ta_t+1 . . . a_2t–1, . . . , a_nt–ta_nt–t+1 . . . a_nt–1 to get the integers α₀, α₁, . . . , α_n–1 respectively. The polynomial a(x) = α₀ + α₁x + · · · + α_n–1x^n–1 is then output.

We urge the reader to verify that BS2RE with β = 128 and the bit string

0000010 0001011 0000000 1100101 1111111 1000111 0000000

as input produces the ring element .

* Converting binary elements to octet strings (BE2OS)

A binary (ring) element is an element with each . One can convert a(x) to an octet string A₀A₁ . . . A_l–1 of any desired length l as follows. We denote the bits in the octet A_i as A_i,7A_i,6 . . . A_i,0. Here, the index of the bits increases from right to left.

First we rewrite the polynomial a(x) as one of degree 8l – 1, that is, as a(x) = a₀ + a₁x + · · · + a_8l–1x^8l–1. If n ≤ 8l, this can be done by setting a_n = a_n+1 = · · · = a_8l–1 = 0. On the other hand, if n > 8l and one or more of the coefficients a_8l, a_8l+1, . . . , a_n–1 are non-zero (that is, 1), the above rewriting of a(x) cannot be done and BE2OS terminates after reporting failure.

When the above rewriting of a(x) becomes successful, one sets the bits of the output octets as A_0,0 := a₀, A_0,1 := a₁, . . . , A_0,7 := a₇, A_1,0 := a₈, A_1,1 := a₉, . . . , A_1,7 := a₁₅, A_2,0 := a₁₆, A_2,1 := a₁₇, . . . , A_2,7 := a₂₃, . . . , A_l–1,0 := a_8l–8, A_l–1,1 := a_8l–7, . . . , A_l–1,7 := a_8l–1.

As an example, take n = 20 and consider the binary element . First let l = 1. Rewriting a(x) as a polynomial of degree 7 is not possible, since the coefficients of x¹⁰ and x¹² are 1; so BE2OS outputs error in this case. If l = 2, then the output octet string will be 00000111 00010100, that is, 07 14. For l ≥ 3, the first two octets will be 07 and 14 as before, whereas the 3rd through l-th octet will be 00.

The BE2OS primitive can be quite effective for reducing storage requirements. For example, the polynomial a(x) of degree 12 of the previous paragraph, viewed as an element of , can be encoded in just two octets. Of course, by specifying l ≥ 3 one may add l – 2 trailing zero octets, if one desires. On the other hand, RE2OS requires exactly 200 octets, whereas RE2BS with β = 128 followed by BS2OS requires exactly ⌈(200 × 7)/8⌉ = 175 octets for storing the same a(x).

* Converting octet strings to binary elements (OS2BE)

Assume that an octet string A₀A₁ . . . A_l–1 of length l is given and the equivalent binary element in is to be determined. As in the case with BE2OS, we index the bits in the octet A_i as A_i = A_i,7A_i,6 . . . A_i,0. Now, consider the polynomial a(x) = a₀ + a₁x + a₂x² + · · · + a_8l–1x^8l–1, where a_8i+j = A_i,j. If n ≥ 8l, we set a_8l = a_8l+1 = · · · = a_n–1 = 0 and output the binary element . On the other hand, if n < 8l and a_n = a_n+1 = · · · = a_8l–1 = 0, then equals the polynomial a(x) and is returned. Finally, if n < 8l and if any of the coefficients a_n, a_n+1, . . . , a_8l–1 is non-zero, then OS2BE returns error.^[3]

^[3] In this case, it still makes full algebraic sense to treat a(x) as an element of R, though not in the canonical representation.

For example, assume that the octet string 07 14 is given as input to OS2BE. If n ≤ 12, the algorithm outputs error, because the polynomial a(x) in this case has degree 12. For any n ≥ 13, the binary element is returned.

6.3. RSA Standards

The public-key cryptography standards (PKCS) [254] refer to a set of standard specifications proposed by the RSA Laboratories. A one-line description of each of these documents is given in Table 6.3. In the rest of this section, we concentrate only on the documents PKCS #1 and #3.

Table 6.3. Public-key cryptography standards from the RSA Laboratories
Document	Description
PKCS #1	RSA encryption and signature
PKCS #2	Merged with PKCS #1
PKCS #3	Diffie–Hellman key exchange
PKCS #4	Merged with PKCS #1
PKCS #5	Password-based cryptography
PKCS #6	Extension of X.509 public-key certificates
PKCS #7	Syntax of cryptographic messages
PKCS #8	Syntax and encryption of private keys
PKCS #9	Attribute types for use in PKCS #6, #7, #8 and #10
PKCS #10	Syntax for certification requests
PKCS #11	Cryptoki, an application programming interface (API)
PKCS #12	Syntax of transferring personal information (private keys, certificates and so on)
PKCS #13	Elliptic curve cryptography (under preparation)
PKCS #15	Syntax for cryptographic token (like integrated circuit card) information

6.3.1. PKCS #1

PKCS #1 describes RSA encryption and RSA signatures. In this section, we summarize Version 2.1 (dated 14 June 2002) of the standard. This version specifies cryptographically stronger encoding procedures compared to the older versions. More specifically, the optimal asymmetric encryption procedure (OAEP [18]) for RSA encryption is incorporated in the Version 2.0 of PKCS #1, whereas the new probabilistic signature scheme (PSS [19]) is introduced in Version 2.1. This latest draft also includes encryption and signature schemes compatible with older versions (1.5 and 2.0). However, adoption of the new algorithms is strongly recommended for enhanced security.

RSA keys

PKCS #1 Version 2.1 introduces the concept of multi-prime RSA, in which the RSA modulus n may have more than two prime divisors. For RSA encryption and decryption to work properly, we only need n to be square-free (Exercise 4.1). Using u > 2 prime divisors of n increases efficiency and does not degrade the security of the resulting system much, as long as u is not very large. More specifically, if T is the time for RSA private-key operation without CRT, then the cost of this operation with CRT is approximately T/u² (neglecting the cost of CRT combination).

So an RSA modulus is of the form n = r₁r₂ . . . r_u with u ≥ 2 and with pairwise distinct primes r₁, . . . , r_u. For the sake of conformity with the older versions of the standard, the first two primes are given the alternate special names p := r₁ and q := r₂. PKCS #1 does not mention any specific way of choosing the prime divisors r_i of n, but encourages use of primes that make factorization of n difficult.

An RSA public exponent is an integer e, 3 ≤ e ≤ n – 1, with gcd(e, λ(n)) = 1, where λ(n) := lcm(r₁ – 1, r₂ – 1, . . . , r_u – 1). An RSA public key is a pair (n, e) with n and e chosen as above.

The RSA private key corresponding to (n, e) can be stored in one of the two formats. In the first format, one maintains the pair (n, d) with the private exponent d so chosen as to make ed ≡ 1 (mod λ(n)). In the second format, one stores the five quantities (p, q, dP, dQ, qInv) and, if u > 2, the triples (r_i, d_i, t_i) for each i = 3, . . . , u. The meanings of these quantities are as follows:

p	=	r₁
q	=	r₂
dP	≡	e^–1 (mod p – 1)
dQ	≡	e^–1 (mod q – 1)
qInv	≡	q^–1 (mod p)
d_i	≡	e^–1 (mod r_i – 1)
t_i	≡	(r₁ . . . r_i–1)^–1 (mod r_i)

For the sake of consistency, one should store the CRT coefficient (mod r₂), that is, p^–1 (mod q). In order to ensure compatibility with older versions of PKCS, q^–1 (mod p) is stored instead.

RSA key operations

The RSA public-key operation is used to encrypt a message or to verify a signature. The PKCS draft calls these primitives RSAEP (encryption primitive) and RSAVP1 (verification primitive). It is implemented in a straightforward manner as in Algorithm 6.1.

Algorithm 6.1. RSA encryption/signature verification primitive

Input: RSA public key (n, e) and message/signature representative x.

Output: The ciphertext/message representative y.

Steps:

if (x < 0) or (x ≥ n) { Return “Error: representative out of range”. }

y := x^e (mod n).

The RSA decryption or signature-generation primitive is called RSADP or RSASP1 and is given in Algorithm 6.2. The operation depends on the format in which the private key K is stored. The correctness of the primitive is left to the reader as an easy exercise.

Algorithm 6.2. RSA decryption/signature generation primitive

Input: RSA private key K and the ciphertext/message representative y.

Output: The message/signature representative x.

Steps:

if (y < 0) or (y ≥ n) { Return “Error: representative out of range”. }
if (K is stored in the first format) {
   x := y^d (mod n).
} else {  /* K is stored in the second format */
   x₁ := y^dP (mod p).
   x₂ := y^dQ (mod q).
   h := (x₁ – x₂)qInv (mod p).
   x := x₂ + qh.
   if (u > 2) {
      R := r₁.
      for i = 3, . . . , u {
         x_i := y^d_i (mod r_i).
         R := R × r_i–1.
         h := (x_i – x)t_i (mod r_i).
         x := x + Rh.
      }
   }
}

RSAES–OAEP encryption scheme

The encryption scheme RSAES–OAEP is based on the optimal asymmetric encryption procedure (OAEP) proposed by Bellare and Rogaway [18, 98]. In this procedure, a string of length slightly less than the size of the modulus n is probabilistically encoded using a hash function and the encoded message is subsequently encrypted. The probabilistic encoding makes the encryption procedure semantically secure and (provably) provides resistance against chosen-ciphertext attacks. Under this scheme, an adversary can produce a ciphertext, only if she knows the corresponding plaintext. Such an encryption scheme is called plaintext-aware. Given an ideal hash function, Bellare and Rogaway’s OAEP is plaintext-aware.

RSAES–OAEP uses a label L which is hashed by a hash function H. One may take L as the empty string. Other possibilities are not specified in the PKCS draft. SHA-1 (or SHA-256 or SHA-384 or SHA-512) is the recommended hash function. The hash values (in hex) of the empty string under these hash functions are given in Table 6.4.

Table 6.4. Hash values of the empty string
Function	Hash of the empty string
SHA-1	`da39a3ee 5e6b4b0d 3255bfef 95601890 afd80709`
SHA-256	`e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855`
SHA-384	`38b060a7 51ac9638 4cd9327e b1b1e36a 21fdb711 14be0743 4c0cc7bf 63f6e1da 274edebf e76f65fb d51ad2f1 4898b95b`
SHA-512	`cf83e135 7eefb8bd f1542850 d66d8007 d620e405 0b5715dc 83f4a921 d36ce9ce 47d0d13c 5d85f2b0 ff8318d2 877eec2f 63b931bd 47417a81 a538327a f927da3e`

The length of the hash output (in octets) is denoted by hLen. For SHA-1, hLen = 20. The RSA modulus n is assumed to be of octet length k. The octet length mLen of the input message M must be ≤ k–2hLen–2. RSAES–OAEP uses a mask-generation function designated as MGF (see Algorithm 6.11 for a recommended realization).

Algorithm 6.3 describes the RSA–OAEP encryption scheme which employs the EME–OAEP encoding scheme described in Algorithm 6.4. The use of a random seed makes the encryption probabilistic. We use the notation ‖ to denote string concatenation and ⊕ to denote bit-wise XOR.

Algorithm 6.3. RSA–OAEP encryption scheme

Input: The recipient’s public key (n, e), the message M (an octet string of length mLen) and an optional label L whose default value is the empty string.

Output: The ciphertext C of octet length k.

Steps:

/* Check lengths */

if (L is longer than what H can handle) { Return “Error: label too long”. }

/* For example, for SHA-1 the input must be of length ≤ 2⁶¹ – 1 octets. */

if (mLen > k – 2hLen – 2) { Return “Error: message too long”. }

/* Encode M to EM (EME–OAEP encoding scheme) */

EM := EME-OAEP-encode(M, L).	/* Algorithm 6.4 */
/* RSA encryption */
m := OS2I(EM).	/* Convert octet string to integer */
c := RSAEP((n, e), m).	/* RSA encryption primitive */
C := I2OS(c, k).	/* Convert integer back to octet string */

The matching decryption operation is shown in Algorithm 6.5 which calls the EME–OAEP decoding procedure of Algorithm 6.6. The only error message that the decryption and decoding algorithms issue is decryption error. This is to ensure that an adversary cannot distinguish between different kinds of errors, because such an ability of the adversary may lead her to guess partial information about the decryption process and thereby mount a chosen-ciphertext attack.

Algorithm 6.4. RSA–OAEP encoding scheme

Input: The message M of octet length mLen, the label L.

Output: The EME–OAEP encoded message EM.

Steps:

lHash := H(L).

Generate the padding string PS with k – mLen – 2hLen – 2 zero octets.

Generate the data block DB := lHash ‖ PS ‖ 01 ‖ M.

Let seed := a random string of length hLen octets.

Generate the data-block mask dbMask := MGF(seed, k – hLen – 1).

Generate the masked data-block maskedDB := DB ⊕ dbMask.

Generate mask for seed seedMask := MGF(maskedDB, hLen).

Generate the masked seed maskedSeed := seed ⊕ seedMask.

Generate the encoded message EM := 00 ‖ maskedSeed ‖ maskedDB.

Algorithm 6.5. RSA–OAEP decryption scheme

Input: The recipient’s private key K, the ciphertext C to be decrypted and an optional label L (the default value of which is the null string).

Output: The decrypted message M.

Steps:

if (the length of L is more than the limitation of H) or (the length of C is not k octets)
or (k < 2hLen + 2) { Return “Decryption error”. }

c := OS2I(C).	/* Convert octet string to integer */
m := RSADP(K, c).	/* RSA decryption primitive */
EM := I2OS(m, k).	/* Convert integer back to octet string */
M := EME-OAEP-decode(EM, L).	/* Algorithm 6.6 */

Algorithm 6.6. RSA–OAEP decoding scheme

Input: The encoded message EM and the label L.

Output: The EME–OAEP decoded message M.

Steps:

lHash := H(L).
Write EM = Y ‖ maskedSeed ‖ maskedDB, where Y is a single octet,
       maskedSeed is a string of length hLen octets and
       maskedDB is a string of length k – hLen – 1 octets.
seedMask := MGF(maskedDB, hLen).
seed := maskedSeed ⊕ seedMask.
dbMask := MGF(seed, k – hLen – 1).
DB := maskedDB ⊕ dbMask.
Try to decompose DB = lHash′ ‖ PS ‖ 01 ‖ M, where lHash′ is of length hLen
       and PS is a (possibly empty) padding string comprising octets 00 only.
if (DB cannot be decomposed as above) or (lHash′ ≠ lHash) or
       (Y ≠ 00) { Return “Decryption error”. }

RSASSA–PSS signature scheme with appendix

RSASSA–PSS employs the probabilistic signature scheme proposed by Bellare and Rogaway [19]. Under suitable assumptions about the hash function and the mask-generation function, the RSASSA–PSS scheme produces secure signatures which are also tight in the sense that forging RSASSA–PSS signatures is computationally equivalent to inverting RSA.

Algorithm 6.7. RSASSA–PSS signature generation

Input: The message M (an octet string) to be signed, the private key K of the signer.

Output: The signature S (an octet string of length k).

Steps:

EM := EMSA–PSS–encode(M, modBits – 1).	/* Encode by Algorithm 6.8 */
m := OS2I(EM).	/* Convert octet string to integer */
s := RSASP1(m).	/* RSA signature generation primitive */
S := I2OS(s, k).	/* Convert integer back to octet string */

Algorithm 6.8. RSASSA–PSS encoding

Input: The message M to be encoded (an octet string), the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9.

Output: The encoded message EM, an octet string of length emLen := ⌈emBits/8⌉.

Steps:

if (M is longer than what H can handle) { Return “Error: message too long”. }

Generate the hashed message mHash := H(M).

if (emLen < hLen + sLen + 2) { Return “Encoding error”. }

Let salt := a random string of length sLen octets.

Generate the salted message M′ := 00 00 00 00 00 00 00 00 ‖ mHash ‖ salt.

Generate the hashed salted message mHash′ := H(M′).

Generate the padding string PS with emLen – sLen – hLen – 2 zero octets.

Generate the data block DB := PS ‖ 01 ‖ salt.

Generate the data block mask dbMask := MGF(mHash′, emLen – hLen – 1).

Generate the masted data block maskedDB := DB ⊕ dbMask.

Set to 0 the leftmost 8emLen – emBits bits of the leftmost octet of maskedDB.

Compute EM := maskedDB ‖ mHash′ ‖ bc.

RSASSA–PSS signature generation (Algorithm 6.7) uses the EMSA–PSS encoding method (Algorithm 6.8). Verification (Algorithm 6.9) uses the EMSA–PSS decoding method (Algorithm 6.10). We assume that k is the octet length of the RSA modulus n. Let modBits denote the bit length of n. The encoded message is of length emLen = ⌈(modBits – 1)/8⌉ octets. The probabilistic behaviour of the encoding scheme is incorporated by the use of a random salt, the octet length of which is sLen. A hash function H that produces hash values of octet length hLen is employed.

Algorithm 6.9. RSASSA–PSS signature verification

Input: The message M, the signature S to be verified and the signer’s public key (n, e).

Output: Verification status of the signature.

Steps:

if (the length of S is not k octets) { Return “Signature not verified”. }

s := OS2I(S).	/* Convert octet string to integer */
m := RSAVP1((n, e), s).	/* RSA signature verification primitive */
EM := I2OS(m, emLen).	/* Convert integer back to octet string */
status := EMSA–PSS–decode(M, EM, modBits – 1).	/* Algorithm 6.10 */

if (status is “consistent”) { Return “Signature verified”. }

else { Return “Signature not verified”. }

Algorithm 6.10. RSASSA–PSS decoding

Input: The message M (an octet string), the encoded message EM (an octet string of length emLen = ⌈emBits/8⌉) and the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9.

Output: Decoding status: “consistent” or “inconsistent”.

Steps:

if (M is longer than what H can handle) { Return “inconsistent”. }
Generate the hashed message mHash := H(M).
if (emLen < hLen + sLen + 2) { Return “inconsistent”. }
Try to decompose EM = maskedDB ‖ mHash′ ‖ Y, where
       maskedDB is an octet string of length emLen – hLen – 1,
       mHash′ is an octet string of length hLen, and Y is a single octet.
if (Y ≠ bc) or (the leftmost 8emLen – emBits bits of the leftmost octet of
       maskedDB are not all 0) { Return “inconsistent”. }
dbMask := MGF(mHash′, emLen – hLen – 1).
DB := maskedDB ⊕ dbMask.
Set to 0 the leftmost 8emLen – emBits bits of the leftmost octet of DB.
Try to decompose DB = PS ‖ 01 ‖ salt, where PS is a string with
       emLen – sLen – hLen – 2 zero octets, and salt is of length sLen octets.
if (the above decomposition is unsuccessful) { Return “inconsistent”. }
Set M′ := 00 00 00 00 00 00 00 00 ‖ mHash ‖ salt.
if (H(M′) = mHash′) { Return “consistent”. } else { Return “inconsistent”. }

A mask-generation function

A mask-generation function (MGF1) is specified in the PKCS #1 draft. It is based on a hash function H. The mask-generation function is deterministic in the sense that its output is completely determined by its input. However, the (provable) security of OAEP and PSS schemes are based on the pseudorandom nature of the output of the mask-generation function. This means that any part of the output should be statistically independent of the other parts. MGF1 derives this pseudorandomness from that of the underlying hash function H.

Algorithm 6.11. Mask-generation function MGF1

Input: The seed mg f Seed (an octet string) and the desired octet length maskLen of the output mask. One requires maskLen ≤ 2³²hLen, where hLen is the octet length of the hash function output.

Output: An octet string mask of length maskLen.

Steps:

if (maskLen > 2³²hLen) { Return “Error: mask too long”. }
Initialize T to the empty octet string.
for i = 0, 1, . . . , ⌈maskLen/hLen⌉ – 1 {
I := I2OS(i, 4).
T := T ‖ H(mgfSeed ‖ I).
}
mask := the leftmost maskLen octets of T.

The RSA encryption scheme of PKCS #1, Version 1.5

The older encryption scheme RSAES–PKCS1–v1_5 is no longer recommended, since this scheme is not plaintext-aware, that is, with high probability, an adversary can generate ciphertexts without knowing the corresponding plaintexts. This allows the adversary to mount chosen-ciphertext attacks. The new drafts of PKCS #1 include this old scheme for backward compatibility. Encryption and decryption for RSAES–PKCS1–v1_5 are given in Algorithms 6.12 and 6.13. Here, k is the octet length of the modulus.

Algorithm 6.12. RSA–PKCS1 encryption scheme

Input: The recipient’s public key (n, e) and the message M (an octet string).

Output: The ciphertext C which is an octet string of length k.

Steps:

if (mLen > k – 11) { Return “Error: message too long”. }
Generate a padding string PS of length k – mLen – 3 ≥ 8 octets consisting of
random non-zero octets.
Generate the encoded message EM := 00 ‖ 02 ‖ PS ‖ 00 ‖ M.

m := OS2I(EM).	/* Convert octet string to integer */
c := RSAEP((n, e), m).	/* RSA encryption primitive */
C := I2OS(c, k).	/* Convert integer back to octet string */

Algorithm 6.13. RSA–PKCS1 decryption scheme

Input: The recipient’s private key K and the ciphertext C (an octet string).

Output: The plaintext message M (an octet string of length ≤ k – 11).

Steps:

if (the length of the ciphertext is not k octets) { Return “decryption error”. }

c := OS2I(C).	/* Convert octet string to integer */
m := RSADP(K, c).	/* RSA decryption primitive */
EM := I2OS(m, k).	/* Convert integer back to octet string */

Try to decompose EM = 00 ‖ 02 ‖ PS ‖ 00 ‖ M, where PS is an octet string of length ≥ 8 and containing only non-zero octets.

if (the above decomposition is unsuccessful) { Return “decryption error”. }

The RSA signature scheme of PKCS #1, Version 1.5

The older RSA signature scheme RSASSA–PKCS1–v1_5 is not known to have security loopholes. (Nevertheless, the provably secure PSS scheme is recommended for future applications.) RSASSA–PKCS1–v1_5 uses EMSA–PKCS1–v1_5 message encoding procedure (Algorithm 6.16). The signature generation and verification procedures are given in Algorithms 6.14 and 6.15. Here, k denotes the octet length of the modulus n.

The EMSA–PKCS1–v1_5 message encoding procedure (Algorithm 6.16) uses a hash function H. Although a member of the SHA family is recommended for future applications, MD2 and MD5 are also supported for compliance with older application. An octet string hashAlgo is used whose value depends on the underlying hash algorithm and is given in Table 6.5.

Table 6.5. The string hashAlgo used by EMSA–PKCS1–v1_5
Function	The string hashAlgo
MD2	`30 20 30 0c 06 08 2a 86 48 86 f7 0d 02 02 05 00 04 10`
MD5	`30 20 30 0c 06 08 2a 86 48 86 f7 0d 02 05 05 00 04 10`
SHA-1	`30 21 30 09 06 05 2b 0e 03 02 1a 05 00 04 14`
SHA-256	`30 31 30 0d 06 09 60 86 48 01 65 03 04 02 01 05 00 04 20`
SHA-384	`30 41 30 0d 06 09 60 86 48 01 65 03 04 02 02 05 00 04 30`
SHA-512	`30 51 30 0d 06 09 60 86 48 01 65 03 04 02 03 05 00 04 40`

Algorithm 6.14. RSA–PKCS1 signature generation

Input: The signer’s private key K and the message M to be signed (an octet string).

Output: The signature S (an octet string of length k).

Steps:

Encode M to EM := EMSA–PKCS1–v1_5(M, k).	/* Algorithm 6.16 */
m := OS2I(EM).	/* Convert octet string to integer */
s := RSASP1(K, m).	/* RSA signature generation primitive */
S := I2OS(s, k).	/* Convert integer back to octet string */

Algorithm 6.15. RSA–PKCS1 signature verification

Input: The signer’s public key (n, e), the message M (an octet string) and the signature S to be verified (an octet string of length k).

Output: Verification status of the signature.

Steps:

if (the length of S is not k octets) { Return “Signature not verified”. }

s := OS2I(S).	/* Convert octet string to integer */
m := RSAVP1((n, e), s).	/* RSA signature verification primitive */
EM′ := I2OS(m, k).	/* Convert integer back to octet string */
Encode M to EM := EMSA–PKCS1–v1_5(M, k).	/* Algorithm 6.16 */

if (EM = EM′) { Return “Signature verified”. }

else { Return “Signature not verified”. }

Algorithm 6.16. EMSA–PKCS1 encoding

Input: The message M (an octet string), the intended length emLen of the encoded message. One requires emLen ≥ tLen + 11, where tLen is the octet length of hashAlgo plus the octet length of the hash output.

Output: The encoded message EM (an octet string of length emLen).

Steps:

if (M is longer than what H can handle) { Return “Error: message too long”. }
Compute the hash value mHash := H(M).
Let T := hashAlgo ‖ mHash.
/* Let tLen be the octet length of T. */
if (emLen < tLen + 11) { Return “Error: encoded message length too short”. }
Generate a padding string PS of length emLen – tLen – 3 ≥ 8 octets each
having the hexadecimal value ff.
Set EM := 00 ‖ 01 ‖ PS ‖ 00 ‖ T.

6.3.2. PKCS #3

PKCS #3 describes the Diffie–Hellman key-exchange algorithm. The draft assumes the existence of a central authority which generates the domain parameters that include a prime p of octet length k, an integer g satisfying 0 < g < p and optionally a positive integer l. The integer g need not be a generator of , but is expected to be of sufficiently large multiplicative order modulo p. The integer l denotes the bit length of the private Diffie–Hellman key of an entity. Values of l ≪ 8k can be chosen for efficiency. However, for maintaining a desired level of security l should not be too small. Since the central authority determines p, g (and l), individual users need not bother about the generation of these parameters.

During a Diffie–Hellman key-exchange interaction of Alice with Bob, Alice performs the steps described in Algorithm 6.17. Bob performs an identical operation which is omitted here.

Algorithm 6.17. PKCS3 Diffie–Hellman key-exchange scheme

Input: p, g and optionally l.

Output: The shared secret SK (an octet string of length k).

Steps:

Alice generates a random .

/* If l is specified, one should have 2^l–1 ≤ x < 2^l. */

Alice computes y := g^x (mod p).

Alice converts y to an octet string PV := I2OS(y, k).

Alice sends the public value PV to Bob.

Alice receives Bob’s public value PV′.

Alice converts PV′ to the integer y′ := OS2I(PV′).

Alice computes z := (y′)^x (mod p) (with 0 < z < p).

Alice transforms z to the shared secret SK := I2OS(z, k).

7.2. Side-Channel Attacks

Side-channel attacks refer to a class of cryptanalytic tools for determining a private key by measuring signals (like timing, power fluctuation, electromagnetic radiation) from or by inducing faults in the device performing operations involving the private key. In this section, we describe three methods of side-channel cryptanalysis: timing attack, power attack and fault attack.

7.2.1. Timing Attack

Paul C. Kocher introduced the concept of side-channel cryptanalysis in his seminal paper [155] on timing attacks. Though not unreasonable, timing attacks are somewhat difficult to mount in practice.

Details of the attack

The private-key operation in many cryptographic systems (like RSA or discrete-log-based systems) is usually a modular exponentiation of the form

y := x^d (mod n),

where d is the private key. The private-key procedure may involve other overheads (like message decoding), but the running time of the routine is usually dominated by and so can be approximated by the time of the modular exponentiation.

Assume that this exponentiation is carried out by a square-and-multiply algorithm known to Carol, the attacker. For example, suppose that Algorithm 3.9 is used. Each iteration of the for loop involves a modular squaring followed conditionally by a modular multiplication. The multiplication is done in an iteration if and only if the corresponding bit e_i in the exponent is 1. Thus, an iteration runs slower if e_i = 1 than if e_i = 0. If Carol could measure the timing of each individual iteration of the for loop, she would correctly guess most (if not all) of the bits in the exponent. But it is unreasonable to assume that an attacker can collect such detailed timing data. Moreover, if Algorithm 3.10 is used, these detailed data do not help much, because in this case the timing of an individual iteration of the for loop can at best differentiate between the two cases e_i = 0 and e_i ≠ 0. There are 2^t – 1 non-zero values for each e_i.

However, it is not difficult to think of a situation where the attacker can measure, to a reasonable accuracy, the total time of the exponentiation. In order to guess d, Carol requires the times of the modular exponentiations for several different values of x, say x₁, . . . , x_k, all known to her. (Note that x_i may be messages to be signed or intercepted ciphertexts.) The same exponent d is used for all these exponentiations. Let T_i be the time for computing (mod n), as measured by Carol. We may assume that all these k exponentiations are carried out on the same machine using the same routine.

Kocher considers the attack on the exponentiation routine of RSAREF, a cryptography toolkit available from the RSA Laboratories. This routine implements Algorithm 3.10 with t = 2. For the sake of convenience, the algorithm is reproduced below. We may assume that the exponent has an even number of bits—if not, pad a leading zero.

Algorithm 7.1. RSAREF’s exponentiation routine

Input: , and d = (d_2l–1d_2l–2 · · · d₁d₀)₂.

Output: y := x^d (mod n).

Steps:

(1)  z₁ := x.
(2)  z₂ := z₁x (mod n).
(3)  z₃ := z₂x (mod n).
(4)  y := 1.
(5)  for j = l - 1, . . . , 0 {
(6)     y := y² (mod n).
(7)     y := y² (mod n).
(8)     if ((d_2j+1d_2j)₂ ≠ 0) {
(9)         y := yz_{(d_2j+1d_2j)₂} (mod n).
(10)     }
(11)  }

Every step of the above algorithm runs in a time dependent on the operands. For example, the modular multiplication in Step (9) takes time dependent on the operands y and z_{(d_2j+1d_2j)₂}. The variation in the timing depends on the implementation of the modular arithmetic routines and also on the machine’s architecture. However, we make the assumption that for fixed operands each step requires a constant time on a given machine (or on identical machines). This is actually a loss of generality, since the running time of a complex step (like modular multiplication or squaring) for fixed operands may vary for various reasons like process scheduling, availability of cache, page faults and so on. It may be difficult, perhaps impossible, for an attacker to arrange for herself a verbatim emulation of the victim’s machine at the time when the latter performed the private-key operations. Let us still proceed with our assumption, say by conceiving of a not-so-unreasonable situation where the effects of these other factors are not sizable enough.

We use the subscript i to denote the i-th private-key operation for 1 ≤ i ≤ k. The entire routine takes time T_i for the i-th exponentiation, that is, for the input x_i. This measurement may involve some (unknown) error which we denote by e_i. The first four steps are executed only once during each call and take a total time of p_i (precomputation time). The for loop is executed l times. We ignore the time needed to maintain the loop (like decrementing j) and also the time taken by the if statement in Step (8). Let s_i,j and t_i,j be the times taken respectively by Steps (6) and (7), when the loop variable (j) assumes the value j. If Step (9) is executed, we denote by m_i,j the time taken by this step, else we set m_i,j := 0. It follows that

Equation 7.1

where the index in the sum decreases from l – 1 to 0 in steps of 1. Carol does not know this break-up (that is, the explicit values of e_i, s_i,j, t_i,j and m_i,j), but she can make an inductive guess in the following way.

Carol manages a machine and a copy of the exponentiation software both identical to those of the victim. She then successively guesses the secret bit pairs d_2l–1d_2l–2, d_2l–3d_2l–4, d_2l–5d_2l–6 and so on. Assume that at some stage Carol has correctly determined the exponent bits d_2j+1d_2j for j = l–1, l–2, . . . , j′+1. Initially j′ = l–1. Using this information Carol computes d_2j′ +1d_2j′ as follows. Carol’s knowledge at this stage allows her to measure p_i and s_i,j, t_i,j, m_i,j for j = l – 1, . . . , j′ + 1 — she simply runs Algorithm 7.1 on x_i. Carol then enters the loop with j = j′. The squaring operations are unconditional. Carol has the exact operands as the victim for the squaring steps. So Carol also measures s_i,j′ and t_i,j′.

The bit pair d_2j′+1d_2j′ (considered as a binary integer) can take any one of the four values g = 0, 1, 2, 3. Carol measures the time of Step (9) for each of the four choices of g and adds this time to the time taken by the algorithm so far, in order to obtain:

Equation 7.2

Kocher observed that the distribution of T_i, i = 1, . . . , k, is statistically related to that of only for the correct guess g. In order to see how, we subtract Equation (7.2) from Equation (7.1) to get:

Equation 7.3

Let us assume that the error term e_i is distributed like a random variable E. Similarly suppose that each multiplication (resp. squaring) has the distribution of a random variable M (resp. S). Taking the variance of Equation (7.3 ) over the values i = 1, 2, . . . , k and assuming that the sample size k is so large that the sample variances are very close to the variances of the respective random variables, we obtain:

Equation 7.4

where λ denotes the number of times Step (9) is executed for j = j′ – 1, . . . , 0. Note that λ is dependent on the private key and not on the arguments to the exponentiation routine. For the correct guess g, we have and so

On the other hand, for an incorrect guess g we have:

if one of m_i,j′ or is zero, or

if both m_i,j′ and are non-zero. (Recall that Var(αX + βY) = α² Var(X) + β² Var(Y) for any real α, β.)

Calculation of the sample variances of for the four choices of g gives Carol a handle to determine (or guess) the correct choice. Carol simply takes the g for which the variance is minimum. This is the fundamental observation that makes the timing attack work.

Of course, statistical irregularities exist in practice, and the approximation of the actual variances by the sample variances introduces errors in Equation (7.4). These errors are of particular concern for large values of j′, that is, during the beginning of the attack. However, if an incorrect guess is made at a certain stage, this is detected soon with high probability, as Carol proceeds further. Suppose that an erroneous guess of d_{2j″ + 1}d_2j″ has been made for some j″ > j′. This means that the values of y are different from the actual values starting from the iteration of the loop with j = j″ – 1. (We may assume that most, if not all, x_i ≠ 1.) We then do not have a cancellation of the timings for j = j″ – 1, . . . , j′. More correctly, if the guesses for j = l – 1, . . . , j″ + 1 are correct and the first error occurs at j = j″, then denoting the subsequent timings by one gets

Equation 7.5

Since each of the square and multiplication operations takes y as an operand, the original timings and the measured timings (the ones with hat) behave like independent variables and, therefore, taking the variance of Equation (7.5) yields

for some λ′ depending on the private key and on the previous guesses, but independent of the current guess g. In other words, Carol loses a meaningful relation of Var with the correctness of the current guess. Once Carol notices this, she backtracks and changes older guesses until the expected behaviour is restored. Thus, the timing attack comes with an error detection and correction strategy.

An analysis done by Kocher (neglecting E and assuming normal distributions for S and M) shows that Carol needs k = O(l) for a good probability of success.

Countermeasures

There are several ways in which timing attacks can be prevented.

If every multiplication step takes exactly the same time and so does every squaring step, the above timing attack does not work. Thus, forcing each multiplication and each squaring take the same respective times independent of their operands disallows Carol to mount the timing attack. Making m_i,j constant alone does not suffice, for difference in square timings can be exploited in subsequent iterations to correct a guess. Forcing every operation take exactly the same time as the slowest possibility makes the implementation run slower. Moreover, finding the slowest possibility may be difficult.
Interleaving random delays also makes timing attacks difficult to mount, because the attacker then requires more number of samples in order to smooth out the effect of the delays. But again adding delays harms performance and does not completely rule out the possibility of timing attacks.
Perhaps the best strategy to thwart timing attacks is to use a random pair (u, v) with v := u^–d (mod n) for each private-key operation. Initially x is multiplied by u and then the product ux is exponentiated to get u^dx^d ≡ v^–1y (mod n). Multiplication by v then yields the desired y. A new random pair (u, v) must be used for every exponentiation. However, the exponentiation v := u^–d (mod n) is too costly to be performed during every private-key operation and may itself invite timing attacks. A good trade-off is to choose (u, v) once, keep it secret and for the next private-key operation update (and replace) the old (u, v) by (u′, v′) with u′ ≡ u^e (mod n) and v′ ≡ v^e (mod n) for some small e (random or deterministic). The choice e = 2 is quite satisfactory in practice—performing two modular squares is much cheaper than computing the full exponentiation v := u^–d (mod n).

7.2.2. Power Analysis

In connection with timing attacks, we mentioned that if an adversary were able to measure the timing of each iteration of the square-and-multiply loop during an RSA (or discrete-log-based) private-key exponentiation, she could guess the bits in the key quite efficiently from only a few timing measurements. But it is questionable if such detailed timing data can be made available.

Now, think of a situation where Carol can measure patterns of power consumption made by the decrypting (or signing) device during one or more private-key operations with Alice’s private key. If Alice carries out the private-key operations in her personal workstation, it is difficult for Carol to conduct such measurements. So assume that Alice is using a smart card with a device to which Carol has a control. Carol inserts a small resistor in series with the line which drives Alice’s smart card. The power consumed by the smart-card circuit is roughly proportional to the current through the resistor. Measuring the voltage across the resistor (and multiplying by a suitable factor) Carol can observe the power consumed by Alice’s decryption device. Carol has to use a power measuring device that takes readings at a high frequency (100 MHz to several GHz depending on the budget of Carol). A set of power measurements obtained during a cryptographic operation is called a power trace. We now study how power traces can reveal Alice’s secrets.

Simple power analysis (SPA)

The individual steps in a private-key operation may be nakedly exposed in a power trace. This is, in particular, the case when different steps consume different amounts of power and/or take different times. Obtaining information about the operation of the decrypting device and/or the secrets by a direct interpretation of power traces is referred to as simple power analysis or SPA in short.

As an example of SPA, consider an implementation of RSA exponentiation using the naive square-and-multiply Algorithm 3.9. Here, the most power-consuming operations are modular squaring and modular multiplication. Modular multiplication typically runs slower than modular squaring. Also modular multiplication requires two different operands to fetch from the memory, whereas modular squaring requires only one operand. Thus, a multiplication operation has more and longer power requirements than a squaring operation.

A hypothetical^[1] SPA trace during a portion of an RSA private-key operation is shown in Figure 7.1. Each spike in the trace corresponds to either a square or a multiplication operation. Let us assume that the power consumption is measured with sufficient resolution, so that no spike is missed. Since multiplication runs longer (and requires more operands) than squaring, multiplication spikes are wider than squaring spikes.

^[1] SPA traces from real-life experiments on smart cards, as reported in several references, look similar to this. We, however, generated the trace using a random number generator. Absolute conformity to reality is not always crucial for the purposes of illustration.

Figure 7.1. Simulated SPA trace for a portion of an RSA private-key operation

Let us denote a squaring operation by S and a multiplication operation by M. We observe that Alice’s smart card performs the sequence

SMSMSSMSSSSMSSSMSS

of operations during the measurement interval shown. Since multiplication in an iteration of the loop is skipped if and only if the corresponding bit in the exponent is zero, we can group the operations as

(SM)(SM)(S)(SM)(S)(S)(S)(SM)(S)(S)(SM)(S)(S.

This, in turn, reveals the bit string 110100010010 in Alice’s private key.

Effective as it appears, SPA, in practice, does not pose a huge threat to the security of conventional cryptographic systems. Using algorithms for which power traces do not bear direct relationships with the bits of the private key largely reduces risks of fruitful SPA. The inefficient repeated square-and-multiply Algorithm 7.2 always performs a multiplication after squaring and thereby eliminates chances of a successful SPA.

Algorithm 7.2. SPA-resistant exponentiation

Input: , and the private key d = (d_l–1 · · · d₁d₀)₂.

Output: y := x^d (mod n).

Steps:

y := 1.
for (j = l – 1, . . . , 0) {
    t₀ := y² (mod n).
    t₁ := t₀x (mod n).
    y := t_dj.
}

Using the (more efficient) Algorithm 7.1 also frustrates SPA. Some chunks of two successive 0 bits are anyway revealed by power traces collected during the execution of this algorithm. But, for a decently large and random private key, this still leaves Carol with many unknown bits to be guessed. Note, however, that neither of the three remedies suggested to thwart the timing attack on Algorithm 7.1 seems to be effective in the context of SPA. Delays normally do not consume much power (unless some power-intensive dummy computations fill up the delays). Also, the masking of (x, y) by (u, v) fails to produce any alteration in the power consumption pattern during exponentiation.

If some private-key algorithm has unavoidable branchings due to individual bits in the private key, SPA can prove to be a notorious botheration.

Differential power analysis (DPA)

A carefully designed algorithm (like Algorithm 7.2) does not reveal key information from a simple observation of power traces. Moreover, the observed power traces may be corrupted by noise to an extent where SPA is not feasible. In such cases, differential power analysis (DPA) often helps the cryptanalyst reduce the effects of noise and exploit subtle correlation of power consumption patterns with specific bits in the operands. DPA requires availability of power traces from several private-key operations with the same key.

Consider the SPA-resistant Algorithm 7.2. Suppose that k power traces P₁(t), . . . , P_k(t) for the computations of (mod n), i = 1, . . . , k, are available to Carol, that the ciphertexts x₁, . . . , x_k are known to Carol and that d = (d_l–1 · · · d₁d₀)₂. Carol successively guesses the bits d_l–1, d_l–2, d_l–3, . . . of the exponent. Suppose that Carol has correctly guessed d_j for j = l – 1, . . . , j′ + 1. She now uses DPA to guess d_j′.

Let e := (d_l–1d_l–2 · · · d_{j′ + 1})₂. At the beginning of the for loop with j = j′ the variable y holds the value x^e modulo n. The loop computes x^2e and x^2e+1 and assigns y the appropriate value. If d_j′ = 0, then in the next iteration the loop computes x^4e and x^4e+1, whereas if d_j′ = 1, then in the next iteration the loop computes x^4e+2 and x^4e+3. It follows that the algorithm handles the value x^4e if and only if d_j′ = 0.

For each i = 1, . . . , k, Carol computes (mod n). Carol then chooses a particular bit position (say, the least significant bit) and considers the bit b_i of z_i at this position. We make the assumption that there is some subsequent step (or substep) in the implementation for which the average power consumption Π₀ for b = 0 is different from the average power consumption Π₁ for b = 1.^[2]

^[2] The exact step which exhibits differential bias toward an individual bit value is dependent on the implementation. If the implementation does not provide such a step, the attack cannot be mounted in this way. Initially, the DPA was proposed for DES, a symmetric encryption algorithm, in which such a dependence is clearly available. With asymmetric-key encryption, such a strong dependence of the power, consumed by a step, on an individual bit value is not obvious. One may, however, use other dividing criteria, like low versus high Hamming weight (that is, number of one-bits) in the operand, which bear more direct relationships with power consumption.

Carol partitions {1, . . . , k} into two subsets:

I₀	:=	{i \| b_i = 0},
I₁	:=	{i \| b_i = 1}.

Carol computes the average power traces and and subsequently the differential power trace

First, let d_j′ = 0. In this case, the routine handles and so the power consumption at some time τ is correlated to the bit b_i of . At any other instant, the power consumption is uncorrelated to this particular bit value. Therefore, if the sample size is sufficiently large and if the measurement noise has mean at zero, we have:

On the other hand, if d_j′ = 1, the value never appears in the execution of the algorithm and so at every time t the power consumption is uncorrelated to the particular bit of and so we expect

Δ(t) ≈ 0

for all t.

Figure 7.2 illustrates the two cases.^[3] If the differential power trace has a distinct spike, the guess d_j′ = 0 is correct. So by observing the existence or otherwise of a spike, Carol determines whether d_j′ = 0 or d_j′ = 1.

^[3] Once again, these are hypothetical traces obtained by random number generators.

Figure 7.2. Simulated DPA trace for a portion of an RSA private-key operation

(a) for the correct guess
(b) for an incorrect guess

The number k of samples required for a good probability of success depends on the bias Π₁–Π₀ relative to the measurement noise. We assume that . If the noise has a variance of σ², then by the central limit theorem the noise in each average power trace or has at each t an approximate variance 2σ²/k, and so in the differential power trace Δ(t) the noise has an approximate variance 4σ²/k. In order that the bias Π₁ –Π₀ stands out against the noise, we require , say, , that is, k ≥ 64σ²/(Π₁ – Π₀)².

Countermeasures

Several countermeasures can be adopted to prevent DPA, both in the software level and in the hardware level.

Interleaving random delays between instructions destroys the alignment of the time τ in different power traces. Using a clock with randomly varying tick-rate has a similar effect. The delays should be such that they cannot be easily analyzed and subsequently removed. Random delays increase the number of samples required for a successful DPA to an infeasible value.
Suitable implementations of the power-critical steps destroy the power consumption signature of these steps. For example, one may go for an implementation that exhibits a constant power consumption pattern irrespective of the operands. Another possibility is replacement of complex critical instructions by atomic instructions (like assembly instructions) for which the dependence of power consumption on operands is less or difficult to analyze. However, the assumption that one can measure power at any resolution (perhaps at infinite resolution, say, using an analog device) indicates that this countermeasure challenges only the attacker’s budget.
Masking (x, y) by multiplying with (u, v) (as we did to prevent timing attacks) also eliminates chances of mounting successful DPA. One has to use a fresh mask for each private-key operation. Random unknown masks destroy the correlation of the bit values b_i with power consumption. That is, the chosen bit b_i of behaves randomly in relation to the same bit of (u_ix_i)^4e and so the differential power trace no longer leaks the bias Π₁ – Π₀.
Another strategy to foil DPA is to use randomization in the private exponent d. Instead of computing y := x^d (mod n), one chooses a small random integer r (typically of bit size ≤ 20) and computes y := x^d+rh (mod n), where h is φ(n) for RSA or the order of the discrete-log (sub)group. Since d = O(h) typically, the performance of the exponentiation routine does not deteriorate much. But random values of r during different private-key operations change the exponent bits in an unpredictable manner.
Quick changes in the exponent (the private key, that is, the key pair) also prevent the attacker to gather sufficiently many power traces for mounting a successful DPA. A key-use counter can be employed for this purpose. Whenever a given private key has been used on a small predetermined number of occasions, the key pair is updated.
Hardware shielding of the decrypting device also reduces DPA possibilities. For example, in-chip buffers between the external power source and the chip processor have been proposed to mask off the variation of internal power from external measurements. Such hardware countermeasures are, in general, somewhat costlier than software countermeasures.

Paul Kocher asserts: DPA highlights the need for people who design algorithms, protocols, software, and hardware to work closely together when producing security products.

7.2.3. Fault Analysis

We finally come to the third genre of side-channel cryptanalysis. We investigate how hardware faults occurring during private-key operations can reveal the secret to an adversary. There are situations where a single fault suffices. Boneh et al. [30] classify hardware faults into three broad categories.

Transient faults These are faults caused by random (unpredictable) hardware malfunctioning. These may be the outcomes of occasional flips of bit values in registers or of temporary erroneous outputs from logic or arithmetic circuits in the processor. These faults are called transient, because they are not repeated. It is rather difficult to detect such (silent) faults.
Latent faults These are faults generated by some permanent malfunctioning and/or bugs inherent in the processor. For example, the floating-point bug in the early releases of the Pentium processor may lead to latent faults. Latent faults are permanent, that is, repeated, but may be difficult to locate in practice.
Induced faults An induced fault is deliberately caused by an adversary. For example, a short surge of electromagnetic radiation may cause a smart card to malfunction temporarily. A malicious adversary can induce such temporary hardware faults to extract secret information from the smart card. It is, however, difficult to induce deliberate faults in a remote workstation.

Although induced faults appear to be the ones to guard against most seriously, the other two types of faults are also of relevance. Consider a certifying authority signing many messages. Transient and/or unknown latent faults may reveal the authority’s private key to a user who can later utilize this knowledge to produce false certificates.

Fault attack on RSA based on CRT

Consider the implementation of RSA private-key operation based on the CRT combination of the values obtained by exponentiation modulo the prime divisors p and q of the modulus n (Algorithm 5.4). Suppose that m is a message to be signed and s := m^d (mod n) the corresponding signature, where d is the signer’s private key. The CRT-based implementation computes s₁ := s (mod p) and s₂ := s (mod q). Assume that due to hardware fault(s) exactly one of s₁ and s₂ is wrongly computed. Say, s₁ is incorrectly computed as . The corresponding faulty signature is denoted by . We assume that the CRT combination of and s₂ is correctly computed.

An adversary requires the faulty signature and the correct signature s on the same message m in order to obtain the factor q of n. To see how, note that (mod p), s ≡ s₁ (mod p) and (mod p), so that (mod p), that is, . On the other hand, (mod q), that is, . Therefore,

This is how the fault analysis of Boneh et al. [30] works.

Arjen K. Lenstra et al. [142] point out that the knowledge of the faulty signature alone reveals the secret divisor q, that is, one does not require the genuine signature s on m. The verification key e of the signer is publicly known. Since RSA exponentiation is bijective, (mod n). However, (mod q), and so (mod p). It follows that

Fault attack on RSA without CRT

Now, consider an implementation of RSA decryption based on a single exponentiation modulo n. For such an implementation, several models of fault attacks have been proposed. These attacks are less practical than the attack on CRT-based RSA just mentioned, because now one requires several faulty signatures in order to deduce the entire private key. Here, we present an attack due to Bao et al. [17].

As usual, the RSA modulus is n = pq and the signer’s key pair is (e, d). Consider a valid signature s on a message m. Let d = (d_l–1 · · · d₁d₀)₂ be the binary representation of the private key. Consider the powers:

s_i ≡ m^2ⁱ (mod n)

for i = 0, 1, . . . , l – 1.

The signature s can be written as:

We assume that the attacker knows m and s and hence can compute s_i and modulo n for i = 0, . . . , l – 1. There is no harm in assuming that the message m is randomly chosen. (We may assume that randomly chosen integers are invertible modulo n, because encountering a non-invertible non-zero integer by chance is a stroke of unimaginable good luck and is tantamount to knowing the factors of n.)

In order to guess a bit of d, the attacker induces a fault in exactly one of the bits d_j, changing it from d_j to . The position j is random, that is, not under the control of the attacker. Now, the algorithm outputs the faulty signature

and so

A repetition in the values s_l–1, . . . , s₀, , . . . , modulo n is again an incident of minuscule probability. Hence the attacker can uniquely identify the bit position j and the bit value d_j in d by comparing with these 2l values.

Statistical analysis implies that the attacker needs to repeat this procedure about l log l times (on same or different (m, s) pairs) in order to ensure that the probability of identifying all the bits of d is at least 1/2.

Fault attack on the Rabin digital signature algorithm

Recall from Algorithm 5.34 that the Rabin signature algorithm uses CRT to combine s₁ (mod p) and s₂ (mod q). Thus, the attack on CRT-based RSA, described earlier, is applicable mutatis mutandis to the Rabin signature scheme. The computation of the square roots s₁ and s₂ demands the major portion of the running time of the routine. Inducing a fault during the execution is, therefore, expected to affect exactly one of s₁ and s₂, as desired by the attacker.

Fault attack on DSA

Bao et al. [17] propose a fault attack on the digital signature algorithm (DSA). We work with the notations of Algorithm 5.43 and Algorithm 5.44, except that, for maintaining uniformity in this section, we use m (instead of M) to denote the message to be signed. The (public) parameters are p, a prime divisor r of p – 1 of length 160 bits and an element of multiplicative order r. The signer’s DSA key pair is (d, g^d(mod p)) with 1 < d < r.

Suppose that during the generation of a DSA signature, an attacker induces a fault in exactly one bit position of d changing it to . The routine generates the faulty signature , where

(d′, g^d′) being the session key pair (not mutilated). As in the DSA signature-verification scheme, the attacker computes the following:

For each i = 0, . . . , l – 1 (where the bit length of d is l), the attacker also computes

Assume that the j-th bit d_j of d is altered. If d_j = 0, and so

On the other hand, if d_j = 1, then and a similar calculation shows that

Thus, the attacker computes and for all j = 0, . . . , l – 1 and notices a unique match (with s). This discloses the position j and the corresponding bit d_j.

Fault attack on the ElGamal signature scheme

A fault attack similar to that on the DSA scheme can be mounted on the ElGamal signature scheme. We here propose an alternative method proposed by Zheng and Matsumoto [315]. The novelty in their approach is that it performs the cryptanalysis of the ElGamal signature scheme by inducing fault on the pseudorandom bit generator of the signer’s smart card.

Algorithms 5.36 and 5.37 describe the ElGamal signature scheme on a general cyclic group G. Here, we restrict our attention to the specific group (though the following exposition works perfectly well for a general G). The parameters are a prime modulus p and a generator g of . The signer’s key-pair is (d, g^d(mod p)) for some d, 2 ≤ d ≤ p – 2.

In order to generate a signature (s, t) on a message m, a random session key d′ is generated and subsequently the following computations are carried out:

s	≡	g^d′ (mod p),
t	≡	d′^–1(H(m) – dH(s)) (mod p – 1).

Zheng and Matsumoto attack the generation of the session key d′. They propose the possibility that an abnormal physical stress (like low voltage) forces a constant output d₀ for d′ from the pseudorandombit generator (software or hardware) in the smart card. First, assume that this particular value d₀ is known a priori to the attacker. She then lets a message m generate a signature (s, t) with the session secret d₀. The private key d is then immediately available from the equation:

d ≡ H(s)^–1(H(m) – d₀t) (mod p – 1).

Here, we assume that H(s) is invertible modulo p – 1.

If d₀ is not known a priori, the attacker generates two signatures (s₁, t₁) and (s₂, t₂) on messages m₁ and m₂ respectively. Since d′ is always d₀, we have s₁ = s₂ = s₀, say. One can then easily calculate

d₀ ≡ (t₁ – t₂)^–1(H(m₁) – H(m₂)) (mod p – 1),

which, in turn, yields

d ≡ H(s₀)^–1(H(m₁) – d₀t₁) (mod p – 1).

Fault attack on the Feige–Fiat–Shamir identification protocol

Let us conclude our repertoire of fault attack examples by explaining an attack on the FFS zero-knowledge identification protocol. This attack is again from Boneh et al. [30].

We use the notations of Algorithm 5.69. A modulus n = pq, p, , is first chosen (by Alice or by a trusted third party). Alice selects random x₁, . . . , and random bits δ₁, . . . , δ_t, computes (mod n), publishes (y₁, . . . , y_t) and keeps (x₁, . . . , x_t) secret.

During an identification session with Bob, Alice generates a random commitment and sends to Bob the witness w := c² (mod n). (For simplicity, we take γ of Algorithm 5.69 to be 0.) When Alice is waiting for a challenge from Bob, a fault occurs in her smart card changing the commitment c to c + E. Assume that the fault is at exactly one bit position, that is, E = ±2^j for some , l being the bit length of c (or of n). This fault may be purposely induced by Bob with the malicious intention of guessing Alice’s secret (x₁, . . . , x_t).

Bob then generates a random challenge as usual. Upon reception of this challenge Alice computes and sends to Bob the faulty response

The knowledge of now aids Bob to obtain the product as follows. First, note that

so that

for some

There are only 4l possible values of (E, δ). Bob tries all these possibilities one by one. To simplify matters we assume that only one value of (E, δ) with E of the special form ±2^j and with satisfies the last congruence. In practice, the existence of two (or more) solutions for (E, δ) is an extremely improbable phenomenon. For a guess of (E, δ), the commitment c can be computed as

The correctness of the guess (E, δ) can be verified from the relation w ≡ c² (mod n). Bob can now compute the desired product

In order to strengthen the confidence about the correctness of T, Bob may repeat the protocol once more with the same values of ∊₁, . . . , ∊_t, but under normal conditions (that is, without faults). This time he obtains w′ ≡ (c′)² (mod n) and r′ ≡ c′T (mod n), which together give (r′)² ≡ w′T² (mod n), a relation that proves the correctness of T.

Bob repeats the above procedure t times in order to generate the system:

Equation 7.6

Here, ∊_ki and T_k are known to Bob. Moreover, the exponents ∊_ki can be so selected that the matrix (∊_ki) is invertible modulo 2. In order to determine x₁, Bob tries to find satisfying

for some integers v₁, . . . , v_t. Comparing the coefficients gives the linear system

which can be solved for u₁, . . . , u_t, since the matrix (∊_ki) is invertible modulo 2. The solution gives v₁, . . . , v_t and hence

Similarly, x₂, . . . , x_t can be determined up to sign. Plugging in these values of x_i in System (7.6) and solving another linear system modulo 2 gives the exact signs of all x_i.

Notice that Bob could have selected ∊_ki = δ_ki (where δ is the Dirac delta). For this choice, System (7.6) immediately gives x₁, . . . , x_t. But, in practice, Alice may disagree to respond to such simplistic challenges. Moreover, Bob must not raise any suspicion about a possible malpractice. For a general choice, all Bob has to do additionally is a little amount of simple linear algebra. The parameter t is rather small (typically less than 20); so this extra effort is of little concern to Bob.

Countermeasures

Fault analysis could be a serious threat, especially to smart-card users and certification authorities. We mention here some precautions to guard against such attacks. Some of these work for a general kind of fault attack, the others are specific to the algorithms they plan to protect.

One obvious general strategy is to perform the private-key operation twice and compare the results from the two executions. If the two results disagree, a fault must have taken place. It is then necessary to restart the computation from the beginning. This strategy slows down the implementation by a factor of two. Moreover, latent (permanent) faults cannot be detected by this method—the same error creeps in during every run.
It is sometimes easier to verify the correctness of the output by performing the reverse operation. For instance, after an RSA signature s ≡ m^d (mod n) is generated, one can check whether m ≡ s^e (mod n). If so, one can be reasonably confident about the correctness of s. If the RSA encryption exponent is small (like 3 or 257), this verification is quite efficient.
Ad hoc algorithm-specific tricks often offer effective and efficient checks for errors. Shamir [268] proposes the following check for CRT-based RSA signature generation. One chooses a small random prime r (say, of length ~ 32 bits) and computes s₁ ≡ m^d (mod pr) and s₂ ≡ m^d (mod qr). If s₁ ≢ s₂ (mod r), then one or both of the exponentiations went wrongly. If, on the other hand, s₁ ≡ s₂ (mod r), then s₁ (mod p) and s₂ (mod q) are combined by the CRT.
Maintaining extraneous error-checking data can guard against random bit flips. Parity check bits can detect the existence of single bit flips. Retaining a verbatim copy of a secret information d and comparison of the two copies at strategic instants can help detect more serious faults. It appears unlikely that both the copies can be affected by faults in exactly the same way. For discrete-log-based systems, maintaining d^–1 in tandem with d appears to be a sound approach. Since the bits of d^–1 are not in direct relationship with those of d, an attack on d cannot easily produce the relevant changes in d^–1. As an example, consider the attack on DSA effected by toggling a bit of the secret key d. The second part of the signature can be generated in two ways: by computing t₁ ≡ d′^–1(H(m) + ds) (mod r) using d, and by computing t₂ ≡ d′^–1(d^–1)^–1(d^–1H(m) + s) (mod r) using d^–1. If t₁ ≡ t₂ (mod r), we can be pretty confident that this common value is the correct signature.
Appending random strings to the messages being signed also prevents timing attacks. Such random strings are not known to the adversary and cannot be easily recovered by the verification routine on a faulty signature. Also in this case the signer signs different strings on different occasions, even when the message remains the same.
Hardware countermeasures can also be adopted. Adequately shielded cards resist induced faults. In a situation described by Zheng and Matsumoto, the card should refuse to work instead of generating constant random bits. In the scenario of fault analysis, it, however, appears that robustness can be implanted easily at the software level. At any rate, sloppy hardware designs are never advocated.

Exercise Set 7.2

7.1	Consider the notations of Section 7.2.1. Assume that m_i,j is constant for all i, j (and irrespective of d_2j+1d_2j), but the square times s_i,j and t_i,j vary according to their operands. Device a timing attack on such a system.
7.2	Show that under reasonable assumptions the SPA-resistant Algorithm 7.2 can be crypt-analyzed by timing attacks.
7.3	Recall that SPA of Algorithm 7.1 may leak partial information on the private key (some 00 sequences in the key). Rewrite the algorithm to prevent this leakage.
7.4	Assume that in Bao et al.’s attack on RSA described in the text, the attacker can induce faults in exactly two bit positions of d. Suggest how the two bits of d at these positions can be revealed from the resulting faulty signature.
7.5	Consider a variant of the Bao et al.’s attack on RSA described in the text, in which the valid signature s on m is unknown to the attacker. Explain how the position j of the erroneous bit and the bit d_j at this position can still be identified. [H]
7.6	Bao et al. [17] propose an alternate fault analysis on RSA with square-and-multiply exponentiation. Use the notations (n, e, d, m, s, s_i) as in the text. Assume that the attacker knows an (m, s) pair and can induce a fault in exactly one of the values s_j (and nowhere else) and generate the corresponding faulty signature. Suggest a strategy how the position j and the bit d_j can be recovered in this case.
7.7	Propose a fault attack on the ElGamal signature scheme (Algorithms 5.36 and 5.37), similar to the attack on DSA described in the text.

7.3. Backdoor Attacks

Backdoor attacks on a public-key cryptosystem refer to attacks embedded in the key generation procedure (hardware or software) by the designer of the procedure. A contaminated cryptosystem is one in which the key generation procedure comes with hidden backdoors. A good backdoor attack should meet the following criteria:

To a user, keys generated by the contaminated system should be indistinguishable from those generated by an honest version of the cryptosystem. For example, the parameters and keys must look sufficiently random.
Keys generated by the contaminated system should satisfy the input/output requirements of an honest system. For example, for the RSA cryptosystem the user should be allowed to opt for small public exponents.
A contaminated key generation procedure should not run (on an average) much slower than the honest procedure.
The designer (and nobody else) should have the exclusive capability of determining the secret information from a contaminated published public key.
A user (other than the designer), detecting or suspecting information leakage from a contaminated system, may reverse-engineer the binaries or the smart card to identify the contaminated key generation procedure. The user may even be given the source code of the contaminated routine. Still the user should not be able to steal keys from other users of the same contaminated system. In this sense, a good backdoor protects the designer universally.
A stronger requirement is that reverse-engineering (or source code) should also not allow a user to distinguish (in poly-time) between keys generated by the contaminated procedure and those generated by a genuine procedure. It is exclusively the designer who should possess the capability to make such distinctions in poly-time.

Young and Yung [307] have proposed using public-key cryptography itself for generating backdoors. In their schemes, the attacker (the designer) embeds the encryption routine and the encryption key of the attacker in the key generation procedure of the contaminated system. The decryption key of the attacker is not embedded in the contaminated system and is known only to the attacker. The attacker’s encryption system is assumed to be honest and unbreakable and, thereby, it gives the attacker the exclusive power to decrypt contaminated keys. Young and Yung call such a backdoor a secretly embedded trapdoor with universal protection (SETUP). They also coined the term kleptography to denote such use of cryptography against cryptography.

In the rest of this section, we denote the attacker’s encryption and decryption functions by f_e and f_d respectively. We often do not restrict these functions to public-key routines only. Since public-key routines are slow, symmetric-key routines can be employed in practice. Simple XOR-ing with a fixed bit string (known to the designer) may also suffice. However, for these faster alternatives of f_e, f_d, reverse engineering reveals the symmetric key or the XOR operand to the user who can subsequently mimic the attacker to steal keys generated elsewhere by the same contaminated system.

We use the following shorthand notations. Here, n stands for a positive integer that can be naturally identified with a unique bit string having the most significant (that is, leftmost) bit equal to 1.

\|n\|	=	the bit length of n.
lsb_k(n)	=	the least significant k bits of n.
msb_k(n)	=	the most significant k bits of n.
(a₁ ‖ a₂ ‖ · · · ‖ a_r)	=	the concatenation of the bit strings a₁, a₂, . . . , a_r.

7.3.1. Attacks on RSA

RSA, (seemingly) being the most popular public-key cryptosystem, has been the target of most cryptanalytic attacks. Backdoor attacks are not an exception. The backdoor attacks on RSA work by cleverly hiding some secret information in the public key (n, e) of a user. As earlier, we denote the corresponding private exponent by d and the prime factors of n by p and q.

Hiding prime factor

The simplest attack is to choose a fixed p known to the designer. The other prime q is generated randomly, and correspondingly n = pq and the key pairs (e, d) are computed. Reverse engineering such a scheme is pretty simple, since two different moduli n₁ = pq₁ and n₂ = pq₂ belch out p = gcd(n₁, n₂) easily.

A better approach is given in Algorithm 7.3. The function f_e may be RSA encryption under the designer’s public key. In that case, the RSA modulus of the attacker should be so chosen that the condition e < n is satisfied with good probability. On the other hand, if this modulus is too small, then this scheme will generate values of e much smaller than n.

In order to determine the secret exponent from a public key generated using this scheme, the attacker runs Algorithm 7.4. If f_e and f_d are RSA functions under the attacker’s keys, nobody other than the attacker can apply f_d to generate p from e. This provides the designer with the exclusive capability of stealing keys.

A problem with Algorithm 7.3 is that the attacker has little control over the length of the public exponent e. If the user demands a small modulus (like e = 3 or e = 257), this scheme fails to produce one. Algorithm 7.5 overcomes this difficulty by hiding p in the high order bits of the modulus n (instead of in the exponent e). Young and Yung [307] proposed this algorithm in the name PAP (pretty awful privacy). The name contrasts with PGP (pretty good privacy), a popular and widely used RSA implementation.

Algorithm 7.3. A simple backdoor attack on RSA

Input:

Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d).

Steps:

Generate a random k-bit prime q.
while (1) {
    Generate a random k-bit prime p.
    n := pq.
    e := f_e(p).
    if ((e < n) and (gcd(e, φ(n)) = 1)) {
        Compute d with ed ≡ 1 (mod φ(n)).
        Return (n, e, d).
    }
}

Algorithm 7.4. Retrieving the secret exponent

Input: An RSA public key (n, e).

Output: The corresponding secret (p, q, d) or failure.

Steps:

p := f_d(e).
if (p|n) {
    q := n/p.
    φ := (p – 1)(q – 1).
    d := e^–1 (mod φ).
    Return (p, q, d).
} else {
    /* The key is not generated by Algorithm 7.3 */
    Return failure.
}

Algorithm 7.5 works as follows. Following Young and Yung [307 ], we assume that the attacker uses RSA to realize f_e and f_d. The RSA modulus of the attacker is denoted by N. The attack requires |N| = k, where |p| = |q| = k. To start with, a random prime p of the desired bit length k is generated. This prime is to be encrypted using f_e and so one requires p < N. Instead of encrypting p directly, the attacker uses a permutation function π keyed by K + i for some fixed K and for i = 1, 2, . . . , B, where B is a small bound (typically B = 16). This permutation helps the attacker in two ways. First, one may now have p > N, so a suspicion regarding bounded values of p does not arise. Second, it is cheaper to apply the permutation instead of generating fresh candidates for p. (In an (honest) RSA key generation routine, the prime generation part typically takes the most of the running time.)

Algorithm 7.5. Backdoor attack on RSA: Young and Yung’s PAP scheme

Input: .

Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d).

Steps:

while (1) {
    /* Try to generate a suitable p */
    Generate a random k-bit prime p.
    i = 1.
    while (i ≤ B) {
        p′ := π_K+i(p).    /* Use a keyed permutation π_K+i. */
        if (p′ < N) { break } else { i++ }
    }

    /* Try to generate n and q */
    if (i ≤ B) {
        p″ := f_e(p′).  /* Encrypt p′ by the designere’s public key */
        j := 1.
        while (j ≤ B′) {
            .   /*  is a keyed permutation and |p‴| = k or k – 1. */
            Generate a pseudorandom bit string a of length k.
            X := (p‴ ‖ a).
            q := X quot p.
            if (|q| = k) and (q is prime) {
                n := pq.
                e := 17.
                while (gcd(e, φ(n)) ≠ 1) { e + = 2. }
                d := e^–1 (mod φ(n)).
                Return (n, e, d).
            } else { j ++ }
        }
    }
}

Once a suitable p and the corresponding p′ = π_K+i(p) are generated, the encryption function f_e is applied to generate p″ = f_e(p′). Now, instead of embedding p″ directly in the modulus n, another keyed permutation is applied on p″ to generate . This permutation facilitates investigating several choices for q and so is a faster alternative than restarting the entire process afresh, every time an unsuitable q is computed. A pseudorandom bit string a of length k is appended to p‴ to obtain an approximation X for n. If q := ⌊X/p⌋ happens to be a prime of bit length k, the exact n = pq is computed, else another j is tried. If all values of (for some small bound B′) fail, the entire procedure is repeated with a new k-bit prime p.

For random choices of a, the quotients q = ⌊X/p⌋ behave like random integers and so the probability that q is prime is almost the same as random integers of bit length k. Write X = qp + r with r = X rem p. If r > a, then n = X – r has p‴ – 1 embedded in its higher bits, whereas if r ≤ a, then p‴ itself is embedded in the higher bits of n.

Once suitable p and q are found, the PAP routine generates (like PGP) a small encryption exponent e relatively prime to φ(n) and its inverse d modulo φ(n). One can anyway opt for bigger values of e. In that case, instead of choosing e successively from the sequence 17, 19, 21, 23, . . . one writes one’s customized steps for generating candidate values for e. Choosing small e in Algorithm 7.5 indicates resemblance with PGP and the flexibility of doing so.

The authors of PAP compare their implementation of Algorithm 7.5 with that of the honest PGP key generation procedure. The contaminated routine has been found to run on an average only 20 per cent slower than the honest routine.

Algorithm 7.6 recovers the prime factor p of n from a public key (n, e) generated by PAP, using the RSA decryption function f_d of the attacker. Reverse engineering may make available to the user the permutation functions π and π′, the fixed constants K, B, B′ and the designer’s public key. But this knowledge alone does not empower the user to steal PAP-generated keys.

Algorithm 7.6. Retrieving the prime divisor

Input: An RSA public key (n, e) with n = pq.

Output: The prime divisor p of n or failure.

Steps:

Write n = (U ‖ V) with |V | = k.
for
    for j = 1, 2, . . . , B′ {

        p′ := f_d(p″).
        for i = 1, 2, . . . , B {
            p := (π_K+i)^–1(p′).
            if (p|n) { Return p. }
        }
    }
}
/* (n, e) is not generated by Algorithm 7.5 */
Return failure.

Hiding small private exponent

Another possible backdoor is hiding an RSA key pair (∊, δ) with small δ inside a key pair (e, d). Crépeau and Slakmon [70] realize this backdoor using a result from Boneh and Durfee [32], which describes a polynomial-time (in |n|) algorithm for computing δ from the public key (n, ∊), provided that δ is less than n^0.292. This attack is explained in Algorithm 7.7. Here, the modulus is a genuine random RSA modulus. The mischievous key ∊ is neatly hidden by the attacker’s encryption routine f_e. The resulting output key pair (e, d) looks reasonably random. However, this scheme has a drawback similar to Algorithm 7.3; that is, it cannot easily generate small values of e.

Algorithm 7.7. Backdoor attack on RSA: small private exponent

Input: .

Output: An RSA modulus n = pq with |n| = k and a key pair (e, d).

Steps:

Generate random primes p, q of bit length ~ k/2, such that n := pq has |n| = k.
do {
   Generate random  with gcd(δ, φ(n)) = 1 and |δ| < 0.292|n|.
   ∊ := δ^–1 (mod φ(n)).
   e := f_e(∊).    /* Hide ∊ */
} while (gcd(e, φ(n)) ≠ 1).
d := e^–1 (mod φ(n)).
Return (n, e, d).

Algorithm 7.8 retrieves d from a public key (n, e) generated by Algorithm 7.7.

Algorithm 7.8. Retrieving the secret exponent

Input: An RSA public key (n, e) generated by Algorithm 7.7.

Output: The corresponding private key d.

Steps:

∊ := f_d(e). /* Recover the hidden exponent */
Use Boneh and Durfee’s algorithm to recover δ ≡ ∊^–1 (mod φ(n)).
Use ∊ and δ to compute φ(n).
Compute d ≡ e^–1 (mod φ(n)).

The correctness of Algorithm 7.8 is evident. In order to see how the knowledge of ∊ and δ reveals φ(n), note that x := ∊δ – 1 is a multiple of φ(n); that is,

Equation 7.7

for some integer l. Since δ < n^0.292 and ∊ < n, we have x < n^1.292. But φ(n) ≈ n and so l cannot be much larger than n^0.292. Since |p| ≈ k/2 ≈ |q|, we have l(p+q–1) < n. Now, if we write

x = an + b = (a + 1)n – (n – b)

with a = x quot n and b = x rem n, comparison with Equation (7.7) reveals that l = a + 1. This gives φ(n) = x/l.

Although not needed explicitly here, the factorization of n can be easily obtained by solving the equations pq = n and p + q = n – φ(n) + 1. If ∊ and δ are not small, we may have l(p + q – 1) ≥ n, and φ(n) cannot be calculated so easily as above. A randomized polynomial-time algorithm can still factor n from the knowledge of ∊, δ and n. For the details, solve Exercise 7.9.

Hiding small public exponent

Crépeau and Slakmon propose another backdoor attack based on the following result due to Boneh et al. [33 ]. Let (∊, δ) be a key pair for an RSA modulus n = pq. Further, let and 2^t–1 ≤ ∊ < 2^t. There exists a polynomial-time algorithm that, given n, ∊, and t most significant and |n|/4 least significant bits of δ, recovers the full private exponent δ.

Algorithm 7.9. Backdoor attack on RSA: small public exponent

Input: and .

Output: An RSA modulus n = pq with |n| = k and a key pair (e, d).

Steps:

Generate random primes p, q of bit length ~ k/2, such that n := pq has |n| = k.
do {
   Generate random  with gcd(∊, φ(n)) = 1 and |∊| = t.
   δ := ∊^–1 (mod φ(n)).
   .
}
while (gcd(e, φ(n)) ≠ 1).
d := e^–1 (mod φ(n)).
Return (n, e, d).

Algorithm 7.9 uses f_e to hide in e a small ∊, t most significant bits of δ and |n|/4 least significant bits of δ. A string of bit length 2t + k/4 is encrypted by f_e. Applying the decryption routine f_d on e recovers these hidden values, from which ∊ and δ and hence φ(n) can be obtained. Algorithm 7.10 does this task. This scheme also fails, in general, to produce small public exponents e.

Algorithm 7.10. Retrieving the secret exponent

Input: An RSA public key (n, e) generated by Algorithm 7.9 and the matching .

Output: The corresponding private key d.

Steps:

Compute f_d(e) and retrieve the following:
   (a) the hidden public exponent ∊,
   (b) the t most significant bits of the hidden private exponent δ and
   (c) the |n|/4 least significant bits of δ.
Apply the Boneh-Durfee-Frankel algorithm to recover δ completely.
Use ∊ and δ to compute φ(n).       /* See Exercise 7.9 */
Compute d ≡ e^–1 (mod φ(n)).

7.3.2. An Attack on ElGamal Signatures

We now describe a backdoor attack on the ElGamal signature Algorithm 5.36. This attack does not work when the user’s permanent key pair is generated. It manipulates the session-key generation in such a way that the user’s permanent private key is revealed to the attacker from two successive signatures.

Let p be a prime, g a generator of , and (d, g^d(mod p)) the permanent key pair of Alice. The attacker uses the same field and a key pair (D, g^D (mod p)) with g^D supplied to the signing device. Suppose that Alice signs two messages m₁ and m₂ to generate signatures (s₁, t₁) and (s₂, t₂) respectively, where

The attack proceeds by letting d₁ arbitrary, but by taking

d₂ ≡ (g^D)^d₁ (mod p).

Since , we have

that is,

The private key D of the attacker (or d₁) is required for computing d; so nobody other than the designer can retrieve Alice’s secret by observing the contaminated signatures (s₁, t₁) and (s₂, t₂).

7.3.3. An Attack on ElGamal Encryption

For ElGamal encryption (Algorithm 5.15) and for Diffie–Hellman key exchange (Algorithm 5.27) over , a party (Alice) generates random session key pairs of the form (d′, g^d′(mod p)) and communicates the public session key g^d′ to another party. The following backdoor manipulates the session-key generation in such a way that two public session keys reveal the second private session key (but not the permanent private key). We assume that the attacker learns the public session keys by eavesdropping. The attacker’s key-pair is (D, g^D(mod p)). The contaminated routine contains the public key g^D(mod p), but not the private key D.

Let (d₁, r₁) and (d₂, r₂) be two session keys used by Alice, where

r₁	≡	g^d₁ (mod p),
r₂	≡	g^d₂ (mod p).

The contaminated routine that generates the session keys uses a fixed odd integer u, a hash function H and a random bit to generate d₂ from d₁ as follows:

z	≡	g^d₁+ub(g^D)^d₁ (mod p),
d₂	≡	H(z) (mod p – 1).

The attacker knows r₁ and r₂ by eavesdropping. She computes d₂ by Algorithm 7.11 , the correctness of which is established from that .

Algorithm 7.11. Backdoor attack on ElGamal encryption

.                                                                     /* corresponding to b = 0 */
if (r₂ ≡ g^H(z₀) (mod p)) { Return H(z₀). }
z₁ := z₀g^u (mod p).                                                                   /* corresponding to b = 1 */
if (r₂ ≡ g^H(z₁) (mod p)) { Return H(z₁). }
Return failure.              /* The attackeres routine was not used for key generation. */

Algorithm 7.11 requires the attacker’s private key D (or d₁) and can be performed only by the attacker. Now, d₂ can be analogously used to generate the third session key d₃ and so on, that is, the attacker can steal all the private session keys (except the first).

The odd integer u is used for additional safety. In order to see what might happen without it (that is, with b = 0 always), assume that H can be inverted. This gives z and (mod p). If D is even, y is always a quadratic residue modulo p. If D is odd, y is a quadratic residue or non-residue modulo p depending on whether d₁ is even or odd. The randomly added odd bias u destroys this correlation of z with quadratic residues.

7.3.4. Countermeasures

Using trustworthy implementations (hardware or software) of cryptographic routines (in particular, key generation routines) eliminates or reduces the risk of backdoor attacks. Preferences should be given to software applications with source codes (rather than to the more capable ones without source codes). Random number generators should be given specific attention. Cascading products from different independent sources also minimizes the possibility of hidden backdoors.

If the desired grain of trust is missing from the available products, the only safe alternative is to write the codes oneself. Complete trust on cryptographic devices and packages and using them as black boxes without bothering about the internals is often called black-box cryptography. Users should learn to question black-box cryptography. The motto is: Be aware or bring peril.

Exercise Set 7.3

7.8

Argue that reverse engineering the PAP routine (Algorithm 7.5) can enable a user to distinguish in polynomial time between key pairs generated by PAP and those generated by honest procedures.

7.9

Let n = pq be an RSA modulus and (e, d) a key pair under this modulus. Write ed – 1 = 2^st, where s = v₂(ed – 1) (so that t is odd). Since ed – 1 is a multiple of φ(n) = (p – 1)(q – 1) with odd p, q, we have s ≥ 2.

Show that for any the multiplicative order ord_n(a^t) divides 2^s. [H]
Let be such that a^t has different orders modulo p and modulo q. Show that gcd(a^{2^σt} – 1, n) is a non-trivial divisor of n for some .
Let g be a generator of . Take a := g^k (mod p) for some and let ord_p(a^t) = 2^σ. Show that σ = v₂(p – 1) if k is odd, and σ < v₂(p – 1) if k is even. [H] An analogous result holds for the other prime q.
Demonstrate that there are at least φ(n)/2 elements a in with the property that a^t has different orders modulo p and q. [H]
Suggest a randomized poly-time algorithm for factoring n from the knowledge of n, e and d.

8.2. Quantum Computation

We start with a formal description of quantum computation. Quantum mechanical laws govern this paradigm. We will pay little attention to the physical interpretations of these laws. A mathematical formulation suffices for our purpose.

For defining a quantum mechanical system, we need to enrich our mathematical vocabulary. Let V be a vector space over (or ). Using Dirac’s ket notation we denote a vector ψ in V as |ψ〉.

Definition 8.1.

An inner product (also called a dot product or a scalar product) on V is a function satisfying the following properties:

Positivity For any , the inner product 〈ψ|ψ〉 is real and non-negative. Moreover, 〈ψ|ψ〉 = 0 if and only if |ψ〉 = 0.
Linearity For a₁, and |ψ〉, , we have .
Skew symmetry For any |ψ〉, , we have , where the bar denotes complex conjugate.

A vector space V with an inner product is called an inner product space.

Example 8.1.

For , the space is an inner product space with the inner product of |ψ〉 = (ψ₁, . . . , ψ_n) and defined as

Definition 8.2.

The inner product on a vector space V induces a norm (Definition 2.115) on V:

An inner product space which is complete (Definition 2.119) under the norm induced by its inner product is called a Hilbert space. We will typically consider finite-dimensional Hilbert spaces (over ) and for denote the n-dimensional Hilbert space by .

Definition 8.3.

We define an equivalence relation ~ on a Hilbert space as if and only if for some . An equivalence class under this relation is called a ray in . One typically considers a vector |ψ〉 with 〈ψ|ψ〉 = 1 as a representative of its equivalence class. Such a representative is unique up to multiplication by complex numbers of the form e^iθ.

Definition 8.4.

An orthonormal basis of a Hilbert space is a subset B of with the following properties:

B is a -basis of .
〈ψ|ψ〉 = 1 for every .
for every pair of distinct vectors ψ, .

It is customary to denote the n vectors in an orthonormal basis of by the symbols |0〉, |1〉, . . . , |n – 1〉.

Example 8.2.

|0〉 := (1, 0, 0 . . . , 0), |1〉 := (0, 1, 0, . . . , 0), . . . , |n – 1〉 := (0, 0, . . . , 0, 1) form an orthonormal basis of under the inner product of Example (8.1).

8.2.1. System

The following axiom describes the model of a quantum mechanical system.

Axiom 8.1. First axiom of quantum mechanics

A system is a ray in a (finite-dimensional) Hilbert space (over ).

Definition 8.5.

The simplest non-trivial quantum mechanical system is a ray in a 2-dimensional Hilbert space . Such a system is assumed to be the basic building block of a quantum computer and is called a quantum bit or a qubit.

In order distinguish a qubit from a classical bit, we call the latter a cbit.

has an orthonormal basis {|0〉, |1〉}. In the classical interpretation, a cbit can assume only the two values |0〉 and |1〉, whereas a qubit can assume any value of the form

a|0〉 + b|1〉

with

, |a|² + |b|² = 1.

Such a state of the qubit is called a superposition of the classical states.

Though we don’t care much, at least for the moment, here are two promising candidates for realizing a qubit:

Spin of an electron: The spin of a particle (like electron) in a given direction, say, along the Z-axis, is modelled as a quantum mechanical system with an orthonormal basis consisting of spin up and spin down.
Polarization of a photon: Photons constitute another class of quantum systems, where the two independent states are provided by the polarization of a photon.

A conceptual example of a 2-state quantum system is the Schrödinger cat. The two independent states of a cat, as we classically know, are |alivei〉 and |deadi〉. However, if we think of the cat confined in a closed room and isolated from our observations, quantum mechanics models the state of the cat as a superposition (that is, a complex-linear combination) of these two states. But then if the quantum model were true, opening the room may reveal the cat in a non-trivial state a|alive〉 + b|dead〉 for some complex numbers a, b with |a|² + |b|² = 1. It would indeed be an exciting experience. But alas, quantum mechanics precludes the possibility of such an observation. Read on to know what we would actually see, if we open the room.

8.2.2. Entanglement

A single qubit is too small to build a useful computer. We need to use several (albeit a finite number of) qubits and hence must have a way to describe the combined system in terms of the individual qubits. As the simplest and basis case, we first concentrate on combining two quantum systems into one.

Axiom 8.2. Second axiom of quantum mechanics

Let A and B be two quantum mechanical systems with respective Hilbert spaces and . Let {|i〉_A | i = 0, . . . , m – 1} and {|j〉_B | j = 0, . . . , n – 1} be orthonormal bases of these Hilbert spaces. The quantum mechanical system AB having A and B as its two parts is described by the tensor product

where is an mn-dimensional Hilbert space with an orthonormal basis

{|i〉_A ⊗ |j〉_B | i = 0, . . . ,m – 1 and j = 0, . . . , n – 1}.

We can generalize this construction to describe a system having components A₁, . . . , A_k. If is the Hilbert space of A_i with an orthonormal basis {|j〉_i | 0 ≤ j < n_i}, the composite system A₁ · · · A_k has the n₁ · · · n_k-dimensional Hilbert space with an orthonormal basis comprising the vectors

|j₁〉₁ ⊗ |j₂〉₂ ⊗ · · · ⊗ |j_k〉_k = |j₁〉₁|j₂〉₂ · · · |j_k〉_k = |j₁j₂ . . . j_k〉

with 0 ≤ j_i < n_i for all i = 1, . . . , k.

Definition 8.6.

An n-bit quantum register is a system having exactly n qubits.

Let A₁, . . . , A_n denote the individual bits in an n-bit quantum register A. Each A_i has the Hilbert space with orthonormal basis {|0〉, |1〉}. So A has the 2ⁿ-dimensional Hilbert space with an orthonormal basis consisting of the vectors

|j₁〉 ⊗ |j₂〉 ⊗ · · · ⊗ |j_n〉 = |j₁〉|j₂〉 · · · |j_n〉 = |j₁j₂ · · · j_n〉

with each . Viewed as an integer in binary notation, j₁j₂ . . . j_n is an integral value between 0 and 2ⁿ – 1. This gives us a canonical numbering |0〉, |1〉, . . . , |2ⁿ – 1〉 of the basis vectors for the register A. These 2ⁿ values are precisely the states that a classical n-bit register can have. The quantum register can, however, be in any state |ψ〉 which is a superposition of the classical states:

Let us once again look at the general composite system A = A₁ · · · A_k. In the classical sense, each state of A is composed of the individual states of the subsystems A_i. For example, each of the 2ⁿ classical states of an n-bit register corresponds to a choice between |0〉 and |1〉 for each individual bit. That is, each individual component retains its own state in a classical composite system. This is, however, not the case with a quantum composite system. Just think of a 2-bit quantum register C := AB. A state

|ψ〉_C = c₀|0〉_C + c₁|1〉_C + c₂|2〉_C + c₃|3〉_C

of C equals a tensor product

\|ψ₁〉_A ⊗ \|ψ₂〉_B	=	(a₀\|0〉_A + a₁\|1〉_A) ⊗ (b₀\|0〉_B + b₁\|1〉_B)
	=	a₀b₀\|0〉_C + a₀b₁\|1〉_C + a₁b₀\|2〉_C + a₁b₁\|3〉_C,

if and only if c₀c₃ = c₁c₂.

Definition 8.7.

The state |ψ〉 of a quantum register A = A₁ · · · A_n is called entangled, if |ψ〉 cannot be written as a tensor product of the states of any two parts of A. In other others, |ψ〉 is entangled if and only if no set of fewer than n qubits of A possesses its individual state.

Entanglement essentially implies correlation or interaction between the components. In a composite quantum system, we cannot treat the components individually. A quantum system, as we have defined (axiomatically) earlier, is a completely isolated system. In reality, interactions with the surroundings make a (non-isolated) system change its state and get entangled. This is one of the biggest problems in the realization of a quantum computer. Quantum error correction is an important topic in quantum computation. For our purpose, we stick to the abstract model of an isolated system (quantum register) immune from external disturbances.

8.2.3. Evolution

Quantum registers give us a way to store quantum information. A computation involves manipulating the information stored in the registers. In quantum mechanics, all such operations must be reversible, that is, it must be possible to invert every operation. The only invertible operations on the classical states |0〉, |1〉, . . . , |2ⁿ – 1〉 of an n-bit quantum register A are precisely all the permutations of the classical states. Now that A can be in many more (quantum) states, there are other allowed operations on A. Any such operation must be reversible and of a particular type. This is the third axiom of quantum mechanics, which is detailed shortly.

A classical n-bit register supports many non-invertible operations. For example, erasing the content of the register (that is, resetting all the bits to zero) is a non-invertible process, since the pre-erasure state of the register cannot be uniquely determined after the erase operation is carried out. Classical computation is based on (classical) gates (like NOT, AND, OR, XOR, NOR, NAND), most of which are non-invertible. XOR, as an example, requires two input bits and outputs a single bit. It is impossible to determine the inputs uniquely from the output only. All such non-reversible operations are disallowed in the quantum world. An invertible version of the XOR operation takes two bits x and y as input and outputs the two bits x and x ⊕ y (where ⊕ denotes XOR of bits). Given the output (x, x ⊕ y), the input can be uniquely determined as (x, y) = (x, x ⊕ (x ⊕ y)), that is, by applying the reversible XOR operation once more.

Like XOR, all bit operations that build up a classical computer can be realized using reversible operations only. This gives us the (informal) assurance that quantum computers are at least as powerful as classical computers.

Back to the business—the third axiom of quantum mechanics.

Definition 8.8.

Let U be a square matrix (that is, an m × m matrix for some ) with complex entries. The conjugate transpose of U is denoted by the symbol U^†, that is, if U = (u_ij), then . U is called unitary, if UU^† = U^†U = I, where I is the m × m identity matrix. Every unitary matrix U is invertible with U^–1 = U^†, and preserves the inner product of , that is, for |ψ〉, .

Let A be a quantum system (like a quantum register) with Hilbert space . An m × m unitary matrix U defines a unitary linear transformation on taking a normalized vector |ψ〉 to a normalized vector U|ψ〉. Moreover, the transformation maps an orthonormal basis of to another orthonormal basis of (Exercise 8.4).

Axiom 8.3. Third axiom of quantum mechanics

A quantum system evolves unitarily, that is, any operation on a quantum mechanical system is a unitary transformation.

Example 8.3.

The Hadamard transform H on one qubit is defined as:

(Recall that a linear transformation is completely specified by its images of the elements of a basis.) If one takes and , the Hadamard transform corresponds to the unitary matrix

By linearity, H transforms a general state |ψ〉 = a|0〉 + b|1〉 to the state

Some other unitary operators are described in Exercises 8.5 and 8.6.

An important consequence of quantum mechanical dynamics is that cloning of a state of a system is not permissible. In other words, there does not exist an operator that copies an arbitrary state (content) of one quantum register to another.

Theorem 8.1. No-cloning theorem

For two n-bit registers A and B, there do not exist a unitary transform U of the composite system AB and a state of B, such that for every state of |ψ〉 of A.

Proof

Assume that such a state of B and a unitary transform U of AB exist. Take two states |ψ₁〉 and |ψ₂〉 of A. Then, and . By linearity, we have . Now, since U clones |aψ₁ + bψ₂〉 also, . The two expressions for are different, unless a = 0, b = 1 or a = 1, b = 0.

8.2.4. Measurement

We have seen how to represent a quantum mechanical system and do operations on the system. Now comes the final part of the game, namely observing or measuring or reading the state of a quantum system. In classical computation, reading the value stored in a classical register is a trivial exercise—just read it! In quantum mechanics, this is not the case.

Axiom 8.4. Fourth axiom of quantum mechanics—the Born rule

Let A be a quantum mechanical system with an orthonormal basis {|0〉, |1〉, . . . , |m – 1〉}. Assume that A is in a state . A measurement of A at this state is a mechanism (or device) that outputs one of the integers , and i is output with probability |a_i|². If i is output by the measurement, the system collapses from the state |ψ〉 to the state |i〉 after the measurement.

This means that whatever the state |ψ〉 of A was before the measurement, the process of measurement can reveal only one of m possible integer values. Moreover, the measurement causes a total loss of information about the pre-measurement amplitudes a_i. Thus, it is impossible to measure A repeatedly at the state |ψ〉 to see a statistical pattern in the occurrences of different values of i so as to guess the probabilities |a_i|².

If we open the room, we can see the Schrödinger cat in only one of the two possible states: |alivei or |deadi. Well, then, what else can we expect? Quantum mechanics only models the cat in the isolated room as one evolving following the unitary dynamics.

At first glance, this is rather frustrating. We claim that the system went through a series of classically meaningless states, but the classical states are all we can see. What is the guarantee that the system really evolved in the quantum mechanical way? Well, there is no guarantee actually. The solace is that the axioms of quantum mechanics can explain certain natural phenomena. Also it is perfectly consistent with the classical behaviour in that if the system A evolves classically and is measured at the state |i〉 (so that a_i = 1 and a_j = 0 for j ≠ i), measuring A reveals i with probability one and causes the system to collapse to the state |i〉, that is, to remain in the state |i〉 itself.

There is a positive side of the quantum mechanical axioms. A quantum mechanical system is inherently parallel. An n-bit classical register at any point of time can hold only one of the classical values |0〉, . . . , |2ⁿ – 1〉. An n-bit quantum register, on the other hand, can simultaneously hold all these classical values, with respective probabilities. This inherent parallelism seems to impart a good deal of power to a computing device. Of course, as long as we cannot harness some physical objects to build a real quantum mechanical computing device, quantum computation continues to remain science fiction. But on an algorithmic level, the inherent parallelism of a (hypothetical) quantum computer can be exploited to do miracles, for example, to design a polynomial-time integer factorization algorithm. This is where we win—at least conceptually. Our failure to see a cat in the state (|alive〉 – |dead〉) should not bother us at all!

Measurement of a quantum register gives us a way to initialize a quantum register A to a state |ψ〉. Suppose that we get the value i upon measuring A. We then apply any unitary transform on A that changes A from the post-measurement state |i〉 to the desired state |ψ〉.

The measurement described in Axiom 8.4 is called measurement in the classical basis. The system A has, in general, many orthonormal bases other than the classical one {|0〉, . . . , |m – 1〉}. If B is any such basis, we can conceive of measuring A in the basis B. All we need to perform is to rewrite the state of A in terms of the new basis B. This can be achieved by applying to A a unitary transformation (the change-of-basis transformation) before the measurement in the classical basis is carried out.

A generalization of the Born rule is also worth mentioning here. Suppose that we have an m + n-bit quantum register A and we want to measure not all but some of the bits of A. To be more specific, let us say that we want to measure the leftmost m bits of A, though the generalized Born rule works for any arbitrary choice of m bit positions in the register A. Denoting by |i〉_m, i = 0, . . . , 2^m – 1, the canonical basis vectors for the left m bits and by |j〉_n, j = 0, . . . , 2ⁿ – 1, those for the right n bits, a general state of A can be written as

with Σ_i,j|a_i,j|² = 1 and with |i, j〉_m+n identified as |i〉_m|j〉_n = |i〉_m ⊗ |j〉_n. A measurement of the left m bits of A yields an integer i, 0 ≤ i ≤ 2^m – 1, with probability . Also this measurement causes A to collapse to the state .

Now, if we immediately apply the generalized Born rule once again on the right n bits of A, we get an integer j, 0 ≤ j ≤ 2ⁿ – 1, with probability |a_i,j|²/p_i and the system collapses to the state |i〉_n|j〉_n. The probability of getting |i〉_n|j〉_n by this two step process is then p_i|a_i,j|²/p_i = |a_i,j|². This is consistent with a single application of the original Born rule.

8.2.5. The Deutsch Algorithm

We start with a general framework of doing computations using quantum registers. Suppose we want to compute a function f which requires an m-bit integer as input and which outputs an n-bit integer. A general function f need not be invertible, but we cannot afford non-invertible operations on quantum registers. This is why we work on an m + n-bit quantum register A in which the left m bits represent the input and the right n bits the output. Computing f(x) for a given x is tantamount to designing a unitary transformation U_f that acts on A and converts its state from |x〉_m|y〉_n to |x〉_m|f(x) ⊕ y〉_n, where ⊕ is the bitwise XOR operation, and where the subscripts (m and n) indicate the number of bits in the input or output part of A. It is easy to verify that U_f is unitary. Moreover, the inverse of U_f is U_f itself. For y = 0, we, in particular, have U_f (|x〉_m|0〉_n) = |x〉_m|f(x)〉_n.

It may still be unclear to the reader what one really gains by using this quantum model. The answer lies in the parallelism inherent in a quantum register. In order to see how this parallelism can be exploited, we describe David Deutsch’s algorithm which, being the first known quantum algorithm, has enough historical importance to be included here in spite of its apparent irrelevance in the context of cryptology.

Assume that f : {0, 1} → {0, 1} is a function that operates on one bit and outputs one bit. There are four such functions: Two of these are constant functions (f(0) = f(1)) and the remaining two non-constant (f(0) ≠ f(1)). We are given a black box D_f representing f. We don’t know which one of the four functions D_f actually implements, but we can supply a bit to D_f as input and read its output on this bit. Our task is to determine whether D_f represents a constant function or not. Classically, we make two invocations of D_f on the inputs 0 and 1 and make a comparison of the output values f(0) and f(1). It is impossible to solve the problem classically using only one invocation of the black box. The Deutsch algorithm makes this task possible using quantum computational techniques.

Following the general quantum computational model we assume that D_f is a unitary transformation on a 2-bit register A (with m = n = 1) that computes D_f |x〉|y〉 = |x〉|f(x) ⊕ y〉 with the left (resp. the right) bit corresponding to the input (resp. the output) of f. Instead of supplying a classical input to D_f we initialize the register A to the state

Linearity shows that on this input, D_f ends its execution leaving A in the state

Here, . We won’t measure A right now, but apply the Hadamard transform on the left bit. This transforms A to the state

Now, if we measure the input bit, we deterministically get the integer 1 or 0 according as whether f is constant or not respectively. That’s it!

Deutsch’s algorithm solved a rather artificial problem, but it opened up the possibilities of exploring a new paradigm of computation. Till date, (good) quantum algorithms are known for many interesting computational problems. In the rest of this chapter, we concentrate on some of the quantum algorithms that have an impact in cryptology.

Exercise Set 8.2

8.1

Let S be a finite set and let l²(S) denote the set of all functions

Show that l²(S) is a Hilbert space under the inner product
Let , where δ_x(y) is 1 if y = x, and is 0 otherwise. Show that B is an orthonormal basis of l²(S).

8.2

Show that the vectors

and

form an orthonormal

-basis of

8.3

Show that

is an entangled state of a 2-bit quantum register.

8.4

Prove the following assertions.

The matrix is unitary.
A unitary matrix preserves inner product, that is, if U is an m × m unitary matrix and |ψ〉, , then .
The determinant of a unitary matrix has absolute value 1.
Every eigen value of a unitary matrix has absolute value 1.
An m × m matrix A is unitary if and only if the columns of A constitute an orthonormal basis of (over ).

8.5

Show that the following operators are unitary on a qubit. Also construct the corresponding transformation matrices.
Identity operator I|0〉 = |0〉, I|1〉 = |1〉.
Exchange operator X|0〉 = |1〉, X|1〉 = |0〉.
Z operator Z|0〉 = |0〉, Z|1〉 = –|1〉.
Hadamard operator .
Deduce the following identities:
Let . Show that defines a unitary operator on a qubit and that , where the last X is the matrix of the exchange operator.

8.6

Let A be an n-bit quantum register. Let us plan to number the bits of A as 1, . . . , n from left to right. One can apply the operators like X, Z, H of Exercise 8.5 on each individual bit of A. A qubit operation B applied on bit i of A will be denoted by B_i.

Let S_ij be the operator that swaps bit i with bit j. Show that
Let C be the reversible XOR operation (also called the controlled-NOT operation) on a two-bit register A = (A₁A₂), that is, C|xy〉 = |x〉|x ⊕ y〉. Show that C can be realized as

8.7

Suppose that whenever you switch on your quantum computer, every bit in its registers is initialized to the state |0〉. Describe how you can use the operators I, X, Z and H defined in Exercise 8.5, in order to change the state of a qubit from |0〉 to the following:

|1〉
–|1〉

8.8

Let A be an n-bit quantum register at the state |0|〉_n. Show that the application of the Hadamard transform individually to each bit of A transforms A to the state

. This is precisely the state of A in which all of the 2ⁿ possible outcomes in a measurement of A are equally likely. What happens if we apply H a second time individually to each bit of A, that is, what is H₁H₂ · · · H_n|ψ〉, where H_i denotes the Hadamard transform on the i-th bit of A?

8.9

We know that any arithmetic or Boolean operation can be implemented using AND and NOT gates. This exercise suggests a reversible way to implement these operations. The Toffoli gate is a function T : {0, 1}³ → {0, 1}³ that maps (x, y, z) ↦ (x, y, z ⊕ xy), where ⊕ means XOR, and xy means AND of x and y. Thus, T flips the third bit, if and only if the first two bits are both 1.

Show that T is a unitary transformation on a 3-bit quantum register. What is the inverse of T?
Use T to realize the Boolean AND and NOT operations.

8.4. Quantum Cryptanalysis

The quantum parallelism has been effectively exploited to design fast (polynomial-time) algorithms to solve some of the intractable mathematical problems discussed in Chapter 4. With the availability of quantum computers, cryptographic systems that derive their security from the intractability of these problems will be unusable (completely insecure). Nobody, however, has the proof that these intractable problems cannot have fast classical algorithms. It is interesting to wait and see which (if any) is invented first, a quantum computer or a polynomial-time classical algorithm.

Let us set up some terminology for the rest of this chapter. Let P be a unitary operator on a qubit. One can apply P individually on the i-th bit of an n-bit register. In this case, we denote the operation by P_i. If P_i is operated for each i = 1, . . . , n (in succession or simultaneously), then we abbreviate P₁ · · · P_n by the short-hand notation P⁽ⁿ⁾. The parentheses distinguish the operation from Pⁿ which is the n-fold application of P on a single qubit.

If P and Q are unitary transforms on n₁- and n₂-bit quantum registers respectively, we let P ⊗ Q denote the unitary transform on an n₁ + n₂-bit register, with P operating on the left n₁ bits and Q on the right n₂ bits of the register.

8.4.1. Shor’s Algorithm for Computing Period

Let N := 2ⁿ for some . Let be a periodic function with (least) period r, that is, f(x + kr) = f(x) for every x, . Suppose further that 1 ≪ r ≤ 2^n/2 and also that f(0), f(1), . . . , f(r – 1) are pairwise distinct. Shor proposed an algorithm for an efficient computation of the period r in this case.

Let’s first look at the problem classically. If one evaluates f at randomly chosen points, by the birthday paradox (Exercise 2.172) one requires evaluations of f on an average in order to find two different integers x and y with f(x) = f(y). But then r|(x – y). If sufficiently many such pairs (x, y) are available, the period can be obtained by computing the gcd of the integers x – y. If r is large, say, r = O(2^n/2), this gives us an algorithm for computing r in expected time exponential in n. Shor’s quantum algorithm determines r in expected time polynomial in n.

Let us assume that we have an oracle U_f which, on input the 2n-bit value |x〉_n|y〉_n, computes |x〉_n|f(x) ⊕ y〉_n. We prepare a 2n-bit register A in the state |0〉_n|0〉_n. Then, we apply the Hadamard transform H⁽ⁿ⁾ on the left n-bits. By Exercise 8.8, the state of A becomes

Supplying this state as the input to the oracle U_f yields the state

We then measure the output register (right n bits). By the generalized Born rule, we get a value for some and the state of the register A collapses to the uniform superposition of all those |x〉|f(x)〉 for which f(x) = f(x₀). By the given periodicity properties of f, the post-measurement state of the input register (left n bits) can be written as

Equation 8.1

for some M determined by the relations:

x₀ + (M – 1)r < N ≤ x₀ +Mr.

This is an interesting state, for if we were allowed to make copies of this state and measure the different copies, we could collect some values x₀+j₁r, . . . , x₀+j_kr, which in turn would reveal r with high probability. But the no-cloning theorem disallows making copies of quantum states. Shor proposed a trick to work around with this difficulty. He considered the following transform:

Equation 8.2

By Exercise 8.13, F is a unitary transform. F is known as the Fourier transform. Applying F to State (8.1) transforms the input register to the state

A measurement of this state gives an integer with the probability

Application of the Fourier transform to State (8.1) helps us to concentrate the probabilities of measurement outcomes in strategic states. More precisely, consider a value of y, where –1/2 ≤ ∊_k < 1/2, that is, a value of y close to an integral multiple of N/r. In this case,

The last summation is that of a geometric series and we have

Now, we use the inequalities for 0 ≤ x ≤ π/2 and the facts that rM ≈ N and that to get

Since has about r positive integral multiples less than N and each such multiple has a closest integer y_k for some k, the probability that we obtain one such y_k as the outcome of the measurement is at least 4/π² = 0.40528 . . . , that is, after O(1) iterations of the above procedure we get some y_k. The Fourier transform increases the likelihood of getting some y_k to a level bounded below by a positive constant.

What remains is to show that r can be retrieved from such a useful observation y_k. We have . If a/b and c/d are two distinct rationals with b, and with and , then by the triangle inequality we have . On the other hand, since a/b ≠ c/d, we have , a contradiction. Therefore, since , there is a unique rational k/r satisfying , and this rational k/r can be determined by efficient classical algorithms, for example, using the continued fraction expansion^[2] of y_k/N.

^[2] Consult Zuckerman et al. [316] to learn about continued fractions and their applications in approximating real numbers.

If gcd(k, r) = 1, we get r. We can verify this by checking if f(x) = f(x + r). If gcd(k, r) > 1, we get a factor of r. Repeating the entire procedure gives another k′/r, from which we get (hopefully) another factor of r (if not r itself). After a few (O(1)) iterations, we obtain r as the lcm of its factors obtained.

Much of the quantum magic is obtained by the use of the Fourier transform F on a suitably prepared quantum register. The question is then how easy it is to implement F. We will not go to the details, but only mention that a circuit consisting of basic quantum gates and of size O(n²) can be used to realize the Fourier transform (cf. Exercise 8.14).

To sum up, we have a polynomial-time (in n) randomized quantum algorithm for computing the period r of f. This leads to efficient quantum algorithms for solving many classically intractable problems of cryptographic significance.

8.4.2. Breaking RSA

Let m = pq with p, . We have φ(m) = (p – 1)(q – 1). Choose an RSA key pair (e, d) with gcd(e, φ(m)) = 1 and ed ≡ 1 (mod φ(m)). Given a message the ciphertext message is b ≡ a^e (mod m). The task of a cryptanalyst is to compute a from the knowledge of m, e and b. If gcd(b, m) > 1, then this gcd is a non-trivial factor of m. So assume that . But then also. Since b ≡ a^e (mod m), b is in the subgroup of generated by a. Similarly, a ≡ b^d (mod m), that is, a is in the subgroup of generated by b. It follows that these two subgroups are equal and, in particular, the multiplicative orders of a and b modulo m are the same. This order—call it r—divides φ(m) and hence is ≤ (p – 1)(q – 1) < m.

Choose with N := 2ⁿ ≥ m² > r². The function sending x ↦ b^x (mod m) is periodic of (least) period r. By Shor’s algorithm, one computes r efficiently. Since gcd(e, φ(m)) = 1 and r|φ(m), we have gcd(e, r) = 1, that is, using the extended gcd algorithm one obtains an integer d′ with d′e ≡ 1 (mod r). But then b^d′ ≡ a^d′e ≡ a (mod m).

The private key d is the inverse of e modulo φ(m). It is not necessary to compute d for decrypting b. The inverse d′ of e modulo r = ord_m(a) = ord_m(b) suffices.

8.4.3. Factoring Integers

Let m be a composite integer that we want to factor. Choose a non-zero integer . If gcd(a, m) > 1, then we already know a non-trivial factor of m. So assume that gcd(a, m) = 1, that is, . Let r := ord_m(a).

As in the case of breaking RSA, choose with N := 2ⁿ ≥ m² > r². The function , x ↦ a^x (mod m), is periodic of least period r. Shor’s algorithm computes r. If r is even, we can write:

(a^r/2 – 1)(a^r/2 + 1) ≡ 0 (mod m).

Since ord_m(a) = r, a^r/2 – 1 ≢ 0 (mod m). If we also have a^r/2 + 1 ≢ 0 (mod m), then gcd(a^r/2 + 1, m) is a non-trivial factor of m. It can be shown that the probability of finding an even r with a^r/2 + 1 ≢ 0 (mod m) is at least half (cf. Exercise 4.9). Thus, trying a few integers one can factor m.

8.4.4. Computing Discrete Logarithms

A variant of Shor’s algorithm in Section 8.4.1 can be used to compute discrete logarithms in the finite field , , . For the sake of simplicity, let us concentrate only on prime fields (s = 1). Let g be a generator of and our task is to compute for a given an integer with a ≡ g^r (mod p). We assume that p is a large prime, that is, p is odd.

Choose with N := 2ⁿ satisfying p < N < 2p. We use a 3n-bit quantum register A in which the left 2n bits constitute the input part and the right n bits the output part. The input part is initialized to the uniform superposition of all pairs , that is, A has the initial state:

(see Exercise 8.15). Then, we use an oracle

U_f : |x〉_n|y〉_n|z〉_n ↦ |x〉_n|y〉_n|f(x, y) ⊕ z〉_n

to compute the function f(x, y) := g^xa^–y (mod p) in the output register. Applying U_f transforms A to the state

Measurement of the output register now gives a value z ≡ g^k (mod p) for some and causes the input register to jump to the state

Note that g^xa^–y ≡ g^k (mod p) if and only if x – ry ≡ k (mod p – 1), that is, only those pairs (x, y) that satisfy this congruence contribute to the post-measurement state. For each value of y modulo p – 1, we get a unique x ≡ ry + k (mod p – 1), that is, there are exactly p – 1 such pairs (x, y).

If we were allowed make copies of this state and observe two copies separately, we would get pairs (x₁, y₁) and (x₂, y₂) with x₁ – ry₁ ≡ x₂ – ry₂ ≡ k (mod p – 1). Now, if gcd(y₁ – y₂, p – 1) = 1, we would get r ≡ (y₁ – y₂)^–1 (x₁ – x₂) (mod p – 1). But we are not allowed to copy quantum states. So Shor used his old trick, that is, applied the Fourier transforms

to obtain the state

A measurement of the input register at this state yields with probability:

Equation 8.3

As in Shor’s period-finding algorithm, we now require to identify a set of useful pairs (u, v) which are sufficiently many in number so as to make the probability of observing one of them bounded below by a positive constant. We also need to demonstrate how a useful pair can reveal the unknown discrete logarithm r of a. The jugglery with inequalities and approximations is much more involved in this case. Let us still make a patient attempt to see the end of the story.

First, we eliminate one of x, y from Equation (8.3). Since x ≡ ry + k (mod p – 1) and 0 ≤ x ≤ p – 2, we have x = (ry + k) rem . But then . Let j be the integer closest to u(p – 1)/N, that is, u(p – 1) = jN + ∊ with , –N/2 < ∊ ≤ N/2. This yields

Equation 8.4

where

Equation 8.5

Since is an integer, substituting Equation (8.4) in Equation (8.3) gives

Writing S = lN + σ with –N/2 < σ ≤ N/2 then gives

We now impose the usefulness conditions on u, v:

Equation 8.6

Equation 8.7

Involved calculations show that the probability p_u,v for a (u, v) satisfying these two conditions is at least . Let us now see how many pairs (u, v) satisfy the conditions. From Equation (8.5), it follows that for each u there exists a unique v, such that Condition (8.6) is satisfied. Condition (8.7), on the other hand, involves only u. If w := v₂(p – 1), then 2^w must divide ∊. For each multiple of 2^w not exceeding N/12 in absolute value, we get 2^w distinct solutions for u modulo N. (We are solving for u the congruence u(p – 1) ≡ ∊ (mod 2ⁿ).) There is a total of at least N/12 of them. Therefore, the probability of making any one of the useful observations (u, v) is at least , since N < 2p.

We finally explain the extraction of r from a useful observation (u, v). Condition (8.6) and Equation (8.5) give . Dividing throughout by N and using the fact that u(p – 1) = jN + ∊, we get

that is, the fractional part of must lie between and . The measurement of the input gives us v and we know N. We approximate to the nearest multiple of and get rj ≡ λ (mod p – 1). Now, j, being the integer closest to u(p – 1)/N, is also known to us. If gcd(j, p – 1) = 1, we have r ≡ j^–1λ (mod p – 1). We don’t go into the details of determining the likelihood of the invertibility of j modulo p – 1. A careful analysis shows that Shor’s quantum discrete-log algorithm runs in probabilistic polynomial time (in n).

Exercise Set 8.4

8.13	Let F be the Fourier Transform (8.2). For basis vectors \|x〉 and \|x′〉, show that Conclude that F is a unitary transform.
8.14	Let N = 2ⁿ. Let x, have binary expansions (x_n–1 · · · x₁x₀)₂ and (y_n–1 · · · y₁y₀)₂ respectively. Show that xy/N equals an integer plus the quantity y_n–1 (.x₀) + y_n–2(.x₁x₀) + y_n–3(.x₂x₁x₀) + · · · + y₀(.x_n–1 x_n–2 . . . x₀), where . Deduce that the quantum Fourier Transform (8.2) can be written as where the i-th expression in parentheses applies to the i-th bit from the left.
8.15	Let , N := 2ⁿ and . Consider an (n + 1)-bit quantum register with input consisting of the left n bits and the output the rightmost bit. Suppose there is an oracle U_f that takes an n-bit input x and outputs the bit: First prepare the register in the state . Then, apply U_f on this register and finally measure the output bit. Describe the state of the input register after this measurement depending on the outcome of the measurement.
8.16	Recall that the Fourier Transform (8.2) is defined for N equal to a power of 2. It turns out that for such values of N the quantum Fourier transform is easy to implement. For this exercise, assume hypothetically that one can efficiently implement F for other values of N too. In particular, take N = p – 1 in Shor’s quantum discrete-log algorithm. Show that in this case, the probability p_u,v of Equation (8.3) becomes: [View full size image] Conclude that an outcome (u, v) of measuring the input register yields r ≡ –u^–1v (mod p – 1), provided gcd(u, p – 1) = 1.

A.2. Block Ciphers

Block ciphers encrypt plaintext messages in blocks of fixed lengths and are more ubiquitously used than public-key encryption routines. In a sense, public-key encryption is also block encryption. Since public-key routines are much slower than (secret-key) block ciphers, it is a custom to use public-key algorithms only in specific situations, for example, for encrypting single blocks of data, like keys of symmetric ciphers.

In the rest of this chapter, we use the word bit in the conventional sense, that is, to denote a quantity that can take only two possible values, 0 and 1. It is convenient to use the symbol to refer to the set {0, 1}. We also let stand for the set of all bit strings of length m. Whenever we plan to refer to the field (or group) structure of , we will use the alternative notation .

Definition A.1.

A block cipher f of block-size n and of key-size r is a map

that encrypts a plaintext block m of bit length n to a ciphertext block c of bit length n under a key K, a bit string of length r. To ensure unique decryption, the map

for a fixed key K has to be a permutation of (that is, a bijective function on) . In that case, the decryption of c to get back m is carried out as .

A good block cipher has the following desirable properties:

The sizes n and r should be big enough, so that an adversary cannot exhaustively check all possibilities of m or K in feasible time.
For most, if not all, keys K, the permutations f_K should be sufficiently random. In other words, if the key K is not known, it should be computationally infeasible to guess the functions f_K and . That is, it should be difficult to guess c from m or m from c, unless the key K is provided. The identity map on , though a permutation of , is a bad candidate for an encryption function f_K. It is also desirable that the functions f_K for different values of K are unpredictably selected from the set of all permutations of . Thus, for example, taking f_K to be a fixed permutation for all choices of K leads to a poor design of a block cipher f.
For most, if not all, pairs of distinct keys K₁ and K₂, the functions g_K₁ ο g_K₂ should not equal g_K for some key K, where g stands for f or f^–1 with independent choices in the three uses. A more stringent demand is that the subgroup generated by the permutations f_K for all possible keys K should be a very big subset of the group of all permutations of . If g_K = g_K₁ ο g_K₂ ο · · · ο g_{K_t} for some t ≥ 2, multiple encryption (see Section A.3) forfeits its expected benefits.

A block cipher provably possessing all these good characteristics (in particular, the randomness properties) is difficult to construct in practice. Practical block ciphers are manufactured for reasonably big n and r and come with the hope of representing reasonably unpredictable permutations. We dub a block cipher good or safe, if it stands the test of time. Table A.1 lists some widely used block ciphers.

Table A.1. Some popular block ciphers
Name	n	r
DES (Data Encryption Standard)	64	56
FEAL (Fast Data Encipherment Algorithm)	64	64
SAFER (Secure And Fast Encryption Routine)	64	64
IDEA (International Data Encryption Algorithm)	64	128
Blowfish	64	≤ 448
Rijndael, accepted as AES (Advanced Encryption Standard) by NIST (National Institute of Standards and Technology, a US government organization)	128/192/256	128/192/256

A.2.1. A Case Study: DES

The data encryption standard (DES) was proposed as a federal information processing standard (FIPS) in 1975. DES has been the most popular and the most widely used among all block ciphers ever designed. Although its relatively small key-size offers questionable security under today’s computing power, DES still enjoys large-scale deployment in not-so-serious cryptographic applications.

DES encryption requires a 64-bit plaintext block m and a 56-bit key K.^[1] Let us plan to use the notations DES_K and to stand respectively for DES encryption and decryption functions under the key K.

^[1] A DES key K = k₁k₂ . . . k₆₄ is actually a 64-bit string. Only 56 bits of K are used for encryption. The remaining 8 bits are used as parity-check bits. Specifically, for each i = 1, . . . , 8 the bit k_8i is adjusted so that the i-th byte (k_{8i – 7}k_{8i – 6} . . . k_8i) has an odd number of one-bits.

DES key schedule

The DES algorithm first computes sixteen 48-bit keys K₁, K₂, . . . , K₁₆ from K using a procedure known as the DES key schedule described in Algorithm A.1. These 16 keys are used in the 16 rounds of encryption. The key schedule uses two fixed permutations PC1 and PC2 described after Algorithm A.1 and to be read in the row-major order. Here, PC is an abbreviation for permuted choice.

Algorithm A.1. The DES key schedule

Input: A DES key K = k₁k₂ . . . k₆₄ (containing the parity-check bits).

Output: Sixteen 48-bit round keys K₁, K₂, . . . , K₁₆.

Steps:

Use PC1 to generate .
Write U₀ = C₀ ‖ D₀ with C₀, .
for i = 1, 2, . . . ,16 {
   Take
   Cyclically left shift C_{i – 1} by s bits to get C_i.
   Cyclically left shift D_{i – 1} by s bits to get D_i.
   Let .
   Compute the i-th round key K_i := PC2(U_i) = u₁₄u₁₇u₁₁ . . . u₂₉u₃₂.
}

PC1
57	49	41	33	25	17	9
1	58	50	42	34	26	18
10	2	59	51	43	35	27
19	11	3	60	52	44	36
63	55	47	39	31	23	15
7	62	54	46	38	30	22
14	6	61	53	45	37	29
21	13	5	28	20	12	4

PC2
14	17	11	24	1	5
3	28	15	6	21	10
23	19	12	4	26	8
16	7	27	20	13	2
41	52	31	37	47	55
30	40	51	45	33	48
44	49	39	56	34	53
46	42	50	36	29	32

DES encryption

DES encryption, as described in Algorithm A.2, proceeds in 16 rounds. The i-th round uses the key K_i (obtained from the key schedule) in tandem with the encryption primitive e. A fixed permutation IP and its inverse IP^–1 are also used.^[2]

^[2] A block cipher that executes several encryption rounds with the i-th round computing the two halves as L_i := R_{i – 1} and R_i := L_{i – 1} ⊕ e(R_{i – 1}, K_i) for some round key K_i and for some encryption primitive e, is called a Feistel cipher. Most popular block ciphers mentioned earlier are of this type. Rijndael is an exception, and its acceptance as the new standard has been interpreted as an end of the Feistel dynasty.

It requires a specification of the round encryption function e to complete the description of DES encryption. The function e can be compactly depicted as:

e(X, J) := P(S(E(X) ⊕ J)),

Algorithm A.2. DES encryption

Input: Plaintext block m = m₁m₂ . . . m₆₄ and the round keys K₁, . . . , K₁₆.

Output: The ciphertext block .

Steps:

Apply the initial permutation on m to get

Write V = L₀ ‖ R₀ with L₀, .
for i = 1, 2, . . . , 16 {
   /* The i-th encryption round */
   L_i := R_{i – 1}.
   R_i := L_{i – 1} ⊕ e(R_{i – 1}, K_i).
}
Let .
Apply the inverse of the initial permutation on W to get the ciphertext block
   .

IP
58	50	42	34	26	18	10	2
60	52	44	36	28	20	12	4
62	54	46	38	30	22	14	6
64	56	48	40	32	24	16	8
57	49	41	33	25	17	9	1
59	51	43	35	27	19	11	3
61	53	45	37	29	21	13	5
63	55	47	39	31	23	15	7

IP^–1
40	8	48	16	56	24	64	32
39	7	47	15	55	23	63	31
38	6	46	14	54	22	62	30
37	5	45	13	53	21	61	29
36	4	44	12	52	20	60	28
35	3	43	11	51	19	59	27
34	2	42	10	50	18	58	26
33	1	41	9	49	17	57	25

where is an expansion function, is a contraction function and P is a fixed permutation of (called the permutation function). S uses eight S-boxes (substitution boxes) S₁, S₂, . . . , S₈. Each S-box S_j is a 4 × 16 matrix with each row a permutation of 0, 1, 2, . . . , 15 and is used to convert a 6-bit string y₁y₂y₃y₄y₅y₆ to a 4-bit string z₁z₂z₃z₄ as follows. Let μ denote the integer with binary representation y₁y₆ and ν the integer with binary representation y₂y₃y₄y₅. Then, z₁z₂z₃z₄ is the 4-bit binary representation of the μ, ν-th entry in the matrix S_j. (Here, the numbering of the rows and columns starts from 0.) In this case, we write S_j(y₁y₂y₃y₄y₅y₆) = z₁z₂z₃z₄. Algorithm A.3 provides the description of e.

Algorithm A.3. The DES round encryption primitive e

Input: and .

Output: e(X, J).

Steps:

Y := E(X) ⊕ J (where E(x₁x₂ . . . x₃₂) = x₃₂x₁x₂ . . . x₃₂x₁).
Write Y = Y₁ ‖ Y₂ ‖ . . . ‖ Y₈ with each .
for
.
(where P(z₁z₂ . . . z₃₂) = z₁₆z₇z₂₀ . . . z₄z₂₅).

The tables for E and P are as follows.

E
32	1	2	3	4	5
4	5	6	7	8	9
8	9	10	11	12	13
12	13	14	15	16	17
16	17	18	19	20	21
20	21	22	23	24	25
24	25	26	27	28	29
28	29	30	31	32	1

P
16	7	20	21
29	12	28	17
1	15	23	26
5	18	31	10
2	8	24	14
32	27	3	9
19	13	30	6
22	11	4	25

Finally, the eight S-boxes are presented:

S₁
14	4	13	1	2	15	11	8	3	10	6	12	5	9	0	7
0	15	7	4	14	2	13	1	10	6	12	11	9	5	3	8
4	1	14	8	13	6	2	11	15	12	9	7	3	10	5	0
15	12	8	2	4	9	1	7	5	11	3	14	10	0	6	13

S₂
15	1	8	14	6	11	3	4	9	7	2	13	12	0	5	10
3	13	4	7	15	2	8	14	12	0	1	10	6	9	11	5
0	14	7	11	10	4	13	1	5	8	12	6	9	3	2	15
13	8	10	1	3	15	4	2	11	6	7	12	0	5	14	9

S₃
10	0	9	14	6	3	15	5	1	13	12	7	11	4	2	8
13	7	0	9	3	4	6	10	2	8	5	14	12	11	15	1
13	6	4	9	8	15	3	0	11	1	2	12	5	10	14	7
1	10	13	0	6	9	8	7	4	15	14	3	11	5	2	12

S₄
7	13	14	3	0	6	9	10	1	2	8	5	11	12	4	15
13	8	11	5	6	15	0	3	4	7	2	12	1	10	14	9
10	6	9	0	12	11	7	13	15	1	3	14	5	2	8	4
3	15	0	6	10	1	13	8	9	4	5	11	12	7	2	14

S₅
2	12	4	1	7	10	11	6	8	5	3	15	13	0	14	9
14	11	2	12	4	7	13	1	5	0	15	10	3	9	8	6
4	2	1	11	10	13	7	8	15	9	12	5	6	3	0	14
11	8	12	7	1	14	2	13	6	15	0	9	10	4	5	3

S₆
12	1	10	15	9	2	6	8	0	13	3	4	14	7	5	11
10	15	4	2	7	12	9	5	6	1	13	14	0	11	3	8
9	14	15	5	2	8	12	3	7	0	4	10	1	13	11	6
4	3	2	12	9	5	15	10	11	14	1	7	6	0	8	13

S₇
4	11	2	14	15	0	8	13	3	12	9	7	5	10	6	1
13	0	11	7	4	9	1	10	14	3	5	12	2	15	8	6
1	4	11	13	12	3	7	14	10	15	6	8	0	5	9	2
6	11	13	8	1	4	10	7	9	5	0	15	14	2	3	12

S₈
13	2	8	4	6	15	11	1	10	9	3	14	5	0	12	7
1	15	13	8	10	3	7	4	12	5	6	11	0	14	9	2
7	11	4	1	9	12	14	2	0	6	10	13	15	3	5	8
2	1	14	7	4	10	8	13	15	12	9	0	3	5	6	11

DES decryption

DES decryption is analogous to DES encryption. To obtain one first computes the round keys K₁, K₂, . . . , K₁₆ using Algorithm A.1. One then calls a minor variant of Algorithm A.2. First, the roles of m and c are interchanged. That is, one inputs c instead of m, and obtains m in place of c as output. Moreover, the right half R_i in the i-th round is computed as R_i := L_{i – 1} ⊕ e(R_{i – 1}, K_{17 – i}). In other words, DES decryption is same as DES encryption, only with the sequence of using the keys K₁, K₂, . . . , K₁₆ reversed. Solve Exercise A.1 in order to establish the correctness of this decryption procedure.

DES test vectors

Some test vectors for DES are given in Table A.2.

Table A.2. DES test vectors
Key	Plaintext block	Ciphertext block
`0101010101010101`	`0000000000000000`	`8ca64de9c1b123a7`
`fefefefefefefefe`	`ffffffffffffffff`	`7359b2163e4edc58`
`3101010101010101`	`1000000000000001`	`958e6e627a05557B`
`1010101010101010`	`1111111111111111`	`f40379ab9e0ec533`
`0123456789abcdef`	`1111111111111111`	`17668dfc7292532d`
`1010101010101010`	`0123456789abcdef`	`8a5ae1f81ab8f2dd`
`fedcba9876543210`	`0123456789abcdef`	`ed39d950fa74bcc4`

Cryptanalysis of DES

DES, being a popular block cipher, has gone through a good amount of cryptanalytic studies. At present, linear cryptanalysis and differential cryptanalysis are the most sophisticated attacks on DES. But the biggest problem with DES is its relatively small key size (56 bits). An exhaustive key search for a given plaintext–ciphertext pair needs carrying out a maximum of 2⁵⁶ encryptions in order to obtain the correct key. But how big is this number 2⁵⁶ = 72,057,594,037,927,936 (nearly 72 quadrillion) in a cryptographic sense?

In order to review this question, RSA Security Inc. posed several challenges for obtaining the DES key from given plaintext–ciphertext pairs. The first challenge, posed in January 1997, was broken by Rocke Verser of Loveland, Colorado, with approximately 96 days of computing. DES Challenge II-1 was broken in February 1998 by distributed.net with 41 days of computing, and the DES challenge II-2 was cracked in July 1998 by the Electronic Frontier Foundation (EFF) in just 56 hours. Finally, DES Challenge III was broken in a record of 22 hours 15 minutes in January 1999. The computations were carried out in EFF’s supercomputer Deep Crack with collaborative efforts from nearly 10⁵ PCs on the Internet guided by distributed.net. These figures demonstrate that DES offers hardly any security against a motivated adversary.

Another problem with DES is that its design criteria (most importantly, the objectives behind choosing the particular S-boxes) were never made public. Chances remain that there are hidden backdoors, though none has been discovered till date.

A.2.2. The Advanced Standard: AES

The advanced encryption standard (AES) [219] has superseded the older standard DES. The Rijndael cipher designed by Daemen and Rijmen has been accepted as the advanced standard. As mentioned in Footnote 2, Rijndael is not a Feistel cipher. Its working is based on the arithmetic in the finite field and in the finite ring .

Data representation

AES encrypts data in blocks of 128 bits. Let B = b₀b₁ . . . b₁₂₇ be a block of data, where each b_i is a bit. Keeping in view typical 32-bit processors, each such block B is represented as a sequence of four 32-bit words, that is, B = B₀B₁B₂B₃, where B_i represents the bit string b_32ib_32i+1 . . . b_32i+31. Each word C = c₀c₁ . . . c₃₁, in turn, is viewed as a sequence of four octets, that is, C = C₀C₁C₂C₃, where C_i stores the bit string c_8ic_8i+1 . . . c_8i+7. Each octet is identified as an element of , whereas an entire 32-bit word is identified with an element of .

The field is represented as , where f(X) is the irreducible polynomial X⁸ + X⁴ + X³ + X + 1. Let x := X + 〈f(X)〉. The element is identified with the octet d₇d₆ . . . d₁d₀. Thus, the i-th octet c_8ic_8i+1 . . . c_8i+7 in a word is treated as the finite field element .

Now, let us explain the interpretation of a 32-bit word C = C₀C₁C₂C₃. The -algebra is not a field, since the polynomial Y⁴ + 1 is reducible (over and so over ). However, each element β of A can be uniquely expressed as a polynomial β = α₃y³ + α₂y² + α₁y + α₀, where y := Y + 〈Y⁴ + 1〉 and where each α_i is an element of . As described in the last paragraph, each α_i is represented as an octet. We take C_i to be the octet representing α_{3 – i}, that is, the 32-bit word α₃α₂α₁α₀ stands for the element .

and A are rings and hence equipped with arithmetic operations (addition and multiplication). These operations are different from the usual addition and multiplication operations defined on octets and words. For example, the addition of two octets or words under the AES interpretation is the same as bit-wise XOR of octets or words. The AES multiplication of octets and words, on the other hand, involves polynomial arithmetic and reduction modulo the defining polynomials and so cannot be expressed so simply as addition. To resolve ambiguities, let us plan to denote the multiplication of by ⊙ and that of A by ⊗, whereas regular multiplication symbols (·, × and juxtaposition) stand for the standard multiplication on octets or words. Exercises A.5, A.6 and A.7 discuss about efficient implementations of the arithmetic in and A.

Every non-zero element is invertible; the inverse is denoted by α^–1 and can be computed by the extended gcd algorithm on polynomials over . With an abuse of notation, we take 0^–1 := 0. Every non-zero element of A is not invertible (under the multiplication of A). The AES algorithm uses the following invertible element β := 03010102 (in hex notation); its inverse is β^–1 = 0b0d090e.

The AES algorithm uses an object called a state, comprising 16 octets arranged in a 4 × 4 array. Each message block also consists of 16 octets. Let M = μ₀μ₁ . . . μ₁₅ be a message block (of 16 octets). This block is translated to a state as follows:

Equation A.1

Thus, each word in the block is relocated in a column of the state. At the end of the encryption procedure, AES makes the reverse translation of a state to a block:

Equation A.2

AES key schedule

A collection of round keys is generated from the given AES key K. The number of rounds of the AES encryption algorithm depends on the size of the key. Let us denote the number of words in the AES key by N_k and the corresponding number of rounds by N_r. We have:

One first generates an initial 128-bit key K₀K₁K₂K₃. Subsequently, for the i-th round, 1 ≤ i ≤ N_r, a 128-bit key K_4iK_4i+1K_4i+2K_4i+3 is required. Here, each K_j is a 32-bit word. The key schedule (also called key expansion) generates a total of 4(N_r + 1) words K₀, K₁, . . . , K_{4N_r+3} from the given secret key K using a procedure described in Algorithm A.4. Here, (02)^{j – 1} stands for the octet that represents the element . The following table summarizes these values for j = 1, 2, . . . , 15.

j	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
x^{j – 1}	01	02	04	08	10	20	40	80	1b	36	6c	d8	ab	4d	9a

The transformation SubWord on a word T = τ₀τ₁τ₂τ₃ is the octet-wise application of AES S-box substitution SubOctet, that is,

SubWord(T) = SubOctet(τ₀) ‖ SubOctet(τ₁) ‖ SubOctet(τ₂) ‖ SubOctet(τ₃).

Algorithm A.4. AES key schedule

Input: (N_k and) the secret key K = κ₀κ₁ ... κ_{4N_k – 1}, where each κ_i is an octet.

Output: The expanded keys K₀, K₁, . . . , K_{4N_r+3}.

Steps:

/* Initially copy the bytes of K */
for i = 0, 1, . . . , N_k – 1 { K_i := κ_4iκ_4i+1κ_4i+2κ_4i+3. }

/* Recursively define the round keys */
for i = N_k, N_k + 1, . . . , 4N_r + 3 {
      T := K_{i – 1};       /* T is a temporary word variable. */
      /* Let T = τ₀τ₁τ₂τ₃, where each τ_i is an octet. */
      if (i rem N_k = 0) { T := SubWord(τ₁τ₂τ₃τ₀) ⊕ [(02)^{(i/N_k) – 1}‖000000]. }
      else if (N_k > 6) and (i rem N_k = 4) { T := SubWord(T). }
      K_i := K_{i – N_k} ⊕ T.
}

The transformation SubOctet is also used in each encryption round and is now described. Let A = a₀a₁ . . . a₇ be an octet that can be identified with an element of as mentioned earlier. Let B = b₀b₁ . . . b₇ denote the octet representing the inverse of this finite field element. (We take 0^–1 = 0.) One then applies the following affine transformation on B to generate the final value C := SubOctet(A) := c₀c₁ . . . c₇. Here, D = d₀d₁ . . . d₇ is the constant octet 63 = 01100011.

Equation A.3

In order to speed up this octet substitution, one may use table lookup. Since the output octet C depends only on the input octet A, one can precompute a table of values of SubOctet(A) for the 256 possible values of A. This list is given in Table A.3. The table is to be read in the row-major fashion. In other words, if hi and lo respectively represent the most and the least significant four bits of A, then SubOctet(A) can be read off from the entry in the table having row number hi and column number lo. For example, SubOctet(a7) = 5c. In an actual implementation, a one-dimensional array is to be used. We use a two-dimensional format in Table A.3 for the sake of clarity of presentation.

Table A.3. AES S-box
	0	1	2	3	4	5	6	7	8	9	a	b	c	d	e	f
0	63	7c	77	7b	f2	6b	6f	c5	30	01	67	2b	fe	d7	ab	76
1	ca	82	c9	7d	fa	59	47	f0	ad	d4	a2	af	9c	a4	72	c0
2	b7	fd	93	26	36	3f	f7	cc	34	a5	e5	f1	71	d8	31	15
3	04	c7	23	c3	18	96	05	9a	07	12	80	e2	eb	27	b2	75
4	09	83	2c	1a	1b	6e	5a	a0	52	3b	d6	b3	29	e3	2f	84
5	53	d1	00	ed	20	fc	b1	5b	6a	cb	be	39	4a	4c	58	cf
6	d0	ef	aa	fb	43	4d	33	85	45	f9	02	7f	50	3c	9f	a8
7	51	a3	40	8f	92	9d	38	f5	bc	b6	da	21	10	ff	f3	d2
8	cd	0c	13	ec	5f	97	44	17	c4	a7	7e	3d	64	5d	19	73
9	60	81	4f	dc	22	2a	90	88	46	ee	b8	14	de	5e	0b	db
a	e0	32	3a	0a	49	06	24	5c	c2	d3	ac	62	91	95	e4	79
b	e7	c8	37	6d	8d	d5	4e	a9	6c	56	f4	ea	65	7a	ae	08
c	ba	78	25	2e	1c	a6	b4	c6	e8	dd	74	1f	4b	bd	8b	8a
d	70	3e	b5	66	48	03	f6	0e	61	35	57	b9	86	c1	1d	9e
e	e1	f8	98	11	69	d9	8e	94	9b	1e	87	e9	ce	55	28	df
f	8c	a1	89	0d	bf	e6	42	68	41	99	2d	0f	b0	54	bb	16

AES encryption

AES encryption is described in Algorithm A.5. The algorithm first converts the input plaintext message block to a state, applies a series of transformations on this state and finally converts the state back to a message (the ciphertext).

The individual state transition transformations are now explained. The transition SubState is an octet-by-octet application of the substitution function SubOctet, that is, SubState maps

where for all r, c. The transform ShiftRows cyclically left rotates the r-th row by r byte positions, that is, maps

The AddKey operation uses four 32-bit round keys L₀, L₁, L₂, L₃. Name the octets of L_i as λ_i0λ_i1λ_i2λ_i3. The i-th key L_i is XORed with the i-th column of the state, that is, AddKey transforms

Finally, the MixCols transform multiplies each column of the state, regarded as an element of , by the element , where the coefficients (expressions within square brackets) are octet values in hexadecimal, that can be identified with elements of . For the c-th column, this transformation can be represented as:

Algorithm A.5. AES encryption

Input: The plaintext message M = μ₀μ₁ . . . μ₁₅ and the round keys K₀, K₁, . . . , K_{4N_r+3}.

Output: Ciphertext message C = γ₀γ₁ . . . γ₁₅.

Steps:

Convert M to the state S.                                      /* Use Transform (A.1) */
S := AddKey(S, K₀, K₁, K₂, K₃).
for i = 1, 2, . . . , N_r {
      S := SubState(S).
      S := ShiftRows(S).
      if (i ≠ N_r) { S := MixCols(S). }
      S := AddKey(S, K_4i, K_4i+1, K_4i+2, K_4i+3).
}
Convert S to the message C.                                /* Use Transform (A.2) */

AES decryption

AES decryption involves taking inverse of each state transition performed during encryption. The key schedule needed for encryption is to be used during decryption too. The straightforward decryption routine is given in Algorithm A.6.

Algorithm A.6. AES decryption

Input: The ciphertext message C = γ₀γ₁ . . . γ₁₅ and the round keys K₀, K₁, . . . , K_{4N_r+3}.

Output: The recovered plaintext message M = μ₀μ₁ . . . μ₁₅.

Steps:

Convert C to the state S.                                      /* Use Transform (A.1) */
S := AddKey(S, K_{4N_r}, K_{4N_r+1}, K_{4N_r+2}, K_{4N_r+3}).
for i = N_r – 1, N_r – 2, . . . , 1, 0 {
      S := ShiftRows^–1(S).
      S := SubState^–1(S).
      S := AddKey(S, K_4i, K_4i+1, K_4i+2, K_4i+3).
      if (i ≠ 0) { S := MixCols^–1(S). }
}
Convert S to the message M.                                /* Use Transform (A.2) */

What remains is a description of the inverses of the basic state transformations. AddKey involves octet-by-octet XORing and so is its own inverse. Table A.4 summarizes the inverse of the substitution transition SubOctet (Exercise A.8). For computing SubState^–1(S), one should apply SubOctet^–1 on each octet of S. The inverse of ShiftRows is also straightforward and can be given by

Finally, MixCols^–1 involves multiplication of each column by the inverse of the element , that is, by the element [0b]y³ + [0d]y² + [09]y + [0e]. So MixCols^–1 transforms each column of the state as follows:

Table A.4. Inverse of AES S-box
	0	1	2	3	4	5	6	7	8	9	a	b	c	d	e	f
0	52	09	6a	d5	30	36	a5	38	bf	40	a3	9e	81	f3	d7	fb
1	7c	e3	39	82	9b	2f	ff	87	34	8e	43	44	c4	de	e9	cb
2	54	7b	94	32	a6	c2	23	3d	ee	4c	95	0b	42	fa	c3	4e
3	08	2e	a1	66	28	d9	24	b2	76	5b	a2	49	6d	8b	d1	25
4	72	f8	f6	64	86	68	98	16	d4	a4	5c	cc	5d	65	b6	92
5	6c	70	48	50	fd	ed	b9	da	5e	15	46	57	a7	8d	9d	84
6	90	d8	ab	00	8c	bc	d3	0a	f7	e4	58	05	b8	b3	45	06
7	d0	2c	1e	8f	ca	3f	0f	02	c1	af	bd	03	01	13	8a	6b
8	3a	91	11	41	4f	67	dc	ea	97	f2	cf	ce	f0	b4	e6	73
9	96	ac	74	22	e7	ad	35	85	e2	f9	37	e8	1c	75	df	6e
a	47	f1	1a	71	1d	29	c5	89	6f	b7	62	0e	aa	18	be	1b
b	fc	56	3e	4b	c6	d2	79	20	9a	db	c0	fe	78	cd	5a	f4
c	1f	dd	a8	33	88	07	c7	31	b1	12	10	59	27	80	ec	5f
d	60	51	7f	a9	19	b5	4a	0d	2d	e5	7a	9f	93	c9	9c	ef
e	a0	e0	3b	4d	ae	2a	f5	b0	c8	eb	bb	3c	83	53	99	61
f	17	2b	04	7e	ba	77	d6	26	e1	69	14	63	55	21	0c	7d

AES decryption is as efficient as AES encryption, since each state transformation primitive has the same structure as its inverse. However, the sequence of application of these primitives in the loop (rounds) for decryption differs from that for encryption. For some implementations, mostly in hardware, this may be a problem. Compare this with DES for which the encryption and decryption algorithms are identical save the sequence of using the round keys (Exercise A.1). With little additional effort AES can also be furnished with this useful property of DES. All we have to do is to use a different key schedule for decryption. The necessary modifications are explored in Exercise A.9.

AES test vectors

Table A.5 provides the ciphertexts for the plaintext block

M = 00112233445566778899aabbccddeeff

under different keys.

Table A.5. AES test vectors
Cipher	Key	Ciphertext block
AES-128	`0001020304050607 \ 08090a0b0c0d0e0f`	`69c4e0d86a7b0430 \ d8cdb78070b4c55a`
AES-192	`0001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617`	`dda97ca4864cdfe0 \ 6eaf70a0ec0d7191`
AES-256	`0001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617 \ 18191a1b1c1d1e1f`	`8ea2b7ca516745bf \ eafc49904b496089`

Cryptanalysis of AES

AES has been designed so that linear and differential attacks are infeasible. Another attack known as the square attack has been proposed by Lucks [184] and Ferguson et al. [93], but at present can tackle less number of rounds than used in Rijndael encryption. Also see Gilbert and Minier [112] to know about the collision attack.

The distinct algebraic structure of AES encryption invites special algebraic attacks. One such potential attack (the XSL attack) has been proposed by Courtois and Pieprzyk [68]. Although this attack has not yet been proved to be effective, a better understanding of the algebra may, in foreseeable future, lead to disturbing consequences for the advanced standard.

For more information on AES, read the book [71] from the designers of the cipher. Also visit the following Internet sites:

http://www.esat.kuleuven.ac.be/~rijmen/rijndael/	Rijndael home
http://csrc.nist.gov/CryptoToolkit/aes/index1.html	NIST site for AES
http://www.cryptosystem.net/aes/	Algebraic attacks

A.2.3. Multiple Encryption

Multiple encryption presents a way to achieve a desired level of security by using block ciphers of small key sizes. The idea is to cascade several stages of encryption and/or decryption, with different stages working under different keys. Figure A.1 illustrates double and triple encryption for a block cipher f. Each g_i or h_j represents either the encryption or the decryption function of f under the given key.

Figure A.1. Multiple encryption

For double encryption, we have K₁ ≠ K₂ and both g₁ and g₂ are usually the encryption function. Unless f_K₂ ο f_K₁ is the same as f_K for some key K and if the permutations of f are reasonably random, it appears at the first glance that double encryption increases the effective key size by a factor of two. Unfortunately, this is not the case. The meet-in-the-middle attack on double encryption works as follows.

Suppose that an adversary knows a plaintext–ciphertext pair (m, c) under the unknown keys K₁, K₂. We assume as before that f has block-size n and key-size r. The adversary computes for each possibility of the encrypted message x_i := f_i(m). She also computes for each the decrypted message . Now, (i, j) is a possible value of (K₁, K₂) if and only if .

A given pair (m, c) usually gives many such candidates (i, j) for (K₁, K₂). More precisely, if each is assumed to be a random permutation of , for a given i we have the equality for an expected number of 2^r/2ⁿ values of j. Considering all possibilities for i gives an expected number of 2^r × 2^r/2ⁿ = 2^{2r – n} candidate pairs (i, j). If f = DES, this number is 2^{2 × 56–64} = 2⁴⁸.

If a second pair (m′, c′) under (K₁, K₂) is also known to the adversary, then for a given i the pair (i, j) is consistent with both (m, c) and (m′, c′) with probability 2^r/(2ⁿ × 2ⁿ). Thus, we get an expected number of (2^r × 2^r)/(2ⁿ × 2ⁿ) = 2^{2r – 2n} candidates (i, j). For DES, this number is 2^–16. This implies that it is very unlikely that a false candidate (i, j) satisfies both (m, c) and (m′, c′). Thus, with high probability the adversary uniquely identifies the double DES key (K₁, K₂) from two plaintext–ciphertext pairs.

This attack calls for O(2^r) encryptions and O(2^r) decryptions. With the assumption that each encryption takes roughly the same time as each decryption (as in the case of DES), the adversary spends a time for O(2^r) encryptions. Moreover, she can find all the matches in O(r2^r) time. This implies that double encryption increases the effective key size (over single encryption) by a few bits only. On the other hand, both the actual key size and the encryption time get doubled. In view of these shortcomings, double encryption is rarely used in practice.

For the triple encryption scheme of Figure A.1, a meet-in-the-middle attack at x or y demands an effort equivalent to O(2^2r) encryptions, that is, the effective key size gets doubled. It is, therefore, customary to take K₁ = K₃ and K₂ different from this common value. The actual key size also gets doubled with this choice—one doesn’t have to remember K₃ separately. It is also a common practice to take h₁ and h₃ the encryption function (under K₁ = K₃) and h₂ the decryption function (under K₂). One often calls this particular triple encryption an E-D-E scheme.

A.2.4. Modes of Operation

In practice, the length of the message m to be encrypted need not equal the block length n of the block cipher f. One then has to break up m into blocks of some fixed length n′ ≤ n and encrypt each block using the block cipher. In order to make the length of m an integral multiple of n′, one may have to pad extra bits to m (say, zero bits at the end). It is often necessary to store the initial size of m in a separate block, say, after the last message block. In what follows, we shall assume that the input message m gives rise to l blocks m₁, m₂, . . . , m_l each of size n′. The corresponding ciphertext blocks c₁, c₂, . . . , c_l will also be of bit length n′ each. The reason for choosing the block size n′ ≤ n will be clear soon.

The ECB mode

The easiest way to encrypt multiple blocks m₁, . . . , m_l is to take n′ = n and encrypt each block m_i as c_i := f_K(m_i). Decryption is analogous: . This mode of operation of a block cipher is called the electronic code-book or the ECB mode. Algorithms A.7 and A.8 describe this mode.

Algorithm A.7. ECB encryption

Input: The plaintext blocks m₁, . . . , m_l and the key K.

Output: The ciphertext c = c₁ . . . c_l.

Steps:

for i = 1, . . . , l { c_i := f_K(m_i) }

Algorithm A.8. ECB decryption

Input: The ciphertext blocks c₁, . . . , c_l and the key K.

Output: The plaintext m = m₁ . . . m_l.

Steps:

for

In this mode, identical message blocks encrypt to identical ciphertext blocks (under the same key), that is, partial information about the plaintext may be leaked out. The following three modes overcome this problem.

The CBC mode

In the cipher-block chaining or the CBC mode, one takes n′ = n and each plaintext block is first XOR-ed with the previous ciphertext block and then encrypted. In order to XOR the first plaintext block, one needs an n-bit initialization vector (IV). The IV need not be kept secret and may be sent along with the ciphertext blocks.

Algorithm A.9. CBC encryption

Input: The plaintext blocks m₁, . . . , m_l, the key K and the IV.

Output: The ciphertext c = c₁ . . . c_l.

Steps:

c₀ := IV.

for i = 1, . . . , l { c_i := f_K(m_i ⊕ c_{i – 1}). }

Algorithm A.10. CBC decryption

Input: The ciphertext blocks c₁, . . . , c_l, the key K and the IV.

Output: The plaintext m = m₁ . . . m_l.

Steps:

c₀ := IV.

for

The CFB mode

In the cipher feedback or the CFB mode, one chooses . In this mode, the plaintext blocks are not encrypted, but masked by XOR-ing with a stream of random keys generated from a (not necessarily secret) n-bit IV. In this sense, the CFB mode works like a stream cipher (see Section A.3).

Algorithm A.11. CFB encryption

Input: The plaintext blocks m₁, . . . , m_l, the key K and the IV.

Output: The ciphertext c = c₁ . . . c_l.

Steps:

k₀ := IV.   /* Initialize the key stream */
for i = 1, . . . , l {
   /* Mask the current key by block encryption and the message by XOR-ing */
   c_i := m_i ⊕ msb_n′ (f_K(k_{i – 1})).
   /* Generate the next key from the previous key and the current ciphertext block */
   k_i := lsb_{n – n′} (k_{i – 1}) ‖ c_i.
}

Algorithm A.11 explains CFB encryption. The notation msb_k(z) (resp. lsb_k(z)) stands for the most (resp. least) significant k bits of a bit string z. For CFB decryption (Algorithm A.12), the identical key stream k₀, k₁, . . . , k_l is generated and used to mask off the message blocks from the ciphertext blocks.

Algorithm A.12. CFB decryption

Input: The ciphertext blocks c₁, . . . , c_l, the key K and the IV.

Output: The plaintext m = m₁ . . . m_l.

Steps:

k₀ := IV.
for i = 1, . . . , l {
m_i := c_i ⊕ msb_n′ (f_K(k_{i – 1})).
k_i := lsb_{n – n′} (k_{i – 1}) ‖ c_i.
}

The OFB mode

The output feedback or the OFB mode also works like a stream cipher by masking the plaintext blocks using a stream of keys. The key stream in the OFB mode is generated by successively applying the block encryption function on an n-bit (not necessarily secret) IV. Here, one chooses any .

OFB encryption is explained in Algorithm A.13. OFB decryption (Algorithm A.14) is identical, with only the roles of m and c interchanged, and requires the generation of the same key stream k₀, k₁, . . . , k_l used during encryption.

Algorithm A.13. OFB encryption

Input: The plaintext blocks m₁, . . . , m_l, the key K and the IV.

Output: The ciphertext c = c₁ . . . c_l.

Steps:

k₀ := IV.      /* Initialize the key stream */
for i = 1, . . . , l {
    k_i := f_K(k_i–1).     /* Generate the next key in the stream */
    c_i := m_i ⊕ msb_n′ (k_i)    . /* Mask the plaintext block */
}

Algorithm A.14. OFB decryption

Input: The ciphertext blocks c₁, . . . , c_l, the key K and the IV.

Output: The plaintext m = m₁ . . . m_l.

Steps:

k₀ := IV.     /* Initialize the key stream */
for i = 1, . . . , l {
   k_i := f_K(k_i–1).    /* Generate the next key in the stream */
   m_i := c_i ⊕ msb_n′ (k_i).    /* Remove the mask from the ciphertext block */
}

Exercise Set A.2

A.1

Let us use the notations of Algorithm A.2. For a message m and round keys K_i, we have the values V, L_i, R_i, W, c. For another message m′ and another set of round keys

, let us denote these values by V′,

, W′, c′. Show that if m′ = c and if

for i = 1, . . . , 16, then

and

for all i = 0, 1, . . . , 16. Deduce that in this case we have c′ = m. (This shows that DES decryption is the same as DES encryption with the key schedule reversed.)

A.2

For a bit string z, let

denote the bit-wise complement of z. Deduce that

, that is, complementing both the plaintext message and the key complements the ciphertext message. [H]

A.3

A DES key K is said to be weak, if the DES key schedule on K gives K₁ = K₂ = · · · = K₁₆. Show that there are exactly four weak DES keys which in hexadecimal notation are:

0101 0101 0101 0101
FEFE FEFE FEFE FEFE
1F1F 1F1F 0E0E 0E0E
E0E0 E0E0 F1F1 F1F1

A.4

A DES key K is said to be anti-palindromic, if the DES key schedule on K gives

for all i = 1, . . . , 16. Show that the following four DES keys (in hexadecimal notation) are anti-palindromic:

01FE 01FE 01FE 01FE
FE01 FE01 FE01 FE01
1FE0 1FE0 0EF1 0EF1
E01F E01F F10E F10E

A.5

Represent

, where f(X) = X⁸ + X⁴ + X³ + X + 1 (Section A.2.2).

Show that multiplication by x (the octet 02) in can be computed by a left shift followed conditionally (derive the condition) by XORing with the octet 1b.
Design an algorithm for multiplying two elements of using bit manipulations on octets only.

A.6

The multiplication of

can be made table-driven. Since this field contains 256 elements, a 256 × 256 array suffices to store all the products. That requires a storage of 64 kb. We can considerably reduce the storage by using discrete logs.

Show that the multiplicative order of x (in ) is 51.
Show that x + 1 is a generator of .

Write a computer program to generate the table of discrete logarithms of elements of to the base x + 1 (Table A.6).

Table A.6. Discrete-log table for AES
	0	1	2	3	4	5	6	7	8	9	a	b	c	d	e	f
0	–	00	19	01	32	02	1a	c6	4b	c7	1b	68	33	ee	df	03
1	64	04	e0	0e	34	8d	81	ef	4c	71	08	c8	f8	69	1c	c1
2	7d	c2	1d	b5	f9	b9	27	6a	4d	e4	a6	72	9a	c9	09	78
3	65	2f	8a	05	21	0f	e1	24	12	f0	82	45	35	93	da	8e
4	96	8f	db	bd	36	d0	ce	94	13	5c	d2	f1	40	46	83	38
5	66	dd	fd	30	bf	06	8b	62	b3	25	e2	98	22	88	91	10
6	7e	6e	48	c3	a3	b6	1e	42	3a	6b	28	54	fa	85	3d	ba
7	2b	79	0a	15	9b	9f	5e	ca	4e	d4	ac	e5	f3	73	a7	57
8	af	58	a8	50	f4	ea	d6	74	4f	ae	e9	d5	e7	e6	ad	e8
9	2c	d7	75	7a	eb	16	0b	f5	59	cb	5f	b0	9c	a9	51	a0
a	7f	0c	f6	6f	17	c4	49	ec	d8	43	1f	2d	a4	76	7b	b7
b	cc	bb	3e	5a	fb	60	b1	86	3b	52	a1	6c	aa	55	29	9d
c	97	b2	87	90	61	be	dc	fc	bc	95	cf	cd	37	3f	5b	d1
d	53	39	84	3c	41	a2	6d	47	14	2a	9e	5d	56	f2	d3	ab
e	44	11	92	d9	23	20	2e	89	b4	7c	b8	26	77	99	e3	a5
f	67	4a	ed	de	c5	31	fe	18	0d	63	8c	80	c0	f7	70	07

Write a computer program to generate the table of powers of x + 1 (Table A.7).

Table A.7. Power table for AES
	0	1	2	3	4	5	6	7	8	9	a	b	c	d	e	f
0	01	03	05	0f	11	33	55	ff	1a	2e	72	96	a1	f8	13	35
1	5f	e1	38	48	d8	73	95	a4	f7	02	06	0a	1e	22	66	aa
2	e5	34	5c	e4	37	59	eb	26	6a	be	d9	70	90	ab	e6	31
3	53	f5	04	0c	14	3c	44	cc	4f	d1	68	b8	d3	6e	b2	cd
4	4c	d4	67	a9	e0	3b	4d	d7	62	a6	f1	08	18	28	78	88
5	83	9e	b9	d0	6b	bd	dc	7f	81	98	b3	ce	49	db	76	9a
6	b5	c4	57	f9	10	30	50	f0	0b	1d	27	69	bb	d6	61	a3
7	fe	19	2b	7d	87	92	ad	ec	2f	71	93	ae	e9	20	60	a0
8	fb	16	3a	4e	d2	6d	b7	c2	5d	e7	32	56	fa	15	3f	41
9	c3	5e	e2	3d	47	c9	40	c0	5b	ed	2c	74	9c	bf	da	75
a	9f	ba	d5	64	ac	ef	2a	7e	82	9d	bc	df	7a	8e	89	80
b	9b	b6	c1	58	e8	23	65	af	ea	25	6f	b1	c8	43	c5	54
c	fc	1f	21	63	a5	f4	07	09	1b	2d	77	99	b0	cb	46	ca
d	45	cf	4a	de	79	8b	86	91	a8	e3	3e	42	c6	51	f3	0e
e	12	36	5a	ee	29	7b	8d	8c	8f	8a	85	94	a7	f2	0d	17
f	39	4b	dd	7c	84	97	a2	fd	1c	24	6c	b4	c7	52	f6	01

Design an algorithm for multiplying two elements of using table lookup.

A.7

Denote the multiplication of

by ⊗ (Section A.2.2).

Let α = a₃y³ + a₂y² + a₁y + a₀ and β = b₃y³ + b₂y² + b₁y + b₀ be elements of A and γ = c₃y³ + c₂y² + c₁y + c₀ = α ⊗ β. Show that

where the matrix arithmetic on the right side follows the arithmetic of .
Verify that the inverse of the element of A represented by the word 03010102 (in hex) is 0b0d090e.

A.8

Show that Transform (A.3) can be represented as

where the matrix arithmetic on the right side is that of .
Let denote the 8 × 8 matrix of Part (a). Prove that is invertible over with
Conclude that the transformation A ↦ SubOctet(A) is invertible.

A.9

Argue that the transforms SubState and ShiftRows commute with one another.
Show that MixCols^–1(AddKey(S, L₀, L₁, L₂, L₃)) = AddKey(MixCols^–1(S), MixCols^–1(L₀, L₁, L₂, L₃)) for a suitable meaning of the application of MixCols^–1 on four 32-bit keys L₀, L₁, L₂ and L₃.
Conclude that one can obtain a decryption key schedule in such a way that Algorithm A.15 correctly performs AES decryption. [H]

Algorithm A.15. Equivalent form of AES decryption

Input: The ciphertext message C = γ₀γ₁ . . . γ₁₅ and the decryption key schedule .

Output: Plaintext message M = μ₀μ₁ . . . μ₁₅.

Steps:

Convert C to the state S.                                 /* Use Transform (A.1) */

for i = N_r – 1, N_r – 2, . . . , 0 {
      S := SubState^–1(S).
      S := ShiftRows^–1(S).
      if (i ≠ 0) { S := MixCols^–1(S). }

}
Convert S to the message M.                            /* Use Transform (A.2) */

A.10

Show that a multiple encryption scheme with exactly k stages provides an effective security of ⌈k/2⌉ keys against the meet-in-the-middle attack.

A.11

Consider a message m broken into blocks m₁, . . . , m_l, encrypted to c₁, . . . , c_l and sent to an entity.

Suppose that during the transmission exactly one ciphertext block gets corrupted. Show that for the different modes of encryption, the numbers ν of blocks that are incorrectly decrypted due to this transmission error are as listed in the following table.
Mode ν
ECB 1
CBC ≤ 2
CFB ≤ 1 + ⌈n/n′⌉
OFB 1
For each of the four modes, discuss the effects on decryption caused by the insertion or deletion of a ciphertext block during transmission (say, by an active adversary).

A.4. Hash Functions

A hash function maps bit strings of any length to bit strings of a fixed length n. For practical uses, hash functions should be easy to compute, that is, computing the hash of x should be doable in time polynomial in the size of x.

Since a hash function H maps an infinite set to a finite set, there must exist pairs (x₁, x₂) of distinct strings with H(x₁) = H(x₂). Such a pair is called a collision for H. For cryptographic applications (for example, for generating digital signatures), it should be computationally infeasible to find collisions for hash functions. To elaborate this topic further we mention the following two desirable properties of hash functions used in cryptography.

Definition A.3.

A hash function H is called second pre-image resistant, if it is computationally infeasible^[6] to find, for a given bit string x₁, a second bit string x₂ with H(x₁) = H(x₂).

^[6] A problem P is said to be computationally infeasible if any known or possible algorithm (deterministic or randomized) to solve P runs in infeasible (like super-polynomial) time, except perhaps for a set of some input instances, the density of which in the input space is zero (or, more generally, negligibly small).

Definition A.4.

A hash function H is called collision resistant, if it is computationally infeasible to find any two distinct bit strings x₁ and x₂ with H(x₁) = H(x₂).

In order to prevent existential forgery (Exercise 5.15) of digital signatures, hash functions should also be difficult to invert.

Definition A.5.

An n-bit hash function H is called first pre-image resistant (or simply pre-image resistant), if it is computationally infeasible to find, for almost all bit strings y of length n, a bit string x (of any length) such that y = H(x). The qualification almost all in the last sentence was necessary, since one can compute and store the pairs (x_i, H(x_i)), i = 1, 2, . . . , k, for some small k and for some x_i of one’s choice. If the given y turns out to be one of these hash values H(x_i), a pre-image of y is easily available.

A hash function (provably or believably) satisfying all these three properties is called a cryptographic hash function. A hash function having first and second pre-image resistance is often called a one-way hash function. Some authors require both second pre-image resistance and collision resistance to define a collision-resistant hash function, but here we stick to Definitions A.3 and A.4. In what follows, an unqualified use of the phrase hash function indicates a cryptographic hash function.

Most of the properties of a cryptographic hash function are mutually independent. However, we have the following implication.

Proposition A.1.

A collision resistant hash function is second pre-image resistant.

Proof

Let H be a (non-cryptographic) hash function which is not second pre-image resistant. This means that there is an algorithm A that efficiently computes second pre-images, except perhaps for a vanishingly small fraction of inputs. Choose a random bit string x₁. The probability that x₁ is not a bad input to A is very high and, in that case, A outputs a second pre-image x₂ quickly. This gives us an efficient randomized algorithm to compute collisions (x₁, x₂) for H.

The converse of Proposition A.1 is not true: A second pre-image resistant hash function need not be collision resistant (Exercise A.19). Also collision resistance (or second pre-image resistance) does not imply first pre-image resistance (Exercise A.20), and first pre-image resistance does not imply second pre-image resistance (Exercise A.21).

A hash function may or may not be used in conjunction with a secret key. An unkeyed hash function is typically used to check the integrity of a message and is often called a modification detection code (MDC). A keyed hash function, on the other hand, is usually employed to authenticate the origin of a message (in addition to verifying the integrity of the message) and so is often called a message authentication code (MAC).

A.4.1. Merkle’s Meta Method

Let us now describe a generic method of constructing hash functions. We start by defining the following basic building block.

Definition A.6.

Let m, with m = n + r for some . A function that maps bit strings of length m to bit strings of length n is called a compression function. Henceforth, we will consider only those compression functions that can be computed easily, that is, in polynomial time of the input size.

Since m > n, collisions must exist for F. For cryptographic use, collisions should be difficult to locate. We can define first and second pre-image resistance and collision resistance of compression functions as before.

Algorithm A.18. Merkle’s meta method

Input: A compression function with m = n + r and a bit string x of length < 2^r.

Output: The hash value H(x).

Steps:

Let λ be the bit length of x.
Set l := ⌈λ/r⌉.
If (λ is not a multiple of r) { Append rl – λ zero bits to the right of x. }
Break the padded x into blocks x₁, . . . , x_l each of length r.
Store in a new block x_l+1 the r-bit representation of λ.
Initialize h₀ := 0^r.
for i = 1, 2, . . . , l + 1 { h_i := F (h_i–1 ‖ x_i) }
Set H(x) := h_l+1.

Algorithm A.18 demonstrates how a compression function can be used to design an n-bit hash function H. The input message x is first broken into l ≥ 0 blocks each of bit length r, after padding zero bits, if necessary. The initial bit length λ of x is then stored in a new block. This implies that H cannot handle bit strings of length ≥ 2^r. For a reasonably big r, this is not a practical limitation. Storing λ is necessary for several reasons. First, it ensures that the for loop is executed at least once for any message. This prevents the trivial hash value 0^r (the bit string of length r containing zero bits only) for the null message. Moreover, if h_i = 0^r for some , then, without the length block, we would get H(x₁ ‖ . . . ‖ x_l) = H(x_i+1 ‖ . . . ‖ x_l) that leads to a collision for H.

We now show if F possesses the desired properties for use in cryptography, then so does H too.

Proposition A.2.

If F is first pre-image resistant, then so is H.

Proof

Assume that H is not first pre-image resistant, that is, an efficient algorithm A exists to compute x with H(x) = y for most (if not all) . Since y = h_l+1 = F (h_l ‖ x_l+1), a pre-image (namely, h_l ‖ x_l+1) of y under F is easily computable.

Proposition A.3.

If F is collision resistant, then H is collision resistant (and hence also second pre-image resistant).

Proof

Given a collision (x, x′) for H, we can find a collision for F with little additional effort. We use the notations of Algorithm A.18 with primed variables for x′.

First consider l ≠ l′. But then, in particular, the length blocks x_l+1 and are different and thus is a collision for F. So for the rest of the proof we take l = l′.

Now, suppose that for some . Choose the largest such i and note that h_i+1 and are defined and equal for this choice. This gives us the collision for F.

The only case that remains to be treated is for all . Since x ≠ x′, there is at least one with . For such an i, the equality implies that is a collision for F.

In order to design cryptographic hash functions, it suffices to design cryptographic compression functions. Block ciphers can be used for that purpose. Let f be a block cipher with block size n and key size r. Take m := n + r and consider the map that sends x = L ‖ R with and to the encrypted bit string f_R(L). If f_R are assumed to be random permutations of , the resulting compression function F possesses the desirable properties.

A.4.2. The Secure Hash Algorithm

Several custom-designed hash functions have been popularly used by the cryptography community. MD4 and MD5 are somewhat older 128-bit hash functions. Soon after its conception, MD4 was found to be vulnerable to several attacks. Also collisions for the compression function of MD5 are known. Therefore, these two hash functions have lost the desired level of confidence for cryptographic uses.

NIST has proposed a family of four hash algorithms. These algorithms are called secure hash algorithms and have the short names SHA-1, SHA-256, SHA-384 and SHA-512, which respectively produce 160-, 256-, 384- and 512-bit hash values. No collisions for SHA are known till date. In the rest of this section, we explain the SHA-1 algorithm. The workings of the other SHA algorithms are very similar and can be found in the FIPS document [222]. RIPEMD-160 is another popular 160-bit hash function.

SHA-1 (like other custom-designed hash functions mentioned above) is suitable for implementation in 32-bit processors. Suppose that we want to compute the hash SHA-1(M) of a message M of bit length λ. First, M is padded to get the bit string M′ := M ‖ 1 ‖ 0^k ‖ Λ, where Λ is the 64-bit representation of λ, and where k is the smallest non-negative integer for which the bit length of M′, that is, λ + 1 + k + 64, is a multiple of 512. M′ is broken into blocks M⁽¹⁾, M⁽²⁾, . . . , M^(l) each of length 512 bits. Each M⁽ⁱ⁾ is represented as a collection of sixteen 32-bit words , j = 0, 1, . . . , 15. SHA-1 supports big-endian packing, that is, stores the leftmost 32 bits of M⁽ⁱ⁾, the next 32 bits of the rightmost 32 bits of M⁽ⁱ⁾.

The SHA-1 computations are given in Algorithm A.19. One starts with a fixed initial 160-bit hash H⁽⁰⁾. Successively for i = 1, 2, . . . , l the i-th message block M⁽ⁱ⁾ is considered and the previous hash value H^(i–1) is updated to H⁽ⁱ⁾. At the end of the loop the 160-bit string H^(l) is returned as SHA-1(M). Each H⁽ⁱ⁾ is represented by five 32-bit words , j = 0, 1, 2, 3, 4. Here also, big-endian notation is used, that is, stores the leftmost 32 bits of H⁽ⁱ⁾, . . . , the rightmost 32 bits of H⁽ⁱ⁾.

The updating procedure uses logical functions f_j. Here, product (like xy) implies bit-wise AND, bar (as in ) denotes bit-wise complementation and ⊕ denotes bit-wise XOR, each on 32-bit operands. The notation LR^k(z) (resp. RR^k(z)) stands for a left (resp. right) rotation, that is, a cyclic left (resp. right) shift, of the bit string z of length 32 by k positions.

The bits of H⁽ⁱ⁾ are well-defined transformations of the bits of H^(i–1) under the guidance of the bits of M⁽ⁱ⁾. The good amount of non-linearity, introduced by the functions f_j and the modulo 2³² sums, makes it difficult to invert the transformation H^(i–1) ↦ H⁽ⁱ⁾ and thereby makes SHA-1 an (apparently) secure hash function.

Algorithm A.19. The SHA-1 algorithm

Input: A message M.

Output: The hash SHA-1(M) of M.

Steps:

Generate the message blocks M⁽ⁱ⁾, i = 1, 2, . . . , l.
/* Initialize the hash value */
H₀ := 0x67452301 efcdab89 98badcfe 10325476 c3d2e1f0.
for i = 1, 2, . . . , l {
   /* Compute the message schedule W_j, 0 ≤ j ≤ 79. */
   for
   for j = 16, 17, . . . , 79 { W_j := LR¹(W_j–3 ⊕ W_j–8 ⊕ W_j–14 ⊕ W_j–16) }
   /* Store the previous hash words */
   for
   /* Compute the updating values */
   for j = 0, 1, . . . , 79 {
      , Where

          and

      t₄ := t₃, t₃ := t₂, t₂ := RR²(t₁), t₁ := t₀, t₀ := T.
   }
   /* Update the hash value */
   for
}
Set SHA-1(M) := H^(l).

A test vector for SHA-1 is the following (here 616263 is the string “abc”):

SHA-1(616263) = a9993e364706816aba3e25717850c26c9cd0d89d.

Exercise Set A.4

A.18	Let x be a bit string. Break up x into blocks x₁, . . . , x_l each of bit size n (after padding, if necessary). Define H₁(x) := x₁ ⊕ . . . ⊕ x_l. Show that H₁ possesses none of the desirable properties of a cryptographic hash function.
A.19	Let H be an n-bit cryptographic hash function and S a finite set of strings with #S ≥ 2. Define the function . Here, 0ⁿ⁺¹ refers to a bit string of length n + 1 containing zero-bits only. Show that H₂ is second pre-image resistant, but not collision resistant. [H]
A.20	Let H be an n-bit cryptographic hash function. Show that the function H₃ defined as is collision resistant (and hence second pre-image resistant), but not first pre-image resistant. [H]
A.21	Let m be a product of two (unknown) big primes and let the binary representation of m (with leading one-bit) have n bits. Assume that it is computationally infeasible to compute square roots modulo m. We can identify bit strings with integers in a natural way. For a bit string x, take y := 1 ‖ x and let H₄(x) denote the n-bit binary representation of y² (mod m). Show that H₄ is first pre-image resistant, but not second pre-image resistant (and hence not collision-resistant). [H]
A.22	Let H be an n-bit cryptographic hash function. Assume that H produces random hash values on random input strings. Prove that O(2^n/2) hash values need to be computed to detect a collision for H with high probability. [H] Deduce also that nearly 2^n–1 hash values need to be computed on an average to obtain a second pre-image x′ of H(x).
A.23	Let be a collision resistant compression function. Define a compression function as follows. Let x be a bit string of length 4n. Write x = L ‖ R, where each of L and R is of length 2n bits. Define F₂(x) := F₁(F₁(L) ‖ F₁(R)). Show that F₂ is also collision-resistant. Inductively define as F_k(x) := F₁(F_k–1(L) ‖ F_k–1(R)), where L and R are the left and right halves of x. Show that each F_k is collision resistant. Show that if F₁ is first pre-image resistant, then so is each F_k. Define an n-bit hash function H as follows. Let x be a bit string of length l. If l < n, take k := 1, else choose such that 2^k–1n ≤ l < 2^kn. Construct the string and define H(x) := F_k(y). Is H collision resistant? [H] (Appending a one-bit at the end of x delimits x and thereby prevents trivial collisions.)
A.24	Let and be cryptographic compression functions. Show that defined as F(L ‖ R) := F₁(L) ‖ F₂(R) (where and ) is again a cryptographic compression function. The hash function H derived from DES (Section A.4.1) produces 64-bit hash values. For reasonable security, we require n-bit hash values with n at least 128. Use Part (a) to propose a method to make H achieve this desired level of security.
A.25	Assume that in the SHA-1 algorithm the designers opted for Algorithm A.19 with the following minor modifications: They defined f_j as f_j(x, y, z) := x ⊕ y ⊕ z for all and they replaced all costly mod 2³² addition operations (+) by cheap bit-wise XOR operations (⊕). Do you sense anything wrong with this design? [H]

D. Hints to Selected Exercises

The greatest thing in family life is to take a hint when a hint is intended and not to take a hint when a hint isn’t intended.

—Robert Frost

Teachers open the door, but you must enter by yourself.

—Chinese Proverb

Imagination grows by exercise, and contrary to common belief, is more powerful in the mature than in the young.

—W. Somerset Maugham

2.11 (a)	Apply Theorem 2.3 to the restriction to H of the canonical homomorphism G → G/K.
2.11 (b)	Apply Theorem 2.3 to the canonical homomorphism G/H → G/K, aH ↦ aK, .
2.14 (c)	Consider the canonical surjection G → G/H.
2.17 (a)	Let i ≠ j and . Then ord g divides both and and so is equal to 1, that is, g = e. Now let h_i, and with . But then . Thus #(H_iH_j) = (#H_i)(#H_j). Generalize this argument to show that #(H₁ · · · H_r) = n.
2.18	First consider the special case #G = p^r for some and . For each , the order ord_G g is of the form p^s_g for some s_g ≤ r. Let s be the maximum of the values s_g, . Take any element with ord_G h = p^s. Then e, h, . . . , h^{p^s–1} are all the elements x that satisfy x^{p^s} = e. But by the choice of s every element satisfies x^{p^s} = e. Hence we must have s = r. This proves the assertion for the special case. For the general case, use this special case in conjunction with Exercise 2.17.
2.19 (b)	Show that , (h₁, . . . , h_r) ↦ h₁ . . . h_r, is a group isomorphism.
2.23	Use Zorn’s lemma.
2.24 (c)	Let be the intersection of all prime ideals of R. First show that . To prove the reverse inclusion take and consider the set S of all non-unit ideals of R such that for all . If f is a non-unit, the set S is non-empty and by Zorn’s lemma has a maximal element, say . Show that is a prime ideal of R.
2.25	For , the map R → R, b ↦ ab, is injective and hence surjective by Exercise 2.4.
2.30	Apply the isomorphism theorem to the canonical surjection , .
2.33	[(1)⇒(2)] Let be an ascending chain of ideals of R. Consider the ideal which is finitely generated by hypothesis. [(3)⇒(1)] Let be an ideal of R. Consider the set of all finitely generated ideals of R contained in .
2.36	Use the pigeon-hole principle: If there are n + 1 pigeons in n holes, then there exists at least one hole containing more than one pigeons.
2.37	Consider the integer satisfying 2^t ≤ n < 2^t+1.
2.39 (e)	1² ≡ (n – 1)² (mod n).
2.39 (f)	Apply Wilson’s theorem.
2.40	Use Fermat’s little theorem.
2.41	Use Wilson’s theorem or Euler’s criterion.
2.45	Reduce to the case y² ≡ α (mod p).
2.49 (a)	Consider the canonical group homomorphism and the fact that a surjective group homomorphism from a cyclic group G onto G′ implies that G′ is cyclic.
2.49 (b)	Let be a primitive element modulo p. The residue class of a in has order k(p – 1) for some . Show that the order of b := p + 1 modulo p^e is p^e–1. So the order of a^kb modulo p^e is p^e–1(p – 1) = φ(p^e).
2.50	Use the Chinese remainder theorem in conjunction with Exercises 2.20 and 2.49.
2.53	Take . The interpolating polynomial is . Use Exercise 2.52 to establish the uniqueness.
2.56 (b)	is irreducible in if and only if f(X + 1) is irreducible in .
2.58	Use the fundamental theorem of algebra.
2.63	Consider the set of all linearly independent subsets of V that contain T. Show that every chain in has an upper bound in . By Zorn’s Lemma, there exists a maximal element . Show that S generates V.
2.64 (b)	Use Exercise 2.63.
2.68	Let p₁, . . . , p_n be n distinct primes. Take and a_i := a/p_i for i = 1, . . . , n.
2.72 (a)	If N is the -submodule of generated by a_i/b_i, i = 1, . . . , n, with gcd(a_i, b_i) = 1, then for any prime p that does not divide b₁ · · · b_n we have 1/p ∉ N.
2.72 (b)	Any two distinct elements of are linearly dependent over . Now use Exercise 2.69.
2.74 (b)	Let the conjugates of over F be α₁ = α, α₂, . . . , α_n. Since is injective, it follows from (a) that makes a permutation of α₁, . . . , α_n. So is surjective.
2.75 (a)	Use Exercise 2.61.
2.76 (b)	The if part follows from Exercise 2.61. For proving the only if part, take . If the polynomial f(X) := X^p – a splits over F, we are done. So suppose that there exists an irreducible divisor of f(X) of degree ≥ 2. By the separability of F, there exist two distinct roots α, β of g(X). Let K := F (α, β). Show that the Frobenius map , , is an endomorphism of K. Also there exists a field isomorphism τ : F (α) → F (β) which fixes F element-wise and takes α ↦ β. But then . Since any field homomorphism is injective, α equals β, a contradiction. Thus no g(X) chosen as above can exist.
2.77 (a)	Let be an irreducible polynomial with g(α) = 0 for some . Let β be another root of g. We show that . By Lemma 2.5, there is an isomorphism μ : F(α) → F(β). Clearly, K is the splitting field of f over F(α). Let K′ be the splitting field of μ^*(f) over F (β). By Proposition 2.33, K ≅ K′. If are the roots of f, then K′ ≅ F (β, γ₁, . . . , γ_d) = K(β). But then K ≅ K(β).
2.78 (a)	Consider transcendental numbers.
2.78 (b)	Let . For , we have , implying that for a, with a ≤ b. Now assume for some . Choose a rational number b with . Then , a contradiction. Thus . Similarly .
2.80	Use the binomial theorem and induction on n.
2.82	Follow the proof of Theorem 2.37.
2.90	Example 2.18.
2.91 (b)	By the fundamental theorem of Galois theory, # . Now show that are distinct -automorphisms of .
2.92 (a)	Assume r > 1. We have the extensions , where is the splitting field of f over and hence over . Consider the minimal polynomial of a root of f over . Conversely, let f be reducible over . Choose an irreducible factor of f with deg h = s < d. Now h has one (and hence all) roots in and, therefore, d\|sm.
2.93	Use Corollary 2.18.
2.98	In each case, the defining polynomial is quadratic in Y (and with coefficients in K[X]). If this polynomial admits a non-trivial factorization, one can reach a contradiction by considering the degrees of X in the coefficients of Y¹ and Y⁰.
2.103	For simplicity, consider the case char K ≠ 2, 3. Show that the curves Y² + Y = X³ and Y² = X³ + X have j-invariants 0 and 1728 respectively. Finally, if , 1728, then the curve has j-invariant . One must also argue that these are actually elliptic curves, that is, have non-zero discriminants.
2.111	Use Theorem 2.51.
2.112 (a)	Pair a point with its opposite. This pairing fails for points of orders 1 and 2.
2.112 (c)	Consider the elliptic curve E : Y² = X³ + 3 over . We have , whereas X³ + 3 is irreducible modulo 13.
2.113 (a)	Every element of has a unique square root.
2.115 (a)	Use Theorem 2.49 or Exercise 2.17.
2.115 (b)	Use Theorem 2.50.
2.115 (c)	The trace of Frobenius at q is 0 in this case. Now, use Theorem 2.50.
2.123	Factor N(G) in .
2.127	Let . For each i, write , . But then det , where , δ_ij being the Kronecker delta.
2.128 (b)	Use Part (a) and Exercise 2.126(c).
2.128 (c)	Let . By Exercise 2.130, is integral over . Let be the ideal generated by in and let and be the ideals of generated respectively by and . Now, use Part (b).
2.133 (b)	In a PID, non-zero prime ideals are maximal.
2.137 (a)	Since and are maximal, we have , that is, a₁ + a₂ = 1 for some and . Now use the fact that (a₁ + a₂)^{e₁ + e₂} = 1.
2.137 (b)	Use CRT.
2.138 (a)	Since is invertible, for some fractional ideal .
2.140 (a)	For , let constitute a complete residue system of modulo . Then also form a complete residue system of modulo .
2.142 (d)	Take in Part (b).
2.143 (a)	Reduce modulo 4.
2.143 (c)	Let divide this gcd. Then divides 2y and . Take norms.
2.144 (b)	Look at the expansion of a – 1 in base p. More precisely, let a < p^N for some . Then –a = (p^N – a) – p^N = [(p^N – 1) – (a – 1)] – p^N.
2.152 (c)	First show that .
2.153	Use unique factorization of rationals.
2.154	Show by induction on n that pⁿ⁺¹ divides a^pⁿ⁺¹ – a^pⁿ in for all .
2.161	There exists an irreducible polynomial in of every degree .
3.7	The implication is obvious. For the reverse implication, use Proposition 2.5.
3.18 (b)	Consider the binary expansion of m.
3.19	if n is a pseudoprime to base a and not a pseudoprime to base b, then n is not a pseudoprime to base ab.
3.20 (a)	If p²\|n for some , take with ord_n(a) = p. If n is square-free, consider a prime divisor p of n and take with and a ≡ 1 (mod n/p).
3.20 (b)	if n is an Euler pseudoprime to base a and not an Euler pseudoprime to base b, then n is not an Euler pseudoprime to base ab.
3.21 (a)	Let be the prime factorization of n with r and each α_i in . Then, . For odd p_i, the group is cyclic of order and hence contains an element of order p_i – 1.
3.21 (b)	ord_n(–1) = 2.
3.21 (c)	Let v_p(n) ≥ 2 for some odd prime p. Construct an element with ord_n(a) = p.
3.28	Proceed by induction on i = 1, . . . , r. For 1 ≤ i ≤ r, define ν_i := n₁ · · · n_i and let be a solution of the congruences b_i ≡ a_j (mod n_j) for j = 1, . . . , i. If i < r, use the combining formula given in Section 2.5 to find such that b_i+1 ≡ b_i (mod ν_i) and b_i+1 ≡ a_i+1 (mod n_i+1).
3.31	Apply Newton’s iteration to compute a zero of x² – n.
3.32 (a)	Apply Newton’s iteration to compute a zero of x^k – n.
3.34 (b)	The updating d(X) := d(X) – X^i–sb(X) needs to consider only the non-zero words of b.
3.36 (b)	First consider b = 0 and note that the roots of X^(q–1)/2 – 1 (resp. X^(q–1)/2 + 1) are all the quadratic residues (resp. non-residues) of .
3.36 (c)	First consider b = 0.
3.40	For , we have ord(a)\|m and for each i = 1, . . . , r the multiplicity v_pi (ord(a)) is the smallest of the non-negative integers k satisfying .
3.41 (a)	Use the CRT.
3.43 (a)	Use the CRT and the fact that for an odd prime r ≡ 3 (mod 4).
4.1 (a)	Using the CRT, reduce to the case that n is prime. Then is bijective ⇔ the restriction is bijective. Now, if gcd(a, φ(n)) = 1, the inverse of is given by , where ab ≡ 1 (mod φ(n)). On the other hand, if q is a prime divisor of gcd(a, φ(n)), choose an element with ord(y) = q. But then y^a ≡ 1 (mod n), that is, is not injective. This exercise provides the foundation for the RSA cryptosystems.
4.1 (b)	In view of the CRT, reduce to the case n = p^α for and α > 1. Then (p^α–1)^a ≡ 0 (mod n).
4.6	Consider the integral .
4.9	Use the CRT and lifting.
4.10	For proving , let n be an odd composite integer, choose a random and compute a square root x of y² modulo n. By Exercise 4.9, the probability that x ≡ ±y (mod n) is at most 1/2.
4.12 (d)	Eliminate a from T (a, b, c) using a + b + c = 0. For each fixed c, allow b to vary and use a sieve to find out all the values of b for which T (a, b, c) is smooth for the fixed c.
4.13	You may use the prime number theorem and the fact that the sum of the reciprocals of the first t primes asymptotically approaches ln ln t.
4.15	If a < a₁ or a > a_m, then no i exists. So assume that a₁ ≤ a ≤ a_m and let d := ⌊(1 + m)/2⌋. If a = a_d, return d, else if a < a_d, recursively search a among the elements a₁, . . . , a_d–1, and if a > a_d, recursively search a among the elements a_d+1, . . . , a_m.
4.16 (a)	Use Lagrange’s interpolation formula (Exercise 2.53).
4.18 (a)	One may precompute the values σ_i := p rem q_i, i = 1, . . . , t. Note that q_i\|(g^α + kp) if and only if ρ_k,i = 0.
4.19 (a)	Use the approximation T (c₁, c₂) ≈ (c₁ + c₂)H.
4.21 (c)	T (a, b, c) = –b² – c(x + cy)b + (z – c²x).
4.21 (d)	Imitate the second stage of the LSM.
4.23	Let the factor base consist of all irreducible polynomials over of degrees ≤ m together with the polynomials of the form X^k + h(X), , deg h ≤ m. The optimal running time of this algorithm corresponds to .
4.24 (b)	is square-free.
4.24 (c)	Use the fact X^m – 1 = (X^{m/p^v_p(m)} – 1)^{p^v_p(m)}.
4.24 (d)	Theorem 2.39.
4.25 (a)	Look at the roots of the polynomials on the two sides.
4.25 (c)	If ord ω = m, then ord(–ω) = 2m.
4.25 (d)	ω, ω^q, . . . , ω^{q^l–1} are all the roots of the minimal polynomial of ω over .
4.26 (b)	Use the Mordell–Weil theorem.
4.26 (c)	Use Theorem 4.2.
5.2 (a)	Solve the simultaneous congruences x ≡ c_i (mod n_i), i = 1, . . . , e, and then take the integer e-th root of the solution x, 1 ≤ x ≤ n₁ · · · n_e.
5.2 (b)	Append (different) pseudorandom bit strings to m before encryption. This process is often referred to as salting.
5.3 (a)	In view of the Chinese remainder theorem, reduce to the case n = p^r for some and .
5.4	ue₁ + ve₂ = 1 for some u, .
5.6	If the same session key is used to generate the ciphertext pairs (r₁, s₁) and (r₂, s₂) on two plaintext messages m₁ and m₂, then m₁/m₂ = s₁/s₂.
5.7 (c)	Let x = (x_l–1 . . . x₁x₀)₂. Define x′ := (x_l_–1 . . . x₂x₁)₂ and y′ := g^x′ (mod p). Then, y ≡ y^′2g^x₀ (mod p). Since x₀ is easily computable, y′ can be obtained by obtaining a square root of y modulo p. Argue that a call of the oracle helps us choose the correct square root y′ of y. Now, use recursion.
5.8	Let g′ be any randomly chosen generator of , where q := p^h. One computes for i = 0, 1, . . . , p – 1. We then have the equality of the sets modulo q – 1, where l := ind_g′ g. But then for each i we have a (yet unknown) j such that . Show that trying all possibilities for i and j one can effectively recover l and hence g = g′^l and hence π.
5.9	Let g′, and l be as in Exercise 5.8. Now, we have the equality of the sets modulo q – 1.
5.11	(mod β) are polynomials with small coefficients.
5.15 (a)	If Alice generates the signatures (M₁, s₁) and (M₂, s₂) on two messages M₁ and M₂, then her signature on a message M with H(M) ≡ H(M₁)H(M₂) (mod n) is s₁s₂ (mod n). Thus, without knowing the private key of Alice, an intruder can generate a valid signature (M, s₁s₂) of Alice, provided that such an M can be computed. Of course, here the intruder has little control over the message M. The PKC standards form RSA Laboratories add some redundancy to the hash function output before signing. The product of two hash values with redundancy is, in general, expected not to have the redundancy. This increases the security of the scheme against existential forgeries beyond that provided by the first pre-image resistance of the underlying hash function.
5.15 (b)	For any , a valid signature is (M, s), where H(M) ≡ s² (mod n).
5.15 (c)	Choose random integers u, v with gcd(v, n) = 1 and take d′ := u + dv. Of course, d and hence d′ are unknown to Carol, but she can compute s = g^d′ = g^u(g^d)^v and t ≡ –H(s)v^–1 (mod n). But then (M, s, t) is a valid ElGamal signature on a message M for which H(M) ≡ tu (mod n).
5.16	Obviously, c itself could be a possible choice, but that is not random and Bob might refuse to sign c. Carol should hide c by cr^e (mod n) for some randomly chosen r known to her.
5.23 (a)	by the CRT.
5.25 (a)	Replace the random challenge of the verifier by the hash value of the string obtained by concatenating the message to be signed with the witness.
5.26 (d)	Bob finds a random b′ with and sends a := (b′)² (mod n) to Alice. But then Alice’s response b yields a non-trivial factor gcd(b – b′, n) of n.
7.5	(mod n) and m ≡ s^e (mod n).
7.9 (a)	Use Exercise 2.44(b).
7.9 (c)	Again use Exercise 2.44(b).
7.9 (d)	Use Part (c) in conjunction with the CRT, and separately consider the three cases v₂(p–1) = v₂(q – 1), v₂(p – 1) > v₂(q – 1) and v₂(p – 1) < v₂(q – 1).
A.2	for all X, J. One does not have to look at the S-boxes for proving this.
A.9 (c)	For i = 0, 1, 2, 3, 4N_r, 4N_r + 1, 4N_r + 2, 4N_r + 3, take . For other values of i, take .
A.14 (b)	Let D_L(X) := X^dC_L(1/X) = a₀ + a₁X + a₂X² + · · · + a_d–1X^d–1 + X^d. Consider the -algebra , where x := X + 〈D_L(X)〉. The -linear transformation λ_x : A → A defined by g(x) ↦ xg(x) has the matrix Δ_L with respect to the polynomial basis (1, x, . . . , x^d–1). If is the minimal polynomial of λ_x, then [f(λ_x)](1) = f(x) = 0. Now, use the fact that 1, x, . . . , x^d–1 are linearly independent over .
A.16 (b)	[only if] Take σ ≠ 00 · · · 01. Since σ is non-zero, s_i = 1 for some . Construct an LFSR with d – 1 stages initialized to s₀s₁ · · · s_d–2 to generate σ.
A.19	Suppose that we want to compute a second pre-image for H₂(x). If , any is a second pre-image for H₂(x). If , computing a second pre-image for H₂(x) is equivalent to computing a second pre-image for H(x). The density of the (finite) set S is 0 in the (infinite) set of all bit strings. Thus, H₂ is second pre-image resistant. On the other hand, for any two distinct x, we have a collision (x, x′) for H₂.
A.20	Collision resistance of H implies that of H₃. On the other hand, for a positive fraction (half) of the (n + 1)-bit strings y, it is easy to compute a pre-image of y under H₃.
A.21	If y is a square root of a modulo m, then so is m – y too.
A.22	Use the birthday paradox (Exercise 2.172).
A.23 (d)	Let L := F₁(L′) and R := F₁(R′) with both R and R′ non-zero. Then, F₁(L ‖ R) = F₂(L′ ‖ R′).
A.25	Let h⁽ⁱ⁾ denote the column vector of dimension 160 having the bits of H⁽ⁱ⁾ as its elements and m⁽ⁱ⁾ the column vector of dimension 512 + 160 = 672 having the bits of M⁽ⁱ⁾ and of H⁽ⁱ⁾ as its elements. Show that the modified design of SHA-1 leads to the relation h⁽ⁱ⁾ ≡ Am^(i–1) + c (mod 2) for some constant 160 × 672 matrix A over and for some constant vector c. So what then?
C.6	For α, , call α ≤ β if and only if \|α\| < \|β\| or \|α\| = \|β\| and α is lexicographically smaller than β. This ≤ produces a well-ordering of Σ^*. For a one-way function f, look at the language for some with γ ≤ β}.

References

If you steal from one author, it’s plagiarism; if you steal from many, it’s research.

—Wilson Mizner

Literature is the question minus the answer.

—Roland Barthes

Everything that can be invented, has been invented.

—Charles H. Duell, 1899

[1] Adkins, W. A. and S. H. Weintraub (1992). Algebra: An Approach via Module Theory. Graduate Texts in Mathematics, 136. New York: Springer.

[2] Adleman, L. M., J. DeMarrais and M.-D. A. Huang (1994). “A Subexponential Algorithm for Discrete Logarithms over the Rational Subgroup of the Jacobians of Large Genus Hyperelliptic Curves over Finite Fields”, Algorithmic Number Theory—ANTS-I, Lecture Notes in Computer Science, 877. pp. 28–40. Berlin/Heidelberg: Springer.

[3] Adleman, L. M. and M.-D. A. Huang (1992). “Primality Testing and Two Dimensional Abelian Varieties over Finite Fields”, Lecture Notes in Mathematics, 1512. Berlin: Springer.

[4] Adleman, L. M., C. Pomerance and R. S. Rumely (1983). “On Distinguishing Prime Numbers from Composite Numbers”, Annals of Mathematics, 117: 173–206.

[5] Agarwal, M., N. Kayal and N. Saxena (2002), “Primes Is in P” [online document]. Available at http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf (October 2008).

[6] * Ahlfors, L. V. (1966). Complex Analysis. New York: McGraw-Hill.

[7] * Aho, A. V., J. E. Hopcroft and J. D. Ullman (1974). The Designs and Analysis of Algorithms. Reading, Massachusetts: Addison-Wesley.

[8] * Aho, A. V., J. E. Hopcroft and J. D. Ullman (1983). Data Structues and Algorithms. Reading, Massachusetts: Addison-Wesley.

[9] Aigner, M. and E. Oswald (2007), “Power Analysis Tutorial” [online document]. Available at http://www.iaik.tugraz.at/content/research/implementation_attacks/introduction_to_impa/dpa_tutorial.pdf (October 2008).

[10] Akkar, M.-L., R. Bevan, P. Dischamp and D. Moyart (2000). “Power Analysis, What Is Now Possible”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 489–502. Berlin/Heidelberg: Springer.

[11] Anderson, R. and M. Kuhn (1997). “Low Cost Attacks on Tamper Resistant Devices”, Security Protocols—5th International Workshop, Lecture Notes in Computer Science, 1361. pp. 125–136. Berlin/Heidelberg: Springer.

[12] * Apostol, T. M. (1976). Introduction to Analytic Number Theory. Undergraduate Texts in Mathematics. New York: Springer.

[13] Arnold, V. I. (1999). “Polymathematics: Is Mathematics a Single Science or a Set of Arts?”, in V. Arnold, M. Atiyah, P. Lax and B. Mazur (eds.), Mathematics: Frontiers and Perspectives, pp. 403–416. Providence, Rhode Island: American Mathematical Society.

[14] Atiyah, M. F. and I. G. MacDonald (1969). Introduction to Commutative Algebra. Reading, Massachusetts: Addison-Wesley.

[15] Aumüller, C., P. Bier, W. Fischer, P. Hofreiter and J.-P. Seifert (2002), “Fault Attacks on RSA with CRT: Concrete Results and Practical Countermeasures” [online document]. Available at http://eprint.iacr.org/2002/073 (October 2008).

[16] Balasubramanian, R. and N. Koblitz (1998). “The Improbability that an Elliptic Curve has Subexponential Discrete Log Problem under the Menezes-Okamoto Vanstone Algorithm”, Journal of Cryptology, 11: 141–145.

[17] Bao, F., R. H. Deng, Y. Han, A. B. Jeng, A. D. Narasimhalu, T.-H. Ngair (1997). “Breaking Public Key Cryptosystems on Tamper Resistant Devices in the Presence of Transient Faults”, Security Protocols—5th International Workshop, Lecture Notes in Computer Science, 1361. pp. 115–124. Berlin/Heidelberg: Springer.

[18] Bellare, M. and P. Rogaway (1995). “Optimal Asymmetric Encryption—How to Encrypt with RSA”, Advances in Cryptology—EUROCRYPT ’94, Lecture Notes in Computer Science, 950. pp. 92–111. Berlin/Heidelberg: Springer. A revised version is available at http://www-cse.ucsd.edu/users/mihir/papers/oaep.html (October 2008).

[19] Bellare, M. and P. Rogaway (1996). “The Exact Security of Digital Signatures: How to Sign with RSA and Rabin”, Advances in Cryptology—EUROCRYPT ’96, Lecture Notes in Computer Science, 1070. pp. 399–416. Berlin/Heidelberg: Springer. A revised version is available at http://www-cse.ucsd.edu/users/mihir/papers/exactsigs.html (October 2008).

[20] Bennett, C. H. and G. Brassard (1984). “Quantum Cryptography: Public Key Distribution and Coin Tossing”, pp. 175–179. Proceedings of the IEEE International Conference on Computers, Systems and Signal Processing, Bangalore, India, December.

[21] Berlekamp, E. R. (1968). Algebraic Coding Theory. New York: McGraw-Hill.

[22] Biham, E. and A. Shamir (1997). “Differential Fault Analysis of Secret Key Cryptosystems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 513–528. Berlin/Heidelberg: Springer.

[23] Blake, I. F., R. Fuji-Hara, R. C. Mullin and S. A. Vanstone (1984). “Computing Logarithms in Finite Fields of Characteristic Two”, SIAM Journal of Algebraic and Discrete Methods, 5: 276–285.

[24] Blake, I. F., G. Seroussi and N. P. Smart (1999). Elliptic Curves in Cryptography. Cambridge: Cambridge University Press.

[25] Blom, R. (1985). “An Optimal Class of Symmetric Key Generation Systems”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 335–338. Berlin/Heidelberg: Springer.

[26] Blum, L., M. Blum, and M. Shub (1986). “A Simple Unpredictable Pseudo-Random Number Generator”, SIAM Journal on Computing, 15: 364–383.

[27] Blum, M. and S. Goldwasser (1985). “An Efficient Probabilistic Public Key Encryption Scheme Which Hides All Partial Information”, Advances in Cryptology—CRYPTO ’84, Lecture Notes in Computer Science, 196. pp. 289–299. Berlin/Heidelberg: Springer.

[28] Blundo, C., A. De Santis, A. Herzberg, S. Kutten, U. Vaccaro and M. Yung (1993). “Perfectly-Secure Key Distribution for Dynamic Conferences”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 471–486. Berlin/Heidelberg: Springer.

[29] Boneh, D. (1999). “Twenty Years of Attacks on the RSA Cryptosystem”, Notices of the American Mathematical Society, 46 (2): 203–213.

[30] Boneh, D., R. A. DeMillo and R. J. Lipton (1997). “On the Importance of Checking Cryptographic Protocols for Faults”, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 37–51. Berlin/Heidelberg: Springer.

[31] Boneh, D., R. A. DeMillo and R. J. Lipton (2001). “On the Importance of Eliminating Errors in Cryptographic Computations”, Journal of Cryptology, 14 (2): 101–119.

[32] Boneh, D. and G. Durfee (1999). “Cryptanalysis of RSA with Private Key d Less Than N^0.292”, Advances in Cryptology—EUROCRYPT ’99, Lecture Notes in Computer Science, 1592. pp. 1–11. Berlin/Heidelberg: Springer.

[33] Boneh, D., G. Durfee and Y. Frankel (1998). “Exposing an RSA Private Key Given a Small Fraction of Its Bits”, Advances in Cryptology—ASIACRYPT ’98, Lecture Notes in Computer Science, 1514. pp. 25–34. Berlin/Heidelberg: Springer.

[34] Boneh, D. and M. K. Franklin (2001). “Identity-based Encryption from the Weil Pairing”, Advances in Cryptology—CRYPTO 2001, Lecture Notes in Computer Science, 2139. pp. 213–229. Berlin/Heidelberg: Springer.

[35] Boneh, D. and M. K. Franklin (2003). “Identity-based Encryption from the Weil Pairing”, SIAM Journal of Computing, (32) 3: 586–615.

[36] Bressoud, D. M. (1989). Factorization and Primality Testing. Undergraduate Texts in Mathematics. New York: Springer.

[37] * Buchmann, J. A. (2004). Introduction to Cryptography. Undergraduate Texts in Mathematics. New York: Springer.

[38] Buchmann, J. A. et al. (2004), “The Number Field Cryptography Project” [online document]. Available at http://www.informatik.tu-darmstadt.de/TI/Forschung/nfc.html (October 2008).

[39] Buchmann, J. A. and S. Hamdy (2001). “A Survey on IQ Cryptography”. Technical report TI-4/01, TU Darmstadt, Fachbereich Informatik.

[40] Buchmann, J. A. and D. Weber (2000). “Discrete Logarithms: Recent Progress”, in J. Buchmann, T. Hoeholdt, H. Stichtenoth and H. Tapia-Recillas (eds.), Coding Theory, Cryptography and Related Areas, pp. 42–56. Proceedings of an International Conference on Coding Theory, Cryptography and Related Areas, Guanajuato, Mexico, April 1998.

[41] Buhler, J., H. W. Lenstra and C. Pomerance (1993). “Factoring Integers with the Number Field Sieve”, in A. K. Lenstra and H. W. Lenstra (eds.), The Development of the Number Field Sieve, Lecture Notes in Mathematics, 1554. pp. 50–94. Berlin: Springer.

[42] * Burton, D. M. (1998). Elementary Number Theory, 4th ed. New York: McGraw-Hill.

[43] Cantor, D. G. (1994). “On the Analogue of Division Polynomials for Hyperelliptic Curves”, Journal für die reine und angewandte Mathematik, 447: 91–145.

[44] Chan, H., A. Perrig and D. Song (2003). “Random Key Predistribution Schemes for Sensor Networks”, pp. 197–213. Proeedings of the 24th IEEE Symposium on Research in Security and Privacy, Berkeley, California, 11–14 May.

[45] Chari, S., C. S. Jutla, J. R. Rao, and P. Rohatgi (1999). “Towards Sound Approaches to Counteract Power-Analysis Attacks”, Advances in Cryptology—CRYPTO ’99, Lecture Notes in Computer Science, 1666. pp. 398–412. Berlin/Heidelberg: Springer.

[46] Charlap, L. S. and R. Coley (1990). “An Elementary Introduction to Elliptic Curves II”, CCR Expository Report 34.

[47] Charlap, L. S. and D. P. Robbins (1988). “An Elementary Introduction to Elliptic Curves”, CRD Expository Report 31.

[48] Chaum, D. (1983). “Blind Signatures for Untraceable Payments”, Advances in Cryptology—CRYPTO ’82. pp. 199–203. New York: Plenum Press.

[49] Chaum, D. (1985). “Security Without Identification: Transaction System to Make Big Brother Obsolete”, Communications of the ACM, 28 (10): 1030–1044.

[50] Chaum, D. (1989). “Privacy Protected Payments: Unconditional Payer and/or Payee Untraceability”, Smart Card 2000: The Future of IC Cards, pp. 69–93. Amsterdam: North-Holland.

[51] Chaum, D. (1990). “Zero-Knowledge Undeniable Signatures”, Advances in Cryptology—CRYPTO ’90, Lecture Notes in Computer Science, 473. pp. 458–464. Berlin/Heidelberg: Springer.

[52] Chaum, D. and H. van Antwerpen (1989). “Undeniable Signatures”, Advances in Cryptology—CRYPTO ’89, Lecture Notes in Computer Science, 435. pp. 212–217. Berlin/Heidelberg: Springer.

[53] Chaum, D., E. van Heijst and B. Pfitzmann (1991). “Cryptographically Strong Undeniable Signatures, Unconditionally Secure for the Signer”, Advances in Cryptology—CRYPTO ’91, Lecture Notes in Computer Science, 576. pp. 470–484. Berlin/Heidelberg: Springer.

[54] Chor, B. and R. L. Rivest (1988). “A Knapsack Type Cryptosystem Based on Arithmetic in Finite Fields”, IEEE Transactions on Information Theory, 34: 901–909.

[55] Clavier, C., J.-S. Coron and N. Dabbous (2000). “Differential Power Analysis in the Presence of Hardware Countermeasures”, Cryptographic Hardware and Embedded Systems—CHES 2000, Lecture Notes in Computer Science, 1965. pp. 252–263. Berlin/Heidelberg: Springer.

[56] Cohen, H. (1993). A Course in Computational Algebraic Number Theory. Graduate Texts in Mathematics, 138. New York: Springer.

[57] Coppersmith, D. (1984). “Fast Evaluation of Logarithms in Fields of Characteristic Two”, IEEE Transactions on Information Theory, 30: 587–594.

[58] Coppersmith, D. (1994). “Solving Homogeneous Equations over GF[2] via Block Wiedemann Algorithm”, Mathematics of Computation, 62: 333–350.

[59] Coppersmith, D., A. M. Odlyzko and R. Schroeppel (1986). “Discrete Logarithms in GF (p)”, Algorithmica, 1: 1–15.

[60] Coppersmith, D. and S. Winograd (1982). “On the Asymptotic Complexity of Matrix Multiplication”, SIAM Journal of Computing, 11 (3): 472–492.

[61] * Cormen, T. H., C. E. Lieserson, R. L. Rivest and C. Stein (2001). Introduction to Algorithms, 2nd ed. Cambridge, Massachusetts: MIT Press.

[62] Coron, J.-S. (1999). “Resistance Against Differential Power Analysis for Elliptic Curve Cryptosystems”, Cryptographic Hardware and Embedded Systems—CHES 1999, Lecture Notes in Computer Science, 1965. pp. 292–302. Berlin/Heidelberg: Springer.

[63] Coron, J.-S., L. Goubin (2000). “On Boolean and Arithmetic Masking Against Differential Power Analysis”, Cryptographic Hardware and Embedded Systems—CHES 2000, Lecture Notes in Computer Science, 1965. pp. 231–237. Berlin/Heidelberg: Springer.

[64] Coster, M. J., A. Joux, B. A. LaMacchia, A. M. Odlyzko, C. P. Schnorr and J. Stern (1992). “Improved Low-Density Subset Sum Algorithms”, Computational Complexity, 2: 111–128.

[65] Coster, M. J., B. A. LaMacchia, A. M. Odlyzko and C. P. Schnorr (1991). “An Improved Low-Density Subset Sum Algorithm”, Advances in Cryptology—EUROCRYPT ’91, Lecture Notes in Computer Science, 547. pp. 54–67. Berlin/Heidelberg: Springer.

[66] Courtois, N. (2003). “Fast Algebraic Attacks on Stream Ciphers with Linear Feedback”, Advances in Cryptology—CRYPTO 2003, Lecture Notes in Computer Science, 2729. pp. 177–194. Berlin/Heidelberg: Springer.

[67] Courtois, N. and W. Meier (2003). “Algebraic Attacks on Stream Ciphers with Linear Feedback”, Advances in Cryptology—EUROCRYPT 2003, Lecture Notes in Computer Science, 2656. pp. 345–359. Berlin/Heidelberg: Springer.

[68] Courtois, N. and J. Pieprzyk (2003). “Cryptanalysis of Block Ciphers with Overdefined Systems of Equations”, Advances in Cryptology—ASIACRYPT 2002, Lecture Notes in Computer Science, 2501. pp. 267–287. Berlin/Heidelberg: Springer.

[69] Crandall, R. and C. Pomerance (2001). Prime Numbers: A Computational Perspective. New York: Springer.

[70] Crépeau, C. and A. Slakmon (2003). “Simple Backdoors for RSA Key Generation”, Topics in Cryptology—CT-RSA 2003, Lecture Notes in Computer Science, 2612. pp. 403–416. Berlin/Heidelberg: Springer.

[71] Daemen, J. and V. Rijmen (2002). The Design of Rijndael: AES—The Advanced Encryption Standard. New York: Springer.

[72] Das, A. (1999). Galois Field Computations: Implementation of a Library and a Study of the Discrete Logarithm Problem [dissertation]. Bangalore, India: Indian Institute of Science.

[73] Das, A. and C. E. Veni Madhavan (1999). “Performance Comparison of Linear Sieve and Cubic Sieve Algorithms for Discrete Logarithms over Prime Fields”, Algorithms and Computation, ISAAC ’99, Lecture Notes in Computer Science, 1741. pp. 295–306. Berlin/Heidelberg: Springer.

[74] * Delfs, H. and H. Knebl (2007). Introduction to Cryptography: Principles and Applications, 2nd ed. Berlin and New York: Springer.

[75] Deutsch, D. (1985). “Quantum Theory, the Church-Turing Principle and the Universal Quantum Computer”. Proceedings of the Royal Society of London, Series A, 400. pp. 97–117.

[76] Deutsch, D. (1998). The Fabric of Reality: The Science of Parallel Universes—and Its Implications. London: Penguin.

[77] Dhem, J.-F., F. Koeune, P.-A. Leroux, P. Mestré, J.-J. Quisquater and J.-L. Willems (2000). “A Practical Implementation of the Timing Attack”, in J.-J. Quisquater and B. Schneier (eds.), Smart Card: Research and Applications, Lecture Notes in Computer Science, 1820. Proceedings of the Third Working Conference on Smart Card Research and Advanced Applications—CARDIS ’98, Louvain-la-Neuve, Belgium, 14–16 September 1998. Springer.

[78] Diffie, W. and M. Hellman (1976). “New Directions in Cryptography”, IEEE Transactions on Information Theory, 22: 644–654.

[79] Du, W., J. Deng, Y. S. Han and P. K. Varshney (2003). “Establishing Pairwise Keys in Distributed Sensor Networks”, pp. 42–51. Proceedings of the 10th ACM Conference on Computer and Communication Security, Washington D.C., USA, 27–30 October.

[80] Du, W., J. Deng, Y. S. Han, S. Chen and P. K. Varshney (2004). “A Key Management Scheme for Wireless Sensor Networks Using Deployment Knowledge”. Proceedings of IEEE INFOCOM 2004, Hong Kong, 7–11 March.

[81] * Dummit, D. and R. Foote (2004). Abstract Algebra, 3rd ed. Somerset, New Jersey: John Wiley & Sons.

[82] Durfee, G. and P. Q. Nguyen (2000). “Cryptanalysis of the RSA Schemes with Short Secret Exponent from Asiacrypt ’99”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 30–44. Berlin/Heidelberg: Springer.

[83] Dusart, P. (1999). “The kth Prime Is Greater than k(ln k+ln ln k–1) for k > 2”, Mathematics of Computation, 68: 411–415.

[84] ElGamal, T. (1985). “A Public-Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms”, IEEE Transactions on Information Theory, 31: 469–472.

[85] Elkies, N. D. (1998). “Elliptic and Modular Curves over Finite Fields and Related Computational Issues”, AMS/IP Studies in Advanced Mathematics, 7: 21–76.

[86] Enge, A. (1999). “Computing Discrete Logarithms in High-Genus Hyperelliptic Jacobians in Provably Subexponential Time”. Technical report CORR 99-04, University of Waterloo, Canada.

[87] Enge, A. and P. Gaudry (2002). “A General Framework for Subexponential Discrete Logarithm Algorithms”, Acta Arithmetica, 102 (1): 83–103.

[88] Eschenauer, L. and V. D. Gligor (2002). “A Key-Management Scheme for Distributed Sensor Networks”. Proceedings of the 9th ACM Conference on Computer and Communication Security, pp. 41–47. Washington D.C., USA, 18–22 November.

[89] * Esmonde, J. and M. Ram Murty (1999). Problems in Algebraic Number Theory. Graduate Texts in Mathematics, 190. New York: Springer.

[90] Fiat, A. and A. Shamir (1987). “How to Prove Yourself: Practical Solutions to Identification and Signature Problems”, Advances in Cryptology—CRYPTO ’86, Lecture Notes in Computer Science, 263. pp. 186–194. Berlin/Heidelberg: Springer.

[91] Feige, U., A. Fiat, and A. Shamir (1988). “Zero-Knowledge Proofs of Identity”, Journal of Cryptology, 1: 77–94.

[92] * Feller, W. (1966). Introduction to Probability Theory and Its Applications, 3rd ed. New York: John Wiley & Sons.

[93] Ferguson, N., J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner and D. Whiting (2000). “Improved Cryptanalysis of Rijndael”, Fast Software Encryption—FSE 2000, Lecture Notes in Computer Science, 1978. pp. 213–230. Berlin/Heidelberg: Springer.

[94] Fouquet, M., P. Gaudry and R. Harley (2000). “An Extension of Satoh’s Algorithm and Its Implementation”, Journal of Ramanujan Mathematical Society, 15: 281–318.

[95] Fouquet, M., P. Gaudry and R. Harley (2001). “Finding Secure Curves with the Satoh-FGH Algorithm and an Early-Abort Strategy”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. Berlin/Heidelberg: Springer.

[96] * Fraleigh, J. B. (1998). A First Course in Abstract Algebra, 6th ed. Reading, Massachusetts: Addison-Wesley.

[97] Fujisaki, E., T. Kobayashi, H. Morita, H. Oguro, T. Okamoto, S. Okazaki, D. Pointcheval and S. Uchiyama (1999). “EPOC: Efficient Probabilistic Public-Key Encryption”, contribution to IEEE P1363a.

[98] Fujisaki, E., T. Okamoto, D. Pointcheval, J. Stern (2001). “RSA-OAEP is Secure under the RSA Assumption”, Advances in Cryptology—CRYPTO 2001, Lecture Notes in Computer Science, 2139. pp. 260–274. Berlin/Heidelberg: Springer.

[99] Fulton, W. (1969). Algebraic Curves. Mathematics Lecture Notes Series. New York: W. A. Benjamin.

[100] Galbraith, S. D. (2003). “Weil Descent of Jacobians”, Discrete Applied Mathematics, 128 (1): 165–180.

[101] Galbraith, S. D., F. Hess and N. P. Smart (2002). “Extending the GHS Weil Descent Attack”, Advances in Cryptology—EUROCRYPT 2002, Lecture Notes in Computer Science, 2332. pp. 29–44. Berlin/Heidelberg: Springer.

[102] Galbraith, S. D., W. Mao, and K. G. Paterson (2002). “RSA-based Undeniable Signatures for General Moduli”, Topics in Cryptology—CT-RSA 2002, Lecture Notes in Computer Science, 2271. pp. 200–217. Berlin/Heidelberg: Springer.

[103] Gathen, J. von zur and J. Gerhard (1999). Modern Computer Algebra. Cambridge: Cambridge University Press.

[104] Gathen, J. von zur and V. Shoup (1992). “Computing Frobenius Maps and Factoring Polynomials”, pp. 97–105. Proceedings of the 24th Annual ACM Symposium on Theory of Computing, Victoria, British Columbia, Canada.

[105] Gaudry, P. (2000). “An Algorithm for Solving the Discrete Log Problem on Hyperelliptic Curves”, Advances in Cryptology—EUROCRYPT 2000, Lecture Notes in Computer Science, 1807. pp. 19–34. Berlin/Heidelberg: Springer.

[106] Gaudry, P. and R. Harley (2000). “Counting Points on Hyperelliptic Curves over Finite Fields”, Algorithmic Number Theory—ANTS-IV, Lecture Notes in Computer Science, 1838. pp. 313–332. Berlin/Heidelberg: Springer.

[107] Gaudry, P., F. Hess and N. P. Smart (2002). “Constructive and Destructive Facets of Weil Descent on Elliptic Curves”, Journal of Cryptology, 15 (1): 19–46.

[108] Geddes, K. O., S. R. Czapor and G. Labahn (1992). Algorithms for Computer Algebra. Boston: Kluwer Academic Publishers.

[109] Gennaro, R., H. Krawczyk and T. Rabin (2000). “RSA-based Undeniable Signatures”, Journal of Cryptology, 13 (4): 397–416.

[110] Gentry, C., J. Jonsson, M. Szydlo and J. Stern (2001). “Cryptanalysis of the NTRU Signature Scheme (NSS) from Eurocrypt 2001”, Advances in Cryptology—ASIACRYPT 2001, Lecture Notes in Computer Science, 2248. pp. 1–20. Berlin/Heidelberg: Springer.

[111] Gentry, C. and M. Szydlo (2002). “Cryptanalysis of the NTRU Signature Scheme”, Advances in Cryptology—EUROCRYPT ’02, Lecture Notes in Computer Science, 2332. pp. 299–320. Berlin/Heidelberg: Springer.

[112] Gilbert, H. and M. Minier (2000). “A Collision Attack on Seven Rounds of Rijndael”, pp. 230–241. Proceedings of the 3rd AES Conference, NIST, New York, April 2000.

[113] * Goldreich, O. (2001). Foundations of Cryptography, Volume 1: Basic Tools. Cambridge: Cambridge University Press.

[114] * Goldreich, O. (2004). Foundations of Cryptography, Volume 2: Basic Applications. Cambridge: Cambridge University Press.

[115] Goldreich, O., S. Goldwasser and S. Halevi (1997). “Public-key Cryptosystems from Lattice Reduction Problems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 112–131. Berlin/Heidelberg: Springer.

[116] Goldwasser, S. and J. Kilian (1986). “Almost All Primes Can Be Quickly Certified”, pp. 316–329. Prodeedings of the 18th Annual ACM Symposium on Theory of Computing, Berkeley, California.

[117] Goldwasser, S. and S. Micali (1984). “Probabilistic Encryption”, Journal of Computer and Systems Sciences, 28: 270–299.

[118] Gordon, D. M. (1985). “Strong Primes are Easy to Find”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 216–223. Berlin/Heidelberg: Springer.

[119] Gordon, D. M. (1993). “Discrete Logarithms in GF (p) Using the Number Field Sieve”, SIAM Journal of Discrete Mathematics, 6: 124–138.

[120] Gordon, D. M. and K. S. McCurley (1992). “Massively Parallel Computation of Discrete Logarithms”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 312–323. Berlin/Heidelberg: Springer.

[121] Grinstead, C. M. and J. L. Snell (1997). Introduction to Probability, 2nd revised ed. Providence, Rhode Island: American Mathematical Society. The book is also available at http://www.dartmouth.edu/~chance/book.html (October 2008).

[122] Guillou, L. C. and J.-J. Quisquater (1988). “A Practical Zero-Knowledge Protocol Fitted to Security Microprocessor Minimizing Both Trasmission and Memory”, Advances in Cryptology—EUROCRYPT ’88, Lecture Notes in Computer Science, 330. pp. 123–128. Berlin/Heidelberg: Springer.

[123] Hankerson, D., A. J. Menezes and S. Vanstone (2004). Guide to Elliptic Curve Cryptography. New York: Springer.

[124] Hartshorne, R. (1977). Algebraic Geometry. Graduate Texts in Mathematics, 52. New York, Heidelberg and Berlin: Springer.

[125] * Herstein, I. N. (1975). Topics in Algebra. New York: John Wiley & Sons.

[126] Hess, F., G. Seroussi and N. P. Smart (2000). “Two Topics in Hyperelliptic Cryptography”. HP Labs technical report HPL-2000-118.

[127] * Hoffman, K. and R. Kunze (1971). Linear Algebra. Englewood Cliffs, New Jersey: Prentice-Hall.

[128] Hoffstein, J., N. Howgrave-Graham, J. Pipher, J. H. Silverman and W. White (2003). “NTRUSign: Digital Signatures Using the NTRU Lattice”, Topics in Cryptology—CT-RSA 2003, Lecture Notes in Computer Science, 2612. pp. 122–140. Berlin/Heidelberg: Springer.

[129] Hoffstein, J., N. Howgrave-Graham, J. Pipher, J. H. Silverman and W. White (2005). “Performance Improvements and a Baseline Parameter Generation Algorithm for NTRUSign”, Workshop on Mathematical Problems and Techniques in Cryptology, Barcelona, Spain, June 2005. Also available at http://www.ntru.com/cryptolab/articles.htm (October 2008).

[130] Hoffstein, J., J. Pipher and J. H. Silverman (1998). “NTRU: A Ring-Based Public Key Cryptosystem”, Algorithmic Number Theory—ANTS-III, Lecture Notes in Computer Science, 1423. pp. 267–288. Berlin/Heidelberg: Springer.

[131] Hoffstein, J., J. Pipher and J. H. Silverman (2001). “NSS: An NTRU Lattice-Based Signature Scheme”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. pp. 211–228. Berlin/Heidelberg: Springer.

[132] Horster, P., M. Michels and H. Petersen (1994). “Meta-ElGamal Signature Schemes”. Technical report TR-94-5-F, Department of Computer Science, Teschnische Universität, Chemnitz-Zwickau.

[133] * Hungerford, T. W. (1974). Algebra, 5th ed. Graduate Texts in Mathematics, 73. Berlin: Springer.

[134] IEEE (2008), “Standard Specifications for Public-Key Cryptography” [online document]. Available at http://grouper.ieee.org/groups/1363/index.html (October 2008).

[135] IETF (2008), “The Internet Engineering Task Force” [online document]. Available at http://www.ietf.org/ (October 2008).

[136] * Ireland, K. and M. Rosen (1990). A Classical Introduction to Modern Number Theory. Graduate Texts in Mathematics, 84. New York: Springer.

[137] Izu, T., B. Möller and T. Takagi (2002). “Improved Elliptic Curve Multiplication Methods Resistant Against Side Channel Attacks”, Progress in Cryptology—INDOCRYPT 2002, Lecture Notes in Computer Science, 2551. pp. 296–313. Berlin/Heidelberg: Springer.

[138] Izu, T. and T. Takagi (2002). “A Fast Parallel Elliptic Curve Multiplication Resistant Against Side Channel Attacks”, Public Key Cryptography—PKC 2002, Lecture Notes in Computer Science, 2274. pp. 280–296. Berlin/Heidelberg: Springer. An improved version of this paper is published as the technical report CORR 2002-03 of the Centre for Applied Cryptographic Research, University of Waterloo, Canada, and is available at http://www.cacr.math.uwaterloo.ca/ (October 2008).

[139] Jacobson, M. J., N. Koblitz, J. H. Silverman, A. Stein and E. Teske (2000). “Analysis of the Xedni Calculus Attack”, Design, Codes and Cryptography, 20: 41–64.

[140] Janusz, G. J. (1995). Algebraic Number Fields. Providence, Rhode Island: American Mathematical Society.

[141] Johnson, D. and A. Menezes (1999). “The Elliptic Curve Digitial Signature Algorithm (ECDSA)”. Technical report CORR 99-34, Department of Combinatorics and Optimization, University of Waterloo, Canada. Also published in International Journal on Information Security (2001), 1: 36–63.

[142] Joye, M., A. K. Lenstra and J.-J. Quisquater (1999). “Chinese Remaindering Based Cryptosystems in the Presence of Faults”, Journal of Cryptology, 12 (4): 241–246.

[143] Kaltofen, E. and V. Shoup (1995). “Subquadratic-Time Factoring of Polynomials over Finite Fields”, pp. 398–406. Proceedings of the 27th Annual ACM Symposium on Theory of Computing, Las Vegas, Nevada.

[144] Kampkötter, W. (1991). Explizite Gleichungen für Jacobishe Varietäten hyperelliptischer Kurven [dissertation]. Essen: Gesamthochschule.

[145] Katz, J. and Y. Lindell (2007). Introduction to Modern Cryptography. Boca Raton, Florida; London and New York: CRC Press.

[146] Kaye, P. and C. Zalka (2004), “Optimized Quantum Implementation of Elliptic Curve Arithmetic over Binary Fields” [online document]. Available at http://arxiv.org/abs/quant-ph/0407095 (October 2008).

[147] * Knuth, D. E. (1997). The Art of Computer Programming, Volume 2: Seminumerical Algorithms. Reading, Massachusetts: Addison-Wesley.

[148] Ko, K. H., S. J. Lee, J. H. Cheon, J. W. Han, J. S. Kang and C. S. Park (2000). “New Public-Key Cryptosystem Using Braid Groups”, Advances in Cryptology—CRYPTO 2000, Lecture Notes in Computer Science, 1880. pp. 166–183. Berlin/Heidelberg: Springer.

[149] Koblitz, N. (1984). p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd ed. Graduate Texts in Mathematics, 58. New York, Heidelberg and Berlin: Springer.

[150] Koblitz, N. (1987). “Elliptic Curve Cryptosystems”, Mathematics of Computation, 48: 203–209.

[151] Koblitz, N. (1989). “Hyperelliptic Cryptosystems”, Journal of Cryptology, 1: 139–150.

[152] Koblitz, N. (1993). Introduction to Elliptic Curves and Modular Forms, 2nd ed. Graduate Texts in Mathematics, 97. Berlin: Springer.

[153] * Koblitz, N. (1994). A Course in Number Theory and Cryptography, 2nd ed. New York:Springer.

[154] Koblitz, N. (1998). Algebraic Aspects of Cryptography. New York: Springer.

[155] Kocher, P. C. (1996). “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”, Advances in Cryptology—CRYPTO ’96, Lecture Notes in Computer Science, 1109. pp. 104–113. Berlin/Heidelberg: Springer.

[156] Kocher, P. C., J. Jaffe and B. Jun (1999). “Differential Power Analysis”, Advances in Cryptology—CRYPTO ’99, Lecture Notes in Computer Science, 1666. pp. 388–397. Berlin/Heidelberg: Springer.

[157] Lagarias, J. C. and A. M. Odlyzko (1985). “Solving Low-Density Subset Sum Problems”, Journal of ACM, 32: 229–246.

[158] LaMacchia, B. A. and A. M. Odlyzko (1991a). “Computation of Discrete Logarithms in Prime Fields”, Designs, Codes and Cryptography, 1: 46–62.

[159] LaMacchia, B. A. and A. M. Odlyzko (1991b). “Solving Large Sparse Linear Systems over Finite Fields”, Advances in Cryptology—CRYPTO ’90, Lecture Notes in Computer Science, 537. pp. 109–133. Berlin/Heidelberg: Springer.

[160] Lang, S. (1994). Algebraic Number Theory. Graduate Texts in Mathematics, 110. New York: Springer.

[161] Law, L., A. Menezes, A. Qu, J. Solinas and S. Vanstone (1998). “An Efficient Protocol for Authenticated Key Agreement”. Technical report CORR 98-05, Department of Combinatorics and Optimization, University of Waterloo, Canada.

[162] Lehmer, D. H. and R. E. Powers (1931). “On Factoring Large Numbers”, Bulletin of the AMS, 37: 770–776.

[163] Lenstra, A. K., E. Tromer, A. Shamir, W. Kortsmit, B. Dodson, J. Hughes and P. Leyland (2003). “Factoring Estimates for a 1024-Bit RSA Modulus”, Advances in Cryptology—ASIACRYPT 2003, Lecture Notes in Computer Science, 2894. pp. 55–74. Berlin/Heidelberg: Springer.

[164] Lenstra, A. K. and H. W. Lenstra (1990). “Algorithms in Number Theory”, in J. van Leeuwen (ed.), Handbook of Theoretical Computer Science, Volume A, pp. 675–715, Amsterdam: Elsevier.

[165] Lenstra, A. K. and H. W. Lenstra (ed.) (1993). The Development of the Number Field Sieve. Lecture Notes in Mathematics, 1554. Berlin: Springer.

[166] Lenstra, A. K., H. W. Lenstra and L. Lovasz (1982). “Factoring Polynomials with Rational Coefficients”, Mathematische Annalen, 261: 515–534.

[167] Lenstra, A. K., H. W. Lenstra, M. S. Manasse and J. M. Pollard (1990). “The Number Field Sieve”, pp. 564–572. Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, Baltimore, Maryland, USA, 13–17 May.

[168] Lenstra, A. K. and A. Shamir (2000). “Analysis and Optimization of the TWINKLE Factoring Device”, Advances in Cryptology—EUROCRYPT 2000, Lecture Notes in Computer Science, 1807. pp. 35–52. Berlin/Heidelberg: Springer.

[169] Lenstra, A. K., A. Shamir, J. Tomlinson and E. Tromer (2002). “Analysis of Bernstein’s Factorization Circuit”, Advances in Cryptology—ASIACRYPT 2002, Lecture Notes in Computer Science, 2501. pp. 1–26. Berlin/Heidelberg: Springer.

[170] Lenstra, A. K. and E. R. Verheul (2000a). “The XTR Public Key System”, Advances in Cryptology—CRYPTO 2000, Lecture Notes in Computer Science, 1880. pp. 1–20. Berlin/Heidelberg: Springer.

[171] Lenstra, A. K. and E. R. Verheul (2000b). “Key Improvements to XTR”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 220–233. Berlin/Heidelberg: Springer.

[172] Lenstra, A. K. and E. R. Verheul (2001a). “An Overview of the XTR Public Key System”, pp. 151–180. Proceedings of the Public Key Cryptography and Computational Number Theory Conference, Warsaw, Poland, 2000. Berlin: Verlages Walter de Gruyter.

[173] Lenstra, A. K. and E. R. Verheul (2001b). “Fast Irreducibility and Subgroup Membership Testing in XTR”, Public Key Cryptography—PKC 2001, Lecture Notes in Computer Science, 1992. pp. 73–86. Berlin/Heidelberg: Springer.

[174] Lenstra, H. W. (1987). “Factoring Integers with Elliptic Curves”, Annals of Mathematics, 126: 649–673.

[175] Lenstra, H. W. and C. Pomerance (2005), “Primality Testing with Gaussian Periods” [online document]. Available at http://www.math.dartmouth.edu/~carlp/PDF/complexity12.pdf (October 2008).

[176] Lercier, R. (1997). “Finding Good Random Elliptic Curves for Cryptosystems Defined over “, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 379–392. Berlin/Heidelberg: Springer.

[177] Lercier, R. and D. Lubicz (2003). “Counting Points on Elliptic Curves over Finite Fields of Small Characteristic in Quasi Quadratic Time”, Advances in Cryptology—EUROCRYPT 2003, Lecture Notes in Computer Science, 2656. pp. 360–373. Berlin/Heidelberg: Springer.

[178] Libert, B. and J.-J. Quisquater (2003), “New Identity Based Signcryption Schemes from Pairings” [online document]. Available at http://eprint.iacr.org/2003/023/ (October 2008).

[179] Lidl, R. and H. Niederreiter (1984). Finite Fields, Encyclopedia of Mathematics and Its Applications, 20. Cambridge: Cambridge University Press.

[180] Lidl, R. and H. Niederreiter (1994). Introduction to Finite Fields and Their Applications. Cambridge: Cambridge University Press.

[181] Liu, D. and P. Ning (2003a). “Establishing Pairwise Keys in Distributed Sensor Networks”, pp. 52–61. Proceedings of the 10th ACM Conference on Computer and Communication Security, Washington D.C., USA, October 2003.

[182] Liu, D. and P. Ning (2003b). “Location-Based Pairwise Key Establishments for Static Sensor Networks”, pp. 72–82. Proceedings of the 1st ACM Workshop on Security in Ad Hoc and Sensor Networks, Fairfax, Virginia, 31 October 2003.

[183] Liu, D., P. Ning and R. Li (2005). “Establishing Pairwise Keys in Distributed Sensor Networks”, ACM Transactions on Information and System Security, (8) 1: 41–77.

[184] Lucks, S. (2000). “Attacking Seven Rounds of Rijndael Under 192-bit and 256-bit Keys”, pp. 215–229. Proceedings of the 3rd Advanced Encryption Standard Candidate conference, New York, April 2000.

[185] Malone-Lee, J. (2002), “Identity-Based Signcryption” [online document]. Available at http://eprint.iacr.org/2002/098/ (October 2008).

[186] Mao, W. (2001). “New Zero-Knowledge Undeniable Signatures—Forgery of Signature Equivalent to Factor-isation”. Hewlett-Packard technical report HPL-2201-36.

[187] Mao, W. and K. G. Paterson (2000). “Convertible Undeniable Standard RSA Signatures”. Hewlett-Packard technical report HPL-2000-148.

[188] Matsumoto, T. and H. Imai (1988). “Public Quadratic Polynomial-Tuples for Efficient Signature-Verification and Message-Encryption”, Advances in Cryptology—EUROCRYPT ’88, Lecture Notes in Computer Science, 330. pp. 419–453. Berlin/Heidelberg: Springer.

[189] McCurley, K. S. (1990). “The Discrete Logarithm Problem”, in C. Pomerance and S. Goldwasser (eds.), Cryptology and Computational Number Theory: American Mathematical Society Short Course, Boulder, Colorado, 6–7 August 1989. Proceedings of Symposia in Applied Mathematics, 42. pp. 49–74. Providence, Rhode Island: American Mathematical Society.

[190] McEliece, R. J. (1978). “A Public-Key Cryptosystem Based on Algebraic Coding Theory”. DSN progress report 42–44, Jet Propulsion Laboratory, California Institute of Technology, pp. 114–116.

[191] Menezes, A. J. (ed.) (1993). Applications of Finite Fields. Boston: Kluwer Academic Publishers.

[192] Menezes, A. J. (1993). Elliptic Curve Public Key Cryptosystems. The Springer International Series in Engineering and Computer Science, 234. Springer. Available at http://books.google.co.in/books?id=bIb54ShKS68C (October 2008).

[193] Menezes, A. J., T. Okamoto and S. Vanstone (1993). “Reducing Elliptic Curve Logarithms to a Finite Field”, IEEE Transactions on Information Theory, 39: 1639–1646.

[194] Menezes, A. J., P. van Oorschot and S. Vanstone (1997). Handbook of Applied Cryptography. Boca Raton, Florida: CRC Press.

[195] Menezes, A. J., Y. Wu and R. Zuccherato (1996). “An Elementary Introduction to Hyperelliptic Curves”. CACR technical report CORR 96-19, University of Waterloo, Canada.

[196] Merkle, R. C. amd M. E. Hellman (1978). “Hiding Information and Signatures in Trapdoor Knapsacks”, IEEE Transactions on Information Theory, 24 (5): 525–530.

[197] Mermin, N. D. (2003). “From Cbits to Qbits: Teaching Computer Scientists Quantum Mechanics”, American Journal of Physics, 71: 23–30.

[198] Mermin, N. D. (2006), “Phys481-681-CS483 Lecture Notes and Homework Assignments” [online document]. Available at http://people.ccmr.cornell.edu/~mermin/qcomp/CS483.html (October 2008).

[199] Messerges, T. S. (2000). “Securing the AES Finalists Against Power Analysis Attacks”, Fast Software Encryption—FSE 2000, Lecture Notes in Computer Science, 1978. pp. 150–164. Berlin/Heidelberg: Springer.

[200] Messerges, T. S., E. A. Dabbish and R. H. Sloan (1999). “Power Analysis Attacks of Modular Exponentiation in Smartcards”, Cryptographic Hardware and Embedded Systems—CHES 1999, Lecture Notes in Computer Science, 1717. pp. 144–157. Berlin/Heidelberg: Springer.

[201] Messerges, T. S., E. A. Dabbish and R. H. Sloan (2002). “Examining Smart-Card Security Under the Threat of Power Analysis Attacks”, IEEE Transactions on Computers, 51 (4): 541–552.

[202] Michels, M. and M. Stadler (1997). “Efficient Convertible Undeniable Signature Schemes”, pp. 231–244. Proceedings of the 4th International Workshop on Selected Areas in Cryptography, Ottawa, Canada.

[203] Mignotte, M. (1992). Mathematics for Computer Algebra. New York: Springer.

[204] Miller, G. L. (1976). “Riemann’s Hypothesis and Tests for Primality”, Journal of Computer and System Sciences, 13: 300–317.

[205] Miller, V. (1986). “Uses of Elliptic Curves in Cryptography”, Advances in Cryptology—CRYPTO ’85, Lecture Notes in Computer Science, 18. pp. 417–426. Berlin/Heidelberg: Springer.

[206] Möller, B. (2001). “Securing Elliptic Curve Point Multiplication Against Side-Channel Attacks”, Information Security Conference, Lecture Notes in Computer Science, 2200. pp. 324–334. Berlin/Heidelberg: Springer.

[207] Mollin, R. A. (1998). Fundamental Number Theory with Applications. Boca Raton, Florida: Chapman & Hall/CRC.

[208] Mollin, R. A. (1999). Algebraic Number Theory. Boca Raton, Florida: Chapman & Hall/CRC.

[209] Mollin, R. A. (2001). An Introduction to Cryptography. Boca Raton, Florida: Chapman & Hall/CRC.

[210] Montgomery, P. L. (1985). “Modular Multiplication Without Trial Division”, Mathematics of Computation, 44: 519–521.

[211] Montgomery, P. L. (1994). “A Survey of Modern Integer Factorization Algorithms”, CWI Quarterly, 7 (4): 337–366.

[212] Montgomery, P. L. (1995). “A Block Lanczos Algorithm for Finding Dependencies over GF(2)”, Advances in Cryptology—EUROCRYPT ’95, Lecture Notes in Computer Science, 921. pp. 106–120. Berlin/Heidelberg: Springer.

[213] Morrison, M. A. and J. Brillhart (1975). “A Method of Factoring and a Factorization of F₇”, Mathematics of Computation, 29: 183–205.

[214] * Motwani, R. and P. Raghavan (1995). Randomized Algorithms. Cambridge: Cambridge University Press.

[215] Muir, J. A. (2001). Techniques of Side Channel Cryptanalysis [dissertation]. Canada: University of Waterloo. Available at http://www.uwspace.uwaterloo.ca/bitstream/10012/1098/1/jamuir2001.pdf (October 2008).

[216] Neukirch, J. (1999). Algebraic Number Theory. Berlin and Heidelberg: Springer.

[217] Nguyen, P. Q. (2006), “A Note on the Security of NTRUSign” [online document]. Available at http://eprint.iacr.org/2006/387 (October 2008).

[218] * Nielsen, M. A. and I. L. Chuang (2000). Quantum Computation and Quantum Information. Cambridge: Cambridge University Press.

[219] NIST (2001), “Advanced Encryption Standard” [online document]. Available at http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf (October 2008).

[220] NIST (2006), “Digital Signature Standard (DSS)” [online document]. Available at http://csrc.nist.gov/publications/drafts/fips_186-3/Draft-FIPS-186-3%20_March2006.pdf (October 2008).

[221] NIST (2007a), “Federal Information Processing Standards” [online document]. Available at http://csrc.nist.gov/publications/PubsFIPS.html (October 2008).

[222] NIST (2007b), “Secure Hash Standard (SHS)” [online document]. Available at http://csrc.nist.gov/publications/drafts/fips_180-3/draft_fips-180-3_June-08-2007.pdf (October 2008).

[223] Nyberg, K. and R. A. Rueppel (1993). “A New Signature Scheme Based on the DSA Giving Message Recovery”, pp. 58–61. Proceedings of the 1st ACM Conference on Computer and Communications Security, Fairfax, Virginia, 3–5 November.

[224] Nyberg, K. and R. A. Rueppel (1995). “Message Recovery for Signature Schemes Based on the Discrete Logarithm Problem”, Advances in Cryptology—EUROCRYPT ’94, Lecture Notes in Computer Science, 950. pp. 182–193. Berlin/Heidelberg: Springer.

[225] Odlyzko, A. M. (1985). “Discrete Logarithms and Their Cryptographic Significance”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 224–314. Berlin/Heidelberg: Springer.

[226] Odlyzko, A. M. (2000). “Discrete Logarithms: The Past and the Future”, Designs, Codes and Cryptography, 19: 129–145.

[227] Okamoto, T. (1992). “Provably Secure and Practical Identification Schemes and Corresponding Signature Schemes”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 31–53. Berlin/Heidelberg: Springer.

[228] Okamoto, T., E. Fujisaki and H. Morita (1998). “TSH-ESIGN: Efficient Digital Signature Scheme Using Trisection Size Hash”, submission to IEEE P1363a.

[229] Papadimitriou, C. H. (1994). Computational Complexity. Reading, Massachusetts: Addison-Wesley.

[230] Park, S., T. Kim, Y. An and D. Won (1995). “A Provably Entrusted Undeniable Signature”, pp. 644–648. IEEE Singapore International Conference on Network/International Conference on Information Engineering (SICON/ICIE ’95).

[231] Patarin, J. (1995). “Cryptanalysis of the Matsumoto and Imai Public Key Scheme of Eurocrypt’88”, Advances in Cryptology—CRYPTO ’95, Lecture Notes in Computer Science, 963. pp. 248–261. Berlin/Heidelberg: Springer.

[232] Patarin, J. (1996). “Hidden Fields Equations (HFE) and Isomorphisms of Polynomials (IP): Two New Families of Asymmetric Algorithms”, Advances in Cryptology—EUROCRYPT ’96, Lecture Notes in Computer Science, 1070. pp. 33–48. Berlin/Heidelberg: Springer.

[233] Pirsig, R. M. (1974). Zen and the Art of Motorcycle Maintenance: An Inquiry into Values. London: Bodley Head.

[234] Pohlig, S. and M. Hellman (1978). “An Improved Algorithm for Computing Logarithms over GF (p) and its Cryptographic Significance”, IEEE Transactions on Information Theory, 24: 106–110.

[235] Pohst, M. and H. Zassenhaus (1989). Algorithmic Algebraic Number Theory, Encyclopaedia of Mathematics and Its Applications, 30. Cambridge: Cambridge University Press.

[236] Pointcheval, D. and J. Stern (1996). “Provably Secure Blind Signature Schemes”, Advances in Cryptology—ASIACRYPT ’96, Lecture Notes in Computer Science, 1163. pp. 252–265. Berlin/Heidelberg: Springer.

[237] Pointcheval, D. and J. Stern (2000). “Security Arguments for Digital Signatures and Blind Signatures”, Journal of Cryptology, 13 (3): 361–396.

[238] Pollard, J. M. (1974). “Theorems on Factorization and Primality Testing”, Proceedings of the Cambridge Philosophical Society, 76 (2): 521–528.

[239] Pollard, J. M. (1975). “A Monte Carlo Method for Factorization”, BIT, 15 (3): 331–334.

[240] Pollard, J. M. (1993). “Factoring with Cubic Integers”, in A. K. Lenstra and H. W. Lenstra (eds.), The Development of the Number Field Sieve, Lecture Notes in Mathematics, 1554. pp. 4–10. Berlin: Springer.

[241] Pomerance, C. (1985). “The Quadratic Sieve Factoring Algorithm”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 169–182. Berlin/Heidelberg: Springer.

[242] Pomerance, C. (2008). “Elementary Thoughts on Discrete Logarithms”, pp. 385–396. in J. P. Buhler and P. Stevenhagen (eds.), Surveys in Algorithmic Number Theory, Publications of the Research Institute for Mathematical Sciences, 44. New York: Cambridge University Press.

[243] Preskill, J. (1998). “Quantum Computing: Pro and Con”, Proceedings of the Royal Society of London, A454:469–486.

[244] Preskill, J. (2007), “Course Information for Quantum Computation” [online document]. Available at http://theory.caltech.edu/people/preskill/ph219/ (October 2008).

[245] Proos, J. and C. Zalka (2004), “Shor’s Discrete Logarithm Quantum Algorithm for Elliptic Curves” [online document]. Available at http://arxiv.org/abs/quant-ph/0301141 (October 2008).

[246] Rabin, M. O. (1979). “Digitalized Signatures and Public-Key Functions as Intractable as Factorization”. Technical report MIT/LCS/TR-212, MIT Laboratory for Computer Science, Massachusetts.

[247] Rabin, M. O. (1980a). “Probabilistic Algorithms in Finite Fields”, SIAM Journal of Computing, 9: 273–280.

[248] Rabin, M. O. (1980b). “Probabilistic Algorithm for Testing Primality”, Journal of Number Theory, 12: 128–138.

[249] Ram Murty, M. (2001). Problems in Analytic Number Theory. New York: Springer.

[250] Raymond, J.-F. and A. Stiglic (2000), “Security Issues in the Diffie-Hellman Key Agreement Protocol” [online document]. Available at http://crypto.cs.mcgill.ca/~stiglic/Papers/dhfull.pdf (October 2008).

[251] Ribenboim, P. (2001). Classical Theory of Algebraic Numbers. Universitext. New York: Springer.

[252] Rivest, R. L., A. Shamir, and L. M. Adleman (1978). “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems”, Communications of the ACM, 2: 120–126.

[253] Rosser, J. and J. Schoenfield (1962). “Approximate Formulas for Some Functions of Prime Numbers”, Illinois Journal of Mathematics, 6: 64–94.

[254] RSA Security Inc. (2008), “Public-Key Cryptography Standards” [online document]. Available at http://www.rsa.com/rsalabs/node.asp?id=2124 (October 2008).

[255] Sakurai, J. J. (1994). Modern Quantum Mechanics. Revised by San-Fu Tuan, Reading, Massachusetts: Addison-Wesley.

[256] Satoh, T. (2000). “The Canonical Lift of an Ordinary Elliptic Curve over a Finite Field and Its Point Counting”, Journal of Ramanujan Mathematical Society, 15: 247–270.

[257] Satoh, T. and K. Araki (1998). “Fermat Quotients and the Polynomial Time Discrete Log Algorithm for Anomalous Elliptic Curves”, Commentarii Mathematici Universitatis Sancti Pauli, 47: 81–92.

[258] Schiff, L. I. (1968). Quantum Mechanics, 3rd ed. New York: McGraw-Hill.

[259] Schindler, W., F. Koeune and J.-J. Quisquater (2001). “Unleashing the Full Power of Timing Attack”. Technical report CG-2001/3, Université Catholique de Louvain, Belgium. Available at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.6622.

[260] Schirokauer, O. (1993). “Discrete Logarithms and Local Units”, Philosophical Transactions of the Royal Society of London, Series A, 345: 409–423.

[261] Schirokauer, O., D. Weber, and T. Denny (1996). “Discrete Logarithms: The Effectiveness of the Index Calculus Method”, Algorithmic Number Theory—ANTS-II, Lecture Notes in Computer Science, 1122. pp. 337–361. Berlin/Heidelberg: Springer.

[262] * Schneier, B. (2006). Applied Cryptography, 2nd ed. New York: John Wiley & Sons.

[263] Schnorr, C. P. (1991). “Efficient Signature Generation for Smart Cards”, Journal of Cryptology, 4: 161–174.

[264] Schoof, R. (1995). “Counting Points on Elliptic Curves over Finite Fields”, Journal de Théorie des Nombres de Bourdeaux, 7: 219-254.

[265] Semaev, I. A. (1998). “Evaluation of Discrete Logarithms on Some Elliptic Curves”, Mathematics of Computation, 67: 353–356.

[266] Shamir, A. (1984). “A Polynomial-Time Algorithm for Breaking the Basic Merkle-Hellman Cryptosystem”, IEEE Transactions on Information Theory, 30: 699–704.

[267] Shamir, A. (1984). “Identity-Based Cryptosystems and Signature Schemes”, Advances in Cryptology—CRYPTO ’84, Lecture Notes in Computer Science, 196. pp. 47–53. Berlin/Heidelberg: Springer.

[268] Shamir, A. (1997). “How to Check Modular Exponentiation”, presented at the rump session of Advances in Cryptology—EUROCRYPT ’97, May.

[269] Shamir, A. (1999). “Factoring Large Numbers with the TWINKLE Device”, Cryptographic Hardware and Embedded Systems—CHES ’99, Lecture Notes in Computer Science, 1717. pp. 2–12. Berlin/Heidelberg: Springer.

[270] Shamir, A. and E. Tromer (2003). “Factoring Large Numbers with the TWIRL Device”, Advances in Cryptology—CRYPTO 2003, Lecture Notes in Computer Science, 2729. pp. 1–26. Berlin/Heidelberg: Springer.

[271] Shor, P. W. (1997). “Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer”, SIAM Journal of Computing, 26: 1484–1509.

[272] Shoup, V. (1990). “On the Deterministic Complexity of Factoring Polynomials over Finite Fields”, Information Processing Letters, 33: 261–267.

[273] Shparlinski, I. E. (1991). “On Some Problems in the Theory of Finite Fields”, Russian Mathematical Surveys, 46 (1): 199–240.

[274] Shparlinski, I. E. (1992). Computational and Algorithmic Problems in Finite Fields, Mathematics and its Applications, 88. Kluwer Academic Publishers.

[275] * Silverman, J. H. (1986). The Arithmetic of Elliptic Curves. Graduate Texts in Mathematics, 106. Berlin and New York: Springer.

[276] Silverman, J. H. (1994). Advanced Topics in the Arithmetic of Elliptic Curves. Graduate Texts in Mathematics, 151. New York: Springer.

[277] Silverman, J. H. (2000). “The Xedni Calculus and the Elliptic Curve Discrete Logarithm Problem”, Design, Codes and Cryptography, 20: 5–40.

[278] Silverman, J. H. and J. Suzuki (1998). “Elliptic Curve Discrete Logarithms and the Index Calculus”, Advances in Cryptology—ASIACRYPT ’98, Lecture Notes in Computer Science, 1514. pp. 110–125. Berlin/Heidelberg: Springer.

[279] Silverman, R. D. (1987). “The Multiple Polynomial Quadratic Sieve”, Mathematics of Computation, 48: 329–339.

[280] * Sipser, M. (1997). Introduction to the Theory of Computation, 2nd ed. Boston: PWS Publishing Company.

[281] B. Skjernaa (2003). “Satoh’s Algorithm in Characteristic 2”, Mathematics of Computation, 72: 477–487.

[282] Smart, N. P. (1999). “The Discrete Logarithm Problem on Elliptic Curves of Trace One”, Journal of Cryptology, 12: 193–196.

[283] Smart, N. P. (2002). Cryptography: An Introduction. New York: McGraw-Hill. The 2nd edition of this book is available online at http://www.cs.bris.ac.uk/~nigel/Crypto_Book/ (October 2008).

[284] Smith, P. J. (1993). “LUC Public-Key Encryption: A Secure Alternative to RSA”, Dr. Dobb’s Journal, 18 (1): 44–49.

[285] Smith, P. J. and M. J. J. Lennon (1993). “LUC: A New Public Key System”, IFIP Transactions, A 37. pp. 103–117. Proceedings of the IFIP TC11, 9th International Conference on Information Security. Computer Security. Amsterdam: North-Holland Co.

[286] Smith, P. J. and C. Skinner (1995). “A Public-Key Cryptosystem and Digital Signature System Based on the Lucas Function Analogue to Discrete Logarithms”, Advances in Cryptology—ASIACRYPT ’94, Lecture Notes in Computer Science, 917. pp. 357–364. Berlin/Heidelberg: Springer.

[287] Solovay, R. and V. Strassen (1977). “A Fast Monte Carlo Test for Primality”, SIAM Journal of Computing, 6: 84–86.

[288] * Stallings, W. (2006). Cryptography and Network Security, 4th ed. Upper Saddle River, New Jersey: Prentice-Hall.

[289] Stam, M. and A. K. Lenstra (2001). “Speeding up XTR”, Advances in Cryptology—ASIACRYPT 2001, Lecture Notes in Computer Science, 2248. pp. 125–143. Berlin/Heidelberg: Springer.

[290] Stein, A. and E. Teske (2005). “Optimized Baby Step-Giant Step Methods”, Journal of Ramanujan Mathematical Society, 20 (1): 27–58.

[291] * Stinson, D. (2005). Cryptography: Theory and Practice, 3rd ed. Boca Raton, Florida: CRC Press.

[292] Strassen, V. (1969). “Gaussian Elimination Is not Optimal”, Numerische Mathematik, 13: 354–356.

[293] Stucki, D., N. Gisin, O. Guinnard, G. Ribordy and H. Zbinden (2002). “Quantum Key Distribution over 67 km with a Plug & Play System”, New Journal of Physics, 4: 41.1–41.8.

[294] Sun, H.-M., W.-C. Yang and C.-S. Laih (1999). “On the Design of RSA with Short Secret Exponent”, Advances in Cryptology—ASIACRYPT ’99, Lecture Notes in Computer Science, 1716. pp. 150–164. Berlin/Heidelberg: Springer.

[295] Swade, D. (2000). The Cogwheel Brain: Charles Babbage and the Quest to Build the First Computer. London: Little, Brown and Company.

[296] Trappe, W. and L. C. Washington (2006). Introduction to Cryptography with Coding Theory, 2nd ed. Upper Saddle River: Prentice-Hall.

[297] Verheul, E. R. (2001). “Evidence that XTR is More Secure than Supersingular Elliptic Curve Cryptosystems”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. pp. 195–210. Berlin/Heidelberg: Springer.

[298] Washington, L. C. (2003). Elliptic Curves: Number Theory and Cryptography. Boca Raton, Florida: Chapman & Hall/CRC.

[299] Weber, D. (1996). “Computing Discrete Logarithms with the General Number Field Sieve”, Algorithmic Number Theory—ANTS-II, Lecture Notes in Computer Science, 1122. pp. 337–361. Berlin/Heidelberg: Springer.

[300] Weber, D. (1998). “Computing Discrete Logarithms with Quadratic Number Rings”, Advances in Cryptology—EUROCRYPT ’98, Lecture Notes in Computer Science, 1403. pp. 171–183. Berlin/Heidelberg: Springer.

[301] Weber, D. and T. Denny (1998). “The Solution of McCurley’s Discrete Log Challenge”, Advances in Cryptology—CRYPTO ’98, Lecture Notes in Computer Science, 1462. pp. 458–471. Berlin/Heidelberg: Springer.

[302] Western, A. E. and J. C. P. Miller (1968). “Tables of Indices and Primitive Roots”, Royal Mathematical Tables, 9, Cambridge: Cambridge University Press.

[303] Wiedemann, D. H. (1986). “Solving Sparse Linear Equations over Finite Fields”, IEEE Transactions on Information Theory, 32: 54–62.

[304] Wiener, M. J. (1990). “Cryptanalysis of Short RSA Secret Exponents”, IEEE Transactions on Information Theory, 36: 553–558.

[305] Williams, H. C. (1982). “A p + 1 Method for Factoring”, Mathematics of Computation, 39 (159): 225–234.

[306] Yang, L. T. and R. P. Brent (2001). “The Parallel Improved Lanczos Method for Integer Factorization over Finite Fields for Public Key Cryptosystems”, pp. 106–114. Proceedings of the ICPP Workshops 2001, Valencia, Spain, 3–7 September.

[307] Young, A. and M. Yung (1996). “The Dark Side of “Black-Box” Cryptography, or: Should We Trust Capstone?”, Advances in Cryptology—CRYPTO ’96, Lecture Notes in Computer Science, 1109. pp. 89–103. Berlin/Heidelberg: Springer.

[308] Young, A. and M. Yung (1997a). “Kleptography: Using Cryptography Against Cryptography”, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 62–74. Berlin/Heidelberg: Springer.

[309] Young, A. and M. Yung (1997b). “The Prevalence of Kleptographic Attacks on Discrete-Log Based Cryptosystems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 264–276. Berlin/Heidelberg: Springer.

[310] Zheng, Y. (1997). “Digital Signcryption or How to Achieve Cost(Signature & Encryption) << Cost(Signature) + Cost(Encryption)”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 165–179. Berlin/Heidelberg: Springer.

[311] Zheng, Y. (1998a). “Signcryption and Its Applications in Efficient Public Key Solutions”, 1997 Information Security Workshop ISW ’97, Lecture Notes in Computer Science, 1397. pp. 291–312. Berlin/Heidelberg: Springer.

[312] Zheng, Y. (1998b). “Shortened Digital Signature, Signcryption, and Compact and Unforgeable Key Agreement Schemes”, contribution to IEEE P1363 Standard for Public Key Cryptography.

[313] Zheng, Y. and H. Imai (1998a). “Efficient Signcryption Schemes on Elliptic Curves”. Proceedings of the IFIP 14th International Information Security Conference IFIP/SEC ’98, Vienna, Austria, September 1998. Chapman & Hall.

[314] Zheng, Y. and H. Imai (1998b). “How to Construct Efficient Signcryption Schemes on Elliptic Curves”, Information Processing Letters, 68: 227–233.

[315] Zheng, Y. and T. Matsumoto (1996). “Breaking Smartcard Implementations of ElGamal Signatures and Its Variants”, presented at the rump session of Advances in Cryptology—ASIACRYPT ’96. Available at http://www.sis.uncc.edu/~yzheng/publications/ (October 2008).

[316] * Zuckerman, H. S., H. L. Montgomery, I. M. Niven and A. Niven (1991). An Introduction to the Theory of Numbers. New York: John Wiley & Sons.

Books marked by stars have Asian editions (at the time of writing this book).

1.2. Common Cryptographic Primitives

1.2.1. The Classical Problem: Secure Transmission of Messages

^[1] Some people prefer to use the terms enciphering and deciphering in place of the words encryption and decryption respectively.

Symmetric-key or secret-key cryptography

Asymmetric-key or public-key cryptography

1.2.2. Key Exchange

1.2.3. Digital Signatures

1.2.4. Entity Authentication

Figure 1.1. Zero-knowledge proofs

1.2.5. Secret Sharing

1.2.6. Hashing

First pre-image resistance

Second pre-image resistance

Given a pair (x, H(x)), it should be difficult to find an input x′ different from x with H(x) = H(x′).

Collision resistance

It should be difficult to find two different input strings x, x′ with H(x) = H(x′).

1.2.7. Certification

The X 5.09 public-key infrastructure specifies Internet standards for certificates and CRLs.

2.2. Sets, Relations and Functions

Some frequently occurring sets are denoted by special symbols. We list a few of them here.

	The set of all natural numbers, that is, {1, 2, 3, . . .}
	The set of all non-negative integers, that is, {0, 1, 2, . . .}
	The set of all integers, that is, {. . . , –2, –1, 0, 1, 2, . . .}
	The set of all (positive) prime numbers, that is, {2, 3, 5, 7, . . .}
	The set of all rational numbers, that is,
	The set of all non-zero rational numbers
	The set of all real numbers
	The set of all non-zero real numbers
	The set of all complex numbers
	The set of all non-zero complex numbers
	The empty set

2.2.1. Set Operations

2.2.2. Relations

A relation ρ on a set A is a subset of A × A. For , we usually say a ρ b implying that a is related by ρ to b. Common examples are the standard relations =, ≠, ≤, <, ≥, > on (or or ).

A relation ρ on a set A is called reflexive, if a ρ a for all . For example, =, ≤ and ≥ are reflexive relations on , but the relations ≠, <, > are not.

A relation ρ on A is called transitive if a ρ b and b ρ c imply a ρ c, For example, =, <, ≤, >, ≥ are all transitive, but ≠ is not transitive.

Theorem 2.1.

An equivalence relation on a set A produces a partition of A. Conversely, every partition of a set A corresponds to an equivalence relation on A.

Proof

The subset [a] of A defined in the proof of the above theorem is called the equivalence class of a with respect to the equivalence relation ρ.

2.2.3. Functions

2.2.4. The Axioms of Mathematics

A totally ordered set A is said to be well ordered (and the relation is called a well order), if every non-empty subset B of A contains a first element.

Axiom 2.1. Zermalo’s well-ordering principle

Every set A can be well ordered, that is, there is a relation which well orders A.

The set is well-ordered under the natural relation ≤. The set can be well ordered by the relation defined as . A well ordering of is not known.

Axiom 2.2. Zorn’s lemma

Let A be a partially ordered set. If every chain of A has an upper bound (in A), then A has at least one maximal element.

Axiom 2.3. Hausdorff’s maximal principle

Let be a partial order on a set A. Then there is a maximal chain B of A, that is, if C is any chain with B ⊆ C ⊆ A, then C = B.

Finally, let A be a set and , that is, is the set of all non-empty subsets of A. A choice function of A is a function such that for every we have .

Axiom 2.4. Axiom of choice

Every set has a choice function.

Exercise Set 2.2

2.1	Let G = (V, E) be an undirected graph. Define a relation ρ on the vertex set V of G by: u ρ v if and only if there is a path from u to v. Show that ρ is an equivalence relation on V. What are the equivalence classes for this relation? Let G = (V, E) be a directed acyclic graph. Define the relation ρ on V as in (a). Show that ρ is a partial order on V. When is ρ a total order?
2.2	Let f : A → B and g : B → A be functions. Show that if f ο g = id_B, then g is injective and f is surjective. In particular, f (and also g) is bijective, if f ο g = id_B and g ο f = id_A. In this case, we call g to be the inverse of f and denote this as g = f^–1. Show by examples that both the conditions f ο g = id_B and g ο f = id_A are necessary for f to be bijective.
2.3	Let f : A → B a map from a finite set A to a finite set B. Prove that #A ≤ #B, if f is injective, #A ≥ #B, if f is surjective, and #A = #B, if f is bijective.
2.4	Let A be a finite set and let f : A → A be a map. Show that the following conditions are equivalent. f is injective. f is surjective. f is bijective. Show by examples that this equivalence need not hold, if A is an infinite set.
2.5	Let A and B be two arbitrary sets, f : A → B a map, A′ ⊆ A and B′ ⊆ B. We define and . Show that: If A′ ⊆ A″ ⊆ A, then f(A′) ⊆ f(A″). If B′ ⊆ B″ ⊆ B, then f^–1(B′) ⊆ f^–1(B″). f^–1(f(A′)) ⊇ A′. f(f^–1(B′)) ⊆ B′. f(f^–1(f(A′))) = f(A′). f^–1(f(f^–1(B′))) = f^–1(B′).
2.6	Russell’s paradox A collection C is called ordinary, if C is not a member of C. A collection which is not ordinary is called extraordinary. Show that the collection of all ordinary collections is neither ordinary nor extraordinary.
2.7	Let A_i, , be a family of sets (not necessarily pairwise disjoint). For each , consider the set . Show that the family B_i, , are pairwise disjoint. The union is called the disjoint union of A_i, .

2.3. Groups

Definition 2.1.

A binary operation on a set A is a map from A × A to A. If ◊ is a binary operation on A, it is customary to write a ◊ a′ to denote the image of (a, a′) (under ◊).

2.3.1. Definition and Basic Properties

Definition 2.2.

A group^[1] (G, ◊) is a set G together with a binary operation ◊ on G, that satisfy the following three conditions:

^[1] In binary operations and algebras generally there is a morass of terminology which reflects on the literacy of the promulgators. Starting for example with a poor choice, namely “group”, we now have “semigroup” (why?), “loop” (why?), “groupoid”, and “partial groupoid”. . . .Among other poor choices are “ring”, “field”, “ideal”, “category theory”, and “universal algebra”. “Ideal” was used by Dedekind in a sense which made sense to mathematicians of that day but it does not today. “Field” can best be labeled as ridiculous. As to categories of category theory, the concept of category is too broad for that reduction. It is not good taste to take such a term and place it in restricted surroundings.

—Preston C. Hammer

Associativity (a ◊ b) ◊ c = a ◊ (b ◊ c) for all a, b, .
Identity element There exists a (unique) element such that e ◊ a = a ◊ e = a for all . The element e is called the identity of G.
Inverse For each , there exists a (unique) element such that a ◊ b = b ◊ a = e. The element b is called the inverse of a.
If, in addition, we assume that
Commutativity a ◊ b = b ◊ a for all a, ,
then G is called a commutative or an Abelian group.

Example 2.1.

The set is an Abelian group under addition. The identity is 0 and the inverse of a is –a. Note, however, that is not a group under multiplication, because though it contains the multiplicative identity 1, multiplicative inverse is not defined for all elements in except ±1.
The set of non-zero rational numbers is a group under multiplication. The identity is 1 = 1/1 and the inverse of a/b is b/a.
For a set A, the set of all bijective functions A → A is a group under composition of functions. The identity element is id_A and the inverse of f is denoted by f^–1. (See also Exercise 2.2.) This group is not Abelian in general.
The set of all m × n matrices with entries from is a group under matrix addition. On the other hand, the set of all n × n invertible matrices over is a group under matrix multiplication and is called the general linear group. Note that is another example of a group that is not Abelian (for n > 1).
A group G is called finite, if G as a set consists of (only) finitely many elements. Finite groups play an extremely important role in cryptography. Here is our first example of finite groups: Let n be an integer ≥ 2. The set

is a group under addition modulo n (that is, add (and subtract) two elements in as integers and if the result is not in , take the remainder of division by n). For this group, the identity element is 0 and –a = n – a for a ≠ 0 and –0 = 0. (See Example 2.3 for a formal definition of .)
For an integer n ≥ 2, define the set

If n is prime, then . The set is a group under multiplication modulo n with identity 1. We need little more machinery than introduced so far in order to prove that every element has a multiplicative inverse modulo n. Other group axioms are easy to check.

Proposition 2.1.

Let (G, ◊) be a group and let a, b, . Then a ◊ b = a ◊ c implies b = c. Similarly, a ◊ c = b ◊ c implies a = b. These statements are commonly known as (left and right) cancellation laws.

Proof

2.3.2. Subgroups, Cosets and Quotient Groups

Definition 2.3.

Example 2.2.

For any group G with identity element e, the subsets {e} and G are subgroups of G. They are called the trivial subgroups of G.
For an integer n ≥ 2, the set of all integral multiples of n is an additive subgroup of and is denoted by .
The set consisting of all n × n real matrices of determinant 1 is a subgroup of and is commonly referred to as the special linear group.
Note that though in Example 2.1 is a subset of , it is not a subgroup of , since it is not closed under the addition of . It is a group under addition modulo n which is not the same as integer addition.

Definition 2.4.

Proposition 2.2.

Let G be a (multiplicative) group and H a subgroup of G. Then, the cosets aH, , partition G. Two cosets aH and bH are equal if and only if . There is a bijective map from aH to bH for every a, .

Proof

Now we define a map by ah ↦ bh for every . The map is clearly surjective. Injectivity of follows from the left cancellation law (Proposition 2.1). Hence is bijective.

The following theorem is an important corollary to the last proposition.

Theorem 2.2. Lagrange’s theorem

Let G be a finite group and H a subgroup of G. Then, the cardinality of G is an integral multiple of the cardinality of H.

Proof

Definition 2.5.

Let G be a group and H a subgroup of G. The number of distinct cosets of H in G is called the index of H in G and is denoted by [G : H]. If G is finite, then [G : H] = #G/#H.

Definition 2.6.

Example 2.3.

Let n be an integer ≥ 2. The subgroup of (, +) (Example 2.2) is normal, since is Abelian. The coset of is the set . The quotient group is denoted as and is essentially the same as the group {0, 1, . . . , n – 1} with the operation of addition modulo n (Example 2.1).
For any group G with identity e, the trivial subgroups G and {e} are normal. G/G is a group with a single element, whereas G/{e} is essentially the same as the group G.

2.3.3. Homomorphisms

Definition 2.7.

^[2] If f : G → G′ is a bijective homomorphism, its inverse f^–1 : G′ → G is bijective as a function. However, it is not obvious that f^–1 has to be a group homomorphism. We are lucky here; f^–1 is.

Example 2.4.

The canonical inclusion a ↦ a/1 is a group homomorphism from (, +) to (, +). More generally, if H is a subgroup of G, then the map h ↦ h for all is a group homomorphism. In particular, the identity map on any group G is an automorphism of G (and is the identity element of the group Aut G).
For a (multiplicative) group G and a normal subgroup H, the map G → G/H that takes to its coset aH is a surjective group homomorphism. It is called the canonical surjection of G onto G/H. For example, the map that takes a to its remainder of division by n (≥ 2) is a canonical surjection from the additive group to the quotient group . (Also see Examples 2.1, 2.2 and 2.3.)
The map that takes a complex number z = a + ib to its conjugate is a group automorphism of both (, +) and (, ·).

Proposition 2.3.

Proof

Definition 2.8.

With the notations of the last proposition we define the kernel of f to be the following subset of G:

Ker .

We also define the image of f to be the subset

of G′. Then we have the following important theorem.

Theorem 2.3. Isomorphism theorem

Ker f is a normal subgroup of G, Im f is a subgroup of G′, and G/ Ker f ≅ Im f.

Proof

2.3.4. Generators and Orders

Definition 2.9.

Example 2.5.

The additive groups and are generated by 1 and hence are cyclic. The multiplicative group is cyclic if and only if n is 2, 4, p^r or 2p^r, where p is an odd prime and (See Exercise 2.50). A generator of for such an n is often called a primitive root modulo n.
The group (, ·) is generated by the “primes” p/1, , and –1.
Let G be a multiplicative group (not necessarily Abelian) with identity e and let . Then the subgroup H generated by a is the set of elements of the form a^r, , and is always Abelian. If H is finite, then the elements a^r, , cannot be all distinct, that is, a^s = a^t for some s, , s > t. Then a^s–t = e, where s – t > 0. Now a^–1 = a^s–t–1 and, more generally, a^–k = a^k(s–t–1). Thus we may consider H to consist of non-negative powers of a only. Let . It is easy to see that H = {a^r | r = 0, . . . , n – 1}.

Definition 2.10.

With these notations we prove the following important proposition.

Proposition 2.4.

The order m := ord_G a of is the smallest of the positive integers r for which a^r = e. If n = ord G, then n is an integral multiple of m. In particular, aⁿ = e.

Proof

Lemma 2.1.

Let G be a finite cyclic group. Then any subgroup of G is also cyclic.

Proof

Proposition 2.5.

Let G be a finite cyclic multiplicative group with identity e and let H be a subgroup of order m. Then an element is an element of H if and only if a^m = e.

Proof

*2.3.5. Sylow’s Theorem

Definition 2.11.

We shortly prove that p-Sylow subgroups always exist. Before doing that, we prove a simpler result.

Theorem 2.4. Cauchy’s theorem

Let G be a finite group and p a prime dividing ord G. Then G has a subgroup of order p.

Proof

Now we are in a position to prove the general theorem.

Theorem 2.5. Sylow’s theorem

Let G be a finite group of order n and let p be a prime dividing n. Then there exists a p-Sylow subgroup of G.

Proof

Exercise Set 2.3

2.8	Let G be a multiplicatively written group (not necessarily Abelian). Prove the following assertions. For all elements a, , we have (ab)^–1 = b^–1a^–1 and (a^–1)^–1 = a. A subset H of G is a subgroup of G if and only if for all a, .
2.9	Let G be a multiplicatively written group and let H and K be subgroups of G. Show that: H ∩ K is a subgroup of G. H ∪ K is a subgroup of G if and only if H ⊆ K or K ⊆ H. HK is a subgroup of G if and only if HK = KH. In particular, if K is normal in G, then HK is a subgroup of G. G × G is a group and H × K is a subgroup of G × G. If , then gHg^–1 is a subgroup of G.
2.10	Let G be a multiplicatively written group and H a subgroup of G. Show that the following conditions are equivalent: H is a normal subgroup of G. for all and . gHg^–1 = H for all . gH = Hg for all . Show that if [G : H] = 2, then H is normal.
2.11	Let G be a (multiplicative) group. Second isomorphism theorem Let H and K be subgroups of G and let K be normal in G. Show that H/(H ∩ K) ≅ (HK)/K. [H] Third isomorphism theorem Let H and K be normal subgroups of G with H ⊆ K. Show that G/K ≅ (G/H)/(K/H) (where ). [H]
2.12	Show that the only automorphisms of the group (, +) are the identity map and the map that sends a ↦ –a. Show that the group of automorphisms of (, +) is isomorphic to (, ·).
2.13	Let H be a subgroup of G generated by a_i, . Show that H is the smallest subgroup of G, that contains all of a_i, .
2.14	Let be a homomorphism of (multiplicative) groups. Show that: If H is a subgroup of G, then is a subgroup of G′. If is surjective and H is normal, then H′ is also normal. If H′ is a subgroup of G′, then is a subgroup of G. If H′ is normal, then H is also normal. If is surjective and H is normal, then H′ is also normal. Correspondence theorem Let H be a normal subgroup of G. Then the subgroups (resp. normal subgroups) of G/H are in one-to-one correspondence with the subgroups (resp. normal subgroups) of G, that contain H. [H]
2.15	Let G be a cyclic group. Show that G is isomorphic to or to for some depending on whether G is infinite or finite.
2.16	Let G be a finite (multiplicative) group (not necessarily Abelian). We define the centre of G to be the set . Show that Z(G) is a subgroup of G. If H ⊆ Z(G) is a subgroup of G, show that H is a normal subgroup of G. The centralizer of is defined to be the set . Show that C(a) is a subgroup of G. Show also that C(a) = G if and only if . Define a relation ~ on G by a ~ b if and only if b = gag^–1 for some . Show that ~ is an equivalence relation on G. We say that the elements a and b of G are conjugate, if the equivalence classes [a] and [b] are the same. The equivalence classes are called the conjugacy classes of G. Show that the cardinality of the conjugacy class of is equal to the index [G : C(a)]. Deduce the class equation of G, that is, #G = #Z(G) + ∑[G : C(a)], where the sum is over a set of all pairwise non-conjugate a ∉ Z(G).
2.17	Let G be a (multiplicative) Abelian group with identity e and order , where p_i are distinct primes and . For each i, let H_i be the p_i-Sylow subgroup of G. Show that: G = H₁ · · · H_r. [H] Every element can be written uniquely as g = h₁ · · · h_r with . Moreover, in that case we have ord_G g = (ord_H₁ h₁) · · · (ord_{H_r} h_r). G is cyclic if and only if all of H₁, . . . , H_r are cyclic.
2.18	Let G be a finite (multiplicative) Abelian group with identity e. Assume that for every there are at most n elements x of G satisfying xⁿ = e. Show that G is cyclic. [H]
2.19	Let G be a (multiplicative) group and let H₁, . . . , H_r be normal subgroups of G. If G = H₁ · · · H_r and every element can be written uniquely as g = h₁ · · · h_r with , then G is called the internal direct product of H₁, . . . , H_r. (For example, if G is finite and Abelian, then by Exercise 2.17 it is the internal direct product of its Sylow subgroups.) Show that: If G is finite, it is the internal direct product of normal subgroups H₁, . . . , H_r if and only if G = H₁ · · · H_r and H_i ∩ H_j = {e} for all i, j, i ≠ j. If G is the internal direct product of the normal subgroups H₁, . . . , H_r, then G is isomorphic to the (external) direct product H₁ × · · · × H_r. [H]
2.20	Let H_i, i = 1, . . . , r, be finite Abelian groups of orders m_i and let H := H₁ × · · ·× H_r be their direct product. Show that H is cyclic if and only if each H_i is cyclic and m₁, . . . , m_r are pairwise coprime.

2.4. Rings

2.4.1. Definition and Basic Properties

Definition 2.12.

Additive group The set R is an Abelian group under +. The additive identity is denoted by 0.
· is associative (ab)c = a(bc) for every a, b, .
· is commutative ab = ba for every a, .
Multiplicative identity There is an element (denoted by 1) in R such that a · 1 = 1 · a = a for every . The element 1 is called the identity of R.
Distributivity The operation · is distributive over +, that is, a(b+c) = ab + ac and (a + b)c = ac + bc for every a, b, .

^[3] Cool! But what’s circular in a ring? Historically, such algebraic structures were introduced by Hilbert to designate a Zahlring (a number ring, see Section 2.13). If α is an algebraic integer (Definition 2.95) and we take a Zahlring of the form and consider the powers α, α², α³, . . . , we eventually get an α^d which can be expressed as a linear combination of the previous (that is, smaller) powers of α. This is perhaps the reason that prompted Hilbert to call such structures “rings”. Also see Footnote 1.

Example 2.6.

The sets , , and are all rings under usual addition and multiplication. Each of , and contains the multiplicative inverse of every non-zero element, whereas the only elements in , that have multiplicative inverses, are ±1.
Let denote the set {0, 1, . . . , n – 1} for an integer n ≥ 2. Then is a ring under addition and multiplication modulo n. The additive identity is 0 and the multiplicative identity is 1. Later we see a more formal definition of this ring. Recall from Example 2.1 how we have defined the groups and under addition and multiplication modulo n. These groups have a connection with the ring as we will shortly see.
Let R be a ring and S a set. The set of all functions S → R is a ring under pointwise addition and multiplication of functions (that is, if f and g are two such functions, then we define (f + g)(a) := f(a) + g(a) and (f g)(a) := f(a)g(a) for every ). The additive (resp. multiplicative) identity in this ring is the constant function 0 (resp. 1).
Let R be a ring. The set R[X] of all polynomials in one indeterminate X and with coefficients from R is a ring. The identity elements in R[X] are the constant polynomials 0 and 1. The addition and multiplication operations in R[X] are the standard ones on polynomials. For a non-zero polynomial , the largest non-negative integer d for which the coefficient of X^d is non-zero is called the degree of the polynomial f and is denoted by deg f. The coefficient of X^{deg f} in f is called the leading coefficient of f and is denoted by lc(f). The degree of the zero polynomial is conventionally taken to be –∞. A non-zero polynomial with leading coefficient 1 is called a monic polynomial.
More generally, for one can define the ring R[X₁, . . . , X_n] of multivariate polynomials over R. Polynomial rings are of paramount importance in algebra and number theory. We devote Section 2.6 to a study of these rings.
We also define the ring R(X) of rational functions over R, which consists of elements of the form f/g with f, , g ≠ 0. More generally, the set of elements f/g with f, , g ≠ 0, is a ring denoted R(X₁, . . . , X_n).
Let R_i, , be a family of rings, and the product of the sets R_i, , that is, the set of all ordered tuples indexed by I. For tuples and , define the sum and the product . It is easy to see that R is a ring with identity elements and . It is called the direct product of the rings R_i, . If I is of finite cardinality n and if R_i = A for all , then is denoted in short by Aⁿ.

Proposition 2.6.

Let R be a ring. For all a, , we have:

a · 0 = 0 · a = 0
a(–b) = (–a)b = –ab
(–a)(–b) = ab

Proof

a · 0 = a · (0 + 0) = a · 0 + a · 0, so that a · 0 = 0. Similarly, 0 · a = 0.
By (1), 0 = a · 0 = a(b + (–b)) = ab + a(–b), that is, a(–b) = –ab. Similarly, (–a)b = –ab.
(–a)(–b) = –(a(–b)) = –(–ab) = ab.

Definition 2.13.

Let R be a ring.

An element is called a zero-divisor of R, if ab = 0 for some , b ≠ 0. By this definition, 0 is a zero-divisor of R, unless R = 0. The elements 0, 3, 5, 6, 9, 10 and 12 are all the zero-divisors of .
An element is called a unit of R, if there exists an element such that ab = 1. The elements 1 and –1 are units in any ring. It is easy to see that an element cannot be simultaneously a zero-divisor and a unit. The set of all units in a ring R is denoted by R* and is a group under the multiplication of the ring R (See Exercise 2.21), called the multiplicative group or the group of units of R. The multiplicative group of the ring (Example 2.6) is .
An element is called nilpotent, if a^k = 0 for some . By this definition, 0 is a nilpotent element in any ring. It is also evident that every nilpotent element in a non-zero ring is a zero-divisor. An example of a non-zero nilpotent element in a ring is .
An element is called idempotent, if a² = a. In every ring, 0 and 1 are idempotent. The element 6 is idempotent in . It is easy to check that 0 is the only element in a ring, that is both nilpotent and idempotent.

Definition 2.14.

Let R be a ring.

R is called an integral domain (or simply a domain), if R ≠ 0 and if R contains no non-zero zero-divisors. Examples of integral domains: , , , , . On the other hand, 3 · 5 = 0 in , so Z₁₅ is not an integral domain.
R is called a field, if R ≠ 0 and if R* = R \ {0}, that is, if every non-zero element of R is a unit. This means that in a field one can divide any element by any non-zero element. The most common fields are , and . Note that is not a field, since, for example, 2 does not have a multiplicative inverse in .
A field R with #R finite is called a finite field. The simplest examples of finite fields are the fields for prime integers p. In fact, it is easy to see that is a field if and only if n is a prime. Finite fields are widely applied for building various cryptographic protocols. See Section 2.9 for a detailed study of finite fields.

Corollary 2.1.

A field is an integral domain.

Proof

Recall from Definition 2.13 that an element in a ring cannot be simultaneously a unit and a zero-divisor.

Definition 2.15.

Let R be a non-zero ring. The characteristic of R, denoted char R, is the smallest positive integer n such that 1 + 1 + · · · + 1 (n times) = 0. If no such integer exists, then we take char R = 0.

Proposition 2.7.

Let R be an integral domain of positive characteristic p. Then p is a prime.

Proof

2.4.2. Subrings, Ideals and Quotient Rings

Definition 2.16.

Let R be a ring. A subset S of R is called a subring of R, if S is a ring under the ring operations of R. In this case, one calls R a superring or a ring extension of S.

is a subring of , and , whereas and are field extensions.

Definition 2.17.

Let R be a ring. A subset of R is called an ideal of R, if is an additive subgroup of (R, +) and if for all and .^[4]

^[4] Kummer introduced the concept of ideal numbers. Later Dedekind reformulated Kummer’s notion of ideal numbers to define what we now know as ideals.

In this book, we will use Gothic letters (usually lower case) like , , , , to denote ideals.^[5]

^[5] Mathematicians always run out of symbols. Many believe if it is Gothic, it is just ideal!

Example 2.7.

Let R be any ring. The subset {0} is an ideal of R, called the zero ideal and denoted also by 0. Similarly, the entire ring R is an ideal of R and is called the unit ideal. Note that if an ideal contains a unit u of R, then 1 = u^–1u is also in and so for every . It follows that an ideal of R is the unit ideal if and only if contains a unit—a justification for the name.
The integral multiples of an integer n form an ideal of denoted by . More generally, for any ring R and for any , the set is an ideal of R and is denoted by Ra or aR or 〈a〉. Such an ideal is called a principal ideal. (See also Definition 2.18.)
Let R be a ring and let , , be a family of ideals of R. The intersection is an ideal of R. The set of finite sums the form (where and ) is an ideal of R. It is called the sum of the ideals , , and is denoted by . The union is, in general, not an ideal of R. In fact, the sum is the smallest ideal that contains (the set) .

Proposition 2.8.

The only ideals of a field are the zero ideal and the unit ideal.

Proof

By definition, every non-zero element of a field is a unit.

Definition 2.18.

Theorem 2.6.

is a principal ideal domain.

Proof

A very similar argument proves the following theorem. The details are left to the reader. Also see Exercise 2.31.

Theorem 2.7.

If K is a field, then K[X] is a principal ideal domain.

We now prove a very important theorem:

Theorem 2.8. Hilbert’s basis theorem

If R is a Noetherian ring, then so is the polynomial ring R[X₁, . . . , X_n] for . In particular, the polynomial rings and K[X₁, . . . , X_n] are Noetherian, where K is a field.

Proof

Two particular types of ideals are very important in algebra.

Definition 2.19.

Let R be a ring.

An ideal of R is called a prime ideal, if and if implies or for a, . The second condition is equivalent to saying that if and , then the product . For a prime integer p, the principal ideal of is prime. On the other hand, for a composite integer n the ideal of is not prime. For example, and , but the product .
An ideal of R is called a maximal ideal, if and if for any ideal satisfying we have or . This means that there are no non-unit ideals of R properly containing . All the ideals of for prime integers p are maximal ideals (Corollary 2.3). Next consider the polynomial ring and the principal ideal 〈X〉 of R. It is easy to see that 〈X〉  〈X, 2〉  R. Thus 〈X〉 is not maximal.

Prime and maximal ideals can be characterized by some nice equivalent criteria. See Proposition 2.9.

Definition 2.20.

We say that two elements a, are congruent modulo an ideal (of R) and write a ≡ b (mod ), if . Thus a ≡ b (mod ) if and only if a and b lie in the same coset of , that is, .

Example 2.8.

For any ring R, the quotient ring R/0 is essentially the same as R and the quotient ring R/R is the zero ring.
The ring of Example 2.6 is formally defined to be the quotient ring . Convince yourself that both these definitions are equivalent.

Proposition 2.9.

Let R be a ring and an ideal of R.

is a prime ideal of R if and only if is an integral domain.
is a maximal ideal of R if and only if is a field.

Proof

Let a, be arbitrary. Then is prime ⇔ implies or implies or is an integral domain.
Let be a maximal ideal. Choose . Then . Consider the ideal . Since is maximal, we must have . This means that a + cb = 1 for some and . Then which implies that is a unit in . That is, is a field.
Conversely, let be a field. Consider any ideal of R with . Choose any . Then . By hypothesis, there exists such that , that is, . Hence , that is, .

The last proposition in conjunction with Corollary 2.1 indicates:

Corollary 2.2.

Maximal ideals are prime.

Corollary 2.3.

For every , the quotient ring is a field. In particular, is a maximal ideal of .

Proof

Since is a prime ideal of , is an integral domain. But is finite, so by Exercise 2.25 is a field.

2.4.3. Homomorphisms

Definition 2.21.

A homomorphism f : R → R is called an endomorphism of R. An automorphism is a bijective endomorphism.

Example 2.9.

For any ring extension R ⊆ S, the canonical inclusion a ↦ a is a homomorphism from R → S. In particular, the identity map on any ring is an automorphism.
Let R be a ring and an ideal of R. The canonical surjection that takes is a ring homomorphism.
Let R be a ring and let . The map R[X] → R that takes f(X) ↦ f(a) is a ring homomorphism and is called the substitution homomorphism.
The map taking n ↦ –n is not a ring homomorphism, since it maps 1 to –1 (and does not satisfy f(ab) = f(a)f(b) for all a, ).
The map that maps z = a + ib to its conjugate is an automorphism of the field .

Proposition 2.10.

Let f : R → S be a ring homomorphism.

If is a unit, then f(a) is a unit in S and f(a^–1) = (f(a))^–1.
Let be an ideal in S. Then is an ideal in R. If is prime, then is also prime.

Proof

If ab = 1, then f(a)f(b) = f(ab) = f(1) = 1.
For , a, and b, with f(a) = b and f(a′) = b′, we have and . Thus is an ideal of R. If , then . If is prime (in which case and are proper ideals of R and S respectively), then or . But then or .

The ideal of the above proposition is called the contraction of and is often denoted by . If R ⊆ S and f is the inclusion homomorphism, then .

Definition 2.22.

Let f : R → S be a ring homomorphism. The set is called the kernel of f and is denoted by Ker f. The set is called the image of f and is denoted by f(R) or Im f.

Theorem 2.9. Isomorphism theorem

With the notations of the last definition, Ker f is an ideal of R, Im f is a subring of S and R/ Ker f ≅ Im f.

Proof

Consider the map that takes a + Ker f ↦ f(a). It is easy to verify that is a well-defined ring homomorphism and is bijective. The details are left to the reader. Also see Theorem 2.3.

Definition 2.23.

Two ideals and of a ring R are called relatively prime or coprime if , that is, if there exist and with a + b = 1.

Theorem 2.10. Chinese remainder theorem (CRT)

Let R be a ring and . Let be ideals in R such that for all i, j, i ≠ j, the ideals and are relatively prime. Then is isomorphic to the direct product .

Proof

2.4.4. Factorization in Rings

Definition 2.24.

Let R be a ring, a, and . Also let K be a field.

We say that a divides b and write a|b, if there exists an element such that b = ac. If a does not divide b, we write a  b. In , for example, –31|899, since 899 = (–31) · (–29). By this definition, any element divides 0, whereas 0 divides no element other than 0.
It is easy to see that a|b and b|a if and only if b = ca for some unit . In that case, we say that a and b are associates of each other. The relation of being associate is an equivalence relation on R (or R \ {0}), as can be easily verified. The only associates of , a ≠ 0, are ±a, since ±1 are the only units in . Two non-zero polynomials f and g of K[X] are associates if and only if f = αg for some .
A non-zero non-unit is called a prime, if p|ab implies either p|a or p|b. One can check easily that p is prime if and only if the principal ideal 〈p〉 = pR is a prime ideal.
A non-zero non-unit is called irreducible, if p = ab implies either a or b is a unit.

Proposition 2.11.

Let R be an integral domain and a prime. Then p is irreducible.

Proof

Proposition 2.12.

Let R be a PID. An element is prime if and only if p is irreducible.

Proof

[only if] Immediate from Proposition 2.11.

Definition 2.25.

Proposition 2.13.

Let R be a UFD. An element is prime if and only if p is irreducible.

Proof

Theorem 2.11.

A PID is a UFD

Proof

The converse of the above theorem is not necessarily true. For example, the polynomial ring K[X₁, . . . , X_n] over a field K is a UFD for every , but not a PID for n ≥ 2.

Definition 2.26.

It is clear that these definitions of gcd and lcm can be readily generalized for any arbitrary finite number of elements.

Corollary 2.4.

Let R be a UFD and a, not both zero. Then gcd(a, b) · lcm(a, b) is an associate of ab.

Proof

Immediate from the definitions.

Corollary 2.5.

Let R be a UFD and a, b, with a|bc. If gcd(a, c) = 1, then a|b.

Proof

Consider the prime factorizations of a, b and c.

For a PID, the gcd and lcm have equivalent characterizations.

Proposition 2.14.

Let R be a PID and a, b be non-zero elements of R. Let d be a gcd of a and b. Then 〈d〉 = 〈a〉 + 〈b〉. If f is an lcm of a and b, then 〈f〉 = 〈a〉 ∩ 〈b〉.

Proof

A direct corollary to the last proposition is the following.

Corollary 2.6.

Exercise Set 2.4

2.21	For a non-zero ring R, prove the following assertions: A unit of R is not a zero-divisor. The product of two units of R is again a unit. The product of two non-units of R is again a non-unit. The element 0 is not a unit in R. The element 1 is always a unit in R. If a is a unit and ab = ac, then b = c. Let K be a field. What are the units in the polynomial ring K[X]? In K[X₁, . . . , X_n]? In the ring K(X) of rational functions? In K(X₁, . . . , X_n)?
2.22	Binomial theorem Let R be a ring, a, and . Show that where are the binomial coefficients.
2.23	Show that every non-zero ring has a maximal (and hence prime) ideal. More generally, show that every non-unit ideal of a non-zero ring is contained in a maximal ideal. [H]
2.24	Let R be a ring. Show that the set of all nilpotent elements of R is an ideal of R. This ideal is usually denoted by and is called the nilradical of R. Show that the quotient ring has no non-zero nilpotent elements. (The ring is called the reduction of R and is often written as R_red. If , then we say that R is reduced. Thus is always reduced.) Show that the nilradical of R is the intersection of the prime ideals of R. [H]
2.25	Show that a finite integral domain R is a field. [H]
2.26	Let R be a ring of characteristic 0. Show that: R contains infinitely many elements. If R is an integral domain, then R contains as subring an isomorphic copy of . If R is a field, then R contains as subfield an isomorphic copy of .
2.27	Let f : R → S be a ring-homomorphism and let and be ideals in R and S respectively. Find examples to corroborate the following statements. Let be such that f(a) is a unit in S. Then a need not be a unit in R. The set need not be an ideal of S. If and if is maximal, then need not be maximal.
2.28	Let K be a field. Show that a homomorphism from K to any non-zero ring is injective. Let L be another field and let f : K → L and g : L → K be homomorphisms such that g ο f = id_K. Show that f and g are isomorphisms.
2.29	Show that a ring R is an integral domain if and only if 0 is a prime ideal of R. Give an example of a reduced ring that is not an integral domain. (Note that an integral domain is always reduced.)
2.30	Let R be a ring and let and be ideals of R with . Show that is an ideal of and that . [H]
2.31	An integral domain R is called a Euclidean domain (ED) if there is a map satisfying the following two conditions: ν(a) ≤ ν(ab) for all a, . For every a, with b ≠ 0, there exist (not necessarily unique) q, such that a = qb + r with r = 0 or ν(r) < ν(b). Show that: is a Euclidean domain with ν(a) = \|a\| for a ≠ 0. The polynomial ring K[X] over a field K is a Euclidean domain with ν(a) = deg a for a ≠ 0. For d = –2, –1, 2, 3, the ring is a Euclidean domain with , a, , not both 0. A Euclidean domain is a PID (and hence a UFD).
2.32	Let R be a ring and an ideal. Consider the set Show that is an ideal of R. It is called the radical or root of . If , then is called a radical or a root ideal. For arbitrary ideals and of R, prove the following assertions. . . If , then . If is a prime ideal, then . if and only if . . . The nilradical .
2.33	Let R be a ring. An ascending chain of ideals is a sequence . The ascending chain is called stationary, if there is some such that for all n ≥ n₀. Show that the following conditions are equivalent. [H] R is Noetherian (that is, every ideal of R is finitely generated). Every ascending chain of ideals in R is stationary. Every non-empty set of ideals of R has a maximal element.
2.34	Let R be an integral domain. Define the set . Define a relation ~ on S as (a, b) ~ (c, d) if and only if ad = bc. Show that ~ is an equivalence relation on S. Let us denote the equivalence class of by a/b and the set of all equivalence classes of S under ~ by K. Now define (a/b)+(c/d) := (ad+bc)/(bd) and (a/b)·(c/d) := (ac)/(bd). Show that these definitions make K a field. This field is called the quotient field of R and is denoted as Q(R). This process resembles the formation of rational numbers from the integers. Indeed, .

2.5. Integers

The set of integers is the main object of study in this section. We use many results from previous sections to derive properties of integers. Recall that is a PID and hence a UFD.

2.5.1. Divisibility

Theorem 2.12.

There are infinitely many prime integers.

Proof

Theorem 2.13.

For an integer a and an integer b ≠ 0, there exist unique integers q and r such that a = qb + r with 0 ≤ r < |b|.

Proof

Proposition 2.15.

For integers a, b with b ≠ 0, let r be the remainder of Euclidean division of a by b. Then gcd(a, b) = gcd(b, r).

Proof

Clearly, 〈a〉 + 〈b〉 = 〈r〉 + 〈b〉. Now use Proposition 2.14.

Proposition 2.16.

Proof

The notions of the gcd and of the Bézout relation can be generalized to any finite number of integers a₁, . . . , a_n as

gcd(a₁, . . . , a_n) = gcd(· · · (gcd(gcd(a₁, a₂), a₃) · · ·), a_n) = u₁a₁ + · · · + u_na_n

for some integers u₁, . . . , u_n (provided that all the gcds mentioned are defined).

2.5.2. Congruences

Since is a PID, congruence modulo a non-zero ideal of can be rephrased in terms of congruence modulo a positive integer as follows.

Definition 2.27.

By an abuse of notation, we often denote the equivalence class [a] of simply by a. The following are some basic properties of congruent integers.

Proposition 2.17.

Let , a ≡ b (mod n) and c ≡ d (mod n). Then:

a ± c ≡ b ± d (mod n).
ac ≡ bd (mod n).
For any polynomial , we have f(a) ≡ f(b) (mod n).
If n′|n, then a ≡ b (mod n′).
If m|a and m|b, then a/m ≡ b/m (mod n/ gcd(n, m)).

Proof

Let with gcd(n_i, n_j) = 1 for i ≠ j. Then lcm(n₁, . . . , n_r) = n₁ · · · n_r, and by the Chinese remainder theorem (Theorem 2.10), we have

This implies that, given integers a₁, . . . , a_r, there exists an integer x unique modulo n₁ · · · n_r such that x satisfies the following congruences simultaneously:

x	≡	a₁ (mod n₁)
x	≡	a₂ (mod n₂)
	⋮
x	≡	a_r (mod n_r)

Proposition 2.18.

(The equivalence class of) an integer a belongs to if and only if gcd(a, n) = 1.

Proof

[if] By Proposition 2.16, there exist integers u and v such that ua + vn = 1. But then ua ≡ 1 (mod n).

[only if] For some integers u and v, we have ua + vn = 1, which implies that the gcd of a and n divides 1 and hence is equal to 1.

Definition 2.28.

The following two theorems are immediate consequences of Proposition 2.4.

Theorem 2.14. Euler’s theorem

Let and with gcd(a, n) = 1. Then

a^φ(n) ≡ 1 (mod n).

Theorem 2.15. Fermat’s little theorem

Let p be a prime and with gcd(a, p) = 1. Then

a^p–1 ≡ 1 (mod p).

For any integer , one has b^p ≡ b (mod p).

Theorem 2.16. Wilson’s theorem

For every prime p, we have (p – 1)! ≡ –1 (mod p).

Proof

The result holds for p = 2. So assume that p is an odd prime. Since is a field, Fermat’s little theorem gives the factorization

Equation 2.1

Looking at the constant terms in two sides proves Wilson’s theorem.

The structure of the group , , can be easily deduced from Fermat’s little theorem. This gives us the following important result.

Proposition 2.19.

For a prime p, the group is cyclic.

Proof

Euler’s totient function plays an extremely important role in number theory (and cryptology). We now describe a method for computing it.

Lemma 2.2.

If n and n′ are relatively prime positive integers, then φ(nn′) = φ(n)φ(n′).

Proof

Lemma 2.3.

If p is a prime and , then φ(p^e) = p^e – p^e–1 = p^e(1 – 1/p).

Proof

Integers between 0 and p^e – 1, which are relatively prime to p^e are precisely those that are not multiples of p.

Proposition 2.20.

Let be the prime factorization of a positive integer n with , with pairwise distinct primes p₁, . . . , p_r and with e_i > 0. Then

Proof

Immediate from Lemmas 2.2 and 2.3.

ax ≡ b (mod n).

Theorem 2.17 characterizes the solutions of this congruence.

Theorem 2.17.

Let d := gcd(a, n). Then the congruence ax ≡ b (mod n) is solvable for x if and only if d|b. A solution of the congruence, if existent, is unique modulo n/d.

Proof

[if] By Proposition 2.17, (a/d)x ≡ b/d (mod n/d). Since gcd(a/d, n/d) = 1, the congruence (a/d)x′ ≡ 1 (mod n/d) is solvable for x′. Then a solution for x is x ≡ (b/d)x′ (mod n/d).

[only if] There exists an integer k such that ax + kn = b. This shows that d|b.

2.5.3. Quadratic Residues

Definition 2.29.

Definition 2.30.

Let p be an odd prime and a an integer with gcd(a, p) = 1. The Legendre symbol is defined as:

Proposition 2.21.

Let p be an odd prime and a and b integers coprime to p.

Euler’s criterion .
.
, and .
If a ≡ b (mod p), then . In particular, if r is the remainder of Euclidean division of a by p, then .

Proof

Lemma 2.4. Gauss

Proof

Definition 2.31.

Corollary 2.7.

With the notations of Lemma 2.4 we have (mod 2). If a is odd, then (mod 2). In particular, , that is, 2 is a quadratic residue mod p if and only if p ≡ ±1 (mod 8).

Proof

Since is even , it follows that if r_j > 0, then is even, and if r_j < 0, then is odd. Therefore, (mod 2).

If a is odd, p + a is even. Also 4 is a quadratic residue modulo p. So , where (mod 2) and (mod 2). Putting a = 1 gives and, therefore, , that is, .

Theorem 2.18. Law of quadratic reciprocity

Let p and q be distinct odd primes. Then .

Proof

To demonstrate how we can use the results deduced so far, let us compute . Since 360 = 2³ · 3² · 5, we have

Definition 2.32.

Let a, b be integers with b > 0 and odd. We define the Jacobi symbol as

where, in the last case, p₁, . . . , p_t are all the prime factors of b (not necessarily all distinct).

The Jacobi symbol enjoys many properties similar to the Legendre symbol.

Proposition 2.22.

For integers a, a′ and positive odd integers b, b′, we have:

,
, and
if a ≡ a′ (mod b), then . In particular, if r is the remainder of Euclidean division of a by b, then .

Proof

Immediate from the definition and Proposition 2.21.

Theorem 2.19.

For an odd positive integer b
If a is another odd positive integer with gcd(a, b) = 1, then

Proof

Let b = p₁ · · · p_s, where p_i are odd primes (not necessarily distinct). Then by definition , where . Now for odd integers x and y one has (mod 2). Repeated applications of this prove that (mod 2). To prove that , we proceed in a similar manner and note that for odd integers x and y one has (mod 2).
If with odd primes, then by definition

where from Theorem 2.18 it follows that

Now, we can calculate without factoring as follows.

2.5.4. Some Assorted Topics

The prime number theorem

Theorem 2.20. Prime Number Theorem

Table 2.1. Approximations to π(x)
x	π(x)	x/ ln x	x/(ln x – 1)	Li(x)
10³	168	145	169	178
10⁴	1229	1086	1218	1246
10⁵	9592	8686	9512	9630
10⁶	78,498	72,382	78,030	78,628
10⁷	664,579	620,421	661,458	664,918
10⁸	5,761,455	5,428,681	5,740,304	5,762,209

Density of smooth integers

Integers having only small prime divisors play an interesting role in cryptography and in number theory in general.

Definition 2.33.

The following theorem gives an asymptotic estimate for ψ(x, y).

Theorem 2.21.

Let x, with x > y and let u := ln x/ ln y. For u → ∞ and y ≥ ln² x we have the asymptotic formula:

ψ(x, y) → u^–u+o(u) = e^{–[(1+o(1))u ln u]}.

The extended Riemann hypothesis

Definition 2.34.

The Euler zeta function ζ(s) is defined for a complex variable s with Re s ≥ 1 as

Conjecture 2.1. Riemann hypothesis (RH)

All non-trivial zeros of ζ(s) lie on the critical line.

In 1901, von Koch proved that the RH is equivalent to the formula:

Conjecture 2.2. An equivalent form of the Riemann hypothesis

π(x) = Li(x) + O(x^1/2 ln x)

Here the order notation f(x) = O(g(x)) means that |f(x)/g(x)| is less than a constant for all sufficiently large x (See Definition 3.1).

Hadamard and de la Vallée Poussin proved that

for some positive constant α. While this estimate was sufficient to prove the prime number theorem, the tighter bound of Conjecture 2.2 continues to remain unproved.

Theorem 2.22. Dirichlet’s theorem on primes in arithmetic progression

Let a, be coprime. The set contains an infinite number of primes.

Dirichlet’s theorem is a powerful generalization of Theorem 2.12 (which corresponds to a = b = 1). One can accordingly generalize the notation π(x) as follows:

Definition 2.35.

Let a, with gcd(a, b) = 1. By π_{a, b}(x), we denote the number of primes in the set , that are ≤ x.

The prime number theorem gives the estimate:

where φ is Euler’s totient function. The RH now generalizes to:

Conjecture 2.3. Extended Riemann hypothesis (ERH)

For a, with gcd(a, b) = 1,

Some authors use the expression Generalized Riemann hypothesis (GRH) in place of ERH. Taking b = 1 demonstrates that the ERH implies the RH. The ERH also implies the following:

Conjecture 2.4.

The smallest positive quadratic non-residue modulo a prime p is < 2 ln² p.

Exercise Set 2.5

2.35	Show that any integer n ≥ 3 satisfies n² = a² – b² for some a, . Show that for any integer n ≥ 2 the integer n⁴ + 4ⁿ is composite.
2.36	Let and S a subset of {1, 2, ..., 2n} of cardinality n + 1. Show that: [H] There exist x, such that x – y = 1. There exist x, such that x – y = n. There exist distinct x, such that x is a multiple of y. There exist distinct x, such that x is relatively prime to y.
2.37	Show that for any , n > 1, the rational number is not an integer. [H]
2.38	Show that the Mersenne number M_n := 2ⁿ – 1 is prime only if n is prime. Show that the Fermat number 2ⁿ + 1 is prime only if n = 2^t for some .
2.39	Let n ≥ 2 be a natural number. A complete residue system modulo n is a set of n integers a₁, . . . , a_n such that a_i ≢ a_j (mod n) for i ≠ j. Similarly, a reduced residue system modulo n is a set of φ(n) integers b₁, . . . , b_φ(n) such that gcd(b_i, n) = 1 for all i = 1, . . . , φ(n) and b_i ≢ b_j (mod n) for i ≠ j. Show that: If {a₁, . . . , a_n} is a complete residue system modulo n, the equivalence classes of a₁, . . . , a_n (modulo the ideal ) constitute the set . In other words, given any integer a, there exists a unique i, 1 ≤ i ≤ n, for which a ≡ a_i (mod n). If {b₁, . . . , b_φ(n)} is a reduced residue system modulo n, then the equivalence classes of b₁, . . . , b_φ(n) constitute the set . In other words, given any integer b coprime to n, there exists a unique i, 1 ≤ i ≤ φ(n), for which b ≡ b_i (mod n). If {a₁, . . . , a_n} is a complete residue system modulo n, then for any integer a coprime to n, the integers aa₁, . . . , aa_n constitute a complete residue system modulo n. For example, if n is odd, then {2, 4, 6, . . . , 2n} is a complete residue system modulo n. If {b₁, . . . , b_φ(n)} is a reduced residue system modulo n, then for any integer b coprime to n, the integers bb₁, . . . , bb_φ(n) constitute a reduced residue system modulo n. For n > 2, the integers 1², 2², . . . , n² do not constitute a complete residue system modulo n. [H] If p is an odd prime and if {a₁, . . . , a_p} and are two complete residue systems modulo p, then is not a complete residue system modulo p. [H]
2.40	Prove that the decimal expansion of any rational number a/b is recurring, that is, (eventually) periodic. (A terminating expansion may be viewed as one with recurring 0.) [H]
2.41	Let p be an odd prime. Show that the congruence x² ≡ –1 (mod p) is solvable if and only if p ≡ 1 (mod 4). [H]
2.42	Let . Show that if n > 2, then φ(n) is even. Show that if n is odd, then φ(n) = φ(2n). Find out all the values of n for which φ(n) = 12.
2.43	For , show that .
2.44	Let n > 2 and gcd(a, n) = 1. Let h be the multiplicative order of a modulo n (that is, in the group ). Show that: aⁱ ≡ a^j (mod n) if and only if i ≡ j (mod h). The multiplicative order of a^l modulo n is h/ gcd(h, l). If a is a primitive element of (that is, if h = φ(n)), then 1, a, a², . . . , a^h–1 is a reduced residue system modulo n. If gcd(b, n) = 1 and b has multiplicative order k modulo n and if gcd(h, k) = 1, then the multiplicative order of ab modulo n is hk.
2.45	Device a criterion for the solvability of ax² + bx + c ≡ 0 (mod p), where p is an odd prime and gcd(a, p) = 1. [H]
2.46	Let p be a prime and . An integer a with gcd(a, p) = 1 is called an r-th power residue modulo p, if the congruence x^r ≡ a (mod p) has a solution. Show that a is an r-th power residue modulo p if and only if a^{(p–1)/ gcd(r, p–1)} ≡ 1 (mod p). This is a generalization of Euler’s criterion for quadratic residues.
2.47	Let G be a finite cyclic group of cardinality n. Show that and that there are exactly φ(n) generators (that is, primitive elements) of G.
2.48	Let m, with m\|n. Show that the canonical (surjective) ring homomorphism induces a surjective group homomorphism of the respective groups of units. (Note that every ring homomorphism induces a group homomorphism , where A* and B* are the groups of units of A and B respectively. Even when is surjective, need not be surjective, in general. As an example consider the canonical surjection for a prime p > 3.)
2.49	In this exercise, we investigate which of the groups is cyclic for a prime p and . Show that and are cyclic, but is not cyclic. Conclude that is not cyclic for e ≥ 3. [H] More specifically, show that for e ≥ 3 the multiplicative group is the direct product of two cyclic subgroups generated by –1 and 5 respectively. Show that if p is an odd prime and , then is cyclic. [H]
2.50	Show that the multiplicative group , n ≥ 2, is cyclic if and only if n = 2, 4, p^e, 2p^e, where p is an odd prime and . [H]

2.6. Polynomials

2.6.1. Elementary Properties

Proposition 2.23.

^[6] Recall that the degree of the zero polynomial is taken to be –∞.

Proof

Similar to the proof of Proposition 2.16.

Theorem 2.23.

For a non-constant polynomial , the ring K[X]/〈f(X)〉 is a field if and only if f(X) is irreducible in K[X].

Proof

2.6.2. Roots of Polynomials

The study of the roots of a polynomial is the central objective in algebra. We now derive some elementary properties of roots of polynomials.

Definition 2.36.

Let . An element is said to be a root of f, if f(a) = 0.

Proposition 2.24.

Let and . Then f(X) = (X – a)q(X) + f(a) for some . In particular, a is a root of f(X) if and only if X – a divides f(X).

Proof

Proposition 2.25.

A non-zero polynomial with d := deg f can have at most d roots in K.

Proof

Proposition 2.26.

For any non-constant polynomial , there exists a field extension K′ of K such that f has a root in K′.

Proof

Proposition 2.27.

A non-constant polynomial f in K[X] with deg f = d has d roots (not necessarily all distinct) in some field extension L of K.

Definition 2.37.

^[7] It is necessary to use the phrase “over K” in this definition. X² + 1, treated as a polynomial in , has the splitting field , whereas the same polynomial, treated as an element of , has the splitting field (see Equation (2.3) on p 74).

Definition 2.38.

The notion of multiplicity can be extended to a non-root β of f by setting the multiplicity of β to zero.

2.6.3. Algebraic Elements and Extensions

Here we assume, unless otherwise stated, that K ⊆ L is a field extension.

Definition 2.39.

Example 2.10.

Every element is algebraic over K, since it is a root of the non-constant polynomial .
The element is algebraic over , since α is a root of the polynomial .
The well-known real numbers e and π are transcendental over . (We are not going to prove this.) Of course, the concept of algebraic and transcendental elements is heavily dependent on the field K. For example, e and π, being elements of , are algebraic over .
A complex number , where and a, , is a root of the polynomial and hence is algebraic over . Therefore, the field extension is algebraic.
The extension is transcendental, since contains elements (like e and π) that are transcendental over .

Definition 2.40.

Let be algebraic over K. A non-constant polynomial of least positive degree with f(α) = 0 is called a minimal polynomial of α over K.

Proposition 2.28.

Proof

Finally, if f and g are two minimal polynomials of α over K, then f|g and g|f and it follows that g(X) = cf(X) for some unit c of K[X]. But the only units of K[X] are the non-zero elements of K.

Example 2.11.

For , we have minpoly_{α, K}(X) = X – α.
A complex number z = a+ib, a, , b ≠ 0, is not a root of a linear polynomial over , but is a root of the quadratic polynomial . Therefore, , that is, f is irreducible over .

Proposition 2.29.

For a field K, the following conditions are equivalent.

Every proper field extension K  L is transcendental (that is, K has no algebraic extensions other than itself).
Every non-constant polynomial in K[X] has a root in K.
Every non-constant polynomial in K[X] splits in K.
Every non-constant irreducible polynomial in K[X] is of degree 1.

Proof

[(c)⇒(d)] Obvious.

[(d)⇒(a)] Let be algebraic over K and let . Since f is irreducible, by (d) deg f = 1, that is, f(X) = X – α, that is, .

Definition 2.41.

Theorem 2.24. Fundamental theorem of algebra

The field is algebraically closed.

is not algebraically closed, since the proper extension is algebraic (See Example 2.10). Indeed, is the algebraic closure of .

Exercise Set 2.6

2.51	Let R be a ring and f, . Show that: deg(f + g) ≤ max(deg f, deg g) with equality holding, if deg f ≠ deg g. deg(f g) ≤ deg f + deg g with equality holding, if R is an integral domain. If R is an integral domain, then R[X] is an integral domain too. More generally, if R is an integral domain, then R[X₁, . . . , X_n] is also an integral domain for all .
2.52	Let f, , where R is an integral domain. Show that if f(a_i) = g(a_i) for i = 1, . . . , n, where n > max(deg f, deg g) and where a₁, . . . , a_n are distinct elements of R, then f = g. In particular, if f(a) = g(a) for an infinite number of , then f = g.
2.53	Lagrange’s interpolation formula Let K be a field and let a₀, . . . , a_n be distinct elements of K. Show that for (not necessarily all distinct), there exists a unique polynomial of degree ≤ n such that f(a_i) = b_i for all i = 0, . . . , n. [H]
2.54	Polynomials over a UFD Let R be a UFD. For a non-zero polynomial , a gcd of the coefficients of f is called a content of f and is denoted by cont f. One can then write f = (cont f)f₁, where with cont . f₁ is called a primitive part of f and is often denoted as pp f. It is clear that cont f and pp f are unique up to multiplication by units of R. If for a non-zero polynomial the content cont (or, equivalently, if f and pp f are associates), then f is called a primitive polynomial. Show that for two non-zero polynomials f, the elements cont(f g) and (cont f)(cont g) are associates in R. In particular, the product of two primitive polynomials is again primitive.
2.55	Let R be a UFD. Show that a non-constant polynomial is irreducible over R if and only if f is irreducible over Q(R), where Q(R) denotes the quotient field of R (see Exercise 2.34).
2.56	Eisenstein’s criterion Let R be a UFD and with a_n ≠ 0. Suppose that there is a prime such that p does not divide a_n, p divides a_i for all i, 0 ≤ i ≤ n – 1, and p² does not divide a₀. Show that f is irreducible over R. As an application of Eisenstein’s criterion show that for a prime the polynomial X^p–1 + · · · + X + 1 is irreducible in . [H]
2.57	Let K ⊆ L be a field extension and f₁, . . . , f_n non-constant polynomials in K[X]. Show that each f_i, i = 1, . . . , n, splits over L if and only if the product f₁ · · · f_n splits over L.
2.58	Show that the irreducible polynomials in have degrees ≤ 2. [H]
2.59	Show that a finite field (that is, a field with finite cardinality) is not algebraically closed. In particular, the algebraic closure of a finite field is infinite.
2.60	A complex number z is called an algebraic number, if z is algebraic over . An algebraic number z is called an algebraic integer, if z is a root of a monic polynomial in . Show that: If z is an algebraic number, then mz is an algebraic integer for some . If is an algebraic integer, then . If is an algebraic integer, then for any integer the complex numbers nz and z + n are algebraic integers.
2.61	Let K be a field and . The formal derivative f′ of f is defined to be the polynomial . Show that: (f + g)′ = f′ + g′ and (f g)′ = f′g + f g′ for any f, . If char K = 0, then f′ = 0 if and only if . If char K = p > 0, then f′ = 0 if and only if f(X) = g(X^p) for some . f (≠ 0) has no multiple roots (in any extension field of K), that is, f is square-free, if and only if gcd(f, f′) = 1. Let f be a (non-constant) irreducible polynomial over K. Show that if char K = 0, then f has no multiple roots. On the other hand, if char K = p > 0, show that f has multiple roots if and only if f(X) = g(X^p) for some . (However, if , then by Fermat’s little theorem g(X^p) = g(X)^p, which contradicts the fact that f(x) is irreducible. Therefore, f cannot have multiple roots.)
2.62	Let be a non-constant polynomial of degree d and let α₁, . . . , α_d be the roots of f (in some extension field of K). The quantity is called the discriminant of f. Prove the following assertions: Δ(f) = 0 if and only if f has a multiple root. . Δ(X² + aX + b) = a² – 4b. Δ(X³ + aX + b) = –(4a³ + 27b²).

2.7. Vector Spaces and Modules

2.7.1. Vector Spaces

Unless otherwise specified, K denotes a field in this section.

Definition 2.42.

a · (x + y) = a · x + a · y,
(a + b) · x = a · x + b · x,
1 · x = x,
a · (b · x) = (ab) · x,

where ab denotes the product of a and b in the field K. When no confusions are likely, we omit the scalar multiplication sign · and write a · x simply as ax.

Example 2.12.

Any field K is trivially a K-vector space with the scalar multiplication being the same as the field multiplication. More generally, if K ⊆ L is a field extension, then L is a K-vector space.
For , the product Kⁿ = K × · · · × K (n factors) is a K-vector space under the scalar multiplication map a(x₁, . . . , x_n) := (ax₁, . . . , ax_n). For arbitrary K-vector spaces V₁, . . . , V_n, we can analogously define the product V₁ × · · · × V_n.
The polynomial ring K[X] (or K[X₁, . . . , X_n]) is a K-vector space (with the natural scalar multiplication).

Corollary 2.8.

Let V be a K-vector space. For every and , we have:

0 · x = 0.
a · 0 = 0.
(–a) · x = a · (–x) = –(a · x).

Proof

Easy verification.

Definition 2.43.

Example 2.13.

Consider the field extension L := K[X]/〈f(X)〉 of K, where f is an irreducible polynomial in K[X] of degree n. If α denotes the equivalence class of X in L, then every element of L can be written as a_n–1α^n–1 + · · · + a₁α + a₀ with for 0 ≤ i ≤ n – 1. Thus {1, α, . . . , α^n–1} is a generating set of L over K. In particular, L is finitely generated over K.
The K-vector space Kⁿ is generated by the unit vectors e_i, 1 ≤ i ≤ n, defined as e_i := (0, . . . , 0, 1, 0, . . . , 0) (1 in the i-th position). Thus Kⁿ is also finitely generated over K.
{1, X, X², · · ·} is an infinite generating set of the polynomial ring K[X] regarded as a K-vector space. K[X] is not finitely generated over K.
It is not difficult to show that the generating sets discussed in these examples are minimal.

Definition 2.44.

Theorem 2.25.

A subset S of a K-vector space V is a minimal generating set for V if and only if S is a maximal linearly independent set of V.

Proof

Definition 2.45.

^[8] Two sets (finite or not) S₁ and S₂ are said to be of the same cardinality, if there exists a bijective map S₁ → S₂.

Theorem 2.26.

Let V be a K-vector space. Then any K-basis of V has the same cardinality.

Proof

Theorem 2.26 holds even when V is not finitely generated. We omit the proof for this case here.

Definition 2.46.

For example, dim_K Kⁿ = n, , and dim_K K[X] = ∞.

Definition 2.47.

Example 2.14.

Let V be a vector space over K.

The subset {0} and V are trivially subspaces of V.
Let S be any subset of V (not necessarily linearly independent). Then the set is a vector subspace of V. We say that U is spanned or generated by S, or that S generates or spans U, or that U is the span of S. This is often denoted by or by U = Span S. If S is linearly independent, then S is a basis of U.

Definition 2.48.

Let V and W be K-vector spaces. A map f : V → W is called a homomorphism (of vector spaces) or a linear transformation or a linear map over K, if

f(ax + by) = af(x) + bf(y)

^[9] As in Footnote 2, we continue to be lucky here: The inverse of a bijective linear transformation is again a linear transformation.

Theorem 2.27.

Let V and W be K-vector spaces. Then V and W are isomorphic if and only if dim_K V = dim_K W.

Proof

Corollary 2.9.

A K-vector space V with n := dim_K V < ∞ is isomorphic to Kⁿ.

Equation 2.2

For , the set is called the kernel Ker f of f, and the set is called the image Im f of f. We have the isomorphism theorem for vector spaces:

Theorem 2.28. Isomorphism theorem

Ker f is a subspace of V, Im f is a subspace of W, and V/Ker f ≅ Im f.

Proof

Similar to Theorem 2.3 and Theorem 2.9.

Definition 2.49.

Theorem 2.29.

Rank f + Null f = dim_K V for any .

*2.7.2. Modules

If we remove the restriction that K is a field and assume that K is any ring, then a vector space over K is called a K-module. More specifically, we have:

Definition 2.50.

Example 2.15.

Vector spaces are special cases of modules, when the underlying ring is a field.
Ideals of R are modules over R with the ring multiplication map taken as the scalar multiplication.
Every Abelian group G is a -module under the scalar multiplication
The polynomial rings R[X] and R[X₁, . . . , X_n] are modules over R.
Let M_i, , be a family of R-modules. The direct product of M_i is defined as the set of all tuples indexed by I. The direct sum is the subset of the Cartesian product consisting only of the tuples for which a_i = 0 except for a finite number of . Both the direct product and the direct sum are R-modules under component-wise addition and scalar multiplication. When I is finite, they are naturally the same.

For an R-submodule N of M, the Abelian group M/N is given an R-module structure by the scalar multiplication map a(x + N) := ax + N. This module is called the quotient module of M by N.

The kernel and image of an R-linear map f : M → N are defined as the sets Ker and Im . With these notations we have the isomorphism theorem for modules:

Theorem 2.30. Isomorphism theorem

Ker f and Im f are submodules of M and N respectively and M / Ker f ≅ Im f.

Definition 2.51.

A free module M over a ring R is defined to be a direct sum of R-modules M_i with each M_i ≅ R as an R-module. If I is of finite cardinality n, then M is isomorphic to Rⁿ.

Any vector space is a free module (Theorem 2.27 and Corollary 2.9). The Abelian groups , , are not free.

Theorem 2.31. Structure theorem for finitely generated modules

M is a finitely generated R-module if and only if M is a quotient of a free module Rⁿ for some .

Proof

[if] The free module Rⁿ has a canonical generating set e_i, , where

e_i = (0, . . . , 0, 1, 0, . . . , 0) (1 in the i-th position).

If M = Rⁿ/N, then the equivalence classes e_i + N, i = 1, ..., n, constitute a finite set of generators of M.

**2.7.3. Algebras

Definition 2.52.

Example 2.16.

Let R be a ring.

The polynomial ring R[X₁, . . . , X_n] is an R-algebra with the canonical inclusion as the structure homomorphism and is called a polynomial algebra over R.
For an ideal of R, the canonical surjection makes an R-algebra.
If A is an R-algebra with structure homomorphism and if B is an A-algebra with structure homomorphism ψ : A → B, then B is an R-algebra with structure homomorphism .
Combining (2) and (3) implies that if A is an R-algebra and an ideal of A, then the ring is again an R-algebra, called the quotient algebra of A by .

An R-algebra A is an R-module with the added property that multiplication of elements of A is now legal. Exploiting this new feature leads to the following concept of algebra generators.

Definition 2.53.

Example 2.17.

The polynomial algebra R[X₁, . . . , X_n], n ≥ 1, over R is not finitely generated as an R-module, but is finitely generated as an R-algebra.
For an ideal of R[X₁, . . . , X_n], the ring is generated as an R-algebra by the equivalence classes of X_i, 1 ≤ i ≤ n, that is, A = R[x₁, . . . , x_n]. If is not the zero ideal, then A is not a polynomial algebra, because x₁, . . . , x_n are not indeterminates in the sense that they satisfy (non-zero) polynomial equations f(x₁, . . . , x_n) = 0 for every . (In this case, we also say that x₁, . . . , x_n are algebraically dependent.) The notation R[. . .] is a generalization of the notation for polynomial algebras. In what follows, we usually denote polynomial algebras by R[X₁, . . . , X_n] with upper-case algebra generators, whereas for an arbitrary finitely generated R-algebra we use lower-case symbols for the algebra generators as in R[x₁, . . . , x_n].

Theorem 2.32.

A ring A is a finitely generated R-algebra if and only if A is a quotient of a polynomial algebra (over R).

Proof

[if] Immediate from Example 2.17.

This theorem suggests that for the study of finitely generated algebras it suffices to investigate only the polynomial algebras and their quotients.

Exercise Set 2.7

2.63	Let V be a K-vector space, U a subspace of V, and T an arbitrary K-basis of U. Show that there is a K-basis of V, that contains T. [H]
2.64	Let V be a K-vector space, and U₁, U₂ subspaces of V. Show that the set is a K-subspace of V. If U₁ ∩ U₂ = {0}, we say that U is the direct sum of U₁ and U₂ and write U = U₁ ⊕ U₂. Let V be a K-vector space and W a subspace of V. Show that there exists a subspace W′ of V such that V = W ⊕ W′. This space W′ is called the complement subspace of W in V. [H]
2.65	Let V and W be K-vector spaces and f : V → W a K-linear map. Show that f is uniquely determined by the images f(x), , where S is a basis of V.
2.66	Let V and W be K-vector spaces. Check that Hom_K(V, W) is a vector space over K. Show that dim_K(Hom_K(V, W)) = (dim_K V)(dim_K W). In particular, if W = K, then Hom_K(V, K) is isomorphic to V. The space Hom_K(V, K) is called the dual space of V.
2.67	Let V and W be m- and n-dimensional K-vector spaces, S = {x₁, . . . , x_m} a K-basis of V, T = {y₁, . . . , y_n} a K-basis of W, and f : V → W a K-linear map. For each i = 1, . . . , m, write f(x_i) = a_i1y₁ + · · · + a_iny_n, . The m × n matrix is called the transformation matrix of f (with respect to the bases S and T). We have: Let V₁, V₂, V₃ be K-vector spaces, f, f₁, , and . Prove the following assertions: . . f is invertible (as a map) if and only if is invertible (as a matrix). (Remark: This exercise explains that the linear transformations of finite-dimensional vector spaces can be explained in terms of matrices.)
2.68	Show that for every there are integers a₁, . . . , a_n that constitute a minimal set of generators for the unit ideal in . [H]
2.69	Let M be an R-module. A subset S of M is called a basis of M, if S generates M and is linearly independent over R in the sense that , , , , implies a₁ = · · · = a_n = 0. Show that M has a basis if and only if M is a free R-module.
2.70	We define the rank of a finitely generated R-module M as Rank_R M := min{#S \| M is generated by S}. If N is a submodule of M, show that Rank_R M ≤ Rank_R N + Rank_R(M/N). Give an example where the strict inequality holds.
2.71	Let M be an R-module. An element is called a torsion element of M, if Ann Rx ≠ 0, that is, if there is with ax = 0. The set of all torsion elements of M is denoted by Tors M. M is called torsion-free if Tors M = {0}, and a torsion module if Tors M = M. Show that Tors M is a submodule of M. Show that Tors M is a torsion module (called the torsion submodule of M) and that the module M/Tors M is torsion-free. If R is an integral domain, show that every free module over R is torsion-free. In particular, every vector space is torsion-free.
2.72	Show that: is not finitely generated as a -module. [H] is not a free -module. [H] is a torsion-free -module. This shows that the converse of Exercise 2.71(c) is not true in general.

2.8. Fields

2.8.1. Properties of Field Extensions

We have seen that if F ⊆ K is a field extension, then K is a vector space over F. This observation leads to the following very useful definitions.

Definition 2.54.

Proposition 2.30.

Proof

One can easily check that if S is an F-basis of K and S′ a K-basis of L, then the set is an F-basis of L.

Recall the definitions of the rings F[X] of polynomials and F(X) of rational functions in one indeterminate X. These notations are now generalized. For a field extension F ⊆ K and for , we define:

and

Equation 2.3

Theorem 2.33.

For a field extension F ⊆ K and an element , the following conditions are equivalent:

The element a is algebraic over F.
The extension F(a) is finite over F.
F(a) = F[a].

Proof

[(c)⇒(a)] Clearly, the element 0 is algebraic over F. So assume a ≠ 0. Since , by hypothesis there is a polynomial such that 1/a = f(a). But then a is a root of the non-constant polynomial .

Corollary 2.10.

For a field extension F ⊆ K, the set of elements in K that are algebraic over F is a field.

Proof

Corollary 2.11.

Let F ⊆ K be a finite extension. Then K is algebraic over F.

Proof

For any , we have F ⊆ F(a) ⊆ K. Now use Proposition 2.30.

The converse of the last corollary is not true, that is, it is possible that an algebraic extension has infinite extension degree. Exercise 2.59 gives an example.

Corollary 2.12.

If F ⊆ K and K ⊆ L are algebraic field extensions, then F ⊆ L is also algebraic.

Proof

Definition 2.55.

A field extension F ⊆ K is called simple, if K = F(a) for some .

Proposition 2.31.

Let F be a field of characteristic 0 and let a, b (belonging to some extension of F) be algebraic over F. Then the extension F(a, b) of F is simple.

Proof

Corollary 2.13.

A finite extension F ⊆ K of fields of characteristic 0 is simple.

Proof

2.8.2. Splitting Fields and Algebraic Closure

Proposition 2.32.

For a polynomial of degree d ≥ 1, there is a field extension K of F with [K : F] ≤ d!, such that f splits over K.

Lemma 2.5.

Proof

Proposition 2.33.

Proof

*2.8.3. Elements of Galois Theory

For a field K, the set Aut K of all automorphisms of K is a group under (functional) composition. We extend this concept now. Let F ⊆ K be an extension of fields.

Definition 2.56.

Definition 2.57.

A field extension F ⊆ K is said to be a Galois extension (or K is said to be a Galois extension over F), if Fix_F (Aut_F K) = F. Thus K is Galois over F if and only if for every there is a with .

Example 2.18.

The following theorem establishes the correspondence we are looking for.

Theorem 2.34. Fundamental theorem of Galois theory

Aut_{Fix_F H} K = H for every subgroup H of Aut_F K.
Fix_F (Aut_L K) = L for every field L with F ⊆ L ⊆ K.
For field extensions F ⊆ L ⊆ L′ ⊆ K, the extension degree [L′ : L] is the same as the index [Aut_L K : Aut_L′ K]. In particular, the order of Aut_F K is [K : F].
For every intermediate field L, one has:

K is Galois over L.
L is Galois over F if and only if Aut_L K is a normal subgroup of Aut_F K. In this case, Aut_F L ≅ Aut_F K/Aut_L K.

A proof of this theorem is rather long and uses many auxiliary results which we would not need otherwise. We, therefore, choose to omit the proof here.

Exercise Set 2.8

2.73	Let α be transcendental over F. Show that the domain F[α] and the field F(α) are respectively isomorphic to the polynomial ring F[X] and the field F(X) of rational functions in one indeterminate X. Generalize the result for an arbitrary family α_i, , of elements each of which is transcendental over F.
2.74	Let F ⊆ K be a field extension and let be an endomorphism of K with for every . If a non-constant polynomial has a root , show that is also a root of f. For example, if , and is the automorphism mapping z to its (complex) conjugate , then we conclude that if a complex number z is a root of , then is also a root of f. A similar result holds for the extension , where m is a non-square rational number. If K is algebraic over F, show that is an automorphism. [H]
2.75	Let F ⊆ K be a field extension. An irreducible polynomial is said to be separable over F, if f has no multiple roots. An algebraic element is said to be separable over F, if the minimal polynomial of α over F is separable. K is called a separable extension of F, if every element of K is (algebraic and) separable over F. Show that if char F = 0 or if , and if K is an algebraic extension of F, then K is separable over F · [H] An algebraic element is called purely inseparable over F, if the minimal polynomial of α over F factors in K[X] as (X – α)ⁿ for some . If every element of K is (algebraic and) purely inseparable over F, then K is called a purely inseparable extension of F. Show that is both separable and purely inseparable if and only if . Thus, if char F = 0 or , then F has no purely inseparable extension other than itself. If p := char F > 0, then an element is purely inseparable over K if and only if minpoly_α,F(X) = X^{p^r} + a for some r ≥ 0 and . In particular, show that if K is a finite purely inseparable extension of F, then [K : F ] = p^s for some s ≥ 0.
2.76	F is called a perfect field, if every irreducible polynomial in F[X] is separable over F. Show that F is a perfect field if and only if every algebraic extension of F is separable over F. In particular, the fields of characteristic 0 and the fields , , are perfect. Let p := char F > 0. Show that F is perfect if and only if every element of F has a p-th root in F. [H]
2.77	A field extension F ⊆ K is called normal, if every irreducible polynomial in F[X], that has a root in K, splits in K[X]. If K is the splitting field of a polynomial over F, show that K is a normal extension of F. [H] If [K : F] = 2, show that F ⊆ K is a normal extension. Consider the tower of field extensions to conclude that if F ⊆ K and K ⊆ L are normal extensions, then F ⊆ L need not be normal.
2.78	Prove the following assertions: is an infinite extension of . [H] . [H]
2.79	Let F ⊆ K be a field extension and let L be the fixed field of Aut_F K over F. Show that K is a Galois extension of L.

2.9. Finite Fields

2.9.1. Existence and Uniqueness of Finite Fields

Theorem 2.35.

The cardinality of a finite field is a power pⁿ, , of a prime number p. Conversely, given and , there exists a finite field of cardinality pⁿ.

Proof

Theorem 2.36. Fermat’s little theorem for finite fields

Let K be a finite field of cardinality q. Then every satisfies a^q = a.

Proof

Clearly, 0^q = 0. Take a ≠ 0. K* being a group of order q – 1, by Proposition 2.4 ord_K* (a) divides q – 1. In particular, a^q–1 = 1, that is, a^q = a.

Theorem 2.37.

Proof

This uniqueness allows us to talk about the finite field of cardinality q (rather than a finite field of cardinality q). We denote this (unique) field by .

Corollary 2.14.

Every finite extension of finite fields is normal.

Corollary 2.15.

Every finite field is perfect.

Proposition 2.34.

Proof

Corollary 2.16.

Let and . Then deg f divides m.

Proof

Consider the extension of , where d := deg f, and the fact that is a normal extension.

Now we will prove a very important result concerning the multiplicative group .

Theorem 2.38.

is a cyclic group for any finite field .

Proof

Modify the proof of Proposition 2.19 or use the following more general result.

Theorem 2.39.

Let K be a field (not necessarily finite). Then any finite subgroup G of the multiplicative group K* is cyclic.

Proof

Since K is a field, for any the polynomial Xⁿ – 1 has at most n roots in K and hence in G. The theorem then follows immediately from Exercise 2.18.

Corollary 2.17.

Every finite extension is simple. In particular, contains an irreducible polynomial of degree m (for any q and m).

Proof

2.9.2. Polynomials over Finite Fields

Corollary 2.18.

The minimal polynomial of over is (X – α)(X – α^q) · · · (X – α^{q^d–1}), where d is the smallest of the integers for which α^{q^s} = α.

Proof

We now prove a theorem which has important consequences.

Theorem 2.40.

is the product of all monic irreducible polynomials in , whose degrees divide m.

Proof

Definition 2.58.

The Möbius function is defined as

It follows that μ(n) ≠ 0 if and only if n is square-free.

Lemma 2.6.

For , we have

where denotes summation over all positive divisors d of n.

Proof

Lemma 2.7. Möbius inversion formula

Let f and g be maps from to an Abelian group G.

If G is additive and , then
If G is multiplicative and , then

Proof

To prove the additive formula we note that

where the last equality follows from Lemma 2.6. The multiplicative formula can be proved similarly.

Equation 2.4

Equation 2.5

Definition 2.59.

Let , q = pⁿ, be a finite extension of finite fields and let . The trace of α over is defined as the sum

and the norm of α over is defined as

The trace and norm functions play an important role in the theory of finite fields. See Exercise 2.86 for some elementary properties of these functions.

2.9.3. Representation of Finite Fields

Example 2.19.

Table 2.2. Multiplication table for
	1	α	α + 1	α²	α² + 1	α² + α	α² + α + 1
0	0	0	0	0	0	0	0
1	1	α	α + 1	α²	α² + 1	α² + α	α² + α + 1
α	α	α²	α² + α	α² + 1	α² + α + 1	1	α + 1
α + 1	α + 1	α² + α	α² + 1	1	α	α² + α + 1	α²
α²	α²	α² + 1	1	α² + α + 1	α + 1	α	α² + α
α² + 1	α² + 1	α² + α + 1	α	α + 1	α² + α	α²	1
α² + α	α² + α	1	α² + α + 1	α	α²	α + 1	α² + 1
α² + α + 1	α² + α + 1	α + 1	α²	α² + α	1	α² + 1	α

Table 2.3. Multiplication table for
	1	2	β	β + 1	β + 2	2β	2β + 1	2β + 2
0	0	0	0	0	0	0	0	0
1	1	2	β	β + 1	β + 2	2β	2β + 1	2β + 2
2	2	1	2β	2β + 2	2β + 1	β	β + 2	β + 1
β	β	2β	2	β + 2	2β + 2	1	β + 1	2β + 1
β + 1	β + 1	2β + 2	β + 2	2β	1	2β + 1	2	β
β + 2	β + 2	2β + 1	2β + 2	1	β	β + 1	2β	2
2β	2β	β	1	2β + 1	β + 1	2	2β + 2	β + 2
2β + 1	2β + 1	β + 2	β + 1	2	2β	2β + 2	β	1
2β + 2	2β + 2	β + 1	2β + 1	β	2	β + 2	1	2β

Polynomial bases are most common in finite field implementations. Some other types of bases also deserve specific mention in this context.

Definition 2.60.

It can be shown that normal bases exist for all finite extensions . It can even be shown that primitive normal bases also do exist for all such extensions.

Example 2.20.

Consider the representation of in Example 2.19. The elements α, α² and α⁴ = α² + α + 1 satisfy

On the other hand, α + 1 is not a normal element of . Table 2.2 gives

with the transformation matrix having determinant zero modulo 2.

Example 2.21.

Consider the representation of in Example 2.19. By Table 2.3, γ := β + 1 is a generator of . Table 2.4 lists the powers of γ and the Zech logarithms.

Table 2.4. Zech’s logarithm table for with respect to γ = β + 1
k	γ^k	1 + γ^k	z_k
0	1	2	4
1	β + 1	β + 2	7
2	2β	2β + 1	3
3	2β + 1	2β + 2	5
4	2	0	–
5	2β + 2	2β	2
6	β	β + 1	1
7	β + 2	β	6

Exercise Set 2.9

2.80	Let F be a field (not necessarily finite) of characteristic and let a, . Prove that (a + b)^p = a^p + b^p, or, more generally, (a + b)^pⁿ = a^pⁿ + b^pⁿ for all . [H]
2.81	Let , and q := pⁿ. Prove that: If , then f(X^p) = f(X)^p. If , then f(X^p) = g(X)^p for some .
2.82	Let , n, and q := pⁿ. Let F ⊆ K be an extension of finite fields with #F = q and #K = q^m. Show that K is the splitting field of over . [H]
2.83	Write the addition and multiplication tables of (some representations of) the fields and . Use these tables to find a primitive element in each of these fields and a normal element in (over ).
2.84	Let K be a field (not necessarily finite or of positive characteristic). Let be of degree 2 or 3. Prove that f is reducible in K[X] if and only if f has a root in K. Deduce that X² + X + 1 and X³ + X + 1 are irreducible in . Let be of degree d ≥ 0. The opposite of f is the polynomial . Show that f(X) is irreducible in K[X] if and only if f^op(X) is irreducible in K[X]. Deduce that X³ + X² + 1 is irreducible in .
2.85	In this exercise, one studies the arithmetic in the finite field . Show that is irreducible. Let us represent as . Call and consider the elements a := 3α² + 2α + 1 and b := 2α² + 3 in . Compute ab^–1 in this representation of . You should compute the canonical representative of ab^–1 in , that is, a polynomial in α of degree < 3 with coefficients reduced modulo 5.
2.86	Let F ⊆ K ⊆ L be finite extensions of finite fields with [L : K] = s. Let α, and . Prove the following assertions: Tr_K\|F(α + β) = Tr_K\|F(α) + Tr_K\|F (β) and N_K\|F (αβ) = N_K\|F (α) N_K\|F (β). Tr_L\|F (α) = s Tr_K\|F (α) and N_L\|F (α) = N_K\|F (α)^s. Transitivity of trace and norm Tr_L\|F (γ) = Tr_K\|F (Tr_L\|K(γ)) and N_L\|F (γ) = N_K\|F (N_L\|K (γ)).
2.87	Let be a finite extension of finite fields. In this exercise, we treat both K and L as vector spaces over K. Show that: Tr_L\|K is a surjective linear transformation L → K. All the linear transformations L → K are given by T_α : L → K, β ↦ Tr_L\|K(αβ), where . (In this notation, Tr_L\|K = T₁.) Moreover, for distinct elements α, the linear transformations T_α and T_α′ are distinct.
2.88	Let K and L be as in Exercise 2.87 and let . Show that Tr_L\|K(β) = 0 if and only if β = γ^q – γ for some .
2.89	Let K and L be as in Exercise 2.87. Two K-bases (β₀, . . . , β_m–1) and (γ₀, . . . , γ_m–1) of L are called dual or complementary, if Tr_L\|K(β_iγ_j) = δ_ij.^[10] Show that every K-basis of L has a unique dual basis. ^[10] The Kronecker delta δ on an index set I (finite or infinite) is defined for i, as:
2.90	Prove that every finite extension of finite fields is Galois. [H]
2.91	For the extension , consider the map , α ↦ α^q. Show that is an -automorphism of . is called the Frobenius automorphism of over . Show that is cyclic of order m and with as a generator. [H]
2.92	Let be irreducible with deg f = d. Consider the extension and let r := gcd(d, m). Show that f is irreducible in if and only if r = 1. [H] More generally, show that f factors in into a product of r irreducible polynomials each of degree d/r.
2.93	Consider the representation of in Example 2.19. Construct the minimal polynomials over of the elements of . [H]
2.94	Show that the number of (ordered) -bases of is (q^m – 1)(q^m – q)(q^m – q²) · · ·(q^m – q^{m – 1}).

*2.10. Affine and Projective Curves

2.10.1. Plane Curves

Definition 2.61.

is an n-dimensional vector space over K. For example, the affine plane can be identified with the conventional X-Y plane.

Definition 2.62.

Definition 2.63.

Now that we have passed from the affine plane to the projective plane, we should be able to carry (affine) plane curves to the projective plane. For this, we need some definitions.

Definition 2.64.

Definition 2.65.

A homogeneous polynomial can be viewed as the homogenization of any of the polynomials

f_Z(X, Y) = f(X, Y, 1), f_Y (X, Z) = f(X, 1, Z) and f_X(Y, Z) = f(1, Y, Z).

Consider a point P = [a, b, c] on the projective curve C : f(X, Y, Z) = 0. Since a, b and c are not all 0, P is a finite point on at least one of f_X, f_Y and f_Z.

2.10.2. Polynomial and Rational Functions on Plane Curves

Throughout the rest of Section 2.10 we make the following assumption:

Assumption 2.1.

K is an algebraically closed field, that is, .

Although many of the results we state now are valid for fields that are not algebraically closed, it is convenient to make this assumption in order to avoid unnecessary complications.

Definition 2.66.

Let P = [a, b, c] be a point on a curve C defined over K. We call P a smooth or regular or non-singular point of C, if P satisfies the following conditions.

If P is a finite point (that is, if c ≠ 0), then P is called a smooth point on C, if the partial derivatives ∂f/∂X and ∂f/∂Y do not vanish simultaneously at (a/c, b/c).
If P is a point at infinity (that is, if c = 0), then we must have a ≠ 0 or b ≠ 0. Assume a ≠ 0. (The other case can be treated similarly.) Consider the polynomial . P is a finite point on the curve D : g(Y, Z) = 0. P is called a smooth point on C, if (b/a, 0) is a smooth point on D, that is, if ∂g/∂Y and ∂g/∂Z do not vanish simultaneously at (b/a, 0).

A non-smooth point on C is also called non-regular or singular. C is called smooth or regular or non-singular, if all points (finite and infinite) on C are smooth.

Definition 2.67.

^[11] Recall from Section 2.7 that K[x, y] is the K-algebra generated by x and y. It is not a polynomial algebra (in general).

Definition 2.68.

The above definitions can be extended to the corresponding projective curve C : f^(h)(X, Y, Z) = 0. By Exercise 2.96(e), the polynomial f^(h) is irreducible, since we assumed f to be so.

Definition 2.69.

Definition 2.70.

Let C be a projective plane curve, r be a non-zero rational function and P a point on C. P is called a zero of r if r(P) = 0, and a pole of r if r(P) = ∞.

Theorem 2.41.

Let C be a projective plane curve defined by an irreducible polynomial over K and P a smooth point on C. Then there exists a rational function (depending on P) with the following properties:

u_P (P) = 0.
For any non-zero rational function , there exist an integer d and a rational function having neither a zero nor a pole at P such that . The integer d does not depend on the choice of u_P.

Definition 2.71.

The connection of poles and zeros with orders is established by the following theorem which we again avoid to prove.

Theorem 2.42.

P is neither a pole nor a zero of r if and only if ord_P(r) = 0. P is a zero of r if and only if ord_P(r) > 0. P is a pole of r if and only if ord_P(r) < 0.

If P is a zero (resp. a pole) of r, the integer ord_P(r) (resp. – ord_P(r)) is called the multiplicity of the zero (resp. pole) P.

Theorem 2.43.

Let r be a rational function on the projective plane curve C defined over K. Then r has finitely many poles and zeros. Furthermore, .

2.10.3. Maps Between Plane Curves

Definition 2.72.

A rational map (defined over K) is given by rational functions , , in K(C₁) such that for each at which all of , and are defined, the point . One often uses the notation .

This, however, is not the complete story. A more precise characterization of a rational map is as follows:

The curves C₁ and C₂ are said to be isomorphic (denoted C₁ ≅ C₂), if there exist morphisms and ψ : C₂ → C₁ such that and are identity maps on C₁(K) and C₂(K) respectively.

**2.10.4. Divisors on Plane Curves

Definition 2.73.

Definition 2.74.

Let be a divisor. The support of D is defined to be the set and is denoted by Supp D.

The degree of D is defined as the integer and is denoted as deg D. The subset of Div(C) is clearly a subgroup of Div(C). We denote this subgroup by Div⁰(C).

Now we define divisors of rational functions on C. Henceforth we assume that C is smooth (that is, smooth at all K-rational points on C).

Definition 2.75.

The divisor of a rational function is defined to be the formal sum , where ord_P(r) is the order of r at P (Definition 2.71). By Theorem 2.43 .

Exercise Set 2.10

In this exercise set, we do not assume (unless otherwise stated) that K is necessarily algebraically closed.

2.95	For homogeneous polynomials f₁, of respective degrees d₁ and d₂, prove the following assertions: If d₁ = d₂, then f₁ ± f₂ are homogeneous polynomials of degree d₁. The polynomial f₁f₂ is homogeneous of degree d₁ + d₂. Conversely, if f₁f₂ is homogeneous, then f₁ and f₂ are also homogeneous. A polynomial is homogeneous of degree d if and only if it satisfies f(λX₁, . . ., λX_n) = λ^df(X₁, . . ., X_n) for every .
2.96	In this exercise, we generalize the notion of homogenization and dehomogenization of polynomials. Let K[X₁, . . . , X_n] denote the polynomial ring in n indeterminates. Introducing another indeterminate X₀, we define the homogenization of a polynomial as Prove the following assertions. f^(h) is an element of K[X₀, X₁, . . . , X_n] and is homogeneous of degree d. f^(h)(1, X₁, . . . , X_n) = f(X₁, . . . , X_n). If deg f = d ≥ 0 and f_d is the sum of all non-zero terms of degree d in f, then we have f^(h)(0, X₁, . . . , X_n) = f_d(X₁, . . . , X_n). For f, , (fg)^(h) = f^(h)g^(h). Moreover, if g\|f, then g^(h)\|f^(h) and (f/g)^(h) = f^(h)/g^(h). Under what condition(s) is (f + g)^(h) = f^(h) + g^(h)? f is irreducible if and only if f^(h) is irreducible.
2.97	Let C : f(X, Y) = 0 be an affine plane curve defined by a non-zero polynomial and C : f^(h)(X, Y, Z) = 0 the corresponding projective plane curve. Let d := deg f = deg f^(h) and f_d the sum of non-zero terms of f of degree d. Show that: f^(h)(X, Y, 1) = f(X, Y) and f^(h)(X, Y, 0) = f_d(X, Y). is a K-rational point of the affine curve if and only if is a K-rational point of the projective curve. More generally, let . The point is a K-rational solution of f if and only if [x, y, λ] is a K-rational solution of f^(h). The solutions of f at infinity are obtained by solving f^(h)(X, Y, 0) = f_d(X, Y) = 0. Conclude that the curve C can have at most d points at infinity. For a, , each of the curves Y – aX = b and X – aY = b (straight lines), and Y – X² = 0 and X – Y² = 0 (parabolas) contains only one point at infinity. The hyperbola XY – 1 = 0 contains two points at infinity. How many points at infinity does the hyperbola X² – Y² – 1 = 0 contain? The circle X² + Y² – 1 = 0? For a₁, a₂, a₃, a₄, , the elliptic curve Y² + a₁XY + a₃Y = X³ + a₂X² + a₄X + a₆ contains only one point at infinity. Let and u(X), with deg u ≤ g, deg v = 2g + 1 and v monic. Show that the hyperelliptic curve Y² + u(X)Y = v(X) has only one point at infinity.
2.98	Show that the defining polynomial of the elliptic curve in Exercise 2.97(e) is irreducible. Prove the same for the hyperelliptic curve of Exercise 2.97(f). [H]
2.99	Show that for an ideal the following two conditions are equivalent: is generated by a set of homogeneous polynomials. If , where f_i is the sum of non-zero terms of degree i in f, then for all i = 0, . . . , d. (The polynomials f_i are called the homogeneous components of f.) An ideal satisfying the above equivalent conditions is called a homogeneous ideal. Construct an example to demonstrate that all ideals of K[X₁, . . . , X_n] need not be homogeneous.

*2.11. Elliptic Curves

2.11.1. The Weierstrass Equation

An elliptic curve E over K is a plane curve defined by the polynomial equation

Equation 2.6

or by the corresponding homogeneous equation

E : Y²Z + a₁XYZ + a₃YZ² = X³ + a₂X²Z + a₄XZ² + a₆Z³.

^[12] Ellipses are not elliptic curves.

Figure 2.1. Elliptic curves over

(a) Y² = X³ – X + 1
(b) Y² = X³ – X

Theorem 2.44.

Two elliptic curves

E₁	:	Y² + a₁XY + a₃Y = X³ + a₂X² + a₄X + a₆
E₂	:	Y² + b₁XY + b₃Y = X³ + b₂X² + b₄X + b₆

Equation 2.7

The theorem is not proved here. Formulas (2.7) can be checked by tedious calculations. A change of variables as in Theorem 2.44 is referred to as an admissible change of variables. We denote this by

(X, Y) ← (u²X + r, u³Y + u²sX + t).

The inverse transformation is also admissible and is given by

Isomorphism is an equivalence relation on the set of all elliptic curves over K.

Consider the elliptic curve E over K given by Equation (2.6). If char K ≠ 2, the admissible change transforms E to the form

E₁ : Y² = X³ + b₂X² + b₄X + b₆.

If, in addition, char K ≠ 3, the admissible change transforms E₁ to E₂ : Y² = X³ + aX + b. We henceforth assume that an elliptic curve over a field of characteristic ≠ 2, 3 is defined by

Equation 2.8

(instead of by the original Weierstrass Equation (2.6)).

Equation 2.9

On the other hand, if a₁ = 0, then the admissible change (X, Y) ← (X + a₂, Y) shows that E can be written in the form

Equation 2.10

A curve defined by Equation (2.9) is called non-supersingular, whereas one defined by Equation (2.10) is called supersingular.

Definition 2.76.

For the curve given by Equation (2.6), we define the following quantities:

Equation 2.11

Δ(E) is called the discriminant of the curve E, and j(E) the j-invariant of E.

For the special cases given by the simplified equations above, these quantities have more compact formulas as given in Table 2.5.

Theorem 2.45.

For the curve E defined by Equation (2.6), the following properties hold:

An admissible change of variables does not alter Δ(E) and j(E).

Table 2.5. Discriminant and j-invariant for elliptic curves
Special case	Δ(E)	j(E)
char K ≠ 2, 3 (Equation 2.8)	–16(4a³ + 27b²)	1728(4a)³/Δ(E)
char K = 2, non-supersingular (Equation 2.9)	b	1/b
char K = 2, supersingular (Equation 2.10)	a⁴	0

E is an elliptic curve, that is, E is smooth, if and only if Δ(E) ≠ 0. In particular, the j-invariant is defined for all elliptic curves.
Let E₁ and E₂ be two elliptic curves defined over the field K. If E₁ and E₂ are isomorphic over K, then j(E₁) = j(E₂). Conversely, if j(E₁) = j(E₂), then E₁ and E₂ are isomorphic over .

Proof

Tedious calculations using Formulas (2.7) establish this claim.
The polynomial f(X, Y, Z) = Y²Z + a₁XYZ + a₃YZ² – X³ – a₂X²Z – a₄XZ² – a₆Z³ defines the curve E. Since , E is smooth at . Suppose that E is not smooth at the finite point . The admissible change (X, Y) ← (X + x₀, Y + y₀) does not alter the value of Δ(E) by (1). So we can assume, without loss of generality, that (x₀, y₀) = (0, 0). But then we have f(0, 0) = –a₆ = 0, ∂f/∂x(0, 0) = –a₄ = 0 and ∂f/∂y(0, 0) = a₃ = 0. Now it is easy to check from Equation (2.11) that Δ(E) = 0.
Conversely, let Δ(E) = 0. For simplicity, we assume that char K ≠ 2, 3 and E is given by Equation (2.8). By Exercise 2.62, , that is, the polynomial X³ + aX + b has multiple roots, say, . But then E is not smooth at .
By Part (1) and Theorem 2.44, two isomorphic elliptic curves have the same j-invariant. For proving the converse, we once again assume that char K ≠ 2, 3 and E₁ : Y² = X³ + a₁X + b₁ and E₂ : Y² = X³ + a₂X + b₂ have the same j-invariant. Then we have . Now we provide an admissible change of variable of the form (X, Y) ← (u²X, u³Y), , that transforms E₁ to E₂. Since Δ(E₁) ≠ 0 and Δ(E₂) ≠ 0, we take u = (b₁/b₂)^1/6 if a₁ = 0, u = (a₁/a₂)^1/4 if b₁ = 0, and u = (a₁/a₂)^1/4 = (b₁/b₂)^1/6 if a₁b₁ ≠ 0. Note that since is algebraically closed, u is defined in all the above cases.

2.11.2. The Elliptic Curve Group

Definition 2.77.

Let E be the elliptic curve defined by Equation (2.6) and the point at infinity on E. A binary operation + on E(K) is defined as follows:

For any , we define , that is, serves as the additive identity.
The opposite (additive inverse) of a point is now defined: if , then –P = P, and if , then –P = (h, –k – a₁h – a₃).
For P, , the sum P + Q is defined by the chord and tangent rule which goes as follows.
1. If Q = –P, then .
2. If Q ≠ –P, we consider the line passing through P and Q (we take the tangent line if P = Q). Since the degree of the defining equation for E is three, this line meets the curve at exactly one other point R. We define P + Q = –R. Figure 2.1 illustrates this case for curves over .

Theorem 2.46.

The set E(K) under the operation + is an Abelian group.

If char K ≠ 2, 3 and E is defined by Equation 2.8, we have:

Next, we consider char K = 2 and non-supersingular curves (Equation 2.9). The formulas in this case are:

Finally, for supersingular curves (Equation 2.10) with char K = 2, we have:

We denote by mP the sum P + · · · + P (m times) for a point and for . We also define and (–m)P := –(mP) (for ).

Example 2.22.

Consider the elliptic curve

E₁ : Y² = X³ + X + 3

Table 2.6. Multiples of points on the elliptic curve Y² = X³ + X + 3 over
P	2P	3P	4P	5P	ord P
					1
P₁ = (4, 1)	(6, 6)	(5, 0)	(6, 1)	(4, 6)	6
P₂ = (4, 6)	(6, 1)	(5, 0)	(6, 6)	(4, 1)	6
P₃ = (5, 0)					2
P₄ = (6, 1)	(6, 6)				3
P₅ = (6, 6)	(6, 1)				3

Now, consider the non-supersingular elliptic curve

E₂ : Y² + XY = X³ + X² + ξ

defined over , where ξ := T + 〈T³ + T + 1〉. We have Δ(E₂) = ξ and j(E₂) = ξ^–1 = ξ² + 1. The finite points on E₂ are:

P₁	=	(0, ξ² + ξ),
P₂	=	(1, ξ²),
P₃	=	(1, ξ² + 1),
P₄	=	(ξ, ξ²),
P₅	=	(ξ, ξ² + ξ),
P₆	=	(ξ + 1, ξ² + 1),
P₇	=	(ξ + 1, ξ² + ξ),
P₈	=	(ξ² + ξ, 1),
P₉	=	(ξ² + ξ, ξ² + ξ + 1).

^[13] Both 6 and 10 are square-free integers, and so the groups and must be cyclic (Exercise 2.115(a)).

Table 2.7. Multiples of points on the elliptic curve Y² + XY = X³ + X² + ξ over .
P	2P	3P	4P	5P	6P	7P	8P	9P	ord P
P₀									1
P₁									2
P₂	P₇	P₆	P₃						5
P₃	P₆	P₇	P₂						5
P₄	P₃	P₉	P₆	P₁	P₇	P₈	P₂	P₅	10
P₅	P₂	P₈	P₇	P₁	P₆	P₉	P₃	P₄	10
P₆	P₂	P₃	P₇						5
P₇	P₃	P₂	P₆						5
P₈	P₆	P₄	P₂	P₁	P₃	P₅	P₇	P₉	10
P₉	P₇	P₅	P₃	P₁	P₂	P₄	P₆	P₈	10

Let us continue to represent as in (2). The supersingular curve
E₃ : Y² + Y = X³ + ξX + ξ²
has Δ(E₃) = 1, j(E₃) = 0. is a cyclic group with 9 points as Table 2.8 illustrates.

Table 2.8. Multiples of points on the elliptic curve Y² + Y = X³ + ξX + ξ² over
P	2P	3P	4P	5P	6P	7P	8P	ord P
P₀ =								1
P₁ = (0, ξ² + ξ)	P₅	P₄	P₇	P₈	P₃	P₆	P₂	9
P₂ = (0, ξ² + ξ + 1)	P₆	P₃	P₈	P₇	P₄	P₅	P₁	9
P₃ = (ξ + 1, ξ)	P₄							3
P₄ = (ξ + 1, ξ + 1)	P₃							3
P₅ = (ξ², ξ²)	P₇	P₃	P₂	P₁	P₄	P₈	P₆	9
P₆ = (ξ², ξ² + 1)	P₈	P₄	P₁	P₂	P₃	P₇	P₅	9
P₇ = (ξ² + ξ, ξ² + ξ)	P₂	P₄	P₆	P₅	P₃	P₁	P₈	9
P₈ = (ξ² + ξ, ξ² + ξ +1)	P₁	P₃	P₅	P₆	P₄	P₂	P₇	9

Definition 2.78.

Multiples mP of a point can be expressed using nice formulas.

Definition 2.79.

For an elliptic curve defined over K by the equation E : f(X, Y) = 0 and for , there exist polynomials θ_m, ω_m, , such that for any point with we have

mP = (θ_m(h, k)/ψ_m(h, k)², ω_m(h, k)/ψ_m(h, k)³).

The polynomial ψ_m is called the m-th division polynomial of E.

Using the addition formula one can verify the following recursive description for ψ_m and the expressions for θ_m and ω_m in terms of ψ_m.

Lemma 2.8.

For an elliptic curve E defined by the general Weierstrass Equation (2.6) over a field K, the division polynomials ψ_m, , are recursively described as:

where d_i are as in Definition 2.76. The polynomials θ_m satisfy

for all ,

and for char K ≠ 2, one has

Points of E[m] can be characterized in terms of the division polynomials:

Theorem 2.47.

Ler and . Then if and only if ψ_m(h, k) = 0. Furthermore, if m > 2 and , then if and only if .

2.11.3. Elliptic Curves over Finite Fields

Since is a subset of , the cardinality is finite. The next theorem shows that is quite close to q.

Theorem 2.48. Hasse’s theorem

, where . (The integer t is called the trace of Frobenius at q.)

Definition 2.80.

If t = 1 (that is, if ), the curve E is called anomalous. If p|t, the curve E is called supersingular and if pt, then E is called non-supersingular.

Proposition 2.35.

An elliptic curve E over a finite field of characteristic 2 is supersingular if and only if j(E) = 0 or, equivalently, if and only if a₁ = 0 in Equation (2.6).

For arbitrary characteristic p, we have the following characterization.

Proposition 2.36.

An elliptic curve E over is supersingular if and only if t² = 0, q, 2q, 3q or 4q. In particular, if char , 3, then E is supersingular if and only if t = 0.

Definition 2.81.

Theorem 2.49. Structure theorem for finite Abelian groups

Theorem 2.50. Structure theorem for

The elliptic curve group is of rank 1 or 2. If the rank is 1, then is cyclic, otherwise , where n₁, n₂ ≥ 2 and n₂|n₁. In the second case, we have n₂|(q – 1).

Once we know the order of the group , it is easy to compute the order of as the following theorem suggests.

Theorem 2.51.

Let α, satisfy 1 – tX + qX² = (1 – αX)(1 – βX). Then for any the order .

Exercise Set 2.11

2.100	Show that the following curves over K are not smooth (and hence not elliptic curves): Y² = X³, K arbitrary. Y² = X³ + X², K arbitrary. Y² = X³ + aX + b, if char K = 2.
2.101	Show that for an elliptic curve E over K and a finite point , the only points in E(K) (or ) having X-coordinate equal to h are P and –P. Let char K ≠ 2, 3 and let E be defined by Equation (2.8). If α₁, α₂, are the roots (distinct by Theorem 2.45) of X³ + aX + b, then (α₁, 0), (α₂, 0) and (α₃, 0) are the only points on with Y-coordinate equal to 0. Show that these are the only points of order 2 in .
2.102	Let P = (h₁, k₁) and Q = (h₂, k₂) be two points (different from ) in E(K) defined by the Weierstrass Equation (2.6). Assume that Q ≠ –P. Determine R = (h₃, k₃) = P + Q as follows: Show that the line passing through P and Q (the tangent, if P = Q) has the equation Y = λX + μ, where Substituting λX + μ for Y in Equation (2.6) gives a cubic equation in X of which h₁ and h₂ are two roots. Show that the third root (the X-coordinate of R) is h₃ = λ² + a₁λ – a₂ – h₁ – h₂. Hence deduce that the Y-coordinate of R is k₃ = –(λ + a₁)h₃ – μ – a₃.
2.103	Let . Show that there exists an elliptic curve E over K such that . [H]
2.104	Assume that char K ≠ 2, 3 and consider the elliptic curve E given by Equation (2.8). Let K[E] be the affine coordinate ring and K(E) the field of rational functions on E. Show that every element in K[E] can be uniquely represented as u(x) + yv(x) for polynomials u(x), . The conjugate of is defined as . The norm of f is defined as . Show that . The degree of is defined as deg f := max(2 deg_x u, 3 + 2 deg_x v), where deg_x denotes the degree in x. Show that deg f = deg_x N(f). Show that for f, one has N(fg) = N(f) N(g). Hence conclude that deg(fg) = deg f + deg g. Show that every rational function in K(E) can be represented as a(x) + yb(x), where a(x), .
2.105	Show that the division polynomials for the general Weierstrass equation can be recursively defined as where F = 4x³ + d₂x² + 2d₄x + d₆.
2.106	Write the recursive formulas for the division polynomials ψ_m(x, y) and for the elliptic curve E defined by Equation 2.8 over a field K of characteristic ≠ 2, 3. Show that for m ≥ 2 and for we have [View full size image]
2.107	Write the recursive formulas for the division polynomials ψ_m(x, y) and for the elliptic curve E defined by Equation 2.9 over a field K of characteristic 2. Conclude that ψ_m are polynomials in only x for all . With f_m := ψ_m for all show that for m ≥ 2 and for we have [View full size image]
2.108	Consider the elliptic curve defined over the field : E_a,b : Y² = X³ + aX + b. Verify the following assertions: (You may write a computer program.) Each E_a,b has order between 3 and 13. The curve E_0,3 : Y² = X³ + 3 has the maximum possible order 13. The curve E_0,4 : Y² = X³ + 4 has the minimum possible order 3. The curve E_0,5 : Y² = X³ + 5 is anomalous. The group is not cyclic.
2.109	Consider the representation of as , where ξ is a root of T³ + T + 1 in . Identify an element (where ) with the integer (a₂a₁a₀)₂ = a₂2² + a₁2 + a₀. For integers a, , b ≠ 0, define the non-supersingular elliptic curve: E_a,b : Y² + XY = X³ + aX² + b. Verify the following assertions: (You may write a computer program.) Each E_a,b has order between 4 and 14. The curve E_1,1 : Y² + XY = X³ + X² + 1 has the maximum possible order 14. The curve E_2,1 : Y² + XY = X³ + ξX² + 1 has the minimum possible order 4. The curve E_2,2 : Y² + XY = X³ + ξX² + ξ is anomalous. The orders of E_a,b for all choices of a, b lie in the set {4, 6, 8, 10, 12, 14}. Each is cyclic. Theorem 2.45(3) requires the phrase over , that is, two curves over an algebraically non-closed field having the same j-invariant may be non-isomorphic.
2.110	Consider the representation of and the identification of elements of with integers as in Exercise 2.109. For a, b, , a ≠ 0, define the supersingular elliptic curve: E_a,b,c : Y² + aY = X³ + bX + c. Verify the following assertions: (You may write a computer program.) Each E_a,b,c has order between 5 and 13. The curve E_1,1,1 : Y² + Y = X³ + X + 1 has the maximum possible order 13. The curve E_1,1,2 : Y² + Y = X³ + X + ξ has the minimum possible order 5. The orders of E_a,b,c for all choices of a, b, c lie in the set {5, 9, 13}. No E_a,b,c is anomalous. Each is cyclic.
2.111	Consider the elliptic curve E : Y² + XY = X³ + X² + 1 defined over for all . Show that where r = ⌊n/2⌋. [H] Conclude that E is anomalous over , but not so over .
2.112	Let K be a finite field of characteristic ≠ 2, 3 and E : Y² = X³ + aX + b an elliptic curve defined over K. Prove that: #E(K) is odd if and only if X³ + aX + b is irreducible in K[X]. [H] E(K) is not cyclic if X³ + aX + b splits in K[X]. The converse of Part (b) does not hold. [H]
2.113	Let E : Y² + XY = X³ + aX² + b be a non-supersingular elliptic curve defined over . Prove that: has exactly one point of order 2. [H] is even.
2.114	Let E : Y² + aY = X³ + bX + c be a supersingular elliptic curve over . Prove that: has no points of order 2. is odd.
2.115	Let G be a finite Abelian group of cardinality n. Show that if n is square-free, then G is cyclic. [H] Prove that if E is an anomalous elliptic curve over , then is cyclic. [H] If E is a supersingular elliptic curve over the field of characteristic ≠ 2, 3, prove that is either cyclic or isomorphic to . [H]
2.116	Let , p ≡ 3 (mod 4), and a, . Consider the elliptic curve E : Y² = X³ – a²X over (or over ). Prove that: contains at most three points of order three. The points of order three in are precisely the points of order three in .
2.117	A Weierstrass equation of an elliptic curve defined over a field K is said to be in the Legendre form, if it can be written as Equation 2.12 for some , k ≠ 0, 1. Show that if char K ≠ 2, then every Weierstrass equation over K can be written in the Legendre form. Show that the j-invariant of the curve E defined by Equation (2.12) is .

**2.12. Hyperelliptic Curves

2.12.1. The Defining Equations

A hyperelliptic curve C of genus over a field K is defined by a polynomial equation of the form

Equation 2.13

Figure 2.2. A hyperelliptic curve of genus 2 over : Y² = X(X² – 1)(X² – 2)

Equation 2.14

Proposition 2.37.

Proof

Definition 2.82.

^[14] It is customary to define the opposite of to be itself.

2.12.2. Polynomial and Rational Functions

Let . Since y² + u(x)y – v(x) = 0 in K[C], we can repeatedly substitute y² by –u(x)y + v(x) in G(x, y) until the y-degree of G(x, y) becomes less than 2. This proves part of the following:

Proposition 2.38.

Every polynomial function can be written uniquely as G(x, y) = a(x) + yb(x) for some a(X), .

Proof

Definition 2.83.

Let . The conjugate of G is defined to be the polynomial function . The norm of G is defined as .

Some useful properties of the norm function are listed in the following lemma, the proof of which is left to the reader as an easy exercise.

Lemma 2.9.

For G, , we have:

.
If G(x, y) = a(x) + yb(x), then N(G) = a(x)² – a(x)b(x)u(x) – v(x)b(x)². In particular, .
.
N(GH) = N(G) N(H).

We also have an easy description of the rational functions on C.

Proposition 2.39.

Every rational function can be written in the form s(x) + yt(x) for some s(X), .

Proof

We can write r(x, y) = G(x, y)/H(x, y) for G, , H ≠ 0. Multiplying both the numerator and the denominator by and using Lemma 2.9(2) and Proposition 2.38 completes the proof.

Definition 2.84.

Some basic properties of the degree function follow.

Lemma 2.10.

For G, , we have:

deg G = deg_x(N(G)).
deg(GH) = deg G + deg H.
.

Proof

Easy exercise.

Now we are in a position to give an explicit definition of the value of a rational function at .

Definition 2.85.

For with G, , we define as:

If deg(G) < deg(H), then .

If deg(G) > deg(H), then (that is, r is not defined at ).

If deg(G) = deg(H), then is defined as the ratio of the leading coefficients of G and H.

Proposition 2.40.

Let be a finite point. Then we can take

as a uniformizing parameter at P. Finally, is a uniformizing parameter at the point at infinity (where g is the genus of C).

We give an alternative definition of the order (independent of u_P), which is computationally useful and which is equivalent to Definition 2.71 for a hyperelliptic curve.

Definition 2.86.

Finally, we define .

Example 2.23.

Now consider r = (x – h)^m for some m < 0. Write r = G/H with G = 1 and H = (x – h)^–m. Since ord_Q(r) = ord_Q(G) – ord_Q(h), we continue to have

Theorem 2.52.

A non-constant polynomial function has only finitely many zeros and a single pole at . Furthermore, if K is algebraically closed, then .

2.12.3. The Jacobian

Example 2.24.

For the rational function r := (x – h)^m of Example 2.23, we have:

Definition 2.87.

Two divisors D₁, (resp. in Div(C)) are said to be equivalent, denoted D₁ ~ D₂, if , or equivalently if .

Our goal is to associate to every divisor some unique reduced divisor with D ~ D_red, that is, D_red plays the role of the canonical representative of . We start with the following definition.

Definition 2.88.

A divisor is called semi-reduced, if each m_P ≥ 0 and if for m_P > 0 we have: if P is an ordinary point, and m_P = 1 if P is a special point.

Proposition 2.41.

Every divisor is equivalent to some semi-reduced divisor D₁.

Proof

Let , with and with C_ord being the disjoint union of C₁ and C₂, where an ordinary point if and only if its opposite and . Now we can write D = D₁ + D₂, where

and

with m₁ and m₂ so chosen that D₁, . By definition, D₁ is semi-reduced, whereas by Example 2.24 , where

Now, we explain how we can represent a semi-reduced divisor by a pair of polynomials a(x), . For that, we need a definition.

Definition 2.89.

Let and be two divisors on C (not necessarily in Div⁰(P)). The greatest common divisor (gcd) of D₁ and D₂ is defined as the divisor

Theorem 2.53.

deg_x b < m,
b(h_i) = k_i for i = 1, . . . , n,
a(x) divides b(x)² + b(x)u(x) – v(x), and
.

Conversely, if a(x), with deg_x b < deg_x a and with a dividing b² + bu – v, then the divisor gcd is semi-reduced.

We denote the divisor gcd by Div(a, b). The zero divisor has the representation Div(1, 0).

Definition 2.90.

A semi-reduced divisor is called a reduced divisior, if , where g is the genus of C.

The following theorem establishes the desirable properties of a reduced divisor.

Theorem 2.54.

For , there exists a unique reduced divisor D₁ equivalent to D.

Proof

We only prove the existence of reduced divisors. For the proof of the uniqueness, one may, for example, see Koblitz [154]. The norm of a divisor is defined as the integer .

Definition 2.91.

Exercise Set 2.12

In this exercise set, we let C denote a hyperelliptic curve of genus g defined by Equation (2.13) over a field K (not necessarily algebraically closed).

2.118	Show that the curve C₁ : Y² = X⁵ + X + 1 defined over is not smooth and so not a hyperelliptic curve. Find a point where C₁ is not smooth. Show that the curve C₂ : Y² = X⁵ + X + 2 defined over is smooth, that is, a hyperelliptic curve of genus 2. Find out all the -rational points on C₂. (There are ten of them.)
2.119	Represent as , where ξ is a root of the irreducible polynomial . Show that the curve C₃ : Y² + XY = X⁵ + X + 1 defined over is not smooth and so not a hyperelliptic curve. Find a point where C₃ is not smooth. Show that the curve C₄ : Y² + XY = X⁵ + X + ξ defined over is smooth, that is, a hyperelliptic curve of genus 2. Find out all the -rational points on C₄. (There are eight of them.)
2.120	Let . Prove the following assertions: The only points on C with X-coordinate equal to h are P and . . P is a special point if and only if u²(h) + 4v(h) = 0. If char K ≠ 2, then C has at most 2g + 1 special points, whereas if char K = 2, then C has at most g special points.
2.121	Prove Lemmas 2.9 and 2.10.
2.122	Let and . Show that G(P) = 0 if and only if . Let . Show that either P is a special point of C or h is a common root of u and v. Show that and that .
2.123	Prove Theorem 2.52. [H]
2.124	A line on C is a polynomial function of the form with a, b, , a and b not both 0. Let D = Div(l) be the divisor of a line l. Show that the norm \|D\| is either 2 or 2g + 1. Let . Determine Div(x – h). Determine Div(y).
2.125	Let E be an elliptic curve (that is, a hyperelliptic curve of genus 1) defined over K. Show that any divisor can be written as for some unique point and for some rational function . This rational function r is unique up to multiplication by elements of . Show that the map that maps the residue class of to the point satisfying for some , is a bijection. Let P, , not both . Show that there is a line l with , where R = –(P + Q). Let , where σ is defined in Part (b). Show that for P, one has . (This, in particular, proves Theorem 2.46 and that σ is a group isomorphism.) Let . Show that D is a principal divisor if and only if (integer sum) and (sum in ).

**2.13. Number Fields

In this section, we develop the theory of number fields and rings. Our aim is to make accessible to the readers the working of the cryptanalytic algorithms based on number field sieves.

2.13.1. Some Commutative Algebra

Ideal arithmetic

We start with some basic operations on ideals (cf. Example 2.7, Definition 2.23).

Definition 2.92.

Let A be a ring and let , , be a family (not necessarily finite) of ideals in A.

The set-theoretic intersection is evidently an ideal in A.

The sum of the family is the ideal

Two ideals and of A are said to be relatively prime or coprime, if , or equivalently if there exist and with a + b = 1.

If I = {1, 2, . . . , n} is finite, the product is the ideal generated by all elements of the form x₁x₂ . . . x_n with for all i = 1, . . . , n. We have:

If , the product is denoted as . The empty product of ideals is conventionally taken to be the unit ideal A. If is the principal ideal 〈a〉, then .

One can readily check that the operations intersection, sum and product on ideals in a ring are associative and commutative.

Localization

Definition 2.93.

Let A be a ring. A non-empty subset S of A is called multiplicatively closed or simply multiplicative, if and for any s, we have .

Example 2.25.

For a non-zero ring A, the subset A \ {0} is multiplicatively closed, if and only if A is an integral domain. For a general non-zero ring A, the set of all elements such that a is not a zero-divisor is a multiplicative subset of A.
Let A be a ring and a a proper ideal of A. The set is multiplicatively closed, if and only if is a prime ideal of A.
For a ring A and an element , the set {1, f, f², f³, . . .} ⊆ A is multiplicatively closed.

Definition 2.94.

Let A be a ring and S a multiplicative subset of A. The ring S^–1A constructed as above is called the localization of A away from S or the ring of fractions of A with respect to S.

Example 2.26.

Let A be an integral domain and let S = A \ {0}. Then S^–1A is called the quotient field or the field of fractions of A and is denoted as Q(A). If A is already a field, then Q(A) ≅ A. Other examples include and Q(K[X]) = K(X), K a field, where K(X) denotes the field of rational functions over K in one indeterminate X.
More generally, if A is any ring and S is the set of all non-zero-divisors of A, then S^–1A is called the total quotient ring of A and is again denoted by Q(A). It is, in general, not a field. If A is an integral domain, then S = A \ {0} and the usage of Q(A) remains consistent.
Let A be a ring, a prime ideal of A and . Then S^–1A is called the localization of A at and is usually denoted by A_p.
Let A be a ring, and S = {1, f, f², f³, . . . }. In this case, S^–1A is conventionally denoted by A_f.

Integral dependence

Definition 2.95.

^[15] Strictly speaking, α being a root of f(X) is equivalent to α satisfying the polynomial equation f(α) = 0. Often the term equation is dropped in this context—a harmless colloquial contraction.

Example 2.27.

If both A and B are fields, the concepts of integral and algebraic elements are the same. (See the argument preceding Definition 2.95.)
Take and and let , gcd(a, b) = 1, be integral over . Let (a/b)ⁿ + α_n–1(a/b)^n–1 + · · · + α₁(a/b) + α₀, , be an equation of integral dependence of a/b over . Multiplication by bⁿ gives aⁿ = –b(α_n–1a^n–1 + · · · + α₁ab^n–2 + α₀b^n–1), that is, b|aⁿ. Since gcd(a, b) = 1, this forces b = ±1, that is, . This is, in general, true for any UFD A and its field of fractions B = Q(A) (See Exercise 2.131).
Every element is integral over A, since it satisfies the monic polynomial .

Lemma 2.11.

For a ring extension A ⊆ B and for , the following conditions are equivalent:

α is integral over A.
A[α] is a finitely generated A-module.
A[α] ⊆ C for some subring C of B with C being a finitely generated A-module.

Proof

[(b)⇒(c)] Take C := A[α].

Proposition 2.42.

For an extension A ⊆ B of rings, the set

is a subring of B containing A.

Proof

Definition 2.96.

Example 2.28.

(or more generally any UFD) is a normal domain.
is not integrally closed in or , since, for example, is integral over . The integral closure of in is denoted by . Elements of are called algebraic integers (See Exercise 2.60).

Noetherian rings

Definition 2.97.

Proposition 2.43.

For a ring A, the following conditions are equivalent:

Every ideal of A is finitely generated.
A satisfies the ascending chain condition.
Every non-empty set of ideals of A contains a maximal element.

Proof

[(b)⇒(c)] Let S be a non-empty set of ideals of A. Order S by inclusion. The ACC implies that every chain in S has an upper bound in S. By Zorn’s lemma, S has a maximal element.

Definition 2.98.

A ring A is called Noetherian, if A satisfies (one and hence all of) the equivalent conditions of Proposition 2.43.

Example 2.29.

All PIDs are Noetherian, since principal ideals are obviously finitely generated. In particular, and K[X] (K a field) are Noetherian.
If A is Noetherian and an ideal of A, then is Noetherian, since the ideals of are in one-to-one inclusion-preserving correspondence with the ideals of A containing a and hence satisfy the ACC.
Let A be a Noetherian ring and S a multiplicative subset of A. Then the localization B := S^–1A is also Noetherian. To prove this fact let be an ideal in B. One can show that for some ideal of A. Since A is Noetherian, is finitely generated, say, . It is now (almost) obvious that is generated by a₁/1, . . . , a_r/1. A particular case: If A is Noetherian and a prime ideal of A, then the localization is also Noetherian.
The ring of polynomials with infinitely many indeterminates X₁, X₂, X₃, . . . is not Noetherian. This is because the ideal
〈X₁, X₂, X₃, . . .〉 = AX₁ + AX₂ + AX₃ + · · ·
is not finitely generated, or alternatively because we have the infinite strictly ascending chain of ideals: 〈X₁〉  〈X₁, X₂〉  〈X₁, X₂, X₃〉  · · ·, or because the set S := {〈X₁〉, 〈X₁, X₂〉, 〈X₁, X₂, X₃〉, . . .} of ideals in A does not contain a maximal element.

We have seen that if A is a PID, the polynomial ring A[X] need not be a PID. However, the property of being Noetherian is preserved during the passage from A to A[X] (Theorem 2.8).

Dedekind domains

A class of rings proves to be vital in the study of number fields:

Definition 2.99.

An integral domain A is called a Dedekind domain, if it satisfies all of the following three conditions:

A is Noetherian.
Every non-zero prime ideal of A is maximal.
A is integrally closed (in its quotient field K := Q(A)).

2.13.2. Number Fields and Rings

After much ado we are finally in a position to define the basic objects of study in this section.

Definition 2.100.

For number fields, the notion of integral closure leads to the following definition.

Definition 2.101.

Some simple properties of number rings are listed below.

Proposition 2.44.

For a number field K, we have:

.
For , there exists a rational integer such that . In particular, the quotient field of is K.
is integrally closed in , that is, is a normal domain.

Proof

(1) follows immediately from Example 2.27(2), (2) follows from Exercise 2.60, and (3) follows from Exercise 2.126(b).

^[16] A complex number has a representation by a pair (a, b) of real numbers. Here, plays the role of X + 〈X² + 1〉 in . Finally, every real number has a decimal (or binary or hexadecimal or . . .) representation.

^[17] The field is canonically embedded in K. It is evident that the embedding σ : K → K′ fixes element-wise.

Proposition 2.45.

A number field K of degree d ≥ 1 has exactly d distinct complex embeddings.

Proof

^[18] In a number theory seminar in 1996, Hendrik W. Lenstra, Jr. commented:

Suppose the Martians defined the complex numbers by adjoining a root of –1 they called j. And when the Earth and Martians start talking, they have to translate i to be either j or –j. So we take i to j, because I think that’s what the scientists will decide. ··· But it was later discovered that most Martians are left handed, so the philosophers decide it’s better to send i to –j instead.

Definition 2.102.

Example 2.30.

The number field is totally real and has the signature (2, 0). (The roots of X² – 2 are .)
The number field is totally complex and has the signature (0, 1). (The roots of X² + 2 are .)
The number field is neither totally real nor totally complex. The roots of X³ – 2 are and . The signature of K is (1, 1), that is, K has one real embedding and two properly complex embeddings.

Now we investigate the -module structure of for a number field K of degree d. Let σ₁, . . . , σ_d be the complex embeddings of K.

Definition 2.103.

For an element , we define the trace of α (over ) as

Equation 2.15

and the norm of α (over ) as

If g(X) is the minimal polynomial of α over and r := deg g, then r|d. Moreover, . So Tr(α) and N(α) belong to . If α is an algebraic integer, then , that is, Tr(α), .

The following properties of the norm and trace functions can be readily verified. Here α, and .

Tr(α + β)	=	Tr(α) + Tr(β),
N(αβ)	=	N(α)N(β),
Tr(cα)	=	c Tr(α),
N(cα)	=	c^dN(α),
Tr(c)	=	cd,
N(c)	=	c^d.

Definition 2.104.

Proposition 2.46.

Δ(β₁, . . . , β_d) = (det(σ_j(β_i)))².

Proof

where the last equality follows from Equation (2.15).

Let for some and let f(X) be the minimal polynomial of α over . We define the discriminant of f as

Δ(f) := Δ(1, α, α², ..., α^d–1).

Let us deduce a useful formula for Δ(f). Write and take formal derivative to get , that is, . Therefore, , that is,

Equation 2.16

For arbitrary , the discriminant Δ(β₁, . . . , β_d) discriminates between the cases that β₁, . . . , β_d form a -basis of K and that they do not.

Lemma 2.12.

Let satisfy for i = 1, . . . , d and for . Then Δ(γ₁, . . . , γ_d) = (det T)²Δ(β₁, . . . , β_d), where T = (t_ij).

Proof

Let E₁ := (σ_j(β_i)) and E₂ := (σ_j(γ_i)). Now

is the ij-th entry of the matrix T E₁, that is, E₂ = T E₁. Hence

Δ(γ₁, . . . , γ_d) = (det E₂)² = (det T)²(det E₁)² = (det T)²Δ(β₁, . . . , β_d).

Corollary 2.19.

Let and be two -bases of K. Let and . Then , where T is the change-of-basis matrix from to .

Corollary 2.20.

form a -basis of K, if and only if Δ(β₁, . . . , β_d) ≠ 0.

Proof

Let , and . Since is a -basis of K, each β_i can be written (uniquely) as with . By Lemma 2.12, , where . We have seen that . Therefore, is a -basis of K.

Finally comes the desired characterization of .

Theorem 2.55.

For a number field K of degree d, the ring is a free -module of rank d.

Proof

Claim: is linearly independent over .

is a -basis of K, that is, linearly independent over and so trivially over too.

Claim: generates as a -module.

by Lemma 2.12, we have

Δ(γ₁, . . . , γ_d) = (det T)²Δ(β₁, . . . , β_d) = r²Δ(β₁, . . . , β_d).

Definition 2.105.

Any -basis of is called an integral basis of K (or of ).

Corollary 2.21.

Every integral basis of K has the same discriminant (for a given K).

Proof

Definition 2.106.

Example 2.31.

Consider the quadratic number field for some square-free integer D ≠ 0, 1. We consider the two cases (See Exercise 2.136):

Case 1: D ≡ 2, 3 (mod 4)

Here , that is, is a power integral basis of K. The minimal polynomial of is X² – D and the conjugates of are ±. Therefore, by Equation (2.16), we have

Case 2: D ≡ 1 (mod 4)

In this case, , that is, is a power integral basis of K. The minimal polynomial of is and the conjugates of are ±. Therefore, Equation (2.16) gives

2.13.3. Unique Factorization of Ideals

Lemma 2.13.

Let be a non-zero prime ideal of . Then lies above a unique non-zero prime ideal of . In particular, contains a (unique) rational prime.

Proof

Let . If , then both and 0 are prime ideals of that lie over the zero ideal of . Since , by Exercise 2.128(c), a contradiction.

Proposition 2.47.

is Noetherian.

Proof

Theorem 2.56.

The ring of integers of a number field K is a Dedekind domain.

Proof

Now we derive the unique factorization theorem for ideals in a DD. It is going to be a long story. We refer the reader to Definition 2.92 to recall how the product of two ideals is defined.

Lemma 2.14.

Let A be a ring, , ideals of A, and a prime ideal of A such that . Then for some . In particular, if A is a DD and are non-zero prime ideals, then for some .

Proof

We now generalize the concept of ideals.

Definition 2.107.

Let A be an integral domain and K := Q(A). An A-submodule of K is called a fractional ideal of A, if for some .

Lemma 2.15.

Let A be a Noetherian integral domain, K := Q(A) and . Then is a fractional ideal of A, if and only if is a finitely generated A-submodule of K.

Proof

[if] Let , where x_i = a_i/b_i, a_i, , b_i ≠ 0. Then .

[only if] Let be such that . Now ba is an (integral) ideal of A (easy check) and is finitely generated, since A is Noetherian. Let , . Then , where .

We define the product of two fractional ideals , of an integral domain A as we did for integral ideals:

Lemma 2.16.

Let A be a Noetherian domain and an (integral) ideal of A. For some , there exist prime ideals of A each containing such that .

Proof

Note that the condition “each containing ” was necessary in Lemma 2.16 in order to rule out the trivial possibility that for some .

Lemma 2.17.

Let A be a DD, K := Q(A) and a non-zero prime ideal of A. Define the set

Then we have:

is a fractional ideal of A.
.
. In particular, every non-zero prime ideal in a DD is invertible.

Proof

Clearly, is an A-submodule of K, and for , we have .
Since , we have . In order to prove the strict inclusion, we take any and consider the ideal . By Lemma 2.16, there exist prime ideals each containing (and hence non-zero) such that . We choose r to be minimal, so that does not contain the product of any r – 1 of . Now and hence by Lemma 2.14 for some i, say, i = r. Choose any . Since , we have . On the other hand, and , so that , that is, .
By the definition of , it follows that is contained in and hence an integral ideal of A. Since , it follows that . Since is a maximal ideal, we then have or . Assume that . We claim that this assumption implies that , a contradiction to Part (2). So we must have . For proving the claim, let and choose . Then we have and, therefore, and so on. For each , define the ideal . Then is an ascending chain of ideals in A. Since A is Noetherian, the chain must be stationary, that is, for some we have , that is, , that is, with . Since A is an integral domain and a ≠ 0, we see that b is integral over A. Since A is integrally closed, . Therefore, , as claimed.

Theorem 2.57.

Every non-zero ideal in a DD A can be represented as a product of prime ideals of A. Moreover, such a factorization of is unique up to permutations of the factors.

Proof

In order to prove the uniqueness of this product, let with prime ideals and . Now and by Lemma 2.14 for some , say, j = 1. Then . Proceeding in this way shows the desired uniqueness.

Corollary 2.22.

Every non-zero fractional ideal of a DD A admits a unique factorization of the form with non-zero prime ideals of A and with exponents . Moreover for such a fractional ideal we have .

Proof

The fractional ideal in Corollary 2.22 is denoted by . We have . One can easily verify that defined as above is equal to the set

In fact, one can use the last equality as the definition for .

To sum up, every non-zero fractional ideal of a DD A is invertible and the set of all non-zero fractional ideals of A is a group. The unit ideal A acts as the identity in .

As in every group, we have the cancellation law(s) in .

Corollary 2.23.

Let A be a DD and , , fractional ideals of A. If and , then .

Corollary 2.24.

Let and be integral ideals of a DD A. Then if and only if .

Proof

[if] If , we have , that is, is an integral ideal of A.

Also .

[only if] If for some integral ideal , we have .

Corollary 2.25.

Let and with e_i, be the prime decompositions of two non-zero integral ideals of a DD A. Then if and only if e_i ≤ f_i for all i = 1, . . . , r.

Proof

[if] We have , where is an integral ideal of A.

Proposition 2.48.

A Dedekind domain A is a UFD, if and only if A is a PID.

Proof

[if] Every PID is a UFD (Theorem 2.11).

In the rest of this section, we abbreviate as , if K is implicit in the context.

2.13.4. Norms of Ideals

We have seen that the ring is a free -module of rank d. The same result holds for every non-zero ideal of . Let β₁, . . . , β_d constitute an integral basis of K.

One can choose rational integers a_ij with each a_ii positive such that

Equation 2.17

Proposition 2.49.

Every (integral) ideal in a DD A is generated by (at most) two elements. More precisely, for a proper non-zero ideal of A and for any there exists with .

Definition 2.108.

The norm of a non-zero ideal of is defined as the cardinality of the quotient ring . It is customary to define the norm of the zero ideal as zero.

Using the integers a_ij of Equations (2.17), we can write

Equation 2.18

Corollary 2.26.

For every non-zero ideal of , the quotient ring is a finite ring. In particular, if is a non-zero prime (hence maximal) ideal of , then is a finite field.

It is tempting to define the norm of an element to be the norm of the principal ideal . It turns out that this new definition is (almost) the same as the old definition of N(α). More precisely:

Proposition 2.50.

For any element , we have N(〈α〉) = |N(α)|.

Proof

It follows that . Equation (2.18) now completes the proof.

Corollary 2.27.

For any , we have .

Like the norm of elements, the norm of ideals is also multiplicative. We omit the (not-so-difficult) proof here.

Proposition 2.51.

Let and be ideals in . Then, .

The following immediate corollary often comes handy.

Corollary 2.28.

Let and be non-zero ideals of . If is the factorization of , then . In particular, if , then (in ).

2.13.5. Rational Primes in Number Rings

Equation 2.19

By Corollary 2.27, N(〈p〉) = p^d. By Corollary 2.28, each divides p^d and is again a power p^di of p.

Definition 2.109.

We define the ramification index of over p (or ) as . This is the largest such that divides (that is, contains) 〈p〉. The integer d_i (where is called the inertial degree of over p.

By the multiplicative property of norms, we have

Definition 2.110.

The following important result is due to Dedekind. Its proof is long and complicated and is omitted here.

Theorem 2.58.

A rational prime p ramifies in , if and only if p divides the discriminant Δ_K. In particular, there are only finitely many rational primes that ramify in .

Let us agree to write the canonical image of any polynomial in as . We write the factorization of as

with and with pairwise distinct irreducible polynomials . If , then . For each i = 1, . . . , r choose whose reduction modulo p is . Define the ideals

of . Since , we have

and

Therefore, are non-zero prime ideals of with . Thus . On the other hand, , since f(α) = 0 and . Thus we must have , that is, we have obtained the desired factorization of 〈p〉.

Let us now concentrate on an example of this explicit factorization.

Example 2.32.

Case 1:

In this case, p|D, that is, . Then , where . Thus p (totally) ramifies in .

Case 2:

Case 3:

The polynomial is irreducible in and hence remains prime in , that is, p is inert in .

Thus the quadratic residuosity of D modulo p dictates the behaviour of p in .

Let us finally look at the fate of the even prime 2 in . If D is even, then and if D is odd, then . In each case, 2 ramifies in .

Recall from Example 2.31 that Δ_K = 4D. Thus we have a confirmation of the fact that a rational prime p ramifies in if and only if p|Δ_K.

One can similarly study the behaviour of rational primes in

where D ≡ 1 (mod 4) is a square-free integer ≠ 0, 1.

2.13.6. Units in a Number Ring

^[19] Every finitely generated torsion-free module over a PID is free.

Example 2.33.

Let D ≠ 0, 1 be a square-free integer, and . If D < 0, the signature of K is (0, 1) and the value of ρ for is 0 + 1 – 1 = 0, that is, , that is, is finite in this case.

Exercise Set 2.13

2.126	If A ⊆ B and B ⊆ C are integral extensions of rings, show that A ⊆ C is also an integral extension. Let A ⊆ B be an extension of rings. Show that the integral closure of A in B is integrally closed in B. Let A ⊆ B be an integral extension of rings, an ideal of B and . (Note that is an ideal of A. If is prime in B, then is prime in A. See Proposition 2.10.) Show that is integral over .
2.127	Let A ⊆ B be an extension of integral domains, a finitely generated non-zero ideal of A and . If , show that γ is integral over A. [H]
2.128	Let A ⊆ B be an integral extension of integral domains. Show that A is a field if and only if B is a field. Let A ⊆ B be an integral extension of rings, a prime ideal of B and . Show that is maximal if and only if is maximal. [H] Let A, B, and be as in (b). Further let be another prime ideal of B with . Show that if , then . [H]
2.129	Let A be a ring and S a multiplicatively closed subset of A. Show that: If , then S^–1A is the zero ring. If S′ := S \ {1} is non-empty and closed under multiplication, then S′^–1A ≅ S^–1A. If A is Noetherian, then S^–1A is also Noetherian.
2.130	Let A ⊆ B be a ring extension and C the integral closure of A in B. Show that for any multiplicative subset S of A (and hence of B and C) the integral closure of S^–1A in S^–1B is S^–1C. In particular, if A is integrally closed in B, then so is S^–1A in S^–1B.
2.131	Recall that an integrally closed integral domain is called a normal domain (ND). Show that every UFD is a normal domain. Let D be a square-free integer ≠ 0, 1. Show that , is normal if and only if D ≡ 2, 3 (mod 4). (Remark: The reader should note the following important implications: That is, a Euclidean domain is a PID, a PID is a UFD and a UFD is a normal domain. Neither of the reverse implications is true. For example, the ring of integers of is known to be a PID but not a Euclidean domain. The ring K[X₁, . . . , X_n], n ≥ 2, of multivariate polynomials over a field K is a UFD, but not a PID, since the ideal 〈X₁, . . . , X_n〉 is not principal. Finally, is a normal domain (by Exercise 2.136 below), but not a UFD, since are two different factorizations of 6 into irreducible elements.)
2.132	A (non-zero) ring A with a unique maximal ideal m is called a local ring. In that case, the field A/m is called the residue field of A. Let A be ring and a prime ideal of A. Show that the localization is a local ring with the unique maximal ideal generated by elements , and the residue field is canonically isomorphic to the quotient field of the integral domain under the map .
2.133	A ring A is called a discrete valuation ring (DVR) or a discrete valuation domain (DVD), if A is a local principal ideal domain. Let A be a DVR with maximal ideal m = 〈p〉. Prove the following assertions: A is a UFD. The only primes in A are the associates of p. [H] Every non-zero element of A can be written as up^α, where u is a unit of A and . Every non-zero ideal of A is of the form 〈p^α〉 for some . A has only one non-zero prime ideal (namely, m). (Remark: The prime p of A is called a uniformizing parameter or a uniformizer for A and is unique up to multiplication by units. The map taking up^α ↦ α is called a discrete valuation of A and can be naturally extended to a group homomorphism by defining ν(a/b) := ν(a)–ν(b), where a, , b ≠ 0 and K = Q(A) is the quotient field of A. It is often convenient to define ν(0) := +∞. It follows that and .)
2.134	Let A be a local Noetherian integral domain which is not a field. Assume further that the maximal ideal m ≠ 0 of A is the only non-zero prime ideal of A. Show that A is a DVR (that is, a PID) if and only if A is integrally closed. Let A be a Noetherian integral domain which is not a field. Prove that A is a Dedekind domain if and only if is a DVR for every non-zero prime ideal of A.
2.135	Show that the only units of are ±1 and ±i. Show that the primes of are associates to the following: a prime integer ≡ 3 (mod 4), a + ib, a, , with a² + b² equal to 2 or a prime integer ≡ 1 (mod 4).
2.136	Show that every quadratic number field K can be represented as for a square-free integer D ≠ 0, 1. Let for some square-free integer D ≠ 0, 1. Show that: (In particular, the ring of integers of is the ring of Gaussian integers.)
2.137	Let A be a Dedekind domain. Let q₁ and q₂ be two distinct non-zero prime ideals of A. Show that for any e₁, we have . [H] Let be the prime factorization of a non-zero ideal of A with pairwise distinct primes q_i and . Show that . [H]
2.138	Let A be a Dedekind domain and a non-zero (integral) ideal of A. Show that: There exists a non-zero (integral) ideal of A such that is a principal ideal. [H] The number of ideals of A containing is finite. Every ideal of is principal.
2.139	Let and , e_i, , be the prime decompositions of two non-zero ideals , of a DD A. Define the gcd and lcm of and as Show that and lcm. Conclude that . (Note that if A is a general ring, we only have .)
2.140	Let K be a number field and . Let be an ideal of . Show that . In particular, every non-zero ideal of contains a non-zero integer. [H] Let be a non-zero prime ideal of . Prove that for some , where p is the unique rational prime contained in (Lemma 2.13).
2.141	Let K be a number field, , , and . Show that: , if and only if N(α) = ±1. , if and only if f(0) = ±1, where is the minimal polynomial of α over . , if and only if \|σ(α)\| = 1 for every complex embedding σ of K.
2.142	Let K be a number field. We say that K is norm-Euclidean, if for every α, , β ≠ 0, there exist q, such that α = qβ + r and \| N(r)\| < \| N(β)\|. Conclude that if K is norm-Euclidean, then is a Euclidean domain with the Euclidean degree function ν(α) := \| N(α)\|. (The converse of this is not true. For example, it is known that is not norm-Euclidean, but is a Euclidean domain.) Prove the following equivalent characterization of a norm-Euclidean number field: K is norm-Euclidean if and only if for every there exists such that \| N(α – β)\| < 1. Show that the following number fields are norm-Euclidean: , , , and . Show that is not norm-Euclidean. [H]
2.143	In this exercise, one derives that the only (rational) integer solutions of Bachet’s equation Equation 2.20 are x = 3, y = ±5. Show that Equation (2.20) has no solutions with x or y even. [H] Let (x, y) be a solution of Equation (2.20) with both x and y odd. Then x³ admits a factorization in as . Let . Show that and that is a UFD. Also the only units of are ±1. Show that gcd. [H] Because of unique factorization one can write for c, . Expand the cube and equate the real and imaginary parts to conclude that we must have y = ±5, so that x = 3.

**2.14. p-adic Numbers

In the rest of this section, we, without specific mention, denote by p an arbitrary rational prime.

2.14.1. The Arithmetic of p-adic Numbers

There are various ways in which p-adic integers can be defined. A simple way is to use infinite sequences.

Definition 2.111.

^[20] Well! We are now in a mess of notations. We have for every . In particular, for we have which is a field that we planned to denote also by . It is superfluous to have two notations for the same thing. Many authors, therefore, prefer to avoid the hat and call as . For them, our is and/or written explicitly. Let us stick to our old conventions and use hats to remove ambiguities.

See Exercise 2.144 for another way of defining p-adic integers. We now show that is a ring. Before doing that, we mention that the ring is canonically embedded in by the injective map , a ↦ (a).

Definition 2.112.

Let (a_n) and (b_n) be two p-adic integers. Define:

(a_n) + (b_n)	:=	(a_n + b_n).
(a_n) · (b_n)	:=	(a_n · b_n).

Proposition 2.52.

For , the following conditions are equivalent:

p  a_n for all .
p  a₁.

Proof

[(a)⇒(b)] Let (a_n)(b_n) = (a_nb_n) = 1 = (1) for some . Then for every we have a_nb_n ≡ 1 (mod pⁿ), that is, a_n is invertible modulo pⁿ and hence modulo p as well, that is, p  a_n.

[(b)⇒(c)] Obvious.

We also have a_n+1b_n+1 ≡ 1 (mod pⁿ), that is, a_nb_n+1 ≡ 1 (mod pⁿ), that is, .

Proposition 2.53.

Every can be written uniquely as x = p^ry for some and for some .

Proof

Proposition 2.54.

is an integral domain.

Proof

Definition 2.113.

The quotient field of is called the field of p-adic numbers.

Proposition 2.55.

Every non-zero can be expressed uniquely as x = p^ry with and .

Proof

One can write x = a/b for some a, . Then a = p^sc and b = p^td for some s, , c, and so x = p^s–t(c/d) with . The proof for the uniqueness is left to the reader.

2.14.2. The p-adic Valuation

Proposition 2.55 leads to the notion of p-adic distance between pairs of points in . Let us start with some formal definitions.

Definition 2.114.

A metric on a set S is a map such that for every x, y, we have:

Non-negative d(x, y) ≥ 0.
Non-degeneracy d(x, y) = 0 if and only if x = y.
Symmetry d(x, y) = d(y, x).
Triangle inequality d(x, z) ≤ d(x, y) + d(y, z).

A set S together with a metric d is called a metric space (with metric d).

Definition 2.115.

A norm on a field K is a map such that for all x, we have:

Non-negative ‖x‖ ≥ 0.
Non-degeneracy ‖x‖ = 0 if and only if x = 0.
Multiplicativity ‖xy‖ = ‖x‖ ‖y‖.
Triangle inequality ‖x + y‖ ≤ ‖x‖ + ‖y‖.

It is an easy check that for a norm ‖ ‖ on K the function , d(x, y) := ‖x – y‖, defines a metric on K.

Example 2.34.

Setting defines a norm on any field K. This norm is called the trivial norm on K.
The absolute value | | is an Archimedean norm on (or ). It is customary to denote this norm as | |_∞. This norm induces the usual metric topology on (or ) which is at the heart of real analysis. In p-adic analysis, one investigates under the p-adic norms that we define now.

Definition 2.116.

The p-adic norm on is defined as:

Theorem 2.59.

The p-adic norm | |_p is a non-Archimedean norm on .

Proof

Definition 2.117.

Theorem 2.60. Ostrowski’s theorem

Every non-trivial norm on is equivalent to | |_p for some .

Definition 2.118.

Consider the partial sums for each . If there exists with s_n → s, we say that the sum converges to s and write .

A sequence x₁, x₂, . . . of elements of is said to be a Cauchy sequence if for every , there exists an such that |x_m – x_n|_p ≤ p^–M for all m, n ≥ N.

Definition 2.119.

A field K is called complete under a norm ‖ ‖ if every sequence of elements of K, which is Cauchy under ‖ ‖, converges to an element in K.

For example, is complete under | |_∞. We shortly demonstrate that is complete under | |_p.

Lemma 2.18.

A sequence (a_n) of p-adic numbers is a Cauchy sequence if and only if the sequence (a_n+1 – a_n) converges to 0.

Proof

[if] Take any . Since a_n+1–a_n → 0 by hypothesis, there exists such that |a_n+1 –a_n|_p ≤ p^–M for all n ≥ N. But then for all m, n ≥ N with m = n+k, , we have .

Thus (a_n) is a Cauchy sequence.

Theorem 2.61.

The field is complete with respect to | |_p.

Proof

Let (a_n) be a Cauchy sequence in . By Lemma 2.18, a_n+1 – a_n → 0. Therefore, there exists such that |a_n+1 – a_n|_p ≤ 1 for all n ≥ N. For n = N + k, , we have

\|a_n\|_p	=	\|a_{N + k}\|_p
	=	\|(a_{N + k} – a_{N + k –1}) + · · · + (a_{N + 1} – a_N) + a_N\|_p
	≤	max(\|a_{N + k} – a_{N + k – 1}\|_p,. . ., \|a_{N + 1} – a_N\|_p, \|a_N\|_p)
	≤	max(1, \|a_N\|_p).

Theorem 2.62.

is the completion of with respect to the norm | |_p.

Proof

Let C denote the ring of Cauchy sequences from (under the p-adic norm), the maximal ideal of C consisting of sequences that converge to 0, and . We now show that .

Corollary 2.29.

The p-adic series (with ) converges if and only if |a_n|_p → 0.

Proof

This is quite unlike the Archimedean norm | |_∞. For example, with respect to this norm , whereas the series diverges.

2.14.3. Hensel’s Lemma

Since by our assumption any ring A comes with identity (that we denote by 1 = 1_A), it makes sense to talk for every about an element n = n_A in A, which is the n-fold sum of 1. More precisely:

Given any , one can define the formal derivative of f as . Properties of formal derivatives of polynomials are covered in Exercise 2.61.

Theorem 2.63. Hensel’s lemma

Let . Suppose that there exist and satisfying:

|f(α₀|_p ≤^{–(2M + 1)} (that is, α₀ is a solution of f(x)≡ 0 (mod p^2M+1)), and
|f′(α₀)|_p = p^–M (this is, f′ (α₀) ≢ 0 (mod p^M+1)).

Then there exists a unique such that f(α) = 0 and |α – α₀|_p ≤ p^–(M+1) (that is, α ≡ α₀ (mod p^M+1)).

Proof

α_n := α_n–1 + k_np^{M + n}

for some

Since p^M+1  f′(α_n–1), the element and, therefore, there is a unique solution for k_n of the congruence

This value of k_n yields

f (α_n) = p^{2M + n}(b_np + c_npⁿ) ≡ 0 (mod p^2M+n+1)

for some . The Taylor expansion of f′ gives f′(α_n) = f′(α_n–1) + d_np^M+n (for some ) which implies that f′(α_n) ≡ f′(α_n–1) (mod p^M), that is, |f′(α_n)|_p = p^–M.

Note that α_n in the last proof satisfies the congruence

f(α_n) ≡ 0 (mod p^2M+n+1)

The special case M = 0 for Hensel’s lemma is now singled out:

Corollary 2.30.

Let . Suppose that there exists an satisfying:

|f(α₀)|_p < 1 (that is, α₀ is a solution of f(x) ≡ 0 (mod p)), and
|f′(α₀)|_p = 1 (that is, f′(α₀) ≢ 0 (mod p), that is, α₀ is a simple root of f modulo p).

Then there exists a unique such that f(α) = 0 and |α – α₀|_p < 1 (that is, α ≡ α₀ (mod p)).

Equation 2.21

Example 2.35.

Exercise Set 2.14

2.144	Establish that any p-adic integer (a_n) can be uniquely described as a sequence of integers x_n satisfying 0 ≤ x_n < p for every and a_n ≡ x₀ + x₁p + · · · + x_n–1p^n–1 (mod pⁿ) for every . In this case, the p-adic integer (a_n) is written as the infinite series (a_n) = x₀ + x₁p + x₂p² + · · ·. One calls the above series the p-adic expansion of (a_n). Note that the sum in the above series is not to be treated as one of integers. However, for the expansion of a to the base p is the same as the p-adic expansion of a (more correctly of ). In other words, if the p-adic expansion of (a_n) is terminating, that is, x_N = x_N+1 = x_N+2 = · · · = 0 for some N, then (a_n) can be identified with the rational integer x₀ + x₁p + · · · + x_N–1p^N–1. A non-terminating p-adic series, on the other hand, diverges under the Archimedean norm, but converges under the p-adic norm and corresponds to an element of not in . The rational integer –1, for example, has the infinite p-adic expansion (p – 1) + (p – 1)p + (p – 1)p² + · · ·. The sum telescopes and in the limit n → ∞ converges (under the p-adic norm) to lim_n→∞ pⁿ – 1 = –1. Let . Write the p-adic expansion for –a. [H] Given p-adic integers a := x₀ + x₁p + x₂p² + · · · and b := y₀ + y₁p + y₂p² + · · · , find the p-adic integers c := z₀ + z₁p + z₂p² + · · · and d := w₀ + w₁p + w₂p² + · · · , such that c = a + b and d = ab. (Express each z_n and w_n explicitly in terms of x_n’s and y_n’s.)
2.145	In view of Exercise 2.144, every admits a unique expansion of the form x = x₀ + x₁p + x₂p² + · · · , where each . This notion of p-adic expansion can be extended to the elements of . Show that for , there exist unique and unique integers x_–r, x_–r+1, . . . , x_–1, x₀, x₁, . . . , each in {0, 1, . . . , p – 1}, such that x can be written as: x = x_–rp^–r + x_–r+1p^–r+1 + · · · + x_–1p^–1 + x₀ + x₁p + x₂p² + · · ·. Describe how to compute the p-adic expansions of x + y and xy given those for x, . Also of x/y provided that y ≠ 0. What is \|x\|_p for ? What is\|x\|_p for with x_–r ≠ 0.
2.146	Let p be an odd prime and with . From elementary number theory we know that the congruence x² ≡ a (mod pⁿ) has two solutions for every . Let x₁ be a solution of x² ≡ a (mod p). We know that a solution x_n of x² ≡ a (mod pⁿ) lifts uniquely to a solution x_n+1 of x² ≡ a (mod pⁿ+1). Thus we can inductively compute a sequence x₁, x₂, x₃, · · · of integers. Show that (x_n) is a p-adic integer and that (x_n)² = (a).
2.147	Show that the ring contains rationals of the form a/b, a, , p  b. This implies that . Take a := 17 for p = 2, a := 7 for p = 3 and a := p + 1 for p > 3. Show that there exists with x² = a in . Show also that such an x does not belong to . Thus . Show that . Thus .
2.148	Prove the following assertions: . . Every non-zero ideal of is of the form for some . The ideals of Part (c) satisfy the infinite strictly descending chain . is a local domain with the maximal ideal . The ideal of Part (c) is the principal ideal of generated by p^r, and . In particular, is a local PID, that is, a discrete valuation domain (Exercise 2.133), with the residue field .
2.149	Compute the p-adic expansion of 1/3 in and of –2/5 in .
2.150	Show that is dense in under the p-adic norm \| \|_p, that is, show that given any and real ε > 0, there exists with \|x – a\|_p < ε. Show also that is dense in .
2.151	Prove the following assertions that establish that is the closure of in under \| \|_p. Every sequence (a_n) of rational integers, Cauchy under \| \|_p, converges in . If a sequence (a_n) of rational numbers, Cauchy under \| \|_p, converges to , then there exists a sequence (b_n) of rational integers, Cauchy under \| \|_p, that converges to x.
2.152	Show that: The series converges in . The series converges in . in . [H] The series does not converge in . If and \|a\|_p < 1, then .
2.153	Prove that for any non-zero . [H]
2.154	Prove that for any the sequence (a^pⁿ) converges in . [H]
2.155	Let p, , p ≠ q. Show that the fields and are not isomorphic.
2.156	Let a be an integer congruent to 1 modulo 8. Show that there exists an such that α² = a and .
2.157	Compute with α² + α + 223 = 0 and α ≡ 4 (mod 243).
2.158	Let p be an odd prime and . Show that the polynomial X² – a has exactly root in .
2.159	Show that the polynomial X² – p is irreducible in .
2.160	Teichmüller representative Let . Show that there exists a unique such that α^p = α and α ≡ a (mod p).
2.161	Show that the algebraic closure of is of infinite extension degree over . [H]

2.15. Statistical Methods

2.15.1. Random Variables and Their Probability Distributions

A random variable is a variable which can assume (all and only) the values from a (given) sample space.

The probability distribution function or the probability mass function

f_X : S_X → [0, 1]

of a discrete random variable X assigns to each x in the sample space S_X of X the probability of the occurrence of the value x in a random experiment.^[21] We have

^[21] [a, b] is the closed interval consisting of all real numbers u satisfying a ≤ u ≤ b. Similarly, the open interval (a, b) is the set of all real values u satisfying a < u < b. In order to make a distinction between the open interval (a, b) and the ordered pair (a, b), many—mostly Europeans—use the notation ]a, b[ for denoting open intervals.

^[22] More correctly, Pr(X = x) = 0 for each .

with the implication that the probability that X occurs in the interval [c, d] (or (c, d)) is given by the integral

that is, by the area between the x-axis, the curve f_X(x) and the vertical lines x = c and x = d. We have

It is sometimes useful to set f_X(x) :=0 for , so that f_X is defined on the entire real line .

The cumulative probability distribution of a random variable X (discrete or continuous) is the function F_X (x) := Pr(X ≤ x) for all . If X is continuous, we have

which implies that

2.15.2. Operations on Random Variables

Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y)

for all x, y.

Example 2.36.

The balls are drawn with replacement, that is, after the first ball is drawn, it is returned back to the urn (and the urn is shaken well), before the next ball is drawn. The joint probability distribution is now as follows:

In this case, the outcome of the second drawing is not influenced by the outcome of the first drawing; that is, X and Y are independent, and we have , as expected.
The balls are drawn without replacement, that is, the ball obtained by the first drawing is not returned to the urn, before the second ball is drawn. In this case, the outcome of the second drawing is influenced by that of the first drawing in the sense that the same ball cannot be drawn on both occasions. Thus, X and Y are now dependent. This is revealed by the following joint probability distribution:
x y Pr(X = x, Y = y)
1 1 0
1 2 1/6
1 3 1/6
2 1 1/6
2 2 0
2 3 1/6
3 1 1/6
3 2 1/6
3 3 0

For continuous random variables X and Y, the joint distribution is defined by the probability density function f_X,Y (x, y) and the cumulative distribution is obtained by the double integral

X and Y are independent, if f_X,Y (x, y) = f_X(x)f_Y (y) for all x, y. In this case, we also have F_X,Y (c, d) = F_X(c)F_Y (d) for all c, d.

The product XY of X and Y is defined to be a random variable V which assumes the values v = xy for and with probability

For , the random variable W = αX assumes the values w = αx for with probability

f_W(w) = Pr(W = αx) = Pr(X = x) = fX(x).

Example 2.37.

Let us consider the random variables X and Y of Example 2.36. For the sake of brevity, we denote Pr(X = x, Y = y) by P_xy. The distributions of U = X + Y in the two cases are as follows:

Drawing with replacement:
Pr(U = 2) = P₁₁ = 1/9
Pr(U = 3) = P₁₂ +P₂₁ = 2/9
Pr(U = 4) = P₁₃ +P₂₂ + P₃₁ = 1/3
Pr(U = 5) = P₂₃ +P₃₂ = 2/9
Pr(U = 6) = P₃₃ = 1/9
Drawing without replacement:
Pr(U = 3) = P₁₂ +P₂₁= 1/3
Pr(U = 4) = P₁₃ +P₃₁= 1/3
Pr(U = 5) = P₂₃ +P₃₂= 1/3

If X and Y are continuous random variables with joint density f(x, y) and , the conditional probability density function of X|y (X given Y = y) is defined by

Again if X and Y are independent, we have f_X|y(x) = f_X(x) for all x, y.

For a fixed , one can likewise define the conditional probabilities f_Y|x (y) := f(x, y)/f_X (x) for all .

Let X and Y be discrete random variables with joint distribution f(x, y). Also let Γ ⊆ S_X and Δ ⊆ S_Y. One defines the probability f_X(Γ) as:

The joint probability f(Γ, Δ), is defined as:

If Γ = {x} is a singleton, we prefer to write f(x, Δ) instead of f({x}, Δ). Similarly, f(Γ, y) stands for f (Γ,{y}). We also define the conditional distributions:

We abbreviate f_X|Δ (Γ) as Pr(Γ|Δ) and f_Y|Γ (Δ) as Pr(Δ|Γ).

Theorem 2.64. Bayes rule

Let X, Y be discrete random variables and Δ ⊆ S_Y with f_Y (Δ) > 0. Also let Γ₁,..., Γ_n form a partition of S_X with f_X (Γ_i) > 0 for all i = 1, . . . , n. Then we have:

that is, in terms of probability:

Proof

The Bayes rule relates the a priori probabilities Pr(Γ_j) and Pr(Δ|Γ_j) to the a posteriori probabilities Pr(Γ_i|Δ). The following example demonstrates this terminology.

Example 2.38.

Consider the random experiment of Example 2.36(2). Take Γ_j := {j} for and Δ := {2, 3}. We have the following a priori probabilities:

Pr(Γ_j)	=	Probability of getting ball j in the first draw = 1/3,
Pr(Δ\|Γ₁)	=	Probability of getting the second or the third ball in the second draw, given that the first ball is obtained in the first draw = 1,
Pr(Δ\|Γ₂)	=	Probability of getting the second or the third ball in the second draw, given that the second ball is obtained in the first draw = 1/2,
Pr(Δ\|Γ₃)	=	Probability of getting the second or the third ball in the second draw, given that the third ball is obtained in the first draw = 1/2.

One can similarly calculate . This is expected, since the only events (x, y) consistent with are the four equiprobable possibilities (1, 2), (1, 3), (2, 3) and (3, 2).

2.15.3. Expectation, Variance and Correlation

Let X be a random variable. The expectation E(X) of X is defined as follows:

X is discrete:

X is continuous:

[View full size image]

Let g(X) and h(Y) be real polynomial functions of the random variables X and Y and let . Then

E(g(X) + h(Y))	=	E(g(X)) + E(h(Y)),
E(g(X)h(Y))	=	E(g(X)) E(h(Y)) if X and Y are independent,
E(αg(X))	=	αE(g(X)).

Let us derive the sum and product formulas for discrete variables X and Y.

If X and Y are independent, then

The variance Var(X) of a random variable X is defined as

Var (X) := E[(X – E(X))²].

From the observation that E[(X – E(X))²] = E[X² – 2 E(X)X + [E(X)]²] = E(X²) – 2 E(X) E(X) + [E(X)]², we derive the computational formula:

Var (X) = E[X²] – [E(X)]².

The following formulas can be easily verified:

Var(X + α)	=	Var(X).
Var(αX)	=	α² Var(X).
Var(X + Y)	=	Var(X) + Var(Y) + 2 Cov(X, Y),

where and where the covariance Cov(X, Y) of X and Y is defined as:

Cov(X, Y) := E[(X – E(X))(Y – E(Y))] = E(XY) – E(X) E(Y).

Normalized covariance is a measure of correlation between the two random variables X and Y. More precisely, the correlation coefficient ρ_X,Y is defined as:

Example 2.39.

Pr(X = 1)	= P₁₁ + P₁₂ + P₁₃	= 0 + (1/6) + (1/6)	= 1/3
Pr(X = 2)	= P₂₁ + P₂₂ + P₂₃	= (1/6) + 0 + (1/6)	= 1/3
Pr(X = 3)	= P₃₁ + P₃₂ + P₃₃	= (1/6) + (1/6) + 0	= 1/3

Pr(Y = 1)	= P₁₁ + P₂₁ + P₃₁	= 0 + (1/6) + (1/6)	= 1/3
Pr(Y = 2)	= P₁₂ + P₂₂ + P₃₂	= (1/6) + 0 + (1/6)	= 1/3
Pr(Y = 3)	= P₁₃ + P₂₃ + P₃₃	= (1/6) + (1/6) + 0	= 1/3

E(X²) = E(Y²) = 1² × (1/3) + 2² × (1/3) + 3² × (1/3) = 14/3 and Var(X) = Var(Y) = (14/3) – 2² = 2/3. The probability distribution for XY is

E(XY = 2)	=	P₁₂ + P₂₁ = 1/3
E(XY = 3)	=	P₁₃ + P₃₁ = 1/3
E(XY = 6)	=	P₂₃ + P₃₂ = 1/3,

so that E(XY) = 2 × (1/3) + 3 × (1/3) + 6 × (1/3) = 11/3. Therefore, Cov(XY) = E(XY) – E(X) E(Y) = (11/3) – 2 × 2 = –1/3, that is,

2.15.4. Some Famous Probability Distributions

Uniform distribution

A discrete uniform random variable U has sample space S_U := {x₁, . . . , x_n} and probability distribution

A continuous uniform random variable U has sample space S_U and probability density function

where A > 0 is the size^[23] of S_U. For example, if S_U is the real interval [a, b] for a < b, we have

^[23] If , “size” means length. If or , “size” refers to area or volume respectively. We assume that the size of S_U is “measurable”.

In this case, we have

E(U) = (a + b)/2

and

Var(U) = (b – a)²/12.

Bernoulli distribution

as follows from simple combinatorial arguments. The mean and variance of B are:

E(B) = np

and

Var(B) = np(1 – p).

The Bernoulli distribution is also called the binomial distribution.

Normal distribution

The normal random variable or the Gaussian random variable N = N (μ, σ²) is a continuous random variable characterized by two real parameters μ and σ with σ > 0. The density function of N is

The cumulative distribution for N can be expressed in terms of the error function erf():

Figure 2.3. Standard normal distribution

[View full size image]

Some statistical properties of N are:

E(N) = μ

and

Var(N) = σ².

The curve f_N (x) is symmetric about x = μ. Most of the area under the curve is concentrated in the region μ – 3σ ≤ x ≤ μ + 3σ. More precisely:

Pr(μ – σ ≤ X ≤ μ + σ)	≈	0.68,
Pr(μ – 2σ ≤ X ≤ μ + 2σ)	≈	0.95,
Pr(μ – 3σ ≤ X ≤ μ + 3σ)	≈	0.997.

2.15.5. Sample Mean, Variation and Correlation

Suppose that S := (x₁, x₂, . . . , x_n) is a sample of size n. We assume that all x_i are real numbers. We define the following quantities for S:

[View full size image]

Here is the mean of the collection .

If T := (y₁, y₂, . . . , y_m) is another sample (of real numbers), the (linear) relationship between S and T is measured by the following quantities:

Here is the mean of the collection ST := (x_iy_j | i = 1, . . . , n, j = 1, . . . , m).

An important property of the normal distribution is the following:

Theorem 2.65. Central limit theorem

Exercise Set 2.15

2.162

An urn contains n₁ red balls and n₂ black balls. We draw k balls sequentially and randomly from the urn, where 1 ≤ k ≤ n₁ + n₂.

If the balls are drawn with replacement, what is the probability that the k-th ball drawn from the urn is red?
If the balls are drawn without replacement, what is the probability that the k-th ball drawn from the urn is red?

2.163

Let X and Y be the random variables of Example 2.36. For each of the two cases, calculate the probability distribution functions, expectations and variances of the following random variables:

XY
2X + 3Y
X²
X² + 2XY + Y²
(X + Y)²

2.164

Let X and Y be continuous random variables, g(X) and h(Y) non-constant real polynomials and α, β,

. Prove that:

E(g(X) + h(Y))	=	E(g(X)) + E(h(Y)).
E(g(X)h(Y))	=	E(g(X)) E(h(Y)), if X and Y are independent.
E(αg(X))	=	αE(g(X)).
Var(αX + βY + γ)	=	α² Var(X) + β² Var(Y).

2.165

Let X be a random variable and Y := αX + β for some α,

. What is ρ_X,Y ?

2.166

Let X and Y be discrete random variables with joint probability distribution function f(x, y). Show that the probability distributions of X and Y can be obtained as
If X and Y are continuous random variables with joint density function f(x, y), show that the density functions of X and Y are given by

The functions f_X and f_Y are called the marginal probability distribution (or density function) of X and Y respectively.

2.167

Let X and Y be continuous random variables whose joint distribution is the uniform distribution in the triangle 0 ≤ X ≤ Y ≤ 1.

Compute the marginal distributions f_X and f_Y.
Compute E(X), E(Y), Var(X), Var(Y), Cov(X, Y) and ρ_X,Y.

2.168

Let X, Y, Z be random variables. Show that:

Cov(X, Y)	=	Cov(Y, X).
ρ_X,Y	=	ρ_Y,X.
Cov(X, X)	=	Var(X).
Cov(X, Y + Z)	=	Cov(X, Y) + Cov(X, Z).
Cov(X, X + Y)	=	Var(X) + Cov(X, Y).
Cov(X, X + Y)	=	Var(X) if X and Y are independent.

2.169

What if p = 0?

2.170

Poisson distribution Let P = P (λ) be the discrete random variable with and with , where λ is a positive real constant. Show that E(P) = Var(P) = λ.

2.171

Exponential distribution

Let X = X(λ) be the continuous random variable with density

where λ is a positive real constant. Show that:
A random variable Y with is said to be memoryless, if
Pr(Y > s + t | Y > s) = Pr(Y > t) for all s, .

Show that the exponential variable X of Part (a) is memoryless.

2.172

The birthday paradox Let S be a finite set of cardinality n.

Show that the probability that k < n elements, drawn at random form S (with replacement), are (pairwise) distinct is
Use the inequality 1 – x ≤ e^–x for any real number x to show that .
Deduce that p ≤ 1/2, if , and that p ≤ 0.136 for .
(The birthday paradox states that if only 23 people are chosen at random, there is a chance as high as 50 per cent that at least two of them have the same birthday.)

3.2. Complexity Issues

3.2.1. Order Notations

We start with the following important definitions.

Definition 3.1.

Let f and g be positive real-valued functions of natural numbers.

f is said to be bounded above by g or of the order of g, denoted f = O(g), if there exists an and a positive real constant c such that f(n) ≤ cg(n) for all n ≥ n₀. In this case, we also say that g is bounded below by f and denote this by g = Ω(f).
If f = O(g) and g = O(f), we say that f and g are of the same order and denote this by f = Θ(g) (or by g = Θ(f)). Equivalently, f = Θ(g) if and only if f = O(g) and f = Ω(g); that is, if and only if there exist an integer and real positive constants c₁, c₂ such that c₁g(n) ≤ f(n) ≤ c₂g(n) for all n ≥ n₀.
f is said to be of strictly lower order than g, denoted f = o(g), if f(n)/g(n) tends to 0 as n tends to infinity. In other words, f = o(g) if and only if for every real positive constant c (however small it may be) there exists an integer such that f(n) < cg(n) for all n ≥ n_c. If f = o(g), we also say that g is of strictly higher order than f and denote this by g = ω(f). Thus g = ω(f) if and only if for every real positive constant c (however large it may be) there exists an integer such that g(n) > cf(n) for all n ≥ n_c.

Example 3.1.

Let f(n) := a_dn^d + · · · + a₁n + a₀ with d ≥ 0, , a_d > 0. Then f = Θ(n^d). This heuristically means that as n becomes sufficiently large, the leading term a_dn^d dominates over the other terms, and apart from the constant of proportionality a_d the function f(n) grows with n as n^d does. If f = Θ(n^d) for some integer d > 0, we say that f is of polynomial order in n.^[1] A Θ(1) function is often called a constant function.
^[1] This is not the complete truth. Functions like , n^2.3 or n³(log n)² would be better included in the polynomial family. Thus, we may define f to be of polynomial order (in n), if f = O(n^d) and f = Ω(n^d′) for some positive real constants d, d′. Similar comments hold for poly-logarithmic and exponential orders.
If f = Θ((log n)^a) for some real a > 0, we say that f is of poly-logarithmic order in n. By Exercise 3.2(b), any function of poly-logarithmic order grows asymptotically slower than any function of polynomial order.
If f = Θ(aⁿ) for some real a > 1, f said to be of exponential order in n. Again by Exercise 3.2(b) any function of exponential order grows asymptotically faster than any function of polynomial order.
Now, consider a function of the form

Equation 3.1

for real c > 0 and for 0 ≤ α ≤ 1. For α = 0, we have f = Θ(n^c); that is, f is of polynomial order. On the other extreme, if α = 1, f = Θ(aⁿ), where a := exp(c), that is, f is of exponential order. If 0 < α < 1, we say that f is of subexponential order in n, since the order of f is somewhere in between polynomial and exponential. We will come across functions of subexponential orders quite frequently in the rest of the book. Note that as α increases from 0 to 1, the order of f also increases monotonically from polynomial to exponential.
A function f = O(n^a(log n)^b) with a > 0 and b ≥ 0 is often denoted by the soft O-notation: f = O^~(n^a). This implies that up to multiplication by a polynomial in log n the function f is of the order of n^a. Similarly, if f = O(aⁿg(n)) for a > 1 and for some g(n) of polynomial order, we say that f = O^~(aⁿ). Intuitively spoken, the O-notation hides constant multipliers, whereas the soft O-notation suppresses exponentially small multipliers.
The notion of order can be readily extended to functions with two or more input variables. For example, for positive real-valued functions f, g of two positive integer variables m, n one says f = O(g), if for some m₀, and for some positive real constant c one has f(m, n) ≤ cg(m, n) for all m ≥ m₀ and n ≥ n₀. The function f(m, n) = m³2ⁿ is of polynomial order in m, but of exponential order in n.

^[2] The practical running time of an algorithm may vary widely depending on its implementation and also on the processor, the compiler and even on run-time conditions. Since we are talking about the order of growth of running times in relation to the input size, we neglect the constants of proportionality and so these variations are usually not a problem. If one plans to be more concrete, one may measure the running time by the number of bit operations needed by the algorithm.

3.2.2. Randomized Algorithms

3.2.3. Reduction Between Computational Problems

Exercise Set 3.2

3.1

Sort the following functions in the increasing sequence of order. (Don’t mind if some of these functions are not defined for a few values of n.)
10¹², 2ⁿ, 2^2ⁿ, 2^n², 100n², 10^–3n³, 1/n, , n!, nⁿ,
log n, (log n)/n, n/log n, n² log n, n(log n)², (0.1)^{log n}, (log n)ⁿ,
1/log n, , 10⁶(log n)¹⁰⁰, log log n, 2^{log log n}, n^{log log n},
, , ,
exp(n^1/3(ln n)^2/3), exp((ln n)^1/3(ln ln n)^2/3).
Evaluate the functions of Part (a) at n = 10ⁱ for i = 1, 2, . . . , 10 and conclude that as n gets larger, the asymptotic ordering tallies with the actual ordering more correctly.

3.2

Show that for any real a > 1 and b > 0 one has n^b = o(aⁿ).
For any positive real c, d, show that (log n)^c = o(n^d).
Show that if f = O(g) and g = O(h), then f = O(h).
Give an example to show that f = O(g) does not necessarily imply f = Θ(g).
Give an example of a function f with f = O(n^1+ε) for every ε > 0, but f is not O(n).

3.3

		Running times
g(t)	f_b(n)	f_w(n)	f_a(n)	f_e(n)
t	0	n	n/2	n/2
t²	0	n²	n(n + 1)/4	n²/4
2^t	1	2ⁿ	(3/2)ⁿ

3.4

Show that an exponential-space (resp. subexponential-space) algorithm must be (at least) exponential-time (resp. subexponential-time) too. You may assume that at a time a computing device can access (read/write) at most a finite number of memory locations.
Give an example of an algorithm that is exponential-time but polynomial-space.

3.5

Consider the Las Vegas algorithm discussed in Section 3.2.2 for generating a random irreducible polynomial of degree n over

. Assume that a randomly chosen polynomial in

of degree n has (an exact) probability of 1/n for being irreducible. Find out the probability p_r that r polynomials chosen randomly (with repetition) from

3.6

3.7

Let G be a finite cyclic multiplicative group and let H be the subgroup of G generated by

whose order is known. The generalized discrete logarithm problem (GDLP) is the following: Given

, find out if

and, if so, find an integer x for which a = h^x. Show that GDLP ≅ DLP, if exponentiations in G can be carried out in polynomial time and if DLP in H is polynomial-time equivalent to DLP in G. [H]

3.3. Multiple-precision Integer Arithmetic

3.3.1. Representation of Large Integers

^[3] We assume that a word in the memory is 32 bits long.

3.3.2. Basic Arithmetic Operations

^[4] This is the typical behaviour of a CPU that supports 2’s complement arithmetic.

Addition and subtraction

Algorithm 3.1. Addition of words

Input: Words a_i and b_i and the input carry .

Output: Word c_i and the output carry with a_i + b_i + γ_i = c_i + δ_iℜ.

Steps:

c_i := a_i + b_i.

if (γ_i) { c_i ++, δ_i := ( (c_i ≤ a_i) ? 1 : 0 ). } else { δ_i := ( (c_i < a_i) ? 1 : 0 ). }

Algorithm 3.2. Subtraction of words

Input: Words a_i and b_i and the input borrow .

Output: Word c_i and the output borrow with a_i – b_i – γ_i = c_i – δ_iℜ.

Steps:

if (γ_i) { δ_i := ( (a_i ≤ b_i) ? 1 : 0 ), c_i := a_i – b_i, c_i – –. }

else { δ_i := ( (a_i < b_i) ? 1 : 0 ), c_i := a_i – b_i. }

We urge the reader to develop the complete addition and subtraction procedures for multiple-precision integers, based on the above primitives for words.

Multiplication

Algorithm 3.3. Multiplication of words

Input: Words a and b.

Output: Words c and d with ab = cℜ + d.

Steps:

/* We use a temporary variable t of data type ullong */

t := (ullong)(a) * (ullong)(b), c := (ulong)(t ≫ 32), d := (ulong)t.

Algorithm 3.4. Multiplication of multiple-precision integers

Input: Integers a = (a_r–1 . . . a₀)_ℜ and b = (b_s–1 . . . b₀)_ℜ

Output: The product c = (c_r+s–1 . . . c₀)_ℜ = ab.

Steps:

/* Let T be a variable and t₀, . . . , t_r+s–1 an array of ullong variables */

/* Let v be a variable and u₀, . . . , u_r+s–1 an array of ulong variables */

Initialize the array locations c_i, t_i and u_i to 0 for all i = 0, . . . , r + s – 1.

Squaring

Fast multiplication

Algorithm 3.5. The quadratic loop for squaring

Division

Algorithm 3.6. Euclidean division of multiple-precision integers

Input: Integers a = (a_r–1 . . . a₀)_ℜ and b = (b_s–1 . . . b₀)_ℜ with r ≥ 3, s ≥ 2, a_r–1 ≠ 0, b_s–1 ≥ ℜ/2 and a ≥ b.

Output: The quotient x = (x_r–s . . . x₀)_ℜ = a quot b and the remainder y = (y_s–1 . . . y₀)_ℜ = a rem b of Euclidean division of a by b.

Steps:

Bit-wise operations

Algorithm 3.7. Left-shift of multiple-precision integers

Input: Integer a = (a_r–1 . . . a₀)_ℜ ≠ 0, a_r–1 ≠ 0, and .

Output: The integer c = (c_s–1 . . . c₀)_ℜ = a · 2^t, c_s–1 ≠ 0.

Steps:

3.3.3. GCD

Algorithm 3.8. Extended binary gcd

Input: Two positive integers a, b with a ≥ b and b odd.

Output: Integers d, u and v with d = gcd(a, b) = ua + vb > 0. If (a, b) ≠ (1, 1), then |u| < b and |v| < a.

Steps:

In order to prove the correctness of Algorithm 3.8, we introduce the sequence of integers x_k, y_k, u_1,k, u_2,k, v_1,k and v_2,k for k = 0, 1, 2, . . . , initialized as:

x₀ := b,	u_{1, 0} := 1,	v_{1, 0} := 0,
y₀ := r,	u_{2, 0} := 0,	v_{2, 0} := 1.

u_1,kx₀ + v_1,ky₀	=	x_k,
u_2,kx₀ + v_2,ky₀	=	y_k.

3.3.4. Modular Arithmetic

Modular exponentiation

Algorithm 3.9. Modular exponentiation: square-and-multiply algorithm

Input: , .

Output: .

Steps:

Algorithm 3.10. Modular exponentiation: windowed square-and-multiply algorithm

Input: , .

Output: .

Steps:

Montgomery exponentiation

Algorithm 3.11. Montgomery multiplication

Input: and (Montgomery representations of x, ).

Output: Montgomery representation of .

Steps:

Algorithm 3.12. Montgomery exponentiation

Input: , .

Output: b = a^e (mod n).

Steps:

/* Precomputations */
n′ := –n (mod ℜ). , .

/* The square-and-multiply loop */

[View full size image]

Exercise Set 3.3

3.8

Let

, ℜ > 1. Show that every

can be represented uniquely as a tuple (a_s–1, . . . , a₁, a₀) for some

(depending on a) with

a = a_s–1ℜ^s–1 + · · · + a₁ℜ + a₀,

3.9

Let

. Show that every

can be written uniquely as

a = a_sR^s + a_s–1R^s–1 + · · · + a₁R + a₀

with each .

3.10

Negative radix Show that every integer can be written as

a = a_s(–2)^s + a_s–1(–2)^s–1 + · · · + a₁(–2) + a₀

with . Moreover, if we force that a_s ≠ 0 for a ≠ 0 and that s = 0 for a = 0, argue that this representation is unique.

3.11

/* Representation 1 */
typedef struct {
   int size;
   boolean sign;
   ulong digits[256];
} cryptInt1;

/* Representation 2 */
typedef ulong cryptInt2[258];
#define signIdx 0
#define sizeIdx 1

/* Representation 3 */
typedef ulong cryptInt3[258];
#define signIdx 256
#define sizeIdx 257

Remark: We recommend the third representation.

3.12

3.13

Write an algorithm which, given two multiple-precision integers a and b, compares the absolute values |a| and |b|. Also write an algorithm to compare a and b as signed integers.

3.14

Write an algorithm that uses the Euclidean gcd loop (Proposition 2.15) to compute the gcd d of two integers a and b. (Observe that gcd(a, b) = gcd(b, a rem b) for b ≠ 0.)
Modify the Euclidean gcd algorithm of Part (a), so that for given integers a, b we obtain d, u, v with d = gcd(a, b) = ua + vb.

3.15

3.16

3.17

3.18

Let

If 1 = a₁, a₂, . . . , a_l = m is an addition chain for m and if j₁, j₂, . . . , j_l is a permutation of 1, 2, . . . , l with a_j₁ ≤ a_j₂ ≤ · · · ≤ a_{j_l}, show that a_j₁, a_j₂, . . . , a_{j_l} is also an addition chain for m. It, therefore, suffices to consider sorted addition chains only.
Show that m has an addition chain of length ≤ 2 ⌈lg m⌉. [H]
Let G be a (multiplicative) group and . Design an algorithm for computing g^m given an addition chain for m. What is the complexity of the algorithm (in terms of the length of the given addition chain)?
Show that Algorithms 3.9 and 3.10 use addition chains for e of lengths ≤ 2 ⌈lg e⌉.

3.4. Elementary Number-theoretic Computations

3.4.1. Primality Testing

Definition 3.2.

Let n be an odd integer greater than 1 and let with gcd(a, n) = 1. Then n is called a pseudoprime to the base a, if a^n–1 ≡ 1 (mod n).

Definition 3.3.

Definition 3.4.

Algorithm 3.13. Miller–Rabin primality test

Input: An odd integer and an acceptable probability δ of failure.

Output: A certificate that either “n is composite” or “n is prime”.

Steps:

Deterministic primality proving

Equation 3.2

Conjecture 3.1. AKS conjecture

Let n be an odd integer > 1, and with r  n. If

(X – 1)ⁿ ≡ Xⁿ – 1 (mod n, X^r – 1),

then either n is prime or n² ≡ 1 (mod r).

3.4.2. Generating Random Primes

Definition 3.5.

Let p be an odd prime. Then p is called a safe prime, if (p – 1)/2 is also a prime, whereas p is called a strong prime, if

p – 1 has a large prime divisor, say, q,
p + 1 has a large prime divisor, say, q′, and
q – 1 has a large prime divisor, say, q″.

In cryptography, a large prime divisor typically refers to one with bit length ≥ 160.

Algorithm 3.14. Gordon’s strong-prime generator

Input: , t ≥ 400.

Output: A strong prime p of bit length t.

Steps:

l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20, l″ := ⌈t/2⌉ – 22.

3.4.3. Modular Square Roots

Similarly, (–1)^{(a – 1)(b – 1)/4} can be computed using only the second least significant bits of a and b as:

Algorithm 3.15. Computation of the Legendre symbol

Input: An odd prime p and an integer a, 1 ≤ a < p.

Output: The Legendre symbol .

Steps:

b := p, k := 1.

/* Initialize */

/* The Euclidean loop */

[View full size image]

Exercise Set 3.4

3.19

Let

be odd and composite and suppose that there exists (at least) one

with a^n–1 ≢ 1 (mod n). Show that b^n–1 ≢ 1 (mod n) for at least half of the bases

. [H]

Algorithm 3.16. Modular square root

Input: An odd prime p and an integer a, 1 ≤ a < p.

Output: A square root of a modulo p (if existent).

Steps:

if { Return “a does not have a square root modulo p”. }

if (p ≡ 3 (mod 4)) { Return (mod p). }

3.20

Let

be odd and composite.

Show that there exists , such that (mod n). [H]
Show that (mod n) for at least half of the bases . [H]

3.21

Let

be a Carmichael number, that is, a composite integer for which a^n–1 ≡ 1 (mod n) for all a coprime to n, that is, ord_n(a)|(n – 1) for all

. Prove that:

(p – 1)|(n – 1) for every prime divisor p of n. [H]
n is odd. [H]
n is square-free. [H]
n has at least three distinct prime divisors.

3.22

Let be a square-free composite integer, such that (p – 1)|(n – 1) for every prime divisor p of n. Show that n is a Carmichael number.
Demonstrate that 561 = 3 × 11 × 17; 2,821 = 7 × 13 × 31; and 172,081 = 7 × 13 × 31 × 61 are Carmichael numbers.
Assume that for some the integers p₁ := 6k + 1, p₂ := 12k + 1 and p₃ := 18k + 1 are prime. Prove that p₁p₂p₃ is a Carmichael number.
Deduce that 1,729 = 7 × 13 × 19 and 294,409 = 37 × 73 × 109 are Carmichael numbers.

3.23

3.24

Pépin’s test for Fermat numbers Show that the Fermat number n := 2^{2^k} + 1 is prime if and only if 3^{(n – 1)/2} ≡ –1 (mod n).

3.25

Write an algorithm that, given natural numbers t, l with l < t, outputs a (probable) prime p of bit length t such that p – 1 has a (probable) prime divisor q of bit length l.

3.26

Let

Show that the ring is (canonically) isomorphic to the ring . In view of this, we write f(X) ≡ g(X) (mod n) to mean either that the coefficients of f are congruent modulo n to the respective coefficients of g or that the polynomials f(X) and g(X) are congruent modulo the principal ideal of generated by n.
Prove that if n is a prime, then (X + a)ⁿ ≡ Xⁿ + a (mod n) for every .
Prove that for composite n there exists , 1 < k < n, with . Deduce that in this case (X + a)ⁿ ≢ Xⁿ + a (mod n) for some .
Let and let be the canonical image of h(X) in . Show that the ring is isomorphic to the ring .

3.27

Modify Algorithm 3.15 to compute the (generalized) Jacobi symbol

for odd

and for arbitrary

3.28

A Implement the Chinese remainder theorem for integers, that is, write an algorithm that takes as input pairwise relatively prime moduli

and integers

for i = 1, . . . , r and that outputs

with a ≡ a_i (mod n_i) for all i = 1, . . . , r. [H]

3.29

Let f(X) be a non-constant polynomial in

Let the congruence f(x) ≡ 0 (mod p^e), , have a solution x ≡ a (mod p^e). Show that if an integer a′ := a + kp^e solves the congruence f(x) ≡ 0 (mod p^{e + 1}), then k satisfies the congruence
f′(a)k ≡ –f(a)/p^e (mod p).
Here f(a)/p^e means integer division. Demonstrate that this congruence may have 0, 1 or p solutions (for k) depending on the values of f′(a) and f(a)/p^e. Each such k gives a solution a′ of f(x) ≡ 0 (mod p^{e + 1}) with a′ ≡ a (mod p^e). We say that the solution a′ (modulo p^{e + 1}) is obtained from the solution a (modulo p^e) by (Hensel) lifting.
Lifting together with the Chinese remainder theorem allow us to reduce the problem of solving a polynomial congruence modulo an arbitrary modulus to the problem of solving the same congruence modulo the prime divisors of n. More precisely, if the prime factorization of n and all the solutions of the congruences f(x) ≡ 0 (mod p_i) for all i = 1, . . . , r are given, design an algorithm to compute all the solutions of the congruence f(x) ≡ 0 (mod n).

3.30

Let

be odd and

. Deduce that the congruence x² ≡ a (mod n) has exactly

solutions modulo n.

3.31

Show that Algorithm 3.17 correctly computes

for

. Specify a strategy to initialize a before the while loop. Determine how Algorithm 3.17 can be used to check if a given

is a perfect square. [H]

Algorithm 3.17. Integer square root

Input: .

Output: .

Steps:

Using bit operations initialize a to an integral value x, .
while (1) {    /* Newton’s iteration loop */
   b := ⌊(a + ⌊n/a⌋)/2⌋.
   if (a ≤ b) { Return a. }
   a := b.
}

3.32

Design an algorithm that, given n, , computes . [H]
Design an algorithm to check if a given is an integral power of another integer.

3.5. Arithmetic in Finite Fields

3.5.1. Arithmetic in the Ring

We now describe the arithmetic functions on two non-zero polynomials

Equation 3.3

Algorithm 3.18. Polynomial addition

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X) + b(X) (to be stored in the array γ_{τ – 1} . . . γ₁γ₀).

Steps:

Algorithm 3.19. Polynomial multiplication

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X)b(X) (to be stored in the array γ_{τ – 1} . . . γ₁γ₀).

Steps:

The square of can be computed very easily using the fact that

a(X)² = (a_rX^r + · · · + a₁X + a₀)² = a_rX^2r + · · · + a₁X² + a₀.

This gives us a linear-time (in terms of r or ρ) algorithm instead of the quadratic general-purpose multiplication Algorithm 3.19. We leave the implementational details to the reader.

Algorithm 3.20. Euclidean division of polynomials

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X) quot b(X) (to be stored in the array γ_{τ – 1} . . . γ₁γ₀) and d(X) = a(X) rem b(X) (to be sored in the array δ_ρ–1 . . . δ₁δ₀).

Steps:

Algorithm 3.21. Extended gcd of polynomials

Input: Nonzero polynomials a, .

Output: Polynomials d, u, satisfying

d = gcd(a, b) = ua + vb, deg u < deg b, deg v < deg a.

Steps:

3.5.2. Finite Fields of Characteristic 2

Algorithm 3.22. Check for irreducibility of a polynomial

Input: A non-constant polynomial .

Output: A (deterministic) certificate whether f is irreducible or not.

Steps:

n := deg f, g := X.
for i = 1, . . . , ⌊n/2⌋ {
g := g^q (mod f). /* Here g = X^qⁱ rem f */
if (deg(gcd(f, g – X)) > 0) { Return “f is reducible”. }
}
Return “f is irreducible”.

Algorithm 3.23. Generation of a random irreducible polynomial

Input: , n ≥ 2.

Output: A random monic irreducible polynomial of degree n.

Steps:

while (1) {
f := a random monic polynomial in of degree n.
if (f is irreducible) { Return }
}

3.5.3. Selecting Suitable Finite Fields

M₇₆₉ = 2⁷⁶⁹ – 1 = 1,591,805,393 × 6,123,566,623,856,435,977,170,641 × q′,

where q′ is a 657-bit prime. These tables should be consulted for choosing a suitable value of n.

Theorem 3.1.

Let , m ≥ 5. Then φ(m)/m ≥ 1/(6 ln ln m).

Algorithm 3.24. Check for primitive element

Input: A cyclic group G of cardinality #G = m with known factorization and an element .

Output: A deterministic certificate that a is a generator of G.

Steps:

Algorithm 3.25. Computation of a generator of a finite cyclic group

Input: A cyclic group G of cardinality #G = m with known factorization .

Output: A generator g of G.

Steps:

while (1) {
g := a random element of G.
if (g is a generator of G) /* Algorithm 3.24 */ { Return }
}

Algorithm 3.26. Computation of an element of given order

Input: A finite field and an (odd) prime factor q′ of q – 1 with q′ < q – 1.

Output: An element of multiplicative order q′.

Steps:

while (1) {
   a := a random element of  \ {0, ±1}.
   b := a^{(q – 1)/q′}.
   if (b ≠ 1) { Return }
}

3.5.4. Factoring Polynomials over Finite Fields

The factorization algorithm we are going to discuss is a generalization of the root finding algorithm (see Exercise 3.36) and consists of three steps:

Square-free factorization (SFF) Decompose the input polynomial f into a product of square-free polynomials.

Distinct-degree factorization (DDF) Given a square-free polynomial f of degree d, compute f = f₁ . . . f_d with each f_i being a product of irreducible polynomials of degree i.

Equal-degree factorization (EDF) Given a product f of irreducible polynomials of the same degree, find out the irreducible factors of f.

We now provide a separate detailed discussion for each of these three steps.

Square-free factorization

Theorem 3.2 is at the very heart of the square-free factorization algorithm and is a generalization of Exercise 2.61.

Theorem 3.2.

Proof

Algorithm 3.27. Square-free factorization

Input: A monic non-constant polynomial , q = pⁿ, p prime, .

Output: A square-free factorization of f.

Steps:

Distinct-degree factorization

^[5] Conventionally, an empty product is taken to be the multiplicative identity and an empty sum to be the additive identity.

Algorithm 3.28 shows an implementation of the DDF. Though the algorithm does not require f to be monic, there is no harm in assuming so.

Algorithm 3.28. Distinct-degree factorization

Input: A (non-constant) square-free polynomial .

Output: The DDF of f, that is, the polynomials f₁, . . . , f_d as explained above.

Steps:

Equal-degree factorization

Theorem 3.3.

Let g be any polynomial in and let . Then X^{q^δ} – X divides g^{q^δ} – g.

Proof

Exercise Set 3.5

3.33

Find a (polynomial-basis) representation of

. Compute a primitive element in this representation.

3.34

Show that the running time of Algorithm 3.20 is O(s(r – s)) which reaches the maximum order of O(r²) = O(s²), when s ≈ r/2.
Suppose b is known to have e non-zero coefficients. Modify the Euclidean division loop of Algorithm 3.20 so that the algorithm runs in time O((r –s)e). [H] In particular, if e = O(1), the running time of Algorithm 3.20 becomes linear, namely O(r).

3.35

Implement the polynomial arithmetic of

given that of

3.36

Let q = pⁿ (p prime and

a non-constant polynomial and let g := gcd(f, X^q – X).

If S is the set of all roots of f in , show that . Thus, g is a square-free polynomial which splits over and has the same roots (over ) as f. If deg g = 0 or 1, then we know all the roots of g and hence of f. So, for the rest of this exercise, we assume that deg g ≥ 2.

Consider the case that p is odd. Let be arbitrary. Show that

(X + b)((X + b)^(q–1)/2 – 1)((X + b)^(q–1)/2 + 1) = X^q – X

and that

g = gcd(g, X + b) gcd(g, (X + b)^(q–1)/2 – 1) gcd(g, (X + b)^(q–1)/2 + 1).

Explain how Algorithm 3.29 produces two non-trivial factors of g (over ) in probabilistic polynomial time. [H] Write an algorithm to compute all the roots of f in .

Algorithm 3.29. Computing roots of a polynomial: odd characteristic

Input: A square-free polynomial that splits over .

Output: Polynomials g₁, with g = g₁g₂ and deg g_i ≥ 1 for i = 1, 2.

Steps:

Now, assume that p = 2 and define the polynomial

Let be arbitrary. Show that
H(X + b)(H(X + b) + 1) = X^q – X
[H] and that
g(X) = gcd(g(X), H(X + b)) gcd(g(X), H(X + b) + 1).

Explain how Algorithm 3.30 produces two non-trivial factors of g (over ) in probabilistic polynomial time. Write an algorithm to compute all the roots of f in .

Algorithm 3.30. Computing roots of a polynomial: characteristic 2

Input: A square-free polynomial that splits over .

Output: Polynomials g₁, with g = g₁g₂ and deg g_i ≥ 1 for i = 1, 2.

Steps:

3.37

Use Exercise 3.36 to compute all the roots of the following polynomials:

X⁶ + 6X⁴ + 4X² + 6 in .
X³ + (α² + α)X² + (α² + α + 1) in , where is represented as , α being a root of the polynomial X³ + X + 1.

3.38

Let f and g be two monic irreducible polynomials over

and of the same degree

. Consider the two representations

. In this exercise, we study how we can compute an isomorphism between these two representations. The polynomial f(Y) splits into linear factors over

. Consider a root α = α(Y) of f(Y) in

. Show that 1, α, α², . . . , α^n–1 is an

-basis of (the

-vector space)

. For i = 0, . . . , n – 1, write (uniquely)

with

, and consider the matrix A = (α_ij)_{0≤i≤n–1, 0≤j≤n–1}. Show that the map

-isomorphism.

3.39

Let q = pⁿ for a prime p and

. We have seen that the elements of

can be represented as integers between 0 and p – 1, whereas the elements of

can be represented as polynomials modulo some irreducible polynomial

of degree n, that is, as polynomials of

of degrees < n. Show that the substitution X = p in the polynomial representation of elements of

gives a representation of elements of

as integers between 0 and q – 1. We call this latter representation of elements of

the packed representation. Compare the advantages and disadvantages of the packed representation over the polynomial representation.

3.40

Let G be a cyclic multiplicatively written group of order m (and with the identity element e). Assume that the factorization of

is known. Devise an algorithm that computes the order of an arbitrary element in G. [H]

3.41

Consider the ring

Show that . [H] A is an -vector space of dimension d.
Consider the map that maps x = X + 〈f(X)〉 to x^q – x. Show that is an -linear transformation with Ker , and so the nullity of equals the number of irreducible factors of f.
Let Q be the matrix of with respect to the basis 1, x, . . . , x^d–1. Describe an algorithm to compute Q. Also design an algorithm to compute a basis of Ker .
Show that if , then

For a suitable h(X), this is a non-trivial factorization of f. This procedure is efficient, when q is small.
Use Berlekamp’s method to factor X⁶ + X⁵ + X² + 1 over .

*3.6. Arithmetic on Elliptic Curves

3.6.1. Point Arithmetic

**3.6.2. Counting Points on Elliptic Curves

The SEA algorithm

Algorithm 3.31. SEA algorithm for elliptic curve point counting

Input: A prime field , p odd, and an elliptic curve E defined over .

Output: The order of the group .

Steps:

Find (the smallest) such that the product .
for i = 1, 2, . . . , r { Compute with t ≡ t_i (mod q_i). }
Compute t by combining t₁, t₂, . . . , t_r using the Chinese Remainder Theorem.

In terms of polynomials, the last relation is equivalent to

Equation 3.4

The Satoh–FGH algorithm

turns out to be a discrete valuation ring with maximal ideal , and the residue field is isomorphic to .

Definition 3.6.

The (Teichmüller) lift is the map that takes 0 ↦ 0 and 0 ≠ a ↦ ω(a), where ω(a) is the unique (q – 1)-th root of unity in satisfying π(ω(a)) = a (cf. Exercise 2.160).

The semi-Witt decomposition of is defined to be the unique sequence a₀, a₁, . . . with such that α has the p-adic expansion .

Equation 3.5

Similarly, if ε = ε₀ is an elliptic curve defined over , application of leads to a sequence of elliptic curves defined over :

Equation 3.6

We need the canonical lifting of an elliptic curve E over to a curve ε over . Explaining that requires some more mathematical concepts:

Definition 3.7.

The kernel ker of an isogeny is defined to be the set . For every non-constant isogeny , the kernel ker is a finite subgroup of E(K).

The multiplication-by-m map of E is an isogeny. If End(E) contains an isogeny not of this type, we call E an elliptic curve with complex multiplication.

Theorem 3.4.

Definition 3.8.

The polynomials , , of Theorem 3.4 are called modular polynomials. As an example,

Φ₂(X, Y)	=	X³ + Y³ – X²Y² + 1488(X²Y + XY²) –
		162,000(X² + Y²) + 40,773,375XY + 8,748,000,000(X + Y) –
		157,464,000,000,000.

The next theorem establishes the foundation for lifting curves from to .

Theorem 3.5. Lubin–Serre–Tate

With this definition of lifting of elliptic curves, Cycles (3.5) and (3.6) satisfy the following commutative diagram, where ε_i is the canonical lift of E_i for each i = 0, 1, . . . , n.

Algorithm 3.32. Satoh–FGH algorithm for elliptic curve point counting

Input: An elliptic curve E over , q = pⁿ, p prime, with j-invariant .

Output: The cardinality or equivalently the trace .

Steps:

3.6.3. Choosing Good Elliptic Curves

Algorithm 3.33. Selecting cryptographically suitable elliptic curves

Input: A suitably large finite field .

Output: A cryptographically good elliptic curve E over .

Steps:

4.3. The Integer Factorization Problem

L(n, α, c) := exp((c + o(1))(ln n)^α(ln ln n)^1–α),

Corollary 4.1.

Let , x = O(n^α) and y = L[β] = L(n, 1/2, β). Then we have the asymptotic formula .

4.3.1. Older Algorithms

Trial division

Pollard’s rho method

Pollard’s rho method solves the IFP in an expected O^~(n^1/4) time and is based on the birthday paradox (Exercise 2.172).

Figure 4.1. Iterates in Pollard’s rho method

Algorithm 4.1 takes an expected running time . Since , Pollard’s rho method runs in expected time .

Algorithm 4.1. Pollard’s rho method

Input: A composite integer .

Output: A non-trivial factor of n.

Steps:

Choose a random element and set ξ := x and r := 1.

while (1) {
   for s = 1, . . . , r {
       x := f(x).
       d := gcd(x – ξ, n).
       if (1 < d < n) { Return d. }
   }
   ξ := x.
   r := 2r.
}

Pollard’s p – 1 method

Definition 4.1.

Let . An integer x is called y-power-smooth if, whenever a prime power p^e divides x, we have p^e ≤ y. Clearly, a y-power-smooth integer is y-smooth, but not necessarily conversely.

Algorithm 4.2. Pollard’s p – 1 method

Input: A composite integer , a bound M and all primes q₁, . . . , q_t ≤ M.

Output: A non-trivial factor d of n or “failure”.

Steps:

Select a random integer a, 1 < a < n. /* For example, we may take a := 2 */

Williams’ p + 1 method

4.3.2. The Quadratic Sieve Method

The basic algorithm

Equation 4.1

with x ≢ ±y (mod n). In that case, gcd(x – y, n) is a non-trivial factor of n.

We start with a factor base B = {q₁, . . . , q_t} comprising the first t primes and let and J := H² – n. Then H and J are each and hence for a small integer c the right side of the congruence

(H + c)² ≡ J + 2cH + c² (mod n)

is also . We try to factor T(c) := J + 2cH + c² using trial divisions by elements of B. If the factorization is successful, that is, if T(c) is B-smooth, then we get a relation of the form

Equation 4.2

In order to find suitable combinations for yielding Congruence (4.1), we employ a method similar to Gaussian elimination. Assume that we have collected r relations of the form

We search for integers such that the product

Sieving

As mentioned earlier, Gaussian elimination with sparse equations can also be performed in time L[1]. So Pomerance’s algorithm with sieving takes time L[1].

Incomplete sieving

Large prime variation

Equation 4.3

The multiple polynomial quadratic sieve

T(c) = J + 2cH + c² = (H + c)² – n.

Now, we work with a more general quadratic polynomial

with W > 0 and V² – UW = n. (The original T(c) corresponds to U = J, V = H and W = 1.) Then we have , that is, in this case a relation looks like

Parallelization

TWINKLE: Shamir’s factoring device

^[1] An LED (light emitting diode) is an electronic device that emits light, when current passes through it. A GaAs(Gallium arsenide)-based LED emits (infra-red) light of wavelength ~870 nano-meters. In the operational range of an LED, the intensity of emitted light is roughly proportional to the current passing through the LED.

Figure 4.2. Working of TWINKLE

The reasons, why TWINKLE speeds up the sieving procedure over software implementations in conventional PCs, are the following:

Silicon-based PC chips at present can withstand clock frequencies on the order of 1 GHz. On the contrary a GaAs-based wafer containing the LED array can be clocked faster than 10 GHz.
There is no need to initialize the array (to log |T(c)| or zero). Similarly at the end, there is no need to compare the final values in all these array locations with a threshold.
The addition of all the log q_i values effective at a given c is done instantly by analog optical means. We do not require an explicit electronic adder.

Shamir [269] reports the full details of a VLSI^[2] design of TWINKLE.

^[2] very large-scale integration

*4.3.3. Factorization Using Elliptic Curves

Algorithm 4.3. Elliptic curve method (ECM)

Input: A composite integer (with no small prime factors).

Output: A non-trivial divisor d of n.

Steps:

**4.3.4. The Number Field Sieve Method

f is monic, so that .
is monogenic.
is a PID.

Consider the ring homomorphism

Equation 4.4

Applying Φ then yields

This is a relation for the SNFSM. After relations are available, Gaussian elimination modulo 2 (as in the case of the QSM) is expected to give us a congruence of the form

x² ≡ y² (mod n),

and gcd(x – y, n) is possibly a non-trivial factor of n. This is the basic strategy of the SNFSM. We clarify some details now.

Selecting the polynomial f(X)

Construction of

Algorithm 4.4. Construction of generators of ideals for the SNFSM

Choose two suitable positive constants a_B and C_B (depending on B and K).

Initialize an array a_{p,c_p} := a_B indexed by the relevant pairs (p, c_p).

Construction of

^[3] The elements u₁, . . . , u_ρ in a (multiplicatively written) group are called (multiplicatively) independent if , , is the group identity only for n₁ = ··· = n_ρ = 0.

Computing the factorization of a + bα

Sieving

The running time of the SNFSM

The integers a + mb have absolute values ≤ L(n, 2/3, (2/3)^1/3). If the coefficients of f are small, then

|N(a + bα)| = |b^df(–a/b)| ≤ L(n, 1/3, d · (2/3)^2/3) = L(n, 2/3, (2/3)^1/3).

Exercise Set 4.3

4.6

For

, define the harmonic numbers

. Show that for each

we have ln(m + 1) ≤ H_m ≤ 1 + ln m. [H] Deduce that the sequence H_m,

, is not convergent. (Note, however, that the sequence H_m – ln m,

, converges to the constant γ = 0.57721566 . . . known as the Euler constant. It is not known whether γ is rational or not.)

4.7

Let k, c, c′, α be positive constants with α < 1. Prove the following assertions.

.
L(n, α, c)L(n, α, c′) is of the form L(n, α, c + c′).
(ln n)^kL(n, α, c) is again of the form L(n, α, c).
L(n, α, c)n^k is of the form n^k+o(1).

4.8

, T(n) = n^1/4, T(n) = L[2], T(n) = L[1], T(n) = L[0.5], T(n) = L(n, 1/3, 2) and T(n) = L(n, 1/3, 1). (Neglect the o(1) terms in the definitions of L( ) and L[ ].)

4.9

Let

4.10

Show that the problems IFP and SQRTP are probabilistic polynomial-time equivalent. [H]

4.11

In this exercise, we use the notations introduced in connection with the Quadratic Sieve method for factoring integers (Section 4.3.2). We assume that M ≪ H, since

, whereas M = L[1].

Show that J ≤ 2H – 1.
Prove that the average of the integers |T(c)|, –M ≤ c ≤ M, is and that the maximum of the same integers is |T(M)| = J + 2MH + M² ≈ J + 2MH.
Prove that the average and the maximum of the integers |T(c)|, 0 ≤ c ≤ 2M, are respectively J + 2MH + M(4M + 1)/3 ≈ J + 2MH and |T(2M)| = J + 4MH + 4M² ≈ J + 4MH.
Conclude that it is better to choose the sieving interval as –M ≤ c ≤ M instead of as 0 ≤ c ≤ 2M.

4.12

Show that for integers a, b, c with a + b + c = 0 one has
(x + ay)(x + by)(x + cy) ≡ y²T(a, b, c) (mod n),
where T(a, b, c) := z + (ab + ac + bc)x + (abc)y = –b(b + c)(x + cy) + (z – c²x). If x, y, z = O(p^ξ), then T(a, b, c) is O(p^ξ) for small values of a, b, c.
Let . Choose a factor base comprising all primes q₁, . . . , q_t with t = L[α] together with the integers x + ay, –M ≤ a ≤ M, M = L[α]. The size of the factor base is then L[α].
If T(a, b, c) with –M ≤ a, b, c ≤ M and a + b + c = 0 is q_t-smooth, we get a relation for the CSM. Show that trying out the L[2α] pairs (a, b, c) gives us a set of linear congruences of the desired size under the heuristic assumption that the T(a, b, c) values behave as random integers on the order of p^ξ.
Propose a strategy how these linear congruences can be combined (by Gaussian elimination) to get a quadratic congruence of the form u² ≡ v² (mod n).
Design a sieve for checking the smoothness of the expressions T(a, b, c). [H]
Show that the running time of the CSM is . Since ξ < 1/2, the CSM is more efficient than the QSM. For ξ ≈ 1/3, the running time is .
(Remark: It is not known how we can efficiently obtain a solution of x³ ≡ y²z (mod n) with x³ ≠ y²z and |x|, |y|, |z| = O(p^ξ), ξ being as small as possible. For some particular values of n, say, for n of the form x³ – z with small z, a solution is naturally available.)

4.13

Algorithm 4.5. The sieve of Eratosthenes

Initialize to zero an array A indexed 2, . . . , n.
for
if (A_k = 0) { for l = 2, . . . , ⌊n/k⌋ { A_lk := 1. } }
}
for k = 2, . . . , n { if (A_k = 0) { Print “k is a prime”. } }

4.14

Use sieving to design an algorithm that generalizes this second strategy in the sense that it checks for primality only those integers n + r, r = 0, 1, 2, . . . , M, n a random l-bit integer, which are not divisible by the first t primes. In practice, the values 100 ≤ t ≤ 10,000 and M = 10l work quite well. For cryptographic sizes, sieving typically speeds up the generation of naive primes 10 to 100 times.
Generalize the sieve of Part (a) for the computation of safe and strong primes.

4.4. The Finite Field Discrete Logarithm Problem

4.4.1. Square Root Methods

Shanks’ baby-step–giant-step method

Pollard’s rho method

Algorithm 4.6. Shanks’ baby-step–giant-step method

Input: G, g and a as described above.

Output: d = ind_g a.

Steps:

n : = ord(G), .

/* Baby steps */

Initialize T to an empty table.

Insert the pairs (0, 1) and (1, g) in T.

The Pohlig–Hellman method

Let H be the subgroup of G generated by h := g^n/p. We have ord H = p (Exercise 2.44). For the computation of d_i, , from the knowledge of d₀, . . . , d_i–1, consider the element

[View full size image]

But ord(g^n/pⁱ⁺¹) = pⁱ⁺¹, so that

[View full size image]

Thus, and d_i = ind_h b, that is, each d_i can be obtained by computing a discrete logarithm in the group H of order p (using the BSGS method or the rho method).

4.4.2. The Index Calculus Method

Equation 4.5

for integers α, β, γ_i and δ_i. This gives us a linear congruence

Equation 4.6

^[4] Some authors prefer to say that the number of stages in the ICM in actually three, because they decouple the congruence-solving phase from the first stage. This is indeed justified, since implementations by several researchers reveal that for large fields this linear algebra part often demands running time comparable to that needed by the relation collection part. Our philosophy is to call the entire precomputation work the first stage. Now, although it hardly matters, it is up to the reader which camp she wants to join.

In order to make the algorithm more concrete, we have to specify:

how to choose a factor base B;
how to find Relation (4.5);
how to solve a linear system of congruences modulo q – 1 (in particular, when the system is sparse).

In the rest of this section, we describe variants of the ICM based on their strategies for selecting the factor base and for collecting relations. We discuss the third issue in Section 4.7.

4.4.3. Algorithms for Prime Fields

The basic ICM

In the second stage, we again choose random integers α and try to factorize ag^α completely over B. Once the factorization gets successful, that is, we have , we compute .

To sum up, the basic ICM and all its modifications can be used for computing discrete logarithms only in small fields, say, of size ≤ 80 bits. For bigger fields, we need newer ideas.

The linear sieve method

Let and J := H² – p. Then . Let’s consider the congruence

Equation 4.7

For small integers c₁, c₂, the right side of the above congruence, henceforth denoted as

T(c₁, c₂) := J + (c₁ + c₂)H + c₁c₂,

is of the order of . If the integer T(c₁, c₂) is smooth with respect to the first t primes q₁, q₂, . . . , q_t, that is, if we have a factorization like , then we have a relation

** The number field sieve method

Equation 4.8

This motivates us to define the factor base as

We assume that so that we have the free relation ind_g g ≡ 1 (mod p – 1).

In the second stage, we bring a to the scene in the following manner. First assume that a is small such that either a is -smooth, that is,

or for some the ideal can be written as a product of prime ideals of , that is,

or, equivalently,

In both the cases, taking logarithms and substituting the indices of the elements of the factor base (available from the first stage) yields d = ind_g a.

However, a is not small, in general, and it is a non-trivial task to find a such that 〈γ〉 is -smooth. We instead write a as a product

Equation 4.9

4.4.4. Algorithms for Fields of Characteristic 2

Theorem 4.1.

ψ(r, m) := u^–u+o(u) = e^{–[(1+o(1))u ln u]}.

For many algorithms that we will come across shortly, we have r ≈ n/α and for some positive α and β, so that and, consequently,

[View full size image]

The basic ICM

The adaptation of the linear sieve method

Let k := ⌈n/2⌉ and . For polynomials h₁, of small degrees, we then have

(X^k + h₁)(X^k + h₂) ≡ X^σf₁ + (h₁ + h₂)X^k + h₁h₂ (mod f).

The right side of the congruence, namely,

T(h₁, h₂) := X^σf₁ + (h₁ + h₂)X^k + h₁h₂,

has degree slightly larger than n/2. This motivates the following algorithm.

Coppersmith’s algorithm

m ≈ αn^1/3(ln n)^2/3, M ≈ βn^1/3(ln n)^2/3 and 2^k ≈ γn^1/3(ln n)^–1/3,

where the (positive real) constants α, β and γ are to be chosen appropriately to optimize the running time. The factor base B comprises irreducible polynomials (over ) of degrees ≤ m. Let

l := ⌊n/2^k⌋ + 1,

so that l ≈ (1/γ)n^2/3(ln n)^1/3. Choose relatively prime polynomials u₁(X) and u₂(X) (in ) of degrees ≤ M and let

h₁(X) := u₁(X)X^l + u₂(X) and h₂(X) := (h₁(X))^{2^k} rem f(X).

But then, since ind_g h₂ ≡ 2^k ind_g h₁ (mod q – 1), we get a relation if both h₁ and h₂ are smooth over B. By choice, deg h₁ is clearly O^~(n^2/3), whereas

h₂(X)≡ u₁(X^{2^k})X^{l2^k} + u₂(X^{2^k})≡ u₁(X^{2^k})X^{l2^k–n}f₁(X) + u₂(X^{2^k})(mod f)

and, therefore, deg h₂ = O^~(n^2/3) too.

The choice and γ = α^–1/2 gives the optimal running time of the first stage as

e^{[(2α ln 2)+o(1))n^1/3(ln n)^2/3]} = L(q, 1/3, 2α/(ln 2)^1/3) ≈ L(q, 1/3, 1.526).

h₁(X) := u₁(X)X^l + u₂(X)

and

h₂(X) := (h₁(X))^{2^k} rem f(X) = u₁(X^{2^k})X^{l2^k–n}f₁(X) + u₂(X^{2^k}).

The polynomials u₁ and u₂ should be so chosen that v|h₁. We see that h₁ and h₂ have low degrees and we try to factor h₁/v and h₂. Once we get a factorization of the form

with deg v_i, deg w_j < σ deg v, we have the desired reduction of v, namely,

Definition 4.2.

Let . Then the (binary) gray code of dimension d is a sequence of all (that is, 2^d) bit strings of length d defined inductively as follows. For d = 1, we define and , whereas for d > 1 we define

where juxtaposition denotes string concatenation.

For example, the gray code of dimension 2 is 00, 01, 11, 10 and that of dimension 3 is 000, 001, 011, 010, 110, 111, 101, 100. Proposition 4.1 can be easily proved by induction on the dimension d.

Proposition 4.1.

Exercise Set 4.4

4.15	Binary search Let ≤ be a total order on a set S (finite or infinite) and let a₁ ≤ a₂ ≤ ··· ≤ a_m be a given sequence of elements of S. Device an algorithm that, given an arbitrary element , determines using only O(lg m) comparisons in S whether a = a_i for some i = 1, . . . , m and, if so, returns i. [H]
4.16	Show that any map can be represented uniquely as a polynomial of degree < q. [H] The set S of all maps is a ring under point-wise addition and multiplication. Prove the ring isomorphism .
4.17	Let p be a prime and g a primitive element of . For a , prove the explicit formula (mod p). What is the problem in using this formula for computing indices in ?
4.18	In the basic ICM for the prime field , we try to factor random powers g^α over the factor base B = {q₁, . . . , q_t}. In addition to the canonical representative of g^α in the set {1, . . . , p – 1}, one can also check for the smoothness of the integers g^α + kp for –M ≤ k ≤ M, where M is a small positive integer (to be determined experimentally). Let ρ_k,i := (g^α + kp) rem q_i for i = 1, . . . , t and for –M ≤ k ≤ M. How can one compute these remainders ρ_k,i efficiently? Device an algorithm that checks the smoothness of all g^α + kp using the values ρ_k,i. [H] Device an algorithm that uses a sieve over the interval –M ≤ k ≤ M. Explain how the above two strategies can be modified to work for the field .
4.19	Show that for the LSM over the average and the maximum T_max of \|T(c₁, c₂)\| over all values of c₁, c₂ (that is, for –M ≤ c₁ ≤ c₂ ≤ M) are approximately HM and 2HM, respectively. [H] For real 0 ≤ η ≤ 1, let , \|T(c₁, c₂)\| ≤ ηT_max} and let . Show that t(η) ≈ η(2 – η). (This shows that the distribution of T(c₁, c₂) is not really random.)
4.20	Consider the following modification of the LSM for . Define for the integers and . Choose a small and repeat the linear sieve method for each r, 1 ≤ r ≤ s, that is, check the smoothness (over the first t = L[1] primes) of the integers T_r(c₁, c₂) := J_r + (c₁ + c₂)H_r + c₁c₂ for all 1 ≤ r ≤ s, –μ ≤ c₁ ≤ c₂ ≤ μ. Let be the average of \|T_r(c₁, c₂)\| over all choices of r, c₁ and c₂. Show that , where is as defined in Exercise 4.19. In particular, for both the choices: (1) and (2) μ = ⌊M/s⌋, that is, on an average we check smaller integers for smoothness under this modified strategy. Determine the size of the factor base and the total number of integers T_r(c₁, c₂) checked for smoothness for the two values of μ given above.
4.21	Cubic sieve method (CSM) for Let the integers x, y, z satisfy x³ ≡ y²z (mod p) with x³ ≠ y²z. Assume that each of x, y, z is O(p^ξ). Show that for integers a, b, c with a + b + c = 0 one has (x + ay)(x + by)(x + cy) ≡ y²T(a, b, c) (mod p), where T(a, b, c) := z + (ab + ac + bc)x + (abc)y = –b(b + c)(x + cy) + (z – c²x). Since x, y, z are O(p^ξ), we have T(a, b, c) = O(p^ξ) for small values of a, b, c. For the CSM, the factor base B comprises all primes q₁, . . . , q_t with together with the integers x + ay, –M ≤ a ≤ M, . If T(a, b, c) factors completely over q₁, . . . , q_t, we get a relation. Show that if we check the smoothness of T(a, b, c) for all –M ≤ a ≤ b ≤ c ≤ M with a + b + c = 0, we expect to get enough relations to compute the discrete logarithms of elements of B. In order to carry out sieving, fix c and let b vary. Specify the details of the sieving process. [H] Specify an algorithm for the second stage of the CSM. [H] Show that the expected running time of the CSM is . Therefore, if ξ < 1/2, the CSM is asymptotically faster than the LSM method, since the LSM runs in time L[1]. The best possible value ξ = 1/3 corresponds to a running time of the CSM.
4.22	The problem with the CSM is that it is not known how to efficiently compute a solution of the congruence Equation 4.10 subject to the condition that x³ ≠ y²z and x, y, z = O(p^ξ) for 1/3 ≤ ξ < 1/2. In this exercise, we estimate the number of solutions of Congruence (4.10). Show that the total number of solutions of Congruence (4.10) modulo p with x, y, is (p – 1)² which is Θ(p²). Show that the total number of solutions of Congruence (4.10) modulo p with x, y, and x³ ≠ y²z is also Θ(p²). Under the heuristic assumption that the solutions (x, y, z) of Congruence (4.10) are randomly distributed in , deduce that the expected number of solutions of Congruence (4.10) modulo p with x, y, , x³ ≠ y²z, and 1 ≤ x, y, z ≤ p^ξ, 1/3 ≤ ξ ≤ 1, is nearly p^3ξ–1. (Therefore, if ξ is slightly larger than 1/3, we expect to get a solution. It is not known how to compute such a solution in polynomial (or even subexponential) time. However, for certain values of p a solution is naturally available, for example, if p (or a small multiple of p) is close to an integer cube.)
4.23	Adaptation of CSM for Let be represented as , where the defining polynomial f is of the form f(X) = Xⁿ + f₁(X) with deg f₁ ≤ n/3. Let k := ⌈n/3⌉. Show that for polynomials h₁, of small degrees (X^k + h₁(X))(X^k + h₂(X))(X^k + h₁(X) + h₂(X)) rem f(X) is of degree slightly larger than n/3. Device an ICM for solving the DLP in based on this observation. What is the best running time of this method? [H]

*4.5. The Elliptic Curve Discrete Logarithm Problem (ECDLP)

**4.5.1. The MOV Reduction

Let us first look at the structure of the group of m-torsion points on an elliptic curve defined over K. Here is the algebraic closure of K.

Theorem 4.2.

Let K be a field of characteristic , and E an elliptic curve defined over K. We consider two separate cases:^[5]

^[5] For the MOV reduction, only the first case is important.

If p = 0 or if p > 0 does not divide m, then . In particular, in this case.
If p > 0, then either for all or for all .

Now, let E be an elliptic curve defined over a finite field K of characteristic p. Let with gcd(m, p) = 1. We use the shorthand notation E[m] for (and not for E_K[m]). We want to define a function

e_m : E[m] × E[m] → μ_m,

e_m(P, R) := g(P + U)/g(U).

The right side can be shown to be independent of the choice of U. The relevant properties of the Weil pairing e_m are now listed.

Proposition 4.2.

Let P, P′, R, and a, Then we have:

Identity	e_m(P, P)	= 1.
Alternation	e_m(P, R)	= e_m(R, P)^–1.
Bilinearity	e_m(P + P′, R)	= e_m(P, R)e_m(P′, R),
	e_m(P, R + R′)	= e_m(P, R)e_m(P, R′),
	e_m(aP, bR)	= (e_m(P, R))^ab.
Non-degeneracy	e_m(P, )	= 1.
	If e_m(P, T) = 1 for all , then .

The above definition of e_m is not computationally effective. We will see later how we can compute e_m(P, T) in probabilistic polynomial time using an alternative (but equivalent) definition.

Algorithm 4.7 shows how the MOV reduction algorithm makes use of Weil pairing. We now clarify the subtle details of this algorithm.

Algorithm 4.7. MOV reduction

Input: A point of order m, gcd(m, q) = 1, and a multiple Q of P.

Output: The index ind_P Q, that is, an integer l with Q = lP.

Steps:

Choose the smallest  such that .
while (1) {
   Choose a random point .
   α := e_m(P, R),   β := e_m(Q, R).  /* α,
   l := ind_α β.   /* Discrete logarithm in  */
   if (Q = lP) { Return l. }
}

The correctness of the algorithm

Lemma 4.1.

Let be of order m (so that P generates the subgroup 〈P〉 of order m in E[m]). Then for any R₁, the cosets R₁ + 〈P〉 and R₂ + 〈P〉 are equal if and only if e_m(P, R₁) = e_m(P, R₂).

Proof

If R₁ + 〈P〉 = R₂ + 〈P〉, then R₁ = R₂ + rP for some integer r and so by bilinearity and identity of Weil pairing e_m(P, R₁) = e_m(P, R₂)e_m(P, P)^r = e_m(P, R₂).

As an immediate corollary to Lemma 4.1, the desired result follows.

Proposition 4.3.

Let be of order m and let

Then #S/#E[m] = φ(m)/m. In particular, S is non-empty.

Proof

By Theorem 3.1 , one should try an expected number of O(ln ln m) random points before a primitive m-th root α = e_m(P, R) is found.

Choosing k

Computing e_m(P, R)

Equation 4.11

and

computes the sum (a reduced divisor)

Then, f can be computed by repeated application of Algorithm A as follows.

Compute for each i = 1, . . . , r the reduced divisor . Let 1 = a_i1, a_i2, . . . , a_{it_i} = |m_i| be an addition chain for |m_i| (Exercise 3.18). Clearly, t_i – 1 applications of Algorithm A computes Δ_i. Since we can choose t_i ≤ 2 ⌈lg |m_i|⌉, each Δ_i can be computed using O(log |m_i|) applications of Algorithm A.
Compute f by computing D = Div(f) = Δ₁ + ··· + Δ_r. This can be done by applying Algorithm A a total of r – 1 times.

**4.5.2. The SmartASS Method

Definition 4.3.

A discrete valuation on a field K is a surjective group homomorphism

such that for every a, we have v(a + b) ≥ min(v(a), v(b)). We extend the definition of v to a map by setting v(0) = +∞. The set

is a ring called the valuation ring of v.

A DVR can be characterized as follows:

Proposition 4.4.

Let R be an integral domain and let K := Q(R) be the field of fractions of R. Then R is a DVR if and only if there exists a discrete valuation of K such that R is the valuation ring of v.

Proof

Equation 4.12

α = p^r(k_r + k_r+1p + k_r+2p² + ···).

A p-adic integer is, in general, an infinite series and a representation with finite precision looks like

k₀ + k₁p + k₂p² + ··· + k_sp^s + O(p^s+1).

An element also has a p-adic expansion, but in this case one has to allow terms involving a finite number of negative exponents of p. That is to say, we have an expansion of the form

β = k_–tp^–t + k_–t+1p^–t+1 + ··· + k_–1p^–1 + k₀ + k₁p + k₂p² + ···

β = p^–t(k_–t + k_–t+1p + ··· + k_–1p^t–1 + k₀p^t + k₁p^t+1 + k₂p^t+2 + ···).

Of course, if k_–t = k_–t+1 = ··· = k_–1 = 0, then β is already in .

It can be shown that is a subgroup of and is a subgroup of . Furthermore, since E is anomalous, we have

Since , the point and, therefore, . Now, if we take the so-called p-adic elliptic logarithm ψ_p on both sides, we get (mod p²), whence it follows that

**4.5.3. The Xedni Calculus Method

Let ε be an elliptic curve defined over a number field K.

Theorem 4.3. Mordell–Weil theorem

The group ε(K) is finitely generated.

Theorem 4.4.

for some .

The non-negative integer ρ of Theorem 4.4 is called the rank of ε(K).

[View full size image]

We start by fixing an integer r, 4 ≤ r ≤ 9. We then choose r random pairs (s_i, t_i) of integers and compute the points

We now apply a change of coordinates of the form

Equation 4.13

Henceforth, we assume that the change of coordinates, as given in Equation (4.13), is successful. This transforms the equation for E to a general cubic equation:

C_p : u_p,1X³ + u_p,2X²Y + u_p,3XY² + u_p,4Y³ + u_p,5X²Z + u_p,6XY Z + u_p,7Y²Z + u_p,8XZ² + u_p,9Y Z² + u_p,10Z³ = 0.

Now, we carry out a step that heuristically ensures that the curve ε over (that we are going to construct) has a small rank. We choose a product M of small primes with p M, a cubic curve

C_M : u_M,1X³ + u_M,2X²Y + u_M,3XY² + u_M,4Y³ + u_M,5X²Z + u_M,6XYZ + u_M,7Y²Z + u_M,8XZ² + u_M,9Y Z² + u_M,10Z³ ≡ 0 (mod M)

Clearly, the points R₁, . . . , R_r are lifts of the points R_p,1, . . . , R_p,r respectively, whereas the cubic curve

subject to the condition that (mod pM) for each i = 1, . . . , 10. The resulting cubic curve

C : u₁X³ + u₂X²Y + u₃XY² + u₄Y³ + u₅X²Z + u₆XYZ + u₇Y²Z + u₈XZ² + u₉Y Z² + u₁₀Z³ = 0

over evidently continues to be a lift of E.

Now, we apply a change of coordinates in order to transfer to the standard Weierstrass equation

ε : Y² + a₁XY + a₃Y = X³ + a₂X² + a₄X + a₆

with integer coefficients a_i. This transformation changes the points R₁, . . . , R_r to the points S₁, . . . , S_r. One should also ensure that .

Exercise Set 4.5

4.24

Let K be a field,

and

. Elements of μ_m are called the m-th roots of unity. Prove the following assertions.

μ_m is a subgroup of (, ·).
If char K = 0, then #μ_m = m. [H]
If p := char K > 0, then #μ_m = m/p^vp(m). [H]
μ_m is cyclic. [H]
The set is a subgroup of .

4.25

is a primitive m-th root of unity and ω^r = 1 for some

, then evidently m|r. In particular, m is the smallest of the exponents

such that ω^r = 1. The (monic) polynomial

where the product runs over all primitive m-th roots of unity, is called the m-th cyclotomic polynomial (over K). Clearly, deg Φ_m(X) = φ(m) (where φ is Euler’s totient function).

Show that . [H] Use the Möbius inversion formula to deduce that , where μ is the Möbius function. Conclude that .
If m is a prime, show that Φ_m(X) = X^m–1 + ··· + X + 1.
Let m ≠ 1 be odd and char K ≠ 2. Show that Φ_2m(X) = Φ_m(–X). [H]
Show that if , l is the (multiplicative) order of q modulo m and if ω is a primitive m-th root of unity, then [K(ω) : K] = l. [H] In particular, Φ_m is a product of φ(m)/l (distinct) irreducible polynomials each of degree l.

4.26

Let G be an (additive) Abelian group (not necessarily finite). Show that the subset

is a subgroup of G. G_tors is called the torsion subgroup of G and the elements of G_tors are called torsion elements of G. An element is a torsion element of G if and only if a is of finite order.
Let ε be an elliptic curve defined over a number field K. Show that the torsion subgroup ε_tors(K) of ε(K) is finite. [H]
Let ε and K be as in Part (b). Show that is not finite. [H]

4.7. Solving Large Sparse Linear Systems over Finite Rings

Ax = b,

Equation 4.14

We then attempt to compute a vector x_i+1 such that

Equation 4.15

Congruence (4.14) shows that the elements of A, x₁, . . . , x_i, b can be so chosen (as integers) that for some vector y_i we have the equality

A(x₁ + px₂ + ··· + p^i–1x_i) = b – pⁱy_i

in . Substituting this in Congruence (4.15) gives Ax_i+1 ≡ y_i (mod p). Thus the (incremental) vector x_i+1 can be obtained by solving a linear system in .

4.7.1. Structured Gaussian Elimination

First we delete all the columns (together with the corresponding variables) that have weight 0. These variables never occur in the system and need not be considered at all.

4.7.2. The Conjugate Gradient Method

Algorithm 4.8. An iteration in the conjugate gradient method

a_i := 〈e_i, e_i〉/〈d_i, Ad_i〉.

x_i+1 := x_i + a_id_i.

e_i+1 := e_i – a_iAd_i.

b_i := 〈e_i+1, e_i+1〉/〈e_i, e_i〉.

d_i+1 := e_i+1 + b_id_i.

4.7.3. The Lanczos Method

Algorithm 4.9. An iteration in the Lanczos method

v_i+1 := Ad_i.

x_i := x_i–1 + a_id_i.

4.7.4. The Wiedemann Method

Since μ_A(A) = 0, we have for every . Therefore, for each k = 1, . . . , l the sequence v_0,k, v_1,k, . . . of the k-th entries of v₀, v₁, . . . satisfies the linear recurrence

The assumption that A is non-singular is equivalent to the condition that c₀ ≠ 0. In that case, the solution vector can be computed using O^~(n²) arithmetic operations in the field .

5.2. Secure Transmission of Messages

5.2.1. The RSA Public-key Encryption Algorithm

RSA key pair

Algorithm 5.1 generates a key pair for RSA.

Algorithm 5.1. RSA key generation

Input: A bit length l.

Output: A random RSA key pair.

Steps:

Generate two different random primes p and q each of bit length l.

n := pq.

Choose an integer e coprime to φ(n) = (p – 1)(q – 1).

d := e^–1 (mod φ(n)).

Return the pair (n, e) as the public key and the pair (n, d) as the private key.

RSA encryption

RSA encryption is rather simple, as Algorithm 5.2 shows.

Algorithm 5.2. RSA encryption

Input: The RSA public key (n, e) of the recipient and the plaintext message .

Output: The ciphertext message .

Steps:

c := m^e (mod n).

RSA decryption

RSA decryption (Algorithm 5.3) is analogous to RSA encryption.

Algorithm 5.3. RSA decryption

Input: The RSA private key (n, d) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

m := c^d (mod n).

Algorithm 5.4. RSA decryption using CRT

Input: The RSA extended private key (p, q, d₁, d₂, h) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

m₁ := c^d₁ (mod p).

m₂ := c^d₂ (mod q).

t := h(m₁ – m₂) (mod p).

m := m₂ + tq.

5.2.2. The Rabin Public-key Encryption Algorithm

Rabin key pair

Like RSA, Rabin encryption requires a modulus of the form n = pq.

Algorithm 5.5. Rabin key generation

Input: A bit length l.

Output: A random Rabin key pair.

Steps:

Generate two different random primes p and q each of bit length l.

n := pq.

Return n as the public key and the pair (p, q) as the private key.

Here, the choice of the bit length l and the generation of the primes p and q follow the same guidelines as discussed in connection with RSA key generation.

Rabin encryption

Encryption in the Rabin scheme involves a single modular squaring.

Algorithm 5.6. Rabin encryption

Input: The Rabin public key n of the recipient and the plaintext message .

Output: The ciphertext message .

Steps:

c := m² (mod n).

^[1] More specifically, if an element is a square modulo both p and q, then the number of square roots of c equals 1 if c = 0; it is 2 if either c ≡ 0 (mod p) or c ≡ 0 (mod q) but not both; and it is 4 if c ≢ 0 (mod p) and c ≢ 0 (mod q). If c is not a square modulo either p or q, then c does not possess a square root modulo n. These assertions can be readily proved using the Chinese remainder theorem.

Rabin decryption

Algorithm 5.7. Rabin decryption

Input: The Rabin private key (p, q) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

if or ( { Return “c is not a ciphertext message”. }

Compute the square roots of c mod p.	/* Algorithm 3.17 */
Compute the square roots of c mod q.	/* Algorithm 3.17 */
Compute the square roots of c mod n from those mod p and q.	/* Use CRT */

if (c has exactly one distinguished square root m mod n) { Return m. }

else { Return “failure”. }

5.2.3. The Goldwasser–Micali Encryption Algorithm

Goldwasser–Micali key pair

Goldwasser–Micali encryption

Algorithm 5.8. Goldwasser–Micali key generation

Input: A bit length l.

Output: A random Goldwasser–Micali key pair.

Steps:

Generate two (different) random primes p and q each of bit length l.

n := pq.

Find out integers a and b such that .

Compute an integer x with x ≡ a (mod p) and x ≡ b (mod q). /* Use CRT */

Return the pair (n, x) as the public key and the prime p as the private key.

Algorithm 5.9. Goldwasser—Micali encryption

Input: The Goldwasser—Micali public key (n, x) of the recipient and the plaintext message m = m₁ . . . m_r, , which is a bit string of length r.

Output: The ciphertext message .

Steps:

for i = 1, . . . , r {
Select a random element .
.
}

Goldwasser–Micali decryption

Algorithm 5.10. Goldwasser—Micali decryption

Input: The Goldwasser—Micali private key p of the recipient and the ciphertext message .

Output: The recovered plaintext message m = m₁, . . . , m_r, .

Steps:

for i = 1, . . . , r {
if else { m_i :=1 }
}

5.2.4. The Blum–Goldwasser Encryption Algorithm

Blum–Goldwasser key pair

Algorithm 5.11. Blum–Goldwasser key generation

Input: A bit length l.

Output: A random Blum–Goldwasser key pair.

Steps:

Generate two (different) random primes p and q each of bit length l and each congruent to 3 mod 4.

n := pq.

Return n as the public key and the pair (p, q) as the private key.

Blum–Goldwasser encryption

Algorithm 5.12. Blum–Goldwasser encryption

Input: The Blum–Goldwasser public key n of the recipient and the plaintext message m = m₁ . . .m_r, where each m_i is a bit string of length t.

Output: The ciphertext message (c₁, . . . , c_r, d), where each c_i is a bit string of length t and .

Steps:

Choose a random element .

Blum–Goldwasser decryption

Algorithm 5.13. Blum–Goldwasser decryption

Input: The Blum–Goldwasser private key (p, q) of the recipient and the ciphertext message (c₁, . . . , c_r, d), where each c_i is a bit string of length t and .

Output: The recovered plaintext message m = m₁ . . . m_r, where each m_i is a bit string of length t.

Steps:

5.2.5. The ElGamal Public-key Encryption Algorithm

ElGamal key pair

Algorithm 5.14. ElGamal key generation

Input: G, g and k as defined above.

Output: A random ElGamal key pair.

Steps:

Generate a random integer d, 2 ≤ d ≤ k – 1.

Return g^d as the public key and d as the private key.

ElGamal encryption

Algorithm 5.15. ElGamal encryption

Input: (G, g, k and) the ElGamal public key g^d of the recipient and the plaintext message .

Output: The ciphertext message (where H = 〈g〉).

Steps:

Generate a (random) session key d′, 2 ≤ d′ ≤ k – 1.

r := g^d′.

s := mg^dd′ = m(g^d)^d′.

ElGamal decryption

Algorithm 5.16. ElGamal decryption

Input: (G, g, k and) the ElGamal private key d of the recipient and the ciphertext message (where H = 〈g〉).

Output: The recovered plaintext message .

Steps:

m := sr^–d = sr^k–d.

5.2.6. The Chor–Rivest Public-key Encryption Algorithm

Chor–Rivest key pair

Algorithm 5.17. Chor–Rivest key generation

Input: A prime p and an integer h ≥ 2 such that p^h – 1 is smooth.

Output: A Chor–Rivest key pair.

Steps:

Choose an irreducible polynomial of degree h.

Use the representation , where x := X + 〈f(X)〉.

Choose a random generator g(x) of .

Compute the indexes a_i := ind_g(x)(x + i) for i = 0, 1, . . . , p – 1.

Select a random permutation π of {0, 1, . . . , p – 1}.

Select a random noise d in the range 0 ≤ d ≤ q – 2.

Compute α_i := a_π(i) + d (mod q – 1) for i = 0, 1, . . . , p – 1.

Return (α₀, α₁ . . . , α_p–1) as the public key and (f, g, π, d) as the private key.

Chor–Rivest encryption

Algorithm 5.18. Chor–Rivest encryption

Input: The Chor–Rivest public key (α₀, . . . , α_p–1) (together with p and h) and the plaintext message (m₀, . . . , m_p–1) which is a binary vector of weight h.

Output: The ciphertext message .

Steps:

(mod q – 1).

Chor–Rivest decryption

The Chor–Rivest decryption procedure (Algorithm 5.19) generates a monic polynomial of degree h, the h (distinct) roots of which gives the non-zero bits m_i in the original plaintext message.

Algorithm 5.19. Chor–Rivest decryption

Input: The Chor–Rivest private key (f, g, π, d) (together with p and h) and the ciphertext message .

Output: The recovered plaintext message (m₀, . . . , m_p–1) which is a binary vector of weight h.

Steps:

s := c – hd (mod q – 1).

u(X) := g(X)^s (mod f(X)).

v(X) := f(X) + u(X).

Factorize u(X) as u(X) = (X + i₁)· · ·(X + i_h), .

For i = 0, 1, . . . , p – 1 set

*5.2.7. The XTR Public-key Encryption Algorithm

XTR runs (about three times) faster than RSA or ECC.
XTR has shorter keys (comparable with ECC).
The security of XTR is based on the DLP/DHP over finite fields of sufficiently big sizes and not on a new allegedly difficult computational problem.
The parameter and key generation for XTR is orders of magnitude faster than that for RSA/ECC.

XTR considers the following tower of field extensions:

The map uses the traces of elements of over (Definition 2.59). In this section, we use the shorthand notation Tr to stand for . The conjugates of an element over are h, h^p², h^p⁴ and so

so the minimal polynomial of h = gⁿ over is

Table 5.1. Basic operations in
Operation	Number of multiplications
x^p	0 (since x^p = x₂α + x₁α².)
x²	2 (since x² = x₂(x₂ – 2x₁)α + x₁(x₁ – 2x₂)α².)
xy	3 (since xy = (x₂y₂–x₁y₂–x₂y₁)α + (x₁y₁–x₁y₂–x₂y₁)α², that is, it suffices to compute x₁y₁, x₂y₂, (x₁ + x₂)(y₁ + y₂).)
xz – yz^p	4 (since xz – yz^p = (z₁(y₁ – x₂ – y₂) + z₂(x₂ – x₁ + y₂))α + (z₁(x₁ – x₂ + y₁) + z₂(y₂ – x₁ – y₁))α².)

where h₁, h₂, are the three roots (not necessarily distinct) of F_c(X). For , we use the notation

Putting c = Tr(g) yields c_n = Tr(gⁿ), or, more generally, for c = Tr(g^k) we have c_n = Tr(g^kn). Algorithm 5.20 computes

Equation 5.1

Equation 5.2

Equation 5.3

Equation 5.4

Equation 5.5

Equation 5.6

Equation 5.7

Equation 5.8

Algorithm 5.20. XTR exponentiation

Input: and .

Output:.

Steps:

XTR key pair

The domain parameters for an XTR cryptosystem include primes p and q satisfying the following requirements:

|q| ≥ 160 (where |a| = ⌈lg a⌉ denotes the bit size of a positive integer a).
|p⁶| ≥ 1024.
p ≡ 2 (mod 3).
q|(p² – p + 1).

Algorithm 5.21. Generation of XTR primes

Randomly choose such that q := r² – r + 1 is a prime of size |q| = l_q.

Randomly choose such that p := r + kq is a prime with |p| = l_p and p ≡ 2 (mod 3).

What remains to explain is how one can find an irreducible . A randomized algorithm results from the fact that for a randomly chosen the polynomial F_c(X) is irreducible with probability ≈ 1/3.

XTR encryption

Algorithm 5.22. XTR encryption

Input: The public key (p, q, Tr(g), Tr(g^d)) of the recipient and the message to be encrypted.

Output: The ciphertext message .

Steps:

Generate a random session key .

Compute r := Tr(g^d′) using Algorithm 5.20 with c := Tr(g) and n := d′.

Compute Tr(g^dd′) using Algorithm 5.20 with c := Tr(g^d) and n := d′.

Set s := m Tr(g^dd′).

XTR decryption

XTR decryption (Algorithm 5.23) is again analogous to ElGamal decryption except that we have to incorporate the XTR representation of elements of G.

Algorithm 5.23. XTR decryption

Input: The private key d of the recipient and the ciphertext .

Output: The recovered plaintext message m.

Steps:

Compute Tr(g^dd′) using Algorithm 5.20 with c := r = Tr(g^d′) and n := d.

Set .

*5.2.8. The NTRU Public-key Encryption Algorithm

NTRU key pair

[View full size image]

Table 5.2. Recommended NTRU parameters
Security	n	α	β	ν_f	ν_g	ν_u
short-term	107	3	64	15	12	5
moderate	167	3	128	61	20	18
standard^[*]	263	3	128	50	24	16
high	503	3	256	216	72	55

^[*] Assumed to be equivalent to 1024-bit RSA

For ν₁, , we also define the subset

of R. For suitably chosen parameters ν_f, ν_g and ν_u (see Table 5.2), we use the special notations:

With these notations we are now ready to describe the NTRU key generation routine. The subsets , , and are assumed to be public knowledge (along with the parameters n, α and β).

Algorithm 5.24. NTRU key generation

Input: n, α, β and , as defined above.

Output: A random NTRU key pair.

Steps:

Choose and randomly.

/* f must be invertible modulo both α and β */

Compute f_α and f_β satisfying f_α ⊛ f ≡ 1 (mod α) and f_β ⊛ f ≡ 1 (mod β).

h := f_β ⊛ g (mod β).

Return h as the public key and f (along with f_α) as the private key.

The polynomial f_α can be computed from f during decryption. However, for the sake of efficiency, it is recommended that f_α be stored along with f.

where s(X) = f_p(X) is the inverse of f modulo p.

NTRU encryption

Algorithm 5.25. NTRU encryption

Input: (n, α, β and) the NTRU public key h of the recipient and the plaintext message .

Output: The ciphertext c which is a polynomial in R, reduced modulo β.

Steps:

Randomly select .

c := αu ⊛ h + m (mod β).

NTRU decryption

Algorithm 5.26. NTRU decryption

Input: The NTRU private key f (and f_α) of the recipient and the ciphertext message c.

Output: The recovered plaintext message .

Steps:

v := f ⊛ c (mod β).

/* The coefficients of v are chosen to lie between –β/2 and +β/2 */

m := f_α ⊛ v (mod α).

Exercise Set 5.2

5.1

Establish the correctness of Algorithm 5.4.

5.2

Assume that the same message m is encrypted using the RSA algorithm and using the public keys (n₁, e), . . . , (n_e, e) of e entities each of which has the same encryption exponent e. Assume further that the moduli n₁, . . . , n_e are pairwise coprime. Specify a method by which an adversary can reconstruct the message m from a knowledge of the ciphertext messages c₁, . . . , c_e. [H]
How can such an attack be prevented? [H]

5.3

Let n, . How many solutions does the polynomial X^e – X have in ? [H]
In particular, conclude that if n = pq is an RSA modulus and e is the encryption exponent, there exist gcd(e – 1, p – 1) × gcd(e – 1, q – 1) messages m for which m^e ≡ m (mod n). Such messages are often called unconcealed. The number of unconcealed messages for random parameters n and e is, in general, vanishingly low compared to n.

5.4

5.5

Let n = pq be a Rabin public key and let

be a quadratic residue modulo n. Show that the knowledge of the four square roots of c modulo n breaks the Rabin system.

5.6

What is the disadvantage of using the same session key in the ElGamal encryption scheme for encrypting two different messages (for the same recipient)? [H]

5.7

Let p be an odd prime and g a generator of

Show that the set S := {g²ⁱ | i = 0, 1, . . . , (p – 3)/2} is precisely the set of all quadratic residues modulo p. Show also that S is a subgroup of .
Assume that y ≡ g^x (mod p) for some . Show that the least significant bit of x is 0 or 1 according as whether y^(p–1)/2 is congruent to 1 or –1 modulo p respectively. Thus, it is easy to determine from y the least significant bit of the discrete logarithm x = ind_g y.
Assume that p ≡ 3 (mod 4) and that p, g, y are only known (but x is not known). Suppose further that there is an oracle (a black box) that, given , returns the second least significant bit of ind_g z. Show that x = ind_g y can be easily computed by making a polynomial (in log p) number of calls to this oracle. [H]

5.8

5.9

Show that if f(X) is only known to a cryptanalyst of the Chor–Rivest scheme, then also she can recover the full private key. [H]

5.10

Derive the identities of Equations (5.1) through (5.8) (p 325).
With the notations of Section 5.2.7 deduce that:
c₃ = c³ – 3c^p+1 + 3.
c₄ = c⁴ – 4c^p+2 + 2c^2p + 4c.

5.11

In this exercise, we use the notations of Section 5.2.8. Assume that Alice encrypts the same message m several times using the NTRU public key h of Bob, but with different random polynomials

5.3. Key Exchange

5.3.1. Basic Key-Exchange Protocols

The Diffie–Hellman key-exchange protocol

The Diffie–Hellman (DH) key-exchange algorithm [78] is one of the pioneering discoveries leading to the birth of public-key cryptography.

Algorithm 5.27. Diffie–Hellman key exchange

Input: G, g, n and m as defined above.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice generates a random and computes e_A := g^d_A.

Alice sends e_A to Bob.

Bob generates a random and computes e_B := g^d_B.

Bob sends e_B to Alice.

Alice computes s := (e_B)^d_A = g^d_Ad_B.

Bob computes s := (e_A)^d_B = g^d_Ad_B.

if (s = 1) { Return “failure”. }

Small-subgroup attacks

Algorithm 5.28. A small-subgroup attack by an active eavesdropper

Alice generates a random and computes e_A := g^d_A.

Alice transmits e_A for Bob.

Carol intercepts e_A, computes and sends to Bob.

Bob generates a random and computes e_B := g^d_B.

Bob transmits e_B for Alice.

Carol intercepts e_B, computes and sends to Alice.

Alice computes .

Bob computes .

if (s′ = 1) { Return “failure”. }

Cofactor exponentiation

5.3.2. Authenticated Key-Exchange Protocols

Other active attack models on the (basic or modified) DH protocol can be conceived of. One important class of attacks is now described.

Unknown key-share attacks

Algorithm 5.29. Diffie–Hellman key exchange with cofactor exponentiation

Input: G, g, n, m and k as defined above and a flag indicating compatibility with the original DH scheme.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice generates a random and computes e_A := g^d_A.

Alice sends e_A to Bob.

Bob generates a random and computes e_B := g^d_B.

Bob sends e_B to Alice.

Algorithm 5.30. An unknown key-share attack

Alice generates a random and computes e_A := g^d_A.

Alice gets the certificate Cert_A on e_A from the certifying authority.

Alice transmits (e_A, Cert_A) for Bob.

Carol intercepts (e_A, Cert_A).

Carol chooses a random .

Carol gets the certificate Cert_C on e_C := (e_A)^d from the certifying authority.

Carol sends (e_C, Cert_C) to Bob.

Bob generates a random and computes e_B := g^d_B.

Bob gets the certificate Cert_B on e_B from the certifying authority.

Bob sends (e_B, Cert_B) to Carol.

Carol transmits ((e_B)^d, Cert_B) to Alice.

Alice computes s = ((e_B)^d)^d_A = g^dd_Ad_B.

Bob computes s = (e_C)^d_B = ((e_A)^d)^d_B = g^dd_Ad_B.

The Menezes–Qu–Vanstone key-exchange protocol

Algorithm 5.31. MQV key exchange

Input: G, g, n and m as defined above.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice obtains Bob’s static public key E_B.

Bob obtains Alice’s static public key E_A.

Alice generates a random integer d_A, 2 ≤ d_A ≤ m – 1, and computes e_A := g^d_A.

Alice sends e_A to Bob.

Bob generates a random integer d_B, 2 ≤ d_B ≤ m– 1, and computes e_B := g^d_B.

Bob sends e_B to Alice.

Alice computes (mod m).

Alice computes .

Bob computes (mod m).

Bob computes .

if (s = 1) { Return “failure”. }

See Raymond and Stiglic [250] to know more about the security issues for the DH key agreement protocol and its variants.

Exercise Set 5.3

5.12	Let G be a multiplicative Abelian group of order n and with identity 1, H the subgroup of G generated by an element of order n, k := n/m and gcd(k, m) = 1. Further let a be a non-identity element of G. Prove that if a^k = 1, then a ∉ H. (The converse of this statement is not true in general, even when G is cyclic. However, if a is an element of small order dividing k, we obviously have a^k = 1.) Explain how the modified Diffie–Hellman protocol (Algorithm 5.29) prevents an active attack by Bob described in connection with small-subgroup attacks.
5.13	Write the MQV key-exchange protocol with cofactor exponentiation.
5.14	Provide the details of the Diffie–Hellman key-exchange algorithm based on the XTR representation (Section 5.2.7).

5.4. Digital Signatures

Signature scheme with message recovery

Signature scheme with appendix

^[2] If M is already a short message, one may go for taking m = M. In order to promote uniform treatment, we assume that the function H is always applied for the generation of m. Use of H is also desirable from the point of security considerations (Exercise 5.15).

5.4.1. The RSA Digital Signature Algorithm

RSA signature generation involves a modular exponentiation in the ring .

Algorithm 5.32. RSA signature generation

Input: A message M to be signed and the signer’s private key (n, d).

Output: The signature (M, s) on M.

Steps:

m := H(M). /* is the short representative of M */
s := m^d (mod n).

The verification routine also involves a modular exponentiation in .

Algorithm 5.33. RSA signature verification

Input: A signature (M, s) and the signer’s public key (n, e).

Output: Verification status of the signature.

Steps:

m := H(M). /* is the short representative of M */
(mod n).
if { Return “Signature verified”. }
else { Return “Signature not verified”. }

Small values of e speed up RSA signature verification and are not known to make the scheme suffer from some special attacks. So the values of e like 3, 257 and 65,537 are quite recommended.

5.4.2. The Rabin Digital Signature Algorithm

Rabin signature generation involves finding a quadratic residue m modulo n as a representative of the message M and computing a square root of m modulo n.

Algorithm 5.34. Rabin signature generation

Input: A message M to be signed and the signer’s private key (p, q).

Output: The signature (M, s) on M.

Steps:

m := H(M). /* is assumed to be a quadratic residue modulo n */

Compute a square root s₁ of m modulo p.	/* Algorithm 3.17 */
Compute a square root s₂ of m modulo q.	/* Algorithm 3.17 */
Compute satisfying s ≡ s₁ (mod p) and s ≡ s₂ (mod q).	/* CRT */

Verification (Algorithm 5.35) involves a square operation in .

Algorithm 5.35. Rabin signature verification

Input: A signature (M, s) and the signer’s public key n.

Output: Verification status of the signature.

Steps:

m := H(M).

is a quadratic residue modulo n */

(mod n).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.3. The ElGamal Digital Signature Algorithm

ElGamal signatures are generated as in Algorithm 5.36. The appendix consists of a pair .

Algorithm 5.36. ElGamal signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key d′, 2 ≤ d′ ≤ n – 1.

s := g^d′.

t := d^′–1 (H(M) – dH(s)) (mod n).

Algorithm 5.37. ElGamal signature verification

Input: A signature (M, s, t) and the signer’s public key g^d.

Output: Verification status of the signature.

Steps:

a₁ := g^H(M).

a₂ := (g^d)^H(s)s^t.

if (a₁ = a₂) { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.4. The Schnorr Digital Signature Algorithm

Schnorr signature generation is described in Algorithm 5.38.

Algorithm 5.38. Schnorr signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, g^d′), 2 ≤ d′ ≤ r – 1.

s := H(M‖g^d′).	/* Here ‖ denotes string concatenation */
t := d′ – ds (mod r).

Algorithm 5.39. Schnorr signature verification

Input: A signature (M, s, t) and the signer’s public key g^d.

Output: Verification status of the signature.

Steps:

u := g^t(g^d)^s.

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.5. The Nyberg–Rueppel Digital Signature Algorithm

NR signature generation can be performed as in Algorithm 5.40.

Algorithm 5.40. Nyberg–Rueppel signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, g^d′), 2 ≤ d′ ≤ r – 1.

s := H(M) + F(g^d′) (mod r).

t := d′ – ds (mod r).

Algorithm 5.41. Nyberg–Rueppel signature verification

Input: A signature (M, s, t) and the signer’s public key g^d.

Output: Verification status of the signature.

Steps:

u := g^t(g^d)^s.

(mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.6. The Digital Signature Algorithm (DSA)

Algorithm 5.42. Generation of DSA primes

Input: An integer λ, 0 ≤ λ ≤ 8.

Output: A prime p of bit length l := 512+64λ such that p – 1 has a prime divisor r of length 160 bits.

Steps:

Algorithm 5.43. DSA signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key d′, 2 ≤ d′ ≤ r – 1.

t := d^′–1(H(M) + ds) (mod r).

Algorithm 5.44. DSA signature verification

Input: A signature (M, s, t) and the signer’s public key g^d.

Output: Verification status of the signature.

Steps:

if ( or ) { Return “Signature not verified”. }

w := t^–1 (mod r).

w₁ := H(M)w (mod r).

w₂ := sw (mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

*5.4.7. The Elliptic Curve Digital Signature Algorithm (ECDSA)

Algorithm 5.45. Generation of ECDSA parameters

Input: A finite field , where q is a prime p or a power 2^m of 2.

Output: A set of parameters E, n, r, P for the ECDSA.

Steps:

Algorithm 5.46. ECDSA signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, d′P), 2 ≤ d′ ≤ r – 1.

/* Let us denote */

s := h (mod r).

t := d′^–1 (H(M) + ds) (mod r).

ECDSA signature verification is explained in Algorithm 5.47. The correctness of this algorithm can be proved like that of Algorithm 5.44.

Algorithm 5.47. ECDSA signature verification

Input: A signature (M, s, t) and the signer’s public key dP.

Output: Verification status of the signature.

Steps:

if ( or ) { Return “Signature not verified”. }

w := t^–1 (mod r).

w₁ := H(M)w (mod r).

w₂ := sw (mod r).

Q := w₁P + w₂(dP).

if () { Return “Signature not verified”. }

/* Otherwise denote */

(mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

*5.4.8. The XTR Signature Algorithm

Equation 5.9

We take c := Tr(g). It can be shown that det , that is, the matrix M_Tr(g) is invertible, and we have:

Equation 5.10

Here the superscript ^t denotes the transpose of a matrix. With these observations, one can write the procedure for computing Tr(g^a(g^d)^b) as in Algorithm 5.48.

Algorithm 5.48. XTR multiplication

Input: a, b, Tr(g) and S_d(Tr(g)) for some unknown d.

Output: Tr(g^a(g^d)^b).

Steps:

XTR–DSA signature generation (Algorithm 5.49) is an obvious adaptation of Algorithm 5.43.

Algorithm 5.49. XTR signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M with s, .

Steps:

Algorithm 5.50. XTR signature verification

Input: XTR–DSA signature (M, s, t) on a message M and the signer’s public key (Tr(g^d–1), Tr(g^d), Tr(g^d+1)).

Output: Verification status of the signature.

Steps:

if or { Return “Signature not verified”. }

w := t^–1 (mod q).

w₁ := H(M)w (mod q).

w₂ := sw (mod q).

Compute Tr(g^w₁ (g^d)^w₂).	/* Use Algorithm 5.48 */
Write this trace value as .	/* See Section 5.2.7 */

(mod q).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

*5.4.9. The NTRUSign Algorithm

Let . The average of the coefficients of a is denoted by . The centred norm ‖a‖ of a is defined by

For two polynomials a, , one also defines

‖(a, b)‖² := ‖a‖² + ‖b‖².

Typical values for NTRUSign parameters are

(n, β, ν_f, ν_g, B) = (251, 128, 73, 71, 300).

It is estimated that these choices lead to a security level at least as high as in an RSA scheme with a 1024-bit modulus. For very long-term security, one may go for (n, β) = (503, 256).

f ⊛ G – g ⊛ F = q

and

‖F‖, ‖G‖ = O(n).

Hoffstein et al. [128] present an algorithm to compute F and G with ‖F‖, from polynomials f and g with ‖f‖, , where c is a given constant.

Algorithm 5.51. NTRU signature generation

Input: A message M to be signed and the signer’s private key (f, g, F, G).

Output: The signature (M, s) on M.

Steps:

Compute .

Compute polynomials A, B, a, satisfying

G ⊛ m₁ – F ⊛ m₂	=	A + βB,
–g ⊛ m₁ + f ⊛ m₂	=	a + βb,

where a and A have coefficients in the range between –β/2 and +β/2.

Compute s ≡ f ⊛ B + F ⊛ b (mod β).

It is clear from the definitions that both (f, g) and (F, G) are in L_h.

If h = (h₀, h₁, . . . , h_n–1), then for each i = 0, 1, . . . , n – 1 we have

Xⁱ ⊛ h(X)	≡	(h_n–i, . . . , h_n–1, h₀, . . . , h_n–i–1) (mod β) and
0 ⊛ h(X)	≡	βXⁱ (mod β).

It follows immediately that L_h is generated by the rows of the matrix

s	≡	f ⊛ B + F ⊛ b (mod β), and
t	≡	g ⊛ B + G ⊛ b (mod β).

The vector is close to the message vector m in the sense that

for the constant c chosen earlier (see Hoffstein et al. [128] for a proof of this relation). The verification routine can, therefore, be designed as in Algorithm 5.52.

Algorithm 5.52. NTRU signature verification

Input: A signature (M, s) and the signer’s public key h.

Output: Verification status of the signature.

Steps:

Compute .

Compute t ≡ h ⊛ s (mod β).

if (‖(m₁ – s, m₂ – t)‖ ≤ B) { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.10. Blind Signature Schemes

Chaum’s RSA blind signature protocol

Algorithm 5.53. Chaum’s RSA blind signature

Input: A message M generated by Alice.

Output: Bob’s blind RSA signature (M, s) on M.

Steps:

Alice hashes the message M to .

Alice chooses a random and computes .

Alice sends to Bob.

Bob generates the signature on .

Bob sends σ to Alice.

Alice computes Bob’s (blind) signature s := ρ^–1σ (mod n) on M.

The Schnorr blind signature protocol

Algorithm 5.54. Schnorr blind signature

Input: A message M generated by Alice.

Output: Bob’s blind Schnorr signature (M, s, t) on M.

Steps:

Alice asks Bob to initiate a communication.

Bob chooses a random and computes .

Bob sends to Alice.

Alice selects α, randomly.

Alice computes .

Alice computes and .

Alice sends to Bob.

Bob computes .

Bob sends to Alice.

Alice computes .

The Okamoto–Schnorr blind signature protocol

Algorithm 5.55. Okamoto–Schnorr blind signature

Input: A message M generated by Alice.

Output: Bob’s blind signature (M, s₁, s₂, s₃) on M.

Steps:

Alice asks Bob to initiate a communication.

Bob chooses random and computes .

Bob sends to Alice.

Alice selects α, β, randomly.

Alice computes .

Alice computes and .

Alice sends to Bob.

Bob computes and .

Bob sends and to Alice.

Alice computes and .

5.4.11. Undeniable Signature Schemes

The Chaum–Van Antwerpen undeniable signature scheme

Algorithm 5.56. Chaum–Van Antwerpen undeniable signature generation

Input: The message M to be signed and the signer’s private key (p, d).

Output: The signature (M, s) on M.

Steps:

m := H(M).

s := m^d (mod p).

If (M, s) is a valid CvA signature, then

v ≡ (sⁱy^j)^{d–1 (mod r)} ≡ ((m^d)ⁱ(g^d)^j)^{d–1 (mod r)} ≡ mⁱg^j ≡ v′ (mod p).

Algorithm 5.57. Chaum–Van Antwerpen undeniable signature verification

Input: A CvA signature (M, s) on a message M.

Output: Verification status of the signature.

Steps:

Alice computes m := H(M).

Alice chooses two secret random integers i, .

Alice computes u := sⁱy^j (mod p).

Alice sends u to Bob.

Bob computes v := u^{d–1 (mod r)} (mod p).

Bob sends v to Alice.

Alice computes v′ := mⁱg^j (mod p).

Alice accepts the signature (M, s) if and only if v = v′.

Algorithm 5.58. Chaum–Van Antwerpen undeniable signature: denial protocol

Input: A (purported) CvA signature (M, s) of Bob on a message M.

Output: One of the following decisions by Alice:

The signature is valid.
The signature is forged.
Bob is trying to deny the signature.

Steps:

Alice computes m := H(M).

Alice chooses two secret random integers i₁, .

Alice computes u₁ := s^i₁ y^j₁ (mod p) and sends u₁ to Bob.

Bob computes (mod p) and sends v₁ to Alice.

if (v₁ ≡ m^i₁ g^j₁ (mod p)) {
Alice accepts the signature (M, s) to be valid and quits the protocol.
}

Alice chooses two other secret random integers i₂, .

Alice computes u₂ := s^i₂ y^j₂ (mod p) and sends u₂ to Bob.

Bob computes and sends v₂ to Alice.

if (v₂ ≡ m^i₂ g^j₂ (mod p)) {
Alice concludes the signature (M, s) to be valid and quits the protocol.
}

Alice computes w₁ := (v₁g^–j₁)^i₂ (mod p) and w₂ := (v₂g^–j₂)^i₁ (mod p).

if (w₁ = w₂) {
Alice concludes that the signature is forged.
} else {
Alice concludes that Bob is trying to deny the signature.
}

RSA-based undeniable signature scheme

Gennaro, Krawczyk and Rabin’s undeniable signature scheme (the GKR scheme) is based on the (intractability of the) RSA problem.

Algorithm 5.59. GKR RSA undeniable signature generation

Input: The message M to be signed and the signer’s private key (e, d).

Output: The signature (M, s) on M.

Steps:

m := H(M).	/* Hash the message M to an element m of */
s := m^d (mod n).

Algorithm 5.60. GKR RSA undeniable signature verification

Input: A GKR signature (M, s) on a message M.

Output: Verification status of the signature.

Steps:

Alice computes m := H(M).

Alice chooses random i, .

Alice computes u := s²ⁱy^j (mod n).

Alice sends u to Bob.

Bob computes v := u^e (mod n).

Bob sends v to Alice.

Alice computes v′ := m²ⁱg^j (mod n).

Alice accepts the signature (M, s) if and only if v = v′.

Algorithm 5.61. GKR RSA undeniable signature: denial protocol

Input: A (purported) GKR signature (M, s) of Bob on a message M.

Output: One of the following decisions by Alice:

The signature is forged.
Bob is trying to deny the signature.

Steps:

Alice computes m := H(M).

Alice chooses random and .

Alice computes w₁ := mⁱg^j (mod n) and w₂ := sⁱy^j (mod n).

Alice sends (w₁, w₂) to Bob.

Bob computes m := H(M).

Bob determines such that the following congruence holds:

Equation 5.11

5.4.12. Signcryption

Algorithm 5.62. Sign-and-encrypt

s := f_s(M, d_a).

Generate a random symmetric key k.

c := f_e(k, e_b).

C := E(M, k).

Send (C, c, s) to the recipient.

Decrypt-and-verify

k := f_d(c, d_b).

M := D(C, k).

Verify the signature: f_v(M, s, e_a).

Table 5.3. Shortened digital signature algorithms
Name	Signature generation	Signature verification
SDSA1	s := H(g^d′ (mod p)‖M). t := d′(s + d)^–1 (mod r).	w := (e_ag^s)^t (mod p). Verify if s = H(w‖M).
SDSA2	s := H(g^d′ (mod p)‖M). t := d′(1 + ds)^–1 (mod r).	. Verify if s = H(w‖M).

Algorithm 5.63. Signcryption

Input: Plaintext message M, the sender’s private key d_a, the recipient’s public key

e_b = g^d_b (mod p).

Output: The signcrypted message (C, s, t).

Steps:

Algorithm 5.64. Unsigncryption

Input: The signcrypted message (C, s, t), the sender’s public key e_a = g^d_a (mod p) and the recipient’s private key d_b.

Output: The plaintext message M and the verification status of the signature.

Steps:

Write k := k₁ ‖ k₂ with |k₂| equal to the length of a symmetric key.

M := D(C, k₂).

/* Symmetric decryption */

if (KH(M‖N, k₁) = s) { Return “Signature verified”. }

else { Return “Signature not verified”. }

Exercise Set 5.4

5.15

Show how first pre-image resistance of the hash function H plays an important role for RSA signatures (with appendix) described in Section 5.4.1. More precisely, show that if it is easy to find a pre-image of any hash value, it is easy to generate a valid signature (M, s) from two valid signatures (M₁, s₁) and (M₂, s₂) with M ∉ {M₁, M₂}. This is often referred to as existential forgery of a signature. [H]
Describe how existential forgery is possible for the Rabin signature scheme. [H]
Describe how existential forgery is possible for the ElGamal signature scheme. [H]

5.16

5.17

Let G be a finite cyclic group of order n, and g a generator of G. Suppose that Alice’s private and public keys are respectively d and g^d.

Consider a variant of the ElGamal signature scheme, in which s is computed as in Algorithm 5.36, but the roles of d and d′ are interchanged in the generation of t, that is, the modified signature (s, ) on M is generated as:
s := g^d′,
:= d^–1[H(M) – d′H(s)] (mod n).

Write the verification routine for the modified scheme.
Show that forging modified ElGamal signatures is as difficult as computing discrete logarithms in G. You may assume that a forger can arrange d′ of her choice.
Explain why signature generation is (a bit) more efficient in the modified scheme. Suppose that because of this enhanced performance Alice decided to switch to the modified scheme, but for backward compatibility she maintained both the original signature (s, t) and the modified signature (s, ) on a message M. What went wrong?

5.18

Show that:

There are two valid ECDSA signatures on each message.
There are three valid XTR–DSA signatures on each message.

(Here we call a signature valid, if it passes the verification routine.)

5.19

Write the versions with message recovery of the RSA, Rabin, Schnorr and Nyberg–Rueppel signature schemes.
Describe the possibilities of existential forgery for these versions. (Since hash functions cannot be inverted, they are not used for signature schemes with message recovery, and so the problem of existential forgery is more acute in this case. To avoid such forgeries the signer should add some redundancy to each message block before signing the same. An existentially forged signature is likely to correspond to a message not containing the redundancy.)

5.20

5.21

Repeat Exercise 5.20 with the Schnorr digital signature scheme (Section 5.4.4).

5.22

Deduce that the determinant of the matrix M_c of Equation (5.9) is
Demonstrate that
[View full size image]

5.23

Let p, q, p′, q′ be distinct odd primes with p = 2p′ + 1 and q = 2q′ + 1, and let n := pq (as in the RSA-based undeniable signature scheme).

Let . Show that . [H]
Argue that there are exactly four elements in of order ≤ 2.
Let α ≢ ±1 (mod n) and ord_n α < p′q′. Show that gcd(α – 1, n) or gcd(α + 1, n) is a non-trivial divisor of n. How many such elements α does contain?
Let have order p′q′ or 2p′q′. Show that for every .
Look at the denial protocol for the GKR RSA signature scheme (Algorithm 5.61) and assume that p′ < q′. Suppose that (M, s) is a forged signature (that is, s ∉ Sig M) on some message M with . Show that s ≡ αm^d (mod n) for some with ord_n α ≥ p′. Deduce that ord_n(ms^–e) ≥ p′. Conclude that if 4k < p′, then there exists a unique (namely, i′ = i) for which Congruence (5.11) holds.

5.24

Write the shortened versions of ECDSA signature generation and verification.
Write the signcryption and unsigncryption algorithms based on shortened ECDSA.

5.5. Entity Authentication

5.5.1. Passwords

^[3] Informally speaking, a one-way function is one which is computationally infeasible to invert.

^[4] The data encryption standard (DES) is a well-known symmetric-key cipher (Section A.2.1).

5.5.2. Challenge–Response Algorithms

A challenge–response scheme based on encryption–decryption

Algorithm 5.65. Challenge–response authentication based on encryption

Bob generates a random bit string r and computes w := H(r).

Bob reads Alice’s (authentic) public key e and computes c := f_e(r, e).

Bob sends (w, c) to Alice.

Alice computes r′ := f_d(c, d).

if (H(r′) ≠ w) { Alice quits the protocol. }

Alice sends r′ to Bob.

Bob identifies Alice if and only if r′ = r.

A challenge–response scheme based on digital signatures

Algorithm 5.66. Challenge–response authentication based on signature

Bob selects a random string r_B.

Bob sends r_B to Alice.

Alice selects a random string r_A.

Alice generates the signature s := f_d(r_A‖r_B, d).

Alice sends (r_A, s) to Bob.

Bob reads Alice’s (authentic) public key e.

Bob retrieves the strings and satisfying .

Bob identifies Alice if and only if and .

Algorithm 5.67. Using timestamp in challenge–response authentication

Alice reads the local time t_A.

Alice generates the signature s := f_d(t_A, d).

Alice sends (t_A, s) to Bob.

Bob reads Alice’s (authentic) public key e.

Bob retrieves the time-stamp .

Bob identifies Alice if and only if and this timestamp is valid.

Mutual authentication

5.5.3. Zero-Knowledge Protocols

Algorithm 5.68. Mutual authentication

Bob selects a random string r_B.

Bob sends r_B to Alice.

Alice selects a random string r_A.

Alice generates the signature s_A := f_{d, A}(r_A‖r_B, d_A).

Alice sends (r_A, s_A) to Bob.

Bob reads Alice’s (authentic) public key e_A.

Bob retrieves the strings and satisfying .

Bob identifies Alice if and only if and .

Bob generates the signature s_B := f_{d, B}(r_B‖r_A, d_B).

Bob sends s_B to Alice.

Alice reads Bob’s (authentic) public key e_B.

Alice retrieves the strings and satisfying .

Alice identifies Bob if and only if and .

The Feige–Fiat–Shamir (FFS) protocol

The FFS protocol (Algorithm 5.69) is based on the intractability of computing square roots modulo a composite integer n. We take n = pq with two distinct primes p and q each congruent to 3 modulo 4.

Algorithm 5.69. Feige–Fiat–Shamir zero-knowledge protocol

Selection of domain parameters:

Select two large distinct primes p and q each congruent to 3 modulo 4.

n := pq.

Select a small integer t.

/* The probability of a successful cheat is 2^–t */

Selection of Alice’s secret:

Alice selects t random integers .

Alice selects t random bits .

Alice computes for i = 1, . . . , t.

Alice makes (y₁, . . . , y_t) public and keeps (x₁, . . . , x_t) secret.

The protocol:

Alice randomly chooses and .	/* Commitment */
Alice computes and sends to Bob w := (–1)^γc² (mod n).	/* Witness */
Bob randomly chooses and sends to Alice .	/* Challenge */
Alice computes and sends to Bob .	/* Response */

Bob computes (mod n).

Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ ≡ ±w (mod n).

The Guillou–Quisquater (GQ) protocol

Algorithm 5.70. Guillou–Quisquater zero-knowledge protocol

Selection of domain parameters:

Select two distinct large primes p and q and set the modulus n := pq.

Select an exponent and compute d := e^–1 (mod φ(n)).

The pair (n, e) is made public and d is kept secret.

Selection of Alice’s secret:

Alice selects a random and computes s := m^–d (mod n).

Alice makes m public and keeps s secret.

The protocol:

Alice selects a random .	/* Commitment */
Alice computes and sends to Bob w := c^e (mod n).	/* Witness */
Bob selects and sends to Alice a random .	/* Challenge */
Alice computes and sends to Bob r := cs^∊ (mod n).	/* Response */

Bob computes w′ := m^∊r^e (mod n).

Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ = w.

The Schnorr protocol

Algorithm 5.71. Schnorr zero-knowledge protocol

Selection of domain parameters:

Select a large prime p such that p – 1 has a large prime divisor q.

Select an element having multiplicative order q modulo p.

Publish (p, q, g).

Select a small integer t < lg q. /* The probability of a successful cheat is 2^–t */

Selection of Alice’s secret:

Alice chooses a random secret integer .

Alice computes and makes public the integer y := g^–d (mod p).

The protocol:

Alice chooses a random .	/* Commitment */
Alice computes and sends to Bob w := g^c (mod p).	/* Witness */
Bob selects and sends to Alice a random .	/* Challenge */
Alice computes and sends to Bob r := d∊ + c (mod q).	/* Response */

Bob computes w′ := g^ry^∊ (mod p).

Bob accepts Alice’s identity if and only if w′ = w.

Exercise Set 5.5

5.25

Describe how a zero-knowledge witness–challenge–response identification scheme can be converted to a signature scheme. [H]
Write the Feige–Fiat–Shamir, Guillou–Quisquater and Schnorr signature schemes based on the corresponding identification schemes.

5.26

Let n := pq with distinct primes p and q each congruent to 3 modulo 4.

Show that –1 is a quadratic non-residue modulo p and modulo q.
If is a quadratic residue modulo n, prove that a has exactly four square roots modulo n, of which exactly one is a quadratic residue modulo n.

A bad zero-knowledge protocol

Bob chooses a random and computes a := x⁴ (mod n).

Bob sends a to Alice.

Alice computes four square roots of a modulo n and picks up the unique
square root b which is a quadratic residue modulo n.

Alice sends b to Bob.

Bob accepts Alice’s claim if and only if b ≡ x² (mod n).

Conclude that this is not a good zero-knowledge protocol, by demonstrating that Bob can maliciously send a bad a to Alice so that during the execution of the protocol he gathers enough information to factor n. [H]

6.2. IEEE Standards

Table 6.1. IEEE drafts on public-key cryptography
Draft	Date	Description
P1363 / D13	12 November 1999	Traditional public-key cryptography based on IFP, DLP and ECDLP
P1363a/D12	16 July 2003	Additional techniques on traditional public-key cryptography
P1363.1/D4	7 March 2002	Lattice-based cryptography
P1363.2/D15	25 May 2004	Password-based authentication
P1363.3/D1	May 2008	Identity-based public-key cryptography

6.2.1. The Data Types

Bit strings

Octet strings

Integers

Prime finite fields

Finite fields of characteristic 2

Extension fields of odd characteristics

* Elliptic curves

An elliptic curve defined over a finite field is specified by two elements a, . Depending on the characteristic of this pair defines the following curves.

If char , 3, then 4a³ + 27b² must be non-zero in and the equation of the elliptic curve is taken to be Y² = X³ + aX + b.

Finally, if has characteristic 3, then both a and b must be non-zero in and the elliptic curve Y² = X³ + aX² + b is specified by (a, b).

* Elliptic curve points

The SORT compressed form is used for q = p^m, m > 1. Let P′ = (h, k′) be the opposite of P = (h, k), that is, One converts k and k′ to integers and using the FE2I primitive and sets .

* Convolution polynomial rings

6.2.2. Conversion Among Data Types

Figure 6.1. IEEE P1363 data types and conversions

We now provide a brief description of the data conversion primitives at a logical level. The implementation details depend on the representations of the data types and are left out here.

Converting bit strings to octet strings (BS2OS)

Every extra bit added must be the zero bit.
Add the minimal number of extra bits.
Add the extra bits, if any, to the left.^[1]
^[1] At the time of writing this book there is a serious conflict between the latest drafts of P1363 and P1363.1 from IEEE. The former asks to add extra bits to the left, the latter to the right. One of the authors of this book raised this issue in the discussion group stds-p1363-discuss maintained by IEEE and was notified that in the next version of the P1363.1 document this conflict would be resolved in favour of P1363.

Converting octet strings to bit strings (OS2BS)

a_0,0a_0,1 . . . a_0,7a_1,0a_1,1 . . . a_1,7 . . . a_d–1,0a_d–1,1 . . . a_d–1,7

Converting integers to bit strings (I2BS)

n = a_l–12^l–1 + a_l–22^l–2 + · · · + a₁2 + a₀ with .

^[2] Each a_i is logically an integer which happens to assume one of two possible values: 0 and 1. A bit, on the other hand, is a quantity that can also assume only two possible values. Traditionally, the values of a bit are also denoted by 0 and 1. But one has the liberty to call these values off and on, or false and true, or black and white, or even armadillo and platypus. To many people, bit is an abbreviation for binary digit which our a_is logically are. To others, binit is a safer and more individualistic acronym for binary digit. For I2BS, we identify the two concepts.

Converting bit strings to integers (BS2I)

Converting integers to octet strings (I2OS)

In order to convert a non-negative integer n to an octet string of length d, we write the base-256 expansion of n as

n = A_d–1256^d–1 + A_d–2256^d–2 + · · · + A₁256 + A₀,

Converting octet strings to integers (OS2I)

Converting field elements to octet strings (FE2OS)

If char is odd, β is represented as an integer in {0, 1, . . . , q – 1}. FE2OS converts β to an octet string of length ⌈log₂₅₆ q⌉ by calling the primitive I2OS.

If q = 2^m, β is represented as a bit string of length m. The primitive BS2OS is called to convert β to an octet string.

Converting octet strings to field elements (OS2FE)

Assume that an octet string is to be converted to an element of the finite field . Again we have two possibilities depending on q.

If is of odd characteristic, the primitive OS2I is called to convert the given octet string to an integer. This integer is returned as the field element.

Converting field elements to integers (FE2I)

* Converting elliptic curve points to octet strings (EC2OS)

S = 1 if and only if the SORT compression is used.
U = 1 if and only if uncompressed or hybrid form is used.
C = 1 if and only if compressed or hybrid form is used.
= if compression is used, it is 0 otherwise.

Table 6.2. The EC2OS primitive
Representation	PC	H	K	q
uncompressed	0000 0100	FE2OS(h, l)	FE2OS(k, l)	All
LSB compressed		FE2OS(h, l)	Empty	p, 2^m
LSB hybrid		FE2OS(h, l)	FE2OS(k, l)	p, 2^m
SORT compressed		FE2OS(h, l)	Empty	2^m, p^m
SORT hybrid		FE2OS(h, l)	FE2OS(k, l)	2^m, p^m
lossy compression	0000 0001	FE2OS(h, l)	Empty	All
point at infinity	0000 0000	Empty	Empty	All

* Converting octet strings to elliptic curve points (OS2EC)

* Converting ring elements to octet strings (RE2OS)

An example: Let n = 7 and β = 128. The ring element a(x) = 2 + 11x + 101x³ + 127x⁴ + 71x⁵ = (2, 11, 0, 101, 127, 71, 0) is converted to the octet string 02 0b 00 65 7f 47 00.

* Converting octet strings to ring elements (OS2RE)

* Converting ring elements to bit strings (RE2BS)

a_0,0a_0,1 . . . a_0,t–1 a_1,0a_1,1 . . . a_1,t–1 . . . a_n–1,0a_n–1,1 . . . a_n–1,t–1

of length nt is then returned by RE2BS.

* Converting bit strings to ring elements (BS2RE)

We urge the reader to verify that BS2RE with β = 128 and the bit string

0000010 0001011 0000000 1100101 1111111 1000111 0000000

as input produces the ring element .

* Converting binary elements to octet strings (BE2OS)

* Converting octet strings to binary elements (OS2BE)

^[3] In this case, it still makes full algebraic sense to treat a(x) as an element of R, though not in the canonical representation.

6.3. RSA Standards

Table 6.3. Public-key cryptography standards from the RSA Laboratories
Document	Description
PKCS #1	RSA encryption and signature
PKCS #2	Merged with PKCS #1
PKCS #3	Diffie–Hellman key exchange
PKCS #4	Merged with PKCS #1
PKCS #5	Password-based cryptography
PKCS #6	Extension of X.509 public-key certificates
PKCS #7	Syntax of cryptographic messages
PKCS #8	Syntax and encryption of private keys
PKCS #9	Attribute types for use in PKCS #6, #7, #8 and #10
PKCS #10	Syntax for certification requests
PKCS #11	Cryptoki, an application programming interface (API)
PKCS #12	Syntax of transferring personal information (private keys, certificates and so on)
PKCS #13	Elliptic curve cryptography (under preparation)
PKCS #15	Syntax for cryptographic token (like integrated circuit card) information

6.3.1. PKCS #1

RSA keys

p	=	r₁
q	=	r₂
dP	≡	e^–1 (mod p – 1)
dQ	≡	e^–1 (mod q – 1)
qInv	≡	q^–1 (mod p)
d_i	≡	e^–1 (mod r_i – 1)
t_i	≡	(r₁ . . . r_i–1)^–1 (mod r_i)

For the sake of consistency, one should store the CRT coefficient (mod r₂), that is, p^–1 (mod q). In order to ensure compatibility with older versions of PKCS, q^–1 (mod p) is stored instead.

RSA key operations

Algorithm 6.1. RSA encryption/signature verification primitive

Input: RSA public key (n, e) and message/signature representative x.

Output: The ciphertext/message representative y.

Steps:

if (x < 0) or (x ≥ n) { Return “Error: representative out of range”. }

y := x^e (mod n).

Algorithm 6.2. RSA decryption/signature generation primitive

Input: RSA private key K and the ciphertext/message representative y.

Output: The message/signature representative x.

Steps:

RSAES–OAEP encryption scheme

Table 6.4. Hash values of the empty string
Function	Hash of the empty string
SHA-1	`da39a3ee 5e6b4b0d 3255bfef 95601890 afd80709`
SHA-256	`e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855`
SHA-384	`38b060a7 51ac9638 4cd9327e b1b1e36a 21fdb711 14be0743 4c0cc7bf 63f6e1da 274edebf e76f65fb d51ad2f1 4898b95b`
SHA-512	`cf83e135 7eefb8bd f1542850 d66d8007 d620e405 0b5715dc 83f4a921 d36ce9ce 47d0d13c 5d85f2b0 ff8318d2 877eec2f 63b931bd 47417a81 a538327a f927da3e`

Algorithm 6.3. RSA–OAEP encryption scheme

Input: The recipient’s public key (n, e), the message M (an octet string of length mLen) and an optional label L whose default value is the empty string.

Output: The ciphertext C of octet length k.

Steps:

/* Check lengths */

if (L is longer than what H can handle) { Return “Error: label too long”. }

/* For example, for SHA-1 the input must be of length ≤ 2⁶¹ – 1 octets. */

if (mLen > k – 2hLen – 2) { Return “Error: message too long”. }

/* Encode M to EM (EME–OAEP encoding scheme) */

EM := EME-OAEP-encode(M, L).	/* Algorithm 6.4 */
/* RSA encryption */
m := OS2I(EM).	/* Convert octet string to integer */
c := RSAEP((n, e), m).	/* RSA encryption primitive */
C := I2OS(c, k).	/* Convert integer back to octet string */

Algorithm 6.4. RSA–OAEP encoding scheme

Input: The message M of octet length mLen, the label L.

Output: The EME–OAEP encoded message EM.

Steps:

lHash := H(L).

Generate the padding string PS with k – mLen – 2hLen – 2 zero octets.

Generate the data block DB := lHash ‖ PS ‖ 01 ‖ M.

Let seed := a random string of length hLen octets.

Generate the data-block mask dbMask := MGF(seed, k – hLen – 1).

Generate the masked data-block maskedDB := DB ⊕ dbMask.

Generate mask for seed seedMask := MGF(maskedDB, hLen).

Generate the masked seed maskedSeed := seed ⊕ seedMask.

Generate the encoded message EM := 00 ‖ maskedSeed ‖ maskedDB.

Algorithm 6.5. RSA–OAEP decryption scheme

Input: The recipient’s private key K, the ciphertext C to be decrypted and an optional label L (the default value of which is the null string).

Output: The decrypted message M.

Steps:

if (the length of L is more than the limitation of H) or (the length of C is not k octets)
or (k < 2hLen + 2) { Return “Decryption error”. }

c := OS2I(C).	/* Convert octet string to integer */
m := RSADP(K, c).	/* RSA decryption primitive */
EM := I2OS(m, k).	/* Convert integer back to octet string */
M := EME-OAEP-decode(EM, L).	/* Algorithm 6.6 */

Algorithm 6.6. RSA–OAEP decoding scheme

Input: The encoded message EM and the label L.

Output: The EME–OAEP decoded message M.

Steps:

RSASSA–PSS signature scheme with appendix

Algorithm 6.7. RSASSA–PSS signature generation

Input: The message M (an octet string) to be signed, the private key K of the signer.

Output: The signature S (an octet string of length k).

Steps:

EM := EMSA–PSS–encode(M, modBits – 1).	/* Encode by Algorithm 6.8 */
m := OS2I(EM).	/* Convert octet string to integer */
s := RSASP1(m).	/* RSA signature generation primitive */
S := I2OS(s, k).	/* Convert integer back to octet string */

Algorithm 6.8. RSASSA–PSS encoding

Input: The message M to be encoded (an octet string), the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9.

Output: The encoded message EM, an octet string of length emLen := ⌈emBits/8⌉.

Steps:

if (M is longer than what H can handle) { Return “Error: message too long”. }

Generate the hashed message mHash := H(M).

if (emLen < hLen + sLen + 2) { Return “Encoding error”. }

Let salt := a random string of length sLen octets.

Generate the salted message M′ := 00 00 00 00 00 00 00 00 ‖ mHash ‖ salt.

Generate the hashed salted message mHash′ := H(M′).

Generate the padding string PS with emLen – sLen – hLen – 2 zero octets.

Generate the data block DB := PS ‖ 01 ‖ salt.

Generate the data block mask dbMask := MGF(mHash′, emLen – hLen – 1).

Generate the masted data block maskedDB := DB ⊕ dbMask.

Set to 0 the leftmost 8emLen – emBits bits of the leftmost octet of maskedDB.

Compute EM := maskedDB ‖ mHash′ ‖ bc.

Algorithm 6.9. RSASSA–PSS signature verification

Input: The message M, the signature S to be verified and the signer’s public key (n, e).

Output: Verification status of the signature.

Steps:

if (the length of S is not k octets) { Return “Signature not verified”. }

s := OS2I(S).	/* Convert octet string to integer */
m := RSAVP1((n, e), s).	/* RSA signature verification primitive */
EM := I2OS(m, emLen).	/* Convert integer back to octet string */
status := EMSA–PSS–decode(M, EM, modBits – 1).	/* Algorithm 6.10 */

if (status is “consistent”) { Return “Signature verified”. }

else { Return “Signature not verified”. }

Algorithm 6.10. RSASSA–PSS decoding

Output: Decoding status: “consistent” or “inconsistent”.

Steps:

A mask-generation function

Algorithm 6.11. Mask-generation function MGF1

Input: The seed mg f Seed (an octet string) and the desired octet length maskLen of the output mask. One requires maskLen ≤ 2³²hLen, where hLen is the octet length of the hash function output.

Output: An octet string mask of length maskLen.

Steps:

The RSA encryption scheme of PKCS #1, Version 1.5

Algorithm 6.12. RSA–PKCS1 encryption scheme

Input: The recipient’s public key (n, e) and the message M (an octet string).

Output: The ciphertext C which is an octet string of length k.

Steps:

m := OS2I(EM).	/* Convert octet string to integer */
c := RSAEP((n, e), m).	/* RSA encryption primitive */
C := I2OS(c, k).	/* Convert integer back to octet string */

Algorithm 6.13. RSA–PKCS1 decryption scheme

Input: The recipient’s private key K and the ciphertext C (an octet string).

Output: The plaintext message M (an octet string of length ≤ k – 11).

Steps:

if (the length of the ciphertext is not k octets) { Return “decryption error”. }

c := OS2I(C).	/* Convert octet string to integer */
m := RSADP(K, c).	/* RSA decryption primitive */
EM := I2OS(m, k).	/* Convert integer back to octet string */

Try to decompose EM = 00 ‖ 02 ‖ PS ‖ 00 ‖ M, where PS is an octet string of length ≥ 8 and containing only non-zero octets.

if (the above decomposition is unsuccessful) { Return “decryption error”. }

The RSA signature scheme of PKCS #1, Version 1.5

Table 6.5. The string hashAlgo used by EMSA–PKCS1–v1_5
Function	The string hashAlgo
MD2	`30 20 30 0c 06 08 2a 86 48 86 f7 0d 02 02 05 00 04 10`
MD5	`30 20 30 0c 06 08 2a 86 48 86 f7 0d 02 05 05 00 04 10`
SHA-1	`30 21 30 09 06 05 2b 0e 03 02 1a 05 00 04 14`
SHA-256	`30 31 30 0d 06 09 60 86 48 01 65 03 04 02 01 05 00 04 20`
SHA-384	`30 41 30 0d 06 09 60 86 48 01 65 03 04 02 02 05 00 04 30`
SHA-512	`30 51 30 0d 06 09 60 86 48 01 65 03 04 02 03 05 00 04 40`

Algorithm 6.14. RSA–PKCS1 signature generation

Input: The signer’s private key K and the message M to be signed (an octet string).

Output: The signature S (an octet string of length k).

Steps:

Encode M to EM := EMSA–PKCS1–v1_5(M, k).	/* Algorithm 6.16 */
m := OS2I(EM).	/* Convert octet string to integer */
s := RSASP1(K, m).	/* RSA signature generation primitive */
S := I2OS(s, k).	/* Convert integer back to octet string */

Algorithm 6.15. RSA–PKCS1 signature verification

Input: The signer’s public key (n, e), the message M (an octet string) and the signature S to be verified (an octet string of length k).

Output: Verification status of the signature.

Steps:

if (the length of S is not k octets) { Return “Signature not verified”. }

s := OS2I(S).	/* Convert octet string to integer */
m := RSAVP1((n, e), s).	/* RSA signature verification primitive */
EM′ := I2OS(m, k).	/* Convert integer back to octet string */
Encode M to EM := EMSA–PKCS1–v1_5(M, k).	/* Algorithm 6.16 */

if (EM = EM′) { Return “Signature verified”. }

else { Return “Signature not verified”. }

Algorithm 6.16. EMSA–PKCS1 encoding

Output: The encoded message EM (an octet string of length emLen).

Steps:

6.3.2. PKCS #3

During a Diffie–Hellman key-exchange interaction of Alice with Bob, Alice performs the steps described in Algorithm 6.17. Bob performs an identical operation which is omitted here.

Algorithm 6.17. PKCS3 Diffie–Hellman key-exchange scheme

Input: p, g and optionally l.

Output: The shared secret SK (an octet string of length k).

Steps:

Alice generates a random .

/* If l is specified, one should have 2^l–1 ≤ x < 2^l. */

Alice computes y := g^x (mod p).

Alice converts y to an octet string PV := I2OS(y, k).

Alice sends the public value PV to Bob.

Alice receives Bob’s public value PV′.

Alice converts PV′ to the integer y′ := OS2I(PV′).

Alice computes z := (y′)^x (mod p) (with 0 < z < p).

Alice transforms z to the shared secret SK := I2OS(z, k).

7.2. Side-Channel Attacks

7.2.1. Timing Attack

Paul C. Kocher introduced the concept of side-channel cryptanalysis in his seminal paper [155] on timing attacks. Though not unreasonable, timing attacks are somewhat difficult to mount in practice.

Details of the attack

The private-key operation in many cryptographic systems (like RSA or discrete-log-based systems) is usually a modular exponentiation of the form

y := x^d (mod n),

Algorithm 7.1. RSAREF’s exponentiation routine

Input: , and d = (d_2l–1d_2l–2 · · · d₁d₀)₂.

Output: y := x^d (mod n).

Steps:

Equation 7.1

Equation 7.2

Equation 7.3

Equation 7.4

On the other hand, for an incorrect guess g we have:

if one of m_i,j′ or is zero, or

if both m_i,j′ and are non-zero. (Recall that Var(αX + βY) = α² Var(X) + β² Var(Y) for any real α, β.)

Equation 7.5

An analysis done by Kocher (neglecting E and assuming normal distributions for S and M) shows that Carol needs k = O(l) for a good probability of success.

Countermeasures

There are several ways in which timing attacks can be prevented.

If every multiplication step takes exactly the same time and so does every squaring step, the above timing attack does not work. Thus, forcing each multiplication and each squaring take the same respective times independent of their operands disallows Carol to mount the timing attack. Making m_i,j constant alone does not suffice, for difference in square timings can be exploited in subsequent iterations to correct a guess. Forcing every operation take exactly the same time as the slowest possibility makes the implementation run slower. Moreover, finding the slowest possibility may be difficult.
Interleaving random delays also makes timing attacks difficult to mount, because the attacker then requires more number of samples in order to smooth out the effect of the delays. But again adding delays harms performance and does not completely rule out the possibility of timing attacks.
Perhaps the best strategy to thwart timing attacks is to use a random pair (u, v) with v := u^–d (mod n) for each private-key operation. Initially x is multiplied by u and then the product ux is exponentiated to get u^dx^d ≡ v^–1y (mod n). Multiplication by v then yields the desired y. A new random pair (u, v) must be used for every exponentiation. However, the exponentiation v := u^–d (mod n) is too costly to be performed during every private-key operation and may itself invite timing attacks. A good trade-off is to choose (u, v) once, keep it secret and for the next private-key operation update (and replace) the old (u, v) by (u′, v′) with u′ ≡ u^e (mod n) and v′ ≡ v^e (mod n) for some small e (random or deterministic). The choice e = 2 is quite satisfactory in practice—performing two modular squares is much cheaper than computing the full exponentiation v := u^–d (mod n).

7.2.2. Power Analysis

Simple power analysis (SPA)

^[1] SPA traces from real-life experiments on smart cards, as reported in several references, look similar to this. We, however, generated the trace using a random number generator. Absolute conformity to reality is not always crucial for the purposes of illustration.

Figure 7.1. Simulated SPA trace for a portion of an RSA private-key operation

Let us denote a squaring operation by S and a multiplication operation by M. We observe that Alice’s smart card performs the sequence

SMSMSSMSSSSMSSSMSS

(SM)(SM)(S)(SM)(S)(S)(S)(SM)(S)(S)(SM)(S)(S.

This, in turn, reveals the bit string 110100010010 in Alice’s private key.

Algorithm 7.2. SPA-resistant exponentiation

Input: , and the private key d = (d_l–1 · · · d₁d₀)₂.

Output: y := x^d (mod n).

Steps:

y := 1.
for (j = l – 1, . . . , 0) {
    t₀ := y² (mod n).
    t₁ := t₀x (mod n).
    y := t_dj.
}

If some private-key algorithm has unavoidable branchings due to individual bits in the private key, SPA can prove to be a notorious botheration.

Differential power analysis (DPA)

^[2] The exact step which exhibits differential bias toward an individual bit value is dependent on the implementation. If the implementation does not provide such a step, the attack cannot be mounted in this way. Initially, the DPA was proposed for DES, a symmetric encryption algorithm, in which such a dependence is clearly available. With asymmetric-key encryption, such a strong dependence of the power, consumed by a step, on an individual bit value is not obvious. One may, however, use other dividing criteria, like low versus high Hamming weight (that is, number of one-bits) in the operand, which bear more direct relationships with power consumption.

Carol partitions {1, . . . , k} into two subsets:

I₀	:=	{i \| b_i = 0},
I₁	:=	{i \| b_i = 1}.

Carol computes the average power traces and and subsequently the differential power trace

On the other hand, if d_j′ = 1, the value never appears in the execution of the algorithm and so at every time t the power consumption is uncorrelated to the particular bit of and so we expect

Δ(t) ≈ 0

for all t.

^[3] Once again, these are hypothetical traces obtained by random number generators.

Figure 7.2. Simulated DPA trace for a portion of an RSA private-key operation

(a) for the correct guess
(b) for an incorrect guess

Countermeasures

Several countermeasures can be adopted to prevent DPA, both in the software level and in the hardware level.

Interleaving random delays between instructions destroys the alignment of the time τ in different power traces. Using a clock with randomly varying tick-rate has a similar effect. The delays should be such that they cannot be easily analyzed and subsequently removed. Random delays increase the number of samples required for a successful DPA to an infeasible value.
Suitable implementations of the power-critical steps destroy the power consumption signature of these steps. For example, one may go for an implementation that exhibits a constant power consumption pattern irrespective of the operands. Another possibility is replacement of complex critical instructions by atomic instructions (like assembly instructions) for which the dependence of power consumption on operands is less or difficult to analyze. However, the assumption that one can measure power at any resolution (perhaps at infinite resolution, say, using an analog device) indicates that this countermeasure challenges only the attacker’s budget.
Masking (x, y) by multiplying with (u, v) (as we did to prevent timing attacks) also eliminates chances of mounting successful DPA. One has to use a fresh mask for each private-key operation. Random unknown masks destroy the correlation of the bit values b_i with power consumption. That is, the chosen bit b_i of behaves randomly in relation to the same bit of (u_ix_i)^4e and so the differential power trace no longer leaks the bias Π₁ – Π₀.
Another strategy to foil DPA is to use randomization in the private exponent d. Instead of computing y := x^d (mod n), one chooses a small random integer r (typically of bit size ≤ 20) and computes y := x^d+rh (mod n), where h is φ(n) for RSA or the order of the discrete-log (sub)group. Since d = O(h) typically, the performance of the exponentiation routine does not deteriorate much. But random values of r during different private-key operations change the exponent bits in an unpredictable manner.
Quick changes in the exponent (the private key, that is, the key pair) also prevent the attacker to gather sufficiently many power traces for mounting a successful DPA. A key-use counter can be employed for this purpose. Whenever a given private key has been used on a small predetermined number of occasions, the key pair is updated.
Hardware shielding of the decrypting device also reduces DPA possibilities. For example, in-chip buffers between the external power source and the chip processor have been proposed to mask off the variation of internal power from external measurements. Such hardware countermeasures are, in general, somewhat costlier than software countermeasures.

Paul Kocher asserts: DPA highlights the need for people who design algorithms, protocols, software, and hardware to work closely together when producing security products.

7.2.3. Fault Analysis

Transient faults These are faults caused by random (unpredictable) hardware malfunctioning. These may be the outcomes of occasional flips of bit values in registers or of temporary erroneous outputs from logic or arithmetic circuits in the processor. These faults are called transient, because they are not repeated. It is rather difficult to detect such (silent) faults.
Latent faults These are faults generated by some permanent malfunctioning and/or bugs inherent in the processor. For example, the floating-point bug in the early releases of the Pentium processor may lead to latent faults. Latent faults are permanent, that is, repeated, but may be difficult to locate in practice.
Induced faults An induced fault is deliberately caused by an adversary. For example, a short surge of electromagnetic radiation may cause a smart card to malfunction temporarily. A malicious adversary can induce such temporary hardware faults to extract secret information from the smart card. It is, however, difficult to induce deliberate faults in a remote workstation.

Fault attack on RSA based on CRT

This is how the fault analysis of Boneh et al. [30] works.

Fault attack on RSA without CRT

s_i ≡ m^2ⁱ (mod n)

for i = 0, 1, . . . , l – 1.

The signature s can be written as:

and so

Fault attack on the Rabin digital signature algorithm

Fault attack on DSA

Suppose that during the generation of a DSA signature, an attacker induces a fault in exactly one bit position of d changing it to . The routine generates the faulty signature , where

(d′, g^d′) being the session key pair (not mutilated). As in the DSA signature-verification scheme, the attacker computes the following:

For each i = 0, . . . , l – 1 (where the bit length of d is l), the attacker also computes

Assume that the j-th bit d_j of d is altered. If d_j = 0, and so

On the other hand, if d_j = 1, then and a similar calculation shows that

Thus, the attacker computes and for all j = 0, . . . , l – 1 and notices a unique match (with s). This discloses the position j and the corresponding bit d_j.

Fault attack on the ElGamal signature scheme

In order to generate a signature (s, t) on a message m, a random session key d′ is generated and subsequently the following computations are carried out:

s	≡	g^d′ (mod p),
t	≡	d′^–1(H(m) – dH(s)) (mod p – 1).

d ≡ H(s)^–1(H(m) – d₀t) (mod p – 1).

Here, we assume that H(s) is invertible modulo p – 1.

d₀ ≡ (t₁ – t₂)^–1(H(m₁) – H(m₂)) (mod p – 1),

which, in turn, yields

d ≡ H(s₀)^–1(H(m₁) – d₀t₁) (mod p – 1).

Fault attack on the Feige–Fiat–Shamir identification protocol

Let us conclude our repertoire of fault attack examples by explaining an attack on the FFS zero-knowledge identification protocol. This attack is again from Boneh et al. [30].

Bob then generates a random challenge as usual. Upon reception of this challenge Alice computes and sends to Bob the faulty response

The knowledge of now aids Bob to obtain the product as follows. First, note that

so that

for some

The correctness of the guess (E, δ) can be verified from the relation w ≡ c² (mod n). Bob can now compute the desired product

Bob repeats the above procedure t times in order to generate the system:

Equation 7.6

Here, ∊_ki and T_k are known to Bob. Moreover, the exponents ∊_ki can be so selected that the matrix (∊_ki) is invertible modulo 2. In order to determine x₁, Bob tries to find satisfying

for some integers v₁, . . . , v_t. Comparing the coefficients gives the linear system

which can be solved for u₁, . . . , u_t, since the matrix (∊_ki) is invertible modulo 2. The solution gives v₁, . . . , v_t and hence

Similarly, x₂, . . . , x_t can be determined up to sign. Plugging in these values of x_i in System (7.6) and solving another linear system modulo 2 gives the exact signs of all x_i.

Countermeasures

One obvious general strategy is to perform the private-key operation twice and compare the results from the two executions. If the two results disagree, a fault must have taken place. It is then necessary to restart the computation from the beginning. This strategy slows down the implementation by a factor of two. Moreover, latent (permanent) faults cannot be detected by this method—the same error creeps in during every run.
It is sometimes easier to verify the correctness of the output by performing the reverse operation. For instance, after an RSA signature s ≡ m^d (mod n) is generated, one can check whether m ≡ s^e (mod n). If so, one can be reasonably confident about the correctness of s. If the RSA encryption exponent is small (like 3 or 257), this verification is quite efficient.
Ad hoc algorithm-specific tricks often offer effective and efficient checks for errors. Shamir [268] proposes the following check for CRT-based RSA signature generation. One chooses a small random prime r (say, of length ~ 32 bits) and computes s₁ ≡ m^d (mod pr) and s₂ ≡ m^d (mod qr). If s₁ ≢ s₂ (mod r), then one or both of the exponentiations went wrongly. If, on the other hand, s₁ ≡ s₂ (mod r), then s₁ (mod p) and s₂ (mod q) are combined by the CRT.
Maintaining extraneous error-checking data can guard against random bit flips. Parity check bits can detect the existence of single bit flips. Retaining a verbatim copy of a secret information d and comparison of the two copies at strategic instants can help detect more serious faults. It appears unlikely that both the copies can be affected by faults in exactly the same way. For discrete-log-based systems, maintaining d^–1 in tandem with d appears to be a sound approach. Since the bits of d^–1 are not in direct relationship with those of d, an attack on d cannot easily produce the relevant changes in d^–1. As an example, consider the attack on DSA effected by toggling a bit of the secret key d. The second part of the signature can be generated in two ways: by computing t₁ ≡ d′^–1(H(m) + ds) (mod r) using d, and by computing t₂ ≡ d′^–1(d^–1)^–1(d^–1H(m) + s) (mod r) using d^–1. If t₁ ≡ t₂ (mod r), we can be pretty confident that this common value is the correct signature.
Appending random strings to the messages being signed also prevents timing attacks. Such random strings are not known to the adversary and cannot be easily recovered by the verification routine on a faulty signature. Also in this case the signer signs different strings on different occasions, even when the message remains the same.
Hardware countermeasures can also be adopted. Adequately shielded cards resist induced faults. In a situation described by Zheng and Matsumoto, the card should refuse to work instead of generating constant random bits. In the scenario of fault analysis, it, however, appears that robustness can be implanted easily at the software level. At any rate, sloppy hardware designs are never advocated.

Exercise Set 7.2

7.1	Consider the notations of Section 7.2.1. Assume that m_i,j is constant for all i, j (and irrespective of d_2j+1d_2j), but the square times s_i,j and t_i,j vary according to their operands. Device a timing attack on such a system.
7.2	Show that under reasonable assumptions the SPA-resistant Algorithm 7.2 can be crypt-analyzed by timing attacks.
7.3	Recall that SPA of Algorithm 7.1 may leak partial information on the private key (some 00 sequences in the key). Rewrite the algorithm to prevent this leakage.
7.4	Assume that in Bao et al.’s attack on RSA described in the text, the attacker can induce faults in exactly two bit positions of d. Suggest how the two bits of d at these positions can be revealed from the resulting faulty signature.
7.5	Consider a variant of the Bao et al.’s attack on RSA described in the text, in which the valid signature s on m is unknown to the attacker. Explain how the position j of the erroneous bit and the bit d_j at this position can still be identified. [H]
7.6	Bao et al. [17] propose an alternate fault analysis on RSA with square-and-multiply exponentiation. Use the notations (n, e, d, m, s, s_i) as in the text. Assume that the attacker knows an (m, s) pair and can induce a fault in exactly one of the values s_j (and nowhere else) and generate the corresponding faulty signature. Suggest a strategy how the position j and the bit d_j can be recovered in this case.
7.7	Propose a fault attack on the ElGamal signature scheme (Algorithms 5.36 and 5.37), similar to the attack on DSA described in the text.

7.3. Backdoor Attacks

To a user, keys generated by the contaminated system should be indistinguishable from those generated by an honest version of the cryptosystem. For example, the parameters and keys must look sufficiently random.
Keys generated by the contaminated system should satisfy the input/output requirements of an honest system. For example, for the RSA cryptosystem the user should be allowed to opt for small public exponents.
A contaminated key generation procedure should not run (on an average) much slower than the honest procedure.
The designer (and nobody else) should have the exclusive capability of determining the secret information from a contaminated published public key.
A user (other than the designer), detecting or suspecting information leakage from a contaminated system, may reverse-engineer the binaries or the smart card to identify the contaminated key generation procedure. The user may even be given the source code of the contaminated routine. Still the user should not be able to steal keys from other users of the same contaminated system. In this sense, a good backdoor protects the designer universally.
A stronger requirement is that reverse-engineering (or source code) should also not allow a user to distinguish (in poly-time) between keys generated by the contaminated procedure and those generated by a genuine procedure. It is exclusively the designer who should possess the capability to make such distinctions in poly-time.

\|n\|	=	the bit length of n.
lsb_k(n)	=	the least significant k bits of n.
msb_k(n)	=	the most significant k bits of n.
(a₁ ‖ a₂ ‖ · · · ‖ a_r)	=	the concatenation of the bit strings a₁, a₂, . . . , a_r.

7.3.1. Attacks on RSA

Hiding prime factor

Algorithm 7.3. A simple backdoor attack on RSA

Input:

Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d).

Steps:

Algorithm 7.4. Retrieving the secret exponent

Input: An RSA public key (n, e).

Output: The corresponding secret (p, q, d) or failure.

Steps:

p := f_d(e).
if (p|n) {
    q := n/p.
    φ := (p – 1)(q – 1).
    d := e^–1 (mod φ).
    Return (p, q, d).
} else {
    /* The key is not generated by Algorithm 7.3 */
    Return failure.
}

Algorithm 7.5. Backdoor attack on RSA: Young and Yung’s PAP scheme

Input: .

Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d).

Steps:

Algorithm 7.6. Retrieving the prime divisor

Input: An RSA public key (n, e) with n = pq.

Output: The prime divisor p of n or failure.

Steps:

Hiding small private exponent

Algorithm 7.7. Backdoor attack on RSA: small private exponent

Input: .

Output: An RSA modulus n = pq with |n| = k and a key pair (e, d).

Steps:

Algorithm 7.8 retrieves d from a public key (n, e) generated by Algorithm 7.7.

Algorithm 7.8. Retrieving the secret exponent

Input: An RSA public key (n, e) generated by Algorithm 7.7.

Output: The corresponding private key d.

Steps:

∊ := f_d(e). /* Recover the hidden exponent */
Use Boneh and Durfee’s algorithm to recover δ ≡ ∊^–1 (mod φ(n)).
Use ∊ and δ to compute φ(n).
Compute d ≡ e^–1 (mod φ(n)).

The correctness of Algorithm 7.8 is evident. In order to see how the knowledge of ∊ and δ reveals φ(n), note that x := ∊δ – 1 is a multiple of φ(n); that is,

Equation 7.7

x = an + b = (a + 1)n – (n – b)

with a = x quot n and b = x rem n, comparison with Equation (7.7) reveals that l = a + 1. This gives φ(n) = x/l.

Hiding small public exponent

Algorithm 7.9. Backdoor attack on RSA: small public exponent

Input: and .

Output: An RSA modulus n = pq with |n| = k and a key pair (e, d).

Steps:

Algorithm 7.10. Retrieving the secret exponent

Input: An RSA public key (n, e) generated by Algorithm 7.9 and the matching .

Output: The corresponding private key d.

Steps:

7.3.2. An Attack on ElGamal Signatures

The attack proceeds by letting d₁ arbitrary, but by taking

d₂ ≡ (g^D)^d₁ (mod p).

Since , we have

that is,

7.3.3. An Attack on ElGamal Encryption

Let (d₁, r₁) and (d₂, r₂) be two session keys used by Alice, where

r₁	≡	g^d₁ (mod p),
r₂	≡	g^d₂ (mod p).

The contaminated routine that generates the session keys uses a fixed odd integer u, a hash function H and a random bit to generate d₂ from d₁ as follows:

z	≡	g^d₁+ub(g^D)^d₁ (mod p),
d₂	≡	H(z) (mod p – 1).

The attacker knows r₁ and r₂ by eavesdropping. She computes d₂ by Algorithm 7.11 , the correctness of which is established from that .

Algorithm 7.11. Backdoor attack on ElGamal encryption

7.3.4. Countermeasures

Exercise Set 7.3

7.8

Argue that reverse engineering the PAP routine (Algorithm 7.5) can enable a user to distinguish in polynomial time between key pairs generated by PAP and those generated by honest procedures.

7.9

Show that for any the multiplicative order ord_n(a^t) divides 2^s. [H]
Let be such that a^t has different orders modulo p and modulo q. Show that gcd(a^{2^σt} – 1, n) is a non-trivial divisor of n for some .
Let g be a generator of . Take a := g^k (mod p) for some and let ord_p(a^t) = 2^σ. Show that σ = v₂(p – 1) if k is odd, and σ < v₂(p – 1) if k is even. [H] An analogous result holds for the other prime q.
Demonstrate that there are at least φ(n)/2 elements a in with the property that a^t has different orders modulo p and q. [H]
Suggest a randomized poly-time algorithm for factoring n from the knowledge of n, e and d.

8.2. Quantum Computation

For defining a quantum mechanical system, we need to enrich our mathematical vocabulary. Let V be a vector space over (or ). Using Dirac’s ket notation we denote a vector ψ in V as |ψ〉.

Definition 8.1.

An inner product (also called a dot product or a scalar product) on V is a function satisfying the following properties:

Positivity For any , the inner product 〈ψ|ψ〉 is real and non-negative. Moreover, 〈ψ|ψ〉 = 0 if and only if |ψ〉 = 0.
Linearity For a₁, and |ψ〉, , we have .
Skew symmetry For any |ψ〉, , we have , where the bar denotes complex conjugate.

A vector space V with an inner product is called an inner product space.

Example 8.1.

For , the space is an inner product space with the inner product of |ψ〉 = (ψ₁, . . . , ψ_n) and defined as

Definition 8.2.

The inner product on a vector space V induces a norm (Definition 2.115) on V:

Definition 8.3.

Definition 8.4.

An orthonormal basis of a Hilbert space is a subset B of with the following properties:

B is a -basis of .
〈ψ|ψ〉 = 1 for every .
for every pair of distinct vectors ψ, .

It is customary to denote the n vectors in an orthonormal basis of by the symbols |0〉, |1〉, . . . , |n – 1〉.

Example 8.2.

|0〉 := (1, 0, 0 . . . , 0), |1〉 := (0, 1, 0, . . . , 0), . . . , |n – 1〉 := (0, 0, . . . , 0, 1) form an orthonormal basis of under the inner product of Example (8.1).

8.2.1. System

The following axiom describes the model of a quantum mechanical system.

Axiom 8.1. First axiom of quantum mechanics

A system is a ray in a (finite-dimensional) Hilbert space (over ).

Definition 8.5.

In order distinguish a qubit from a classical bit, we call the latter a cbit.

has an orthonormal basis {|0〉, |1〉}. In the classical interpretation, a cbit can assume only the two values |0〉 and |1〉, whereas a qubit can assume any value of the form

a|0〉 + b|1〉

with

, |a|² + |b|² = 1.

Such a state of the qubit is called a superposition of the classical states.

Though we don’t care much, at least for the moment, here are two promising candidates for realizing a qubit:

Spin of an electron: The spin of a particle (like electron) in a given direction, say, along the Z-axis, is modelled as a quantum mechanical system with an orthonormal basis consisting of spin up and spin down.
Polarization of a photon: Photons constitute another class of quantum systems, where the two independent states are provided by the polarization of a photon.

8.2.2. Entanglement

Axiom 8.2. Second axiom of quantum mechanics

where is an mn-dimensional Hilbert space with an orthonormal basis

{|i〉_A ⊗ |j〉_B | i = 0, . . . ,m – 1 and j = 0, . . . , n – 1}.

|j₁〉₁ ⊗ |j₂〉₂ ⊗ · · · ⊗ |j_k〉_k = |j₁〉₁|j₂〉₂ · · · |j_k〉_k = |j₁j₂ . . . j_k〉

with 0 ≤ j_i < n_i for all i = 1, . . . , k.

Definition 8.6.

An n-bit quantum register is a system having exactly n qubits.

|j₁〉 ⊗ |j₂〉 ⊗ · · · ⊗ |j_n〉 = |j₁〉|j₂〉 · · · |j_n〉 = |j₁j₂ · · · j_n〉

|ψ〉_C = c₀|0〉_C + c₁|1〉_C + c₂|2〉_C + c₃|3〉_C

of C equals a tensor product

\|ψ₁〉_A ⊗ \|ψ₂〉_B	=	(a₀\|0〉_A + a₁\|1〉_A) ⊗ (b₀\|0〉_B + b₁\|1〉_B)
	=	a₀b₀\|0〉_C + a₀b₁\|1〉_C + a₁b₀\|2〉_C + a₁b₁\|3〉_C,

if and only if c₀c₃ = c₁c₂.

Definition 8.7.

8.2.3. Evolution

Back to the business—the third axiom of quantum mechanics.

Definition 8.8.

Axiom 8.3. Third axiom of quantum mechanics

A quantum system evolves unitarily, that is, any operation on a quantum mechanical system is a unitary transformation.

Example 8.3.

The Hadamard transform H on one qubit is defined as:

(Recall that a linear transformation is completely specified by its images of the elements of a basis.) If one takes and , the Hadamard transform corresponds to the unitary matrix

By linearity, H transforms a general state |ψ〉 = a|0〉 + b|1〉 to the state

Some other unitary operators are described in Exercises 8.5 and 8.6.

Theorem 8.1. No-cloning theorem

For two n-bit registers A and B, there do not exist a unitary transform U of the composite system AB and a state of B, such that for every state of |ψ〉 of A.

Proof

8.2.4. Measurement

Axiom 8.4. Fourth axiom of quantum mechanics—the Born rule

8.2.5. The Deutsch Algorithm

Linearity shows that on this input, D_f ends its execution leaving A in the state

Here, . We won’t measure A right now, but apply the Hadamard transform on the left bit. This transforms A to the state

Now, if we measure the input bit, we deterministically get the integer 1 or 0 according as whether f is constant or not respectively. That’s it!

Exercise Set 8.2

8.1

Let S be a finite set and let l²(S) denote the set of all functions

Show that l²(S) is a Hilbert space under the inner product
Let , where δ_x(y) is 1 if y = x, and is 0 otherwise. Show that B is an orthonormal basis of l²(S).

8.2

Show that the vectors

and

form an orthonormal

-basis of

8.3

Show that

is an entangled state of a 2-bit quantum register.

8.4

Prove the following assertions.

The matrix is unitary.
A unitary matrix preserves inner product, that is, if U is an m × m unitary matrix and |ψ〉, , then .
The determinant of a unitary matrix has absolute value 1.
Every eigen value of a unitary matrix has absolute value 1.
An m × m matrix A is unitary if and only if the columns of A constitute an orthonormal basis of (over ).

8.5

Show that the following operators are unitary on a qubit. Also construct the corresponding transformation matrices.
Identity operator I|0〉 = |0〉, I|1〉 = |1〉.
Exchange operator X|0〉 = |1〉, X|1〉 = |0〉.
Z operator Z|0〉 = |0〉, Z|1〉 = –|1〉.
Hadamard operator .
Deduce the following identities:
Let . Show that defines a unitary operator on a qubit and that , where the last X is the matrix of the exchange operator.

8.6

Let S_ij be the operator that swaps bit i with bit j. Show that
Let C be the reversible XOR operation (also called the controlled-NOT operation) on a two-bit register A = (A₁A₂), that is, C|xy〉 = |x〉|x ⊕ y〉. Show that C can be realized as

8.7

|1〉
–|1〉

8.8

Let A be an n-bit quantum register at the state |0|〉_n. Show that the application of the Hadamard transform individually to each bit of A transforms A to the state

8.9

Show that T is a unitary transformation on a 3-bit quantum register. What is the inverse of T?
Use T to realize the Boolean AND and NOT operations.

8.4. Quantum Cryptanalysis

8.4.1. Shor’s Algorithm for Computing Period

Supplying this state as the input to the oracle U_f yields the state

Equation 8.1

for some M determined by the relations:

x₀ + (M – 1)r < N ≤ x₀ +Mr.

Equation 8.2

By Exercise 8.13, F is a unitary transform. F is known as the Fourier transform. Applying F to State (8.1) transforms the input register to the state

A measurement of this state gives an integer with the probability

The last summation is that of a geometric series and we have

Now, we use the inequalities for 0 ≤ x ≤ π/2 and the facts that rM ≈ N and that to get

^[2] Consult Zuckerman et al. [316] to learn about continued fractions and their applications in approximating real numbers.

8.4.2. Breaking RSA

The private key d is the inverse of e modulo φ(m). It is not necessary to compute d for decrypting b. The inverse d′ of e modulo r = ord_m(a) = ord_m(b) suffices.

8.4.3. Factoring Integers

As in the case of breaking RSA, choose with N := 2ⁿ ≥ m² > r². The function , x ↦ a^x (mod m), is periodic of least period r. Shor’s algorithm computes r. If r is even, we can write:

(a^r/2 – 1)(a^r/2 + 1) ≡ 0 (mod m).

8.4.4. Computing Discrete Logarithms

(see Exercise 8.15). Then, we use an oracle

U_f : |x〉_n|y〉_n|z〉_n ↦ |x〉_n|y〉_n|f(x, y) ⊕ z〉_n

to compute the function f(x, y) := g^xa^–y (mod p) in the output register. Applying U_f transforms A to the state

Measurement of the output register now gives a value z ≡ g^k (mod p) for some and causes the input register to jump to the state

to obtain the state

A measurement of the input register at this state yields with probability:

Equation 8.3

Equation 8.4

where

Equation 8.5

Since is an integer, substituting Equation (8.4) in Equation (8.3) gives

Writing S = lN + σ with –N/2 < σ ≤ N/2 then gives

We now impose the usefulness conditions on u, v:

Equation 8.6

Equation 8.7

We finally explain the extraction of r from a useful observation (u, v). Condition (8.6) and Equation (8.5) give . Dividing throughout by N and using the fact that u(p – 1) = jN + ∊, we get

Exercise Set 8.4

8.13	Let F be the Fourier Transform (8.2). For basis vectors \|x〉 and \|x′〉, show that Conclude that F is a unitary transform.
8.14	Let N = 2ⁿ. Let x, have binary expansions (x_n–1 · · · x₁x₀)₂ and (y_n–1 · · · y₁y₀)₂ respectively. Show that xy/N equals an integer plus the quantity y_n–1 (.x₀) + y_n–2(.x₁x₀) + y_n–3(.x₂x₁x₀) + · · · + y₀(.x_n–1 x_n–2 . . . x₀), where . Deduce that the quantum Fourier Transform (8.2) can be written as where the i-th expression in parentheses applies to the i-th bit from the left.
8.15	Let , N := 2ⁿ and . Consider an (n + 1)-bit quantum register with input consisting of the left n bits and the output the rightmost bit. Suppose there is an oracle U_f that takes an n-bit input x and outputs the bit: First prepare the register in the state . Then, apply U_f on this register and finally measure the output bit. Describe the state of the input register after this measurement depending on the outcome of the measurement.
8.16	Recall that the Fourier Transform (8.2) is defined for N equal to a power of 2. It turns out that for such values of N the quantum Fourier transform is easy to implement. For this exercise, assume hypothetically that one can efficiently implement F for other values of N too. In particular, take N = p – 1 in Shor’s quantum discrete-log algorithm. Show that in this case, the probability p_u,v of Equation (8.3) becomes: [View full size image] Conclude that an outcome (u, v) of measuring the input register yields r ≡ –u^–1v (mod p – 1), provided gcd(u, p – 1) = 1.

A.2. Block Ciphers

Definition A.1.

A block cipher f of block-size n and of key-size r is a map

that encrypts a plaintext block m of bit length n to a ciphertext block c of bit length n under a key K, a bit string of length r. To ensure unique decryption, the map

for a fixed key K has to be a permutation of (that is, a bijective function on) . In that case, the decryption of c to get back m is carried out as .

A good block cipher has the following desirable properties:

The sizes n and r should be big enough, so that an adversary cannot exhaustively check all possibilities of m or K in feasible time.
For most, if not all, keys K, the permutations f_K should be sufficiently random. In other words, if the key K is not known, it should be computationally infeasible to guess the functions f_K and . That is, it should be difficult to guess c from m or m from c, unless the key K is provided. The identity map on , though a permutation of , is a bad candidate for an encryption function f_K. It is also desirable that the functions f_K for different values of K are unpredictably selected from the set of all permutations of . Thus, for example, taking f_K to be a fixed permutation for all choices of K leads to a poor design of a block cipher f.
For most, if not all, pairs of distinct keys K₁ and K₂, the functions g_K₁ ο g_K₂ should not equal g_K for some key K, where g stands for f or f^–1 with independent choices in the three uses. A more stringent demand is that the subgroup generated by the permutations f_K for all possible keys K should be a very big subset of the group of all permutations of . If g_K = g_K₁ ο g_K₂ ο · · · ο g_{K_t} for some t ≥ 2, multiple encryption (see Section A.3) forfeits its expected benefits.

Table A.1. Some popular block ciphers
Name	n	r
DES (Data Encryption Standard)	64	56
FEAL (Fast Data Encipherment Algorithm)	64	64
SAFER (Secure And Fast Encryption Routine)	64	64
IDEA (International Data Encryption Algorithm)	64	128
Blowfish	64	≤ 448
Rijndael, accepted as AES (Advanced Encryption Standard) by NIST (National Institute of Standards and Technology, a US government organization)	128/192/256	128/192/256

A.2.1. A Case Study: DES

^[1] A DES key K = k₁k₂ . . . k₆₄ is actually a 64-bit string. Only 56 bits of K are used for encryption. The remaining 8 bits are used as parity-check bits. Specifically, for each i = 1, . . . , 8 the bit k_8i is adjusted so that the i-th byte (k_{8i – 7}k_{8i – 6} . . . k_8i) has an odd number of one-bits.

DES key schedule

Algorithm A.1. The DES key schedule

Input: A DES key K = k₁k₂ . . . k₆₄ (containing the parity-check bits).

Output: Sixteen 48-bit round keys K₁, K₂, . . . , K₁₆.

Steps:

PC1
57	49	41	33	25	17	9
1	58	50	42	34	26	18
10	2	59	51	43	35	27
19	11	3	60	52	44	36
63	55	47	39	31	23	15
7	62	54	46	38	30	22
14	6	61	53	45	37	29
21	13	5	28	20	12	4

PC2
14	17	11	24	1	5
3	28	15	6	21	10
23	19	12	4	26	8
16	7	27	20	13	2
41	52	31	37	47	55
30	40	51	45	33	48
44	49	39	56	34	53
46	42	50	36	29	32

DES encryption

^[2] A block cipher that executes several encryption rounds with the i-th round computing the two halves as L_i := R_{i – 1} and R_i := L_{i – 1} ⊕ e(R_{i – 1}, K_i) for some round key K_i and for some encryption primitive e, is called a Feistel cipher. Most popular block ciphers mentioned earlier are of this type. Rijndael is an exception, and its acceptance as the new standard has been interpreted as an end of the Feistel dynasty.

It requires a specification of the round encryption function e to complete the description of DES encryption. The function e can be compactly depicted as:

e(X, J) := P(S(E(X) ⊕ J)),

Algorithm A.2. DES encryption

Input: Plaintext block m = m₁m₂ . . . m₆₄ and the round keys K₁, . . . , K₁₆.

Output: The ciphertext block .

Steps:

IP
58	50	42	34	26	18	10	2
60	52	44	36	28	20	12	4
62	54	46	38	30	22	14	6
64	56	48	40	32	24	16	8
57	49	41	33	25	17	9	1
59	51	43	35	27	19	11	3
61	53	45	37	29	21	13	5
63	55	47	39	31	23	15	7

IP^–1
40	8	48	16	56	24	64	32
39	7	47	15	55	23	63	31
38	6	46	14	54	22	62	30
37	5	45	13	53	21	61	29
36	4	44	12	52	20	60	28
35	3	43	11	51	19	59	27
34	2	42	10	50	18	58	26
33	1	41	9	49	17	57	25

Algorithm A.3. The DES round encryption primitive e

Input: and .

Output: e(X, J).

Steps:

The tables for E and P are as follows.

E
32	1	2	3	4	5
4	5	6	7	8	9
8	9	10	11	12	13
12	13	14	15	16	17
16	17	18	19	20	21
20	21	22	23	24	25
24	25	26	27	28	29
28	29	30	31	32	1

P
16	7	20	21
29	12	28	17
1	15	23	26
5	18	31	10
2	8	24	14
32	27	3	9
19	13	30	6
22	11	4	25

Finally, the eight S-boxes are presented:

S₁
14	4	13	1	2	15	11	8	3	10	6	12	5	9	0	7
0	15	7	4	14	2	13	1	10	6	12	11	9	5	3	8
4	1	14	8	13	6	2	11	15	12	9	7	3	10	5	0
15	12	8	2	4	9	1	7	5	11	3	14	10	0	6	13

S₂
15	1	8	14	6	11	3	4	9	7	2	13	12	0	5	10
3	13	4	7	15	2	8	14	12	0	1	10	6	9	11	5
0	14	7	11	10	4	13	1	5	8	12	6	9	3	2	15
13	8	10	1	3	15	4	2	11	6	7	12	0	5	14	9

S₃
10	0	9	14	6	3	15	5	1	13	12	7	11	4	2	8
13	7	0	9	3	4	6	10	2	8	5	14	12	11	15	1
13	6	4	9	8	15	3	0	11	1	2	12	5	10	14	7
1	10	13	0	6	9	8	7	4	15	14	3	11	5	2	12

S₄
7	13	14	3	0	6	9	10	1	2	8	5	11	12	4	15
13	8	11	5	6	15	0	3	4	7	2	12	1	10	14	9
10	6	9	0	12	11	7	13	15	1	3	14	5	2	8	4
3	15	0	6	10	1	13	8	9	4	5	11	12	7	2	14

S₅
2	12	4	1	7	10	11	6	8	5	3	15	13	0	14	9
14	11	2	12	4	7	13	1	5	0	15	10	3	9	8	6
4	2	1	11	10	13	7	8	15	9	12	5	6	3	0	14
11	8	12	7	1	14	2	13	6	15	0	9	10	4	5	3

S₆
12	1	10	15	9	2	6	8	0	13	3	4	14	7	5	11
10	15	4	2	7	12	9	5	6	1	13	14	0	11	3	8
9	14	15	5	2	8	12	3	7	0	4	10	1	13	11	6
4	3	2	12	9	5	15	10	11	14	1	7	6	0	8	13

S₇
4	11	2	14	15	0	8	13	3	12	9	7	5	10	6	1
13	0	11	7	4	9	1	10	14	3	5	12	2	15	8	6
1	4	11	13	12	3	7	14	10	15	6	8	0	5	9	2
6	11	13	8	1	4	10	7	9	5	0	15	14	2	3	12

S₈
13	2	8	4	6	15	11	1	10	9	3	14	5	0	12	7
1	15	13	8	10	3	7	4	12	5	6	11	0	14	9	2
7	11	4	1	9	12	14	2	0	6	10	13	15	3	5	8
2	1	14	7	4	10	8	13	15	12	9	0	3	5	6	11

DES decryption

DES test vectors

Some test vectors for DES are given in Table A.2.

Table A.2. DES test vectors
Key	Plaintext block	Ciphertext block
`0101010101010101`	`0000000000000000`	`8ca64de9c1b123a7`
`fefefefefefefefe`	`ffffffffffffffff`	`7359b2163e4edc58`
`3101010101010101`	`1000000000000001`	`958e6e627a05557B`
`1010101010101010`	`1111111111111111`	`f40379ab9e0ec533`
`0123456789abcdef`	`1111111111111111`	`17668dfc7292532d`
`1010101010101010`	`0123456789abcdef`	`8a5ae1f81ab8f2dd`
`fedcba9876543210`	`0123456789abcdef`	`ed39d950fa74bcc4`

Cryptanalysis of DES

A.2.2. The Advanced Standard: AES

Data representation

Equation A.1

Thus, each word in the block is relocated in a column of the state. At the end of the encryption procedure, AES makes the reverse translation of a state to a block:

Equation A.2

AES key schedule

j	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
x^{j – 1}	01	02	04	08	10	20	40	80	1b	36	6c	d8	ab	4d	9a

The transformation SubWord on a word T = τ₀τ₁τ₂τ₃ is the octet-wise application of AES S-box substitution SubOctet, that is,

SubWord(T) = SubOctet(τ₀) ‖ SubOctet(τ₁) ‖ SubOctet(τ₂) ‖ SubOctet(τ₃).

Algorithm A.4. AES key schedule

Input: (N_k and) the secret key K = κ₀κ₁ ... κ_{4N_k – 1}, where each κ_i is an octet.

Output: The expanded keys K₀, K₁, . . . , K_{4N_r+3}.

Steps:

Equation A.3

Table A.3. AES S-box
	0	1	2	3	4	5	6	7	8	9	a	b	c	d	e	f
0	63	7c	77	7b	f2	6b	6f	c5	30	01	67	2b	fe	d7	ab	76
1	ca	82	c9	7d	fa	59	47	f0	ad	d4	a2	af	9c	a4	72	c0
2	b7	fd	93	26	36	3f	f7	cc	34	a5	e5	f1	71	d8	31	15
3	04	c7	23	c3	18	96	05	9a	07	12	80	e2	eb	27	b2	75
4	09	83	2c	1a	1b	6e	5a	a0	52	3b	d6	b3	29	e3	2f	84
5	53	d1	00	ed	20	fc	b1	5b	6a	cb	be	39	4a	4c	58	cf
6	d0	ef	aa	fb	43	4d	33	85	45	f9	02	7f	50	3c	9f	a8
7	51	a3	40	8f	92	9d	38	f5	bc	b6	da	21	10	ff	f3	d2
8	cd	0c	13	ec	5f	97	44	17	c4	a7	7e	3d	64	5d	19	73
9	60	81	4f	dc	22	2a	90	88	46	ee	b8	14	de	5e	0b	db
a	e0	32	3a	0a	49	06	24	5c	c2	d3	ac	62	91	95	e4	79
b	e7	c8	37	6d	8d	d5	4e	a9	6c	56	f4	ea	65	7a	ae	08
c	ba	78	25	2e	1c	a6	b4	c6	e8	dd	74	1f	4b	bd	8b	8a
d	70	3e	b5	66	48	03	f6	0e	61	35	57	b9	86	c1	1d	9e
e	e1	f8	98	11	69	d9	8e	94	9b	1e	87	e9	ce	55	28	df
f	8c	a1	89	0d	bf	e6	42	68	41	99	2d	0f	b0	54	bb	16

AES encryption

The individual state transition transformations are now explained. The transition SubState is an octet-by-octet application of the substitution function SubOctet, that is, SubState maps

where for all r, c. The transform ShiftRows cyclically left rotates the r-th row by r byte positions, that is, maps

Algorithm A.5. AES encryption

Input: The plaintext message M = μ₀μ₁ . . . μ₁₅ and the round keys K₀, K₁, . . . , K_{4N_r+3}.

Output: Ciphertext message C = γ₀γ₁ . . . γ₁₅.

Steps:

AES decryption

Algorithm A.6. AES decryption

Input: The ciphertext message C = γ₀γ₁ . . . γ₁₅ and the round keys K₀, K₁, . . . , K_{4N_r+3}.

Output: The recovered plaintext message M = μ₀μ₁ . . . μ₁₅.

Steps:

Table A.4. Inverse of AES S-box
	0	1	2	3	4	5	6	7	8	9	a	b	c	d	e	f
0	52	09	6a	d5	30	36	a5	38	bf	40	a3	9e	81	f3	d7	fb
1	7c	e3	39	82	9b	2f	ff	87	34	8e	43	44	c4	de	e9	cb
2	54	7b	94	32	a6	c2	23	3d	ee	4c	95	0b	42	fa	c3	4e
3	08	2e	a1	66	28	d9	24	b2	76	5b	a2	49	6d	8b	d1	25
4	72	f8	f6	64	86	68	98	16	d4	a4	5c	cc	5d	65	b6	92
5	6c	70	48	50	fd	ed	b9	da	5e	15	46	57	a7	8d	9d	84
6	90	d8	ab	00	8c	bc	d3	0a	f7	e4	58	05	b8	b3	45	06
7	d0	2c	1e	8f	ca	3f	0f	02	c1	af	bd	03	01	13	8a	6b
8	3a	91	11	41	4f	67	dc	ea	97	f2	cf	ce	f0	b4	e6	73
9	96	ac	74	22	e7	ad	35	85	e2	f9	37	e8	1c	75	df	6e
a	47	f1	1a	71	1d	29	c5	89	6f	b7	62	0e	aa	18	be	1b
b	fc	56	3e	4b	c6	d2	79	20	9a	db	c0	fe	78	cd	5a	f4
c	1f	dd	a8	33	88	07	c7	31	b1	12	10	59	27	80	ec	5f
d	60	51	7f	a9	19	b5	4a	0d	2d	e5	7a	9f	93	c9	9c	ef
e	a0	e0	3b	4d	ae	2a	f5	b0	c8	eb	bb	3c	83	53	99	61
f	17	2b	04	7e	ba	77	d6	26	e1	69	14	63	55	21	0c	7d

AES test vectors

Table A.5 provides the ciphertexts for the plaintext block

M = 00112233445566778899aabbccddeeff

under different keys.

Table A.5. AES test vectors
Cipher	Key	Ciphertext block
AES-128	`0001020304050607 \ 08090a0b0c0d0e0f`	`69c4e0d86a7b0430 \ d8cdb78070b4c55a`
AES-192	`0001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617`	`dda97ca4864cdfe0 \ 6eaf70a0ec0d7191`
AES-256	`0001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617 \ 18191a1b1c1d1e1f`	`8ea2b7ca516745bf \ eafc49904b496089`

Cryptanalysis of AES

For more information on AES, read the book [71] from the designers of the cipher. Also visit the following Internet sites:

http://www.esat.kuleuven.ac.be/~rijmen/rijndael/	Rijndael home
http://csrc.nist.gov/CryptoToolkit/aes/index1.html	NIST site for AES
http://www.cryptosystem.net/aes/	Algebraic attacks

A.2.3. Multiple Encryption

Figure A.1. Multiple encryption

A.2.4. Modes of Operation

The ECB mode

Algorithm A.7. ECB encryption

Input: The plaintext blocks m₁, . . . , m_l and the key K.

Output: The ciphertext c = c₁ . . . c_l.

Steps:

for i = 1, . . . , l { c_i := f_K(m_i) }

Algorithm A.8. ECB decryption

Input: The ciphertext blocks c₁, . . . , c_l and the key K.

Output: The plaintext m = m₁ . . . m_l.

Steps:

for

The CBC mode

Algorithm A.9. CBC encryption

Input: The plaintext blocks m₁, . . . , m_l, the key K and the IV.

Output: The ciphertext c = c₁ . . . c_l.

Steps:

c₀ := IV.

for i = 1, . . . , l { c_i := f_K(m_i ⊕ c_{i – 1}). }

Algorithm A.10. CBC decryption

Input: The ciphertext blocks c₁, . . . , c_l, the key K and the IV.

Output: The plaintext m = m₁ . . . m_l.

Steps:

c₀ := IV.

for

The CFB mode

Algorithm A.11. CFB encryption

Input: The plaintext blocks m₁, . . . , m_l, the key K and the IV.

Output: The ciphertext c = c₁ . . . c_l.

Steps:

Algorithm A.12. CFB decryption

Input: The ciphertext blocks c₁, . . . , c_l, the key K and the IV.

Output: The plaintext m = m₁ . . . m_l.

Steps:

k₀ := IV.
for i = 1, . . . , l {
m_i := c_i ⊕ msb_n′ (f_K(k_{i – 1})).
k_i := lsb_{n – n′} (k_{i – 1}) ‖ c_i.
}

The OFB mode

Algorithm A.13. OFB encryption

Input: The plaintext blocks m₁, . . . , m_l, the key K and the IV.

Output: The ciphertext c = c₁ . . . c_l.

Steps:

Algorithm A.14. OFB decryption

Input: The ciphertext blocks c₁, . . . , c_l, the key K and the IV.

Output: The plaintext m = m₁ . . . m_l.

Steps:

Exercise Set A.2

A.1

Let us use the notations of Algorithm A.2. For a message m and round keys K_i, we have the values V, L_i, R_i, W, c. For another message m′ and another set of round keys

, let us denote these values by V′,

, W′, c′. Show that if m′ = c and if

for i = 1, . . . , 16, then

and

for all i = 0, 1, . . . , 16. Deduce that in this case we have c′ = m. (This shows that DES decryption is the same as DES encryption with the key schedule reversed.)

A.2

For a bit string z, let

denote the bit-wise complement of z. Deduce that

, that is, complementing both the plaintext message and the key complements the ciphertext message. [H]

A.3

A DES key K is said to be weak, if the DES key schedule on K gives K₁ = K₂ = · · · = K₁₆. Show that there are exactly four weak DES keys which in hexadecimal notation are:

0101 0101 0101 0101
FEFE FEFE FEFE FEFE
1F1F 1F1F 0E0E 0E0E
E0E0 E0E0 F1F1 F1F1

A.4

A DES key K is said to be anti-palindromic, if the DES key schedule on K gives

for all i = 1, . . . , 16. Show that the following four DES keys (in hexadecimal notation) are anti-palindromic:

01FE 01FE 01FE 01FE
FE01 FE01 FE01 FE01
1FE0 1FE0 0EF1 0EF1
E01F E01F F10E F10E

A.5

Represent

, where f(X) = X⁸ + X⁴ + X³ + X + 1 (Section A.2.2).

Show that multiplication by x (the octet 02) in can be computed by a left shift followed conditionally (derive the condition) by XORing with the octet 1b.
Design an algorithm for multiplying two elements of using bit manipulations on octets only.

A.6

The multiplication of

Show that the multiplicative order of x (in ) is 51.
Show that x + 1 is a generator of .

Write a computer program to generate the table of discrete logarithms of elements of to the base x + 1 (Table A.6).

Table A.6. Discrete-log table for AES
	0	1	2	3	4	5	6	7	8	9	a	b	c	d	e	f
0	–	00	19	01	32	02	1a	c6	4b	c7	1b	68	33	ee	df	03
1	64	04	e0	0e	34	8d	81	ef	4c	71	08	c8	f8	69	1c	c1
2	7d	c2	1d	b5	f9	b9	27	6a	4d	e4	a6	72	9a	c9	09	78
3	65	2f	8a	05	21	0f	e1	24	12	f0	82	45	35	93	da	8e
4	96	8f	db	bd	36	d0	ce	94	13	5c	d2	f1	40	46	83	38
5	66	dd	fd	30	bf	06	8b	62	b3	25	e2	98	22	88	91	10
6	7e	6e	48	c3	a3	b6	1e	42	3a	6b	28	54	fa	85	3d	ba
7	2b	79	0a	15	9b	9f	5e	ca	4e	d4	ac	e5	f3	73	a7	57
8	af	58	a8	50	f4	ea	d6	74	4f	ae	e9	d5	e7	e6	ad	e8
9	2c	d7	75	7a	eb	16	0b	f5	59	cb	5f	b0	9c	a9	51	a0
a	7f	0c	f6	6f	17	c4	49	ec	d8	43	1f	2d	a4	76	7b	b7
b	cc	bb	3e	5a	fb	60	b1	86	3b	52	a1	6c	aa	55	29	9d
c	97	b2	87	90	61	be	dc	fc	bc	95	cf	cd	37	3f	5b	d1
d	53	39	84	3c	41	a2	6d	47	14	2a	9e	5d	56	f2	d3	ab
e	44	11	92	d9	23	20	2e	89	b4	7c	b8	26	77	99	e3	a5
f	67	4a	ed	de	c5	31	fe	18	0d	63	8c	80	c0	f7	70	07

Write a computer program to generate the table of powers of x + 1 (Table A.7).

Table A.7. Power table for AES
	0	1	2	3	4	5	6	7	8	9	a	b	c	d	e	f
0	01	03	05	0f	11	33	55	ff	1a	2e	72	96	a1	f8	13	35
1	5f	e1	38	48	d8	73	95	a4	f7	02	06	0a	1e	22	66	aa
2	e5	34	5c	e4	37	59	eb	26	6a	be	d9	70	90	ab	e6	31
3	53	f5	04	0c	14	3c	44	cc	4f	d1	68	b8	d3	6e	b2	cd
4	4c	d4	67	a9	e0	3b	4d	d7	62	a6	f1	08	18	28	78	88
5	83	9e	b9	d0	6b	bd	dc	7f	81	98	b3	ce	49	db	76	9a
6	b5	c4	57	f9	10	30	50	f0	0b	1d	27	69	bb	d6	61	a3
7	fe	19	2b	7d	87	92	ad	ec	2f	71	93	ae	e9	20	60	a0
8	fb	16	3a	4e	d2	6d	b7	c2	5d	e7	32	56	fa	15	3f	41
9	c3	5e	e2	3d	47	c9	40	c0	5b	ed	2c	74	9c	bf	da	75
a	9f	ba	d5	64	ac	ef	2a	7e	82	9d	bc	df	7a	8e	89	80
b	9b	b6	c1	58	e8	23	65	af	ea	25	6f	b1	c8	43	c5	54
c	fc	1f	21	63	a5	f4	07	09	1b	2d	77	99	b0	cb	46	ca
d	45	cf	4a	de	79	8b	86	91	a8	e3	3e	42	c6	51	f3	0e
e	12	36	5a	ee	29	7b	8d	8c	8f	8a	85	94	a7	f2	0d	17
f	39	4b	dd	7c	84	97	a2	fd	1c	24	6c	b4	c7	52	f6	01

Design an algorithm for multiplying two elements of using table lookup.

A.7

Denote the multiplication of

by ⊗ (Section A.2.2).

Let α = a₃y³ + a₂y² + a₁y + a₀ and β = b₃y³ + b₂y² + b₁y + b₀ be elements of A and γ = c₃y³ + c₂y² + c₁y + c₀ = α ⊗ β. Show that

where the matrix arithmetic on the right side follows the arithmetic of .
Verify that the inverse of the element of A represented by the word 03010102 (in hex) is 0b0d090e.

A.8

Show that Transform (A.3) can be represented as

where the matrix arithmetic on the right side is that of .
Let denote the 8 × 8 matrix of Part (a). Prove that is invertible over with
Conclude that the transformation A ↦ SubOctet(A) is invertible.

A.9

Argue that the transforms SubState and ShiftRows commute with one another.
Show that MixCols^–1(AddKey(S, L₀, L₁, L₂, L₃)) = AddKey(MixCols^–1(S), MixCols^–1(L₀, L₁, L₂, L₃)) for a suitable meaning of the application of MixCols^–1 on four 32-bit keys L₀, L₁, L₂ and L₃.
Conclude that one can obtain a decryption key schedule in such a way that Algorithm A.15 correctly performs AES decryption. [H]

Algorithm A.15. Equivalent form of AES decryption

Input: The ciphertext message C = γ₀γ₁ . . . γ₁₅ and the decryption key schedule .

Output: Plaintext message M = μ₀μ₁ . . . μ₁₅.

Steps:

A.10

Show that a multiple encryption scheme with exactly k stages provides an effective security of ⌈k/2⌉ keys against the meet-in-the-middle attack.

A.11

Consider a message m broken into blocks m₁, . . . , m_l, encrypted to c₁, . . . , c_l and sent to an entity.

Suppose that during the transmission exactly one ciphertext block gets corrupted. Show that for the different modes of encryption, the numbers ν of blocks that are incorrectly decrypted due to this transmission error are as listed in the following table.
Mode ν
ECB 1
CBC ≤ 2
CFB ≤ 1 + ⌈n/n′⌉
OFB 1
For each of the four modes, discuss the effects on decryption caused by the insertion or deletion of a ciphertext block during transmission (say, by an active adversary).

A.4. Hash Functions

Definition A.3.

A hash function H is called second pre-image resistant, if it is computationally infeasible^[6] to find, for a given bit string x₁, a second bit string x₂ with H(x₁) = H(x₂).

^[6] A problem P is said to be computationally infeasible if any known or possible algorithm (deterministic or randomized) to solve P runs in infeasible (like super-polynomial) time, except perhaps for a set of some input instances, the density of which in the input space is zero (or, more generally, negligibly small).

Definition A.4.

A hash function H is called collision resistant, if it is computationally infeasible to find any two distinct bit strings x₁ and x₂ with H(x₁) = H(x₂).

In order to prevent existential forgery (Exercise 5.15) of digital signatures, hash functions should also be difficult to invert.

Definition A.5.

Most of the properties of a cryptographic hash function are mutually independent. However, we have the following implication.

Proposition A.1.

A collision resistant hash function is second pre-image resistant.

Proof

A.4.1. Merkle’s Meta Method

Let us now describe a generic method of constructing hash functions. We start by defining the following basic building block.

Definition A.6.

Algorithm A.18. Merkle’s meta method

Input: A compression function with m = n + r and a bit string x of length < 2^r.

Output: The hash value H(x).

Steps:

We now show if F possesses the desired properties for use in cryptography, then so does H too.

Proposition A.2.

If F is first pre-image resistant, then so is H.

Proof

Proposition A.3.

If F is collision resistant, then H is collision resistant (and hence also second pre-image resistant).

Proof

Given a collision (x, x′) for H, we can find a collision for F with little additional effort. We use the notations of Algorithm A.18 with primed variables for x′.

First consider l ≠ l′. But then, in particular, the length blocks x_l+1 and are different and thus is a collision for F. So for the rest of the proof we take l = l′.

Now, suppose that for some . Choose the largest such i and note that h_i+1 and are defined and equal for this choice. This gives us the collision for F.

The only case that remains to be treated is for all . Since x ≠ x′, there is at least one with . For such an i, the equality implies that is a collision for F.

A.4.2. The Secure Hash Algorithm

Algorithm A.19. The SHA-1 algorithm

Input: A message M.

Output: The hash SHA-1(M) of M.

Steps:

A test vector for SHA-1 is the following (here 616263 is the string “abc”):

SHA-1(616263) = a9993e364706816aba3e25717850c26c9cd0d89d.

Exercise Set A.4

A.18	Let x be a bit string. Break up x into blocks x₁, . . . , x_l each of bit size n (after padding, if necessary). Define H₁(x) := x₁ ⊕ . . . ⊕ x_l. Show that H₁ possesses none of the desirable properties of a cryptographic hash function.
A.19	Let H be an n-bit cryptographic hash function and S a finite set of strings with #S ≥ 2. Define the function . Here, 0ⁿ⁺¹ refers to a bit string of length n + 1 containing zero-bits only. Show that H₂ is second pre-image resistant, but not collision resistant. [H]
A.20	Let H be an n-bit cryptographic hash function. Show that the function H₃ defined as is collision resistant (and hence second pre-image resistant), but not first pre-image resistant. [H]
A.21	Let m be a product of two (unknown) big primes and let the binary representation of m (with leading one-bit) have n bits. Assume that it is computationally infeasible to compute square roots modulo m. We can identify bit strings with integers in a natural way. For a bit string x, take y := 1 ‖ x and let H₄(x) denote the n-bit binary representation of y² (mod m). Show that H₄ is first pre-image resistant, but not second pre-image resistant (and hence not collision-resistant). [H]
A.22	Let H be an n-bit cryptographic hash function. Assume that H produces random hash values on random input strings. Prove that O(2^n/2) hash values need to be computed to detect a collision for H with high probability. [H] Deduce also that nearly 2^n–1 hash values need to be computed on an average to obtain a second pre-image x′ of H(x).
A.23	Let be a collision resistant compression function. Define a compression function as follows. Let x be a bit string of length 4n. Write x = L ‖ R, where each of L and R is of length 2n bits. Define F₂(x) := F₁(F₁(L) ‖ F₁(R)). Show that F₂ is also collision-resistant. Inductively define as F_k(x) := F₁(F_k–1(L) ‖ F_k–1(R)), where L and R are the left and right halves of x. Show that each F_k is collision resistant. Show that if F₁ is first pre-image resistant, then so is each F_k. Define an n-bit hash function H as follows. Let x be a bit string of length l. If l < n, take k := 1, else choose such that 2^k–1n ≤ l < 2^kn. Construct the string and define H(x) := F_k(y). Is H collision resistant? [H] (Appending a one-bit at the end of x delimits x and thereby prevents trivial collisions.)
A.24	Let and be cryptographic compression functions. Show that defined as F(L ‖ R) := F₁(L) ‖ F₂(R) (where and ) is again a cryptographic compression function. The hash function H derived from DES (Section A.4.1) produces 64-bit hash values. For reasonable security, we require n-bit hash values with n at least 128. Use Part (a) to propose a method to make H achieve this desired level of security.
A.25	Assume that in the SHA-1 algorithm the designers opted for Algorithm A.19 with the following minor modifications: They defined f_j as f_j(x, y, z) := x ⊕ y ⊕ z for all and they replaced all costly mod 2³² addition operations (+) by cheap bit-wise XOR operations (⊕). Do you sense anything wrong with this design? [H]

D. Hints to Selected Exercises

The greatest thing in family life is to take a hint when a hint is intended and not to take a hint when a hint isn’t intended.

—Robert Frost

Teachers open the door, but you must enter by yourself.

—Chinese Proverb

Imagination grows by exercise, and contrary to common belief, is more powerful in the mature than in the young.

—W. Somerset Maugham

2.11 (a)	Apply Theorem 2.3 to the restriction to H of the canonical homomorphism G → G/K.
2.11 (b)	Apply Theorem 2.3 to the canonical homomorphism G/H → G/K, aH ↦ aK, .
2.14 (c)	Consider the canonical surjection G → G/H.
2.17 (a)	Let i ≠ j and . Then ord g divides both and and so is equal to 1, that is, g = e. Now let h_i, and with . But then . Thus #(H_iH_j) = (#H_i)(#H_j). Generalize this argument to show that #(H₁ · · · H_r) = n.
2.18	First consider the special case #G = p^r for some and . For each , the order ord_G g is of the form p^s_g for some s_g ≤ r. Let s be the maximum of the values s_g, . Take any element with ord_G h = p^s. Then e, h, . . . , h^{p^s–1} are all the elements x that satisfy x^{p^s} = e. But by the choice of s every element satisfies x^{p^s} = e. Hence we must have s = r. This proves the assertion for the special case. For the general case, use this special case in conjunction with Exercise 2.17.
2.19 (b)	Show that , (h₁, . . . , h_r) ↦ h₁ . . . h_r, is a group isomorphism.
2.23	Use Zorn’s lemma.
2.24 (c)	Let be the intersection of all prime ideals of R. First show that . To prove the reverse inclusion take and consider the set S of all non-unit ideals of R such that for all . If f is a non-unit, the set S is non-empty and by Zorn’s lemma has a maximal element, say . Show that is a prime ideal of R.
2.25	For , the map R → R, b ↦ ab, is injective and hence surjective by Exercise 2.4.
2.30	Apply the isomorphism theorem to the canonical surjection , .
2.33	[(1)⇒(2)] Let be an ascending chain of ideals of R. Consider the ideal which is finitely generated by hypothesis. [(3)⇒(1)] Let be an ideal of R. Consider the set of all finitely generated ideals of R contained in .
2.36	Use the pigeon-hole principle: If there are n + 1 pigeons in n holes, then there exists at least one hole containing more than one pigeons.
2.37	Consider the integer satisfying 2^t ≤ n < 2^t+1.
2.39 (e)	1² ≡ (n – 1)² (mod n).
2.39 (f)	Apply Wilson’s theorem.
2.40	Use Fermat’s little theorem.
2.41	Use Wilson’s theorem or Euler’s criterion.
2.45	Reduce to the case y² ≡ α (mod p).
2.49 (a)	Consider the canonical group homomorphism and the fact that a surjective group homomorphism from a cyclic group G onto G′ implies that G′ is cyclic.
2.49 (b)	Let be a primitive element modulo p. The residue class of a in has order k(p – 1) for some . Show that the order of b := p + 1 modulo p^e is p^e–1. So the order of a^kb modulo p^e is p^e–1(p – 1) = φ(p^e).
2.50	Use the Chinese remainder theorem in conjunction with Exercises 2.20 and 2.49.
2.53	Take . The interpolating polynomial is . Use Exercise 2.52 to establish the uniqueness.
2.56 (b)	is irreducible in if and only if f(X + 1) is irreducible in .
2.58	Use the fundamental theorem of algebra.
2.63	Consider the set of all linearly independent subsets of V that contain T. Show that every chain in has an upper bound in . By Zorn’s Lemma, there exists a maximal element . Show that S generates V.
2.64 (b)	Use Exercise 2.63.
2.68	Let p₁, . . . , p_n be n distinct primes. Take and a_i := a/p_i for i = 1, . . . , n.
2.72 (a)	If N is the -submodule of generated by a_i/b_i, i = 1, . . . , n, with gcd(a_i, b_i) = 1, then for any prime p that does not divide b₁ · · · b_n we have 1/p ∉ N.
2.72 (b)	Any two distinct elements of are linearly dependent over . Now use Exercise 2.69.
2.74 (b)	Let the conjugates of over F be α₁ = α, α₂, . . . , α_n. Since is injective, it follows from (a) that makes a permutation of α₁, . . . , α_n. So is surjective.
2.75 (a)	Use Exercise 2.61.
2.76 (b)	The if part follows from Exercise 2.61. For proving the only if part, take . If the polynomial f(X) := X^p – a splits over F, we are done. So suppose that there exists an irreducible divisor of f(X) of degree ≥ 2. By the separability of F, there exist two distinct roots α, β of g(X). Let K := F (α, β). Show that the Frobenius map , , is an endomorphism of K. Also there exists a field isomorphism τ : F (α) → F (β) which fixes F element-wise and takes α ↦ β. But then . Since any field homomorphism is injective, α equals β, a contradiction. Thus no g(X) chosen as above can exist.
2.77 (a)	Let be an irreducible polynomial with g(α) = 0 for some . Let β be another root of g. We show that . By Lemma 2.5, there is an isomorphism μ : F(α) → F(β). Clearly, K is the splitting field of f over F(α). Let K′ be the splitting field of μ^*(f) over F (β). By Proposition 2.33, K ≅ K′. If are the roots of f, then K′ ≅ F (β, γ₁, . . . , γ_d) = K(β). But then K ≅ K(β).
2.78 (a)	Consider transcendental numbers.
2.78 (b)	Let . For , we have , implying that for a, with a ≤ b. Now assume for some . Choose a rational number b with . Then , a contradiction. Thus . Similarly .
2.80	Use the binomial theorem and induction on n.
2.82	Follow the proof of Theorem 2.37.
2.90	Example 2.18.
2.91 (b)	By the fundamental theorem of Galois theory, # . Now show that are distinct -automorphisms of .
2.92 (a)	Assume r > 1. We have the extensions , where is the splitting field of f over and hence over . Consider the minimal polynomial of a root of f over . Conversely, let f be reducible over . Choose an irreducible factor of f with deg h = s < d. Now h has one (and hence all) roots in and, therefore, d\|sm.
2.93	Use Corollary 2.18.
2.98	In each case, the defining polynomial is quadratic in Y (and with coefficients in K[X]). If this polynomial admits a non-trivial factorization, one can reach a contradiction by considering the degrees of X in the coefficients of Y¹ and Y⁰.
2.103	For simplicity, consider the case char K ≠ 2, 3. Show that the curves Y² + Y = X³ and Y² = X³ + X have j-invariants 0 and 1728 respectively. Finally, if , 1728, then the curve has j-invariant . One must also argue that these are actually elliptic curves, that is, have non-zero discriminants.
2.111	Use Theorem 2.51.
2.112 (a)	Pair a point with its opposite. This pairing fails for points of orders 1 and 2.
2.112 (c)	Consider the elliptic curve E : Y² = X³ + 3 over . We have , whereas X³ + 3 is irreducible modulo 13.
2.113 (a)	Every element of has a unique square root.
2.115 (a)	Use Theorem 2.49 or Exercise 2.17.
2.115 (b)	Use Theorem 2.50.
2.115 (c)	The trace of Frobenius at q is 0 in this case. Now, use Theorem 2.50.
2.123	Factor N(G) in .
2.127	Let . For each i, write , . But then det , where , δ_ij being the Kronecker delta.
2.128 (b)	Use Part (a) and Exercise 2.126(c).
2.128 (c)	Let . By Exercise 2.130, is integral over . Let be the ideal generated by in and let and be the ideals of generated respectively by and . Now, use Part (b).
2.133 (b)	In a PID, non-zero prime ideals are maximal.
2.137 (a)	Since and are maximal, we have , that is, a₁ + a₂ = 1 for some and . Now use the fact that (a₁ + a₂)^{e₁ + e₂} = 1.
2.137 (b)	Use CRT.
2.138 (a)	Since is invertible, for some fractional ideal .
2.140 (a)	For , let constitute a complete residue system of modulo . Then also form a complete residue system of modulo .
2.142 (d)	Take in Part (b).
2.143 (a)	Reduce modulo 4.
2.143 (c)	Let divide this gcd. Then divides 2y and . Take norms.
2.144 (b)	Look at the expansion of a – 1 in base p. More precisely, let a < p^N for some . Then –a = (p^N – a) – p^N = [(p^N – 1) – (a – 1)] – p^N.
2.152 (c)	First show that .
2.153	Use unique factorization of rationals.
2.154	Show by induction on n that pⁿ⁺¹ divides a^pⁿ⁺¹ – a^pⁿ in for all .
2.161	There exists an irreducible polynomial in of every degree .
3.7	The implication is obvious. For the reverse implication, use Proposition 2.5.
3.18 (b)	Consider the binary expansion of m.
3.19	if n is a pseudoprime to base a and not a pseudoprime to base b, then n is not a pseudoprime to base ab.
3.20 (a)	If p²\|n for some , take with ord_n(a) = p. If n is square-free, consider a prime divisor p of n and take with and a ≡ 1 (mod n/p).
3.20 (b)	if n is an Euler pseudoprime to base a and not an Euler pseudoprime to base b, then n is not an Euler pseudoprime to base ab.
3.21 (a)	Let be the prime factorization of n with r and each α_i in . Then, . For odd p_i, the group is cyclic of order and hence contains an element of order p_i – 1.
3.21 (b)	ord_n(–1) = 2.
3.21 (c)	Let v_p(n) ≥ 2 for some odd prime p. Construct an element with ord_n(a) = p.
3.28	Proceed by induction on i = 1, . . . , r. For 1 ≤ i ≤ r, define ν_i := n₁ · · · n_i and let be a solution of the congruences b_i ≡ a_j (mod n_j) for j = 1, . . . , i. If i < r, use the combining formula given in Section 2.5 to find such that b_i+1 ≡ b_i (mod ν_i) and b_i+1 ≡ a_i+1 (mod n_i+1).
3.31	Apply Newton’s iteration to compute a zero of x² – n.
3.32 (a)	Apply Newton’s iteration to compute a zero of x^k – n.
3.34 (b)	The updating d(X) := d(X) – X^i–sb(X) needs to consider only the non-zero words of b.
3.36 (b)	First consider b = 0 and note that the roots of X^(q–1)/2 – 1 (resp. X^(q–1)/2 + 1) are all the quadratic residues (resp. non-residues) of .
3.36 (c)	First consider b = 0.
3.40	For , we have ord(a)\|m and for each i = 1, . . . , r the multiplicity v_pi (ord(a)) is the smallest of the non-negative integers k satisfying .
3.41 (a)	Use the CRT.
3.43 (a)	Use the CRT and the fact that for an odd prime r ≡ 3 (mod 4).
4.1 (a)	Using the CRT, reduce to the case that n is prime. Then is bijective ⇔ the restriction is bijective. Now, if gcd(a, φ(n)) = 1, the inverse of is given by , where ab ≡ 1 (mod φ(n)). On the other hand, if q is a prime divisor of gcd(a, φ(n)), choose an element with ord(y) = q. But then y^a ≡ 1 (mod n), that is, is not injective. This exercise provides the foundation for the RSA cryptosystems.
4.1 (b)	In view of the CRT, reduce to the case n = p^α for and α > 1. Then (p^α–1)^a ≡ 0 (mod n).
4.6	Consider the integral .
4.9	Use the CRT and lifting.
4.10	For proving , let n be an odd composite integer, choose a random and compute a square root x of y² modulo n. By Exercise 4.9, the probability that x ≡ ±y (mod n) is at most 1/2.
4.12 (d)	Eliminate a from T (a, b, c) using a + b + c = 0. For each fixed c, allow b to vary and use a sieve to find out all the values of b for which T (a, b, c) is smooth for the fixed c.
4.13	You may use the prime number theorem and the fact that the sum of the reciprocals of the first t primes asymptotically approaches ln ln t.
4.15	If a < a₁ or a > a_m, then no i exists. So assume that a₁ ≤ a ≤ a_m and let d := ⌊(1 + m)/2⌋. If a = a_d, return d, else if a < a_d, recursively search a among the elements a₁, . . . , a_d–1, and if a > a_d, recursively search a among the elements a_d+1, . . . , a_m.
4.16 (a)	Use Lagrange’s interpolation formula (Exercise 2.53).
4.18 (a)	One may precompute the values σ_i := p rem q_i, i = 1, . . . , t. Note that q_i\|(g^α + kp) if and only if ρ_k,i = 0.
4.19 (a)	Use the approximation T (c₁, c₂) ≈ (c₁ + c₂)H.
4.21 (c)	T (a, b, c) = –b² – c(x + cy)b + (z – c²x).
4.21 (d)	Imitate the second stage of the LSM.
4.23	Let the factor base consist of all irreducible polynomials over of degrees ≤ m together with the polynomials of the form X^k + h(X), , deg h ≤ m. The optimal running time of this algorithm corresponds to .
4.24 (b)	is square-free.
4.24 (c)	Use the fact X^m – 1 = (X^{m/p^v_p(m)} – 1)^{p^v_p(m)}.
4.24 (d)	Theorem 2.39.
4.25 (a)	Look at the roots of the polynomials on the two sides.
4.25 (c)	If ord ω = m, then ord(–ω) = 2m.
4.25 (d)	ω, ω^q, . . . , ω^{q^l–1} are all the roots of the minimal polynomial of ω over .
4.26 (b)	Use the Mordell–Weil theorem.
4.26 (c)	Use Theorem 4.2.
5.2 (a)	Solve the simultaneous congruences x ≡ c_i (mod n_i), i = 1, . . . , e, and then take the integer e-th root of the solution x, 1 ≤ x ≤ n₁ · · · n_e.
5.2 (b)	Append (different) pseudorandom bit strings to m before encryption. This process is often referred to as salting.
5.3 (a)	In view of the Chinese remainder theorem, reduce to the case n = p^r for some and .
5.4	ue₁ + ve₂ = 1 for some u, .
5.6	If the same session key is used to generate the ciphertext pairs (r₁, s₁) and (r₂, s₂) on two plaintext messages m₁ and m₂, then m₁/m₂ = s₁/s₂.
5.7 (c)	Let x = (x_l–1 . . . x₁x₀)₂. Define x′ := (x_l_–1 . . . x₂x₁)₂ and y′ := g^x′ (mod p). Then, y ≡ y^′2g^x₀ (mod p). Since x₀ is easily computable, y′ can be obtained by obtaining a square root of y modulo p. Argue that a call of the oracle helps us choose the correct square root y′ of y. Now, use recursion.
5.8	Let g′ be any randomly chosen generator of , where q := p^h. One computes for i = 0, 1, . . . , p – 1. We then have the equality of the sets modulo q – 1, where l := ind_g′ g. But then for each i we have a (yet unknown) j such that . Show that trying all possibilities for i and j one can effectively recover l and hence g = g′^l and hence π.
5.9	Let g′, and l be as in Exercise 5.8. Now, we have the equality of the sets modulo q – 1.
5.11	(mod β) are polynomials with small coefficients.
5.15 (a)	If Alice generates the signatures (M₁, s₁) and (M₂, s₂) on two messages M₁ and M₂, then her signature on a message M with H(M) ≡ H(M₁)H(M₂) (mod n) is s₁s₂ (mod n). Thus, without knowing the private key of Alice, an intruder can generate a valid signature (M, s₁s₂) of Alice, provided that such an M can be computed. Of course, here the intruder has little control over the message M. The PKC standards form RSA Laboratories add some redundancy to the hash function output before signing. The product of two hash values with redundancy is, in general, expected not to have the redundancy. This increases the security of the scheme against existential forgeries beyond that provided by the first pre-image resistance of the underlying hash function.
5.15 (b)	For any , a valid signature is (M, s), where H(M) ≡ s² (mod n).
5.15 (c)	Choose random integers u, v with gcd(v, n) = 1 and take d′ := u + dv. Of course, d and hence d′ are unknown to Carol, but she can compute s = g^d′ = g^u(g^d)^v and t ≡ –H(s)v^–1 (mod n). But then (M, s, t) is a valid ElGamal signature on a message M for which H(M) ≡ tu (mod n).
5.16	Obviously, c itself could be a possible choice, but that is not random and Bob might refuse to sign c. Carol should hide c by cr^e (mod n) for some randomly chosen r known to her.
5.23 (a)	by the CRT.
5.25 (a)	Replace the random challenge of the verifier by the hash value of the string obtained by concatenating the message to be signed with the witness.
5.26 (d)	Bob finds a random b′ with and sends a := (b′)² (mod n) to Alice. But then Alice’s response b yields a non-trivial factor gcd(b – b′, n) of n.
7.5	(mod n) and m ≡ s^e (mod n).
7.9 (a)	Use Exercise 2.44(b).
7.9 (c)	Again use Exercise 2.44(b).
7.9 (d)	Use Part (c) in conjunction with the CRT, and separately consider the three cases v₂(p–1) = v₂(q – 1), v₂(p – 1) > v₂(q – 1) and v₂(p – 1) < v₂(q – 1).
A.2	for all X, J. One does not have to look at the S-boxes for proving this.
A.9 (c)	For i = 0, 1, 2, 3, 4N_r, 4N_r + 1, 4N_r + 2, 4N_r + 3, take . For other values of i, take .
A.14 (b)	Let D_L(X) := X^dC_L(1/X) = a₀ + a₁X + a₂X² + · · · + a_d–1X^d–1 + X^d. Consider the -algebra , where x := X + 〈D_L(X)〉. The -linear transformation λ_x : A → A defined by g(x) ↦ xg(x) has the matrix Δ_L with respect to the polynomial basis (1, x, . . . , x^d–1). If is the minimal polynomial of λ_x, then [f(λ_x)](1) = f(x) = 0. Now, use the fact that 1, x, . . . , x^d–1 are linearly independent over .
A.16 (b)	[only if] Take σ ≠ 00 · · · 01. Since σ is non-zero, s_i = 1 for some . Construct an LFSR with d – 1 stages initialized to s₀s₁ · · · s_d–2 to generate σ.
A.19	Suppose that we want to compute a second pre-image for H₂(x). If , any is a second pre-image for H₂(x). If , computing a second pre-image for H₂(x) is equivalent to computing a second pre-image for H(x). The density of the (finite) set S is 0 in the (infinite) set of all bit strings. Thus, H₂ is second pre-image resistant. On the other hand, for any two distinct x, we have a collision (x, x′) for H₂.
A.20	Collision resistance of H implies that of H₃. On the other hand, for a positive fraction (half) of the (n + 1)-bit strings y, it is easy to compute a pre-image of y under H₃.
A.21	If y is a square root of a modulo m, then so is m – y too.
A.22	Use the birthday paradox (Exercise 2.172).
A.23 (d)	Let L := F₁(L′) and R := F₁(R′) with both R and R′ non-zero. Then, F₁(L ‖ R) = F₂(L′ ‖ R′).
A.25	Let h⁽ⁱ⁾ denote the column vector of dimension 160 having the bits of H⁽ⁱ⁾ as its elements and m⁽ⁱ⁾ the column vector of dimension 512 + 160 = 672 having the bits of M⁽ⁱ⁾ and of H⁽ⁱ⁾ as its elements. Show that the modified design of SHA-1 leads to the relation h⁽ⁱ⁾ ≡ Am^(i–1) + c (mod 2) for some constant 160 × 672 matrix A over and for some constant vector c. So what then?
C.6	For α, , call α ≤ β if and only if \|α\| < \|β\| or \|α\| = \|β\| and α is lexicographically smaller than β. This ≤ produces a well-ordering of Σ^*. For a one-way function f, look at the language for some with γ ≤ β}.