Library of Congress Cataloging-in-Publication Data
Das, Abhijit.
Public-key cryptography : theory and practice / Abhijit Das, C. E. Veni Madhavan.
p. cm.
Includes bibliographical references and index.
ISBN: 978-8131708323 (pbk.)
1. Public key cryptography. 2. Telecommunication—Security
measures-Mathematics. 3. Computers-Access control-Mathematics. I. Madhavan,
C. E. Veni. II. Title. TK5102.94.D37 2009
005.8'2-dc22
2009012766
Copyright © 2009 Dorling Kindersley (India) Pvt. Ltd.
Licensees of Pearson Education in South Asia
This book is sold subject to the condition that it shall not, by way of trade or otherwise, be lent, resold, hired out, or otherwise circulated without the publisher’s prior written consent in any form of binding or cover other than that in which it is published and without a similar condition including this condition being imposed on the subsequent purchaser and without limiting the rights under copyright reserved above, no part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise), without the prior written permission of both the copyright owner and the above-mentioned publisher of this book.
ISBN 9788131708323
Head Office: 482 FIE, Patparganj, Delhi 110 092, India
Registered Office: 14 Local Shopping Centre, Panchsheel Park, New Delhi 110 017, India
Printed in India.
Pearson Education Inc., Upper Saddle River, NJ
Pearson Education Ltd., London
Pearson Education Australia Pty, Limited, Sydney
Pearson Education Singapore, Pte. Ltd
Pearson Education North Asia Ltd, Hong Kong
Pearson Education Canada, Ltd., Toronto
Pearson Educacion de Mexico, S.A. de C.V.
Pearson Education-Japan, Tokyo
Pearson Education Malaysia, Pte. Ltd.
I can’t understand why a person will take a year to write a novel when he can easily buy one for a few dollars.
—Fred Allen
The first moral question that we faced (like most authors) is: “Why another book?” Available textbooks on public-key cryptography (or cryptography in general) are many [37, 74, 113, 114, 145, 152, 153, 194, 209, 262, 283, 288, 291, 296]. In the presence of all these books, writing another may sound like a waste of energy and effort.
Fortunately, we have a big answer. Most cryptography textbooks today, even many of the celebrated ones, essentially take a narrative approach. While such an approach may be suitable for beginners at an undergraduate level, it misses the finer details in this rapidly growing area of applied mathematics. The fact that public-key cryptography is mathematical is hard to deny and a mathematical subject would be better treated in the mathematical way.
This is precisely the point that this book addresses, that is, it proceeds in a canonically mathematical way while revealing cryptographic concepts. This mathematics is often not so simple (and that is why other textbooks didn’t bother to mention it), but we plan to stick to mathematical sophistication as far as possible. A typical feature of this book is that it does not rely on anything other than the readers’ mathematical intuitions; it develops all the mathematical abstractions starting from scratch. Although computer science and mathematics students nowadays do undergo some courses on discrete structures somewhere in their curricula, we do not assume this; instead we develop the algebra starting at the level of set operations. Simpler structures like groups, rings and fields are followed by more complex concepts like finite fields, algebraic curves, number fields and p-adic numbers. The resulting (long) compilation of abstract mathematical tools tends to relieve cryptography students and researchers from consulting many mathematics books for understanding the background concepts. We are happy to offer this self-sufficient treatment complete with proofs and other details. The only place where we had to be somewhat sketchy is the discussion on elliptic and hyperelliptic curves. The mathematics here seems to be too vast to fit in a few pages and we opted for a deliberate simplification of these topics.
A big problem with discrete mathematics is that many of its proofs are existential. However, in order to make things work in a practical environment one must undergo algorithmic studies of algebra and number theory. This is what our book does next. While many algorithmic issues in this area are settled favourably, there remain some problems whose best known algorithmic complexities are still poor. Some of these so-called computationally difficult problems are used to build secure public-key cryptosystems. The security of these systems are assumed (rather than proven) and so we extensively deal with the algorithms known till date to solve these difficult problems. This is precisely the point that utilizes the mathematics developed in earlier chapters, to a great extent.
In Chapter 5, we eventually hit upon the culmination of all these mathematical and algorithmic studies in the design of public-key systems for achieving various cryptographic goals. Under the theoretical base developed in earlier chapters, Chapter 5 turns out to be an easy chapter. This is our way of looking into the problem, namely, a formal bottom–up approach. We claim to be different from most textbooks in this regard. Our discussion of mathematics is not for its own sake, but to develop the foundation of cryptographic primitives.
We then turn to some purely implementation and practical issues of public-key cryptography. Standards proposed by organizations such as IEEE and RSA Security Inc. promote interoperability of using crypto primitives in Internet applications. We then look at some small applications of the crypto basics. Some indirect ways of cryptanalysis are described next. These techniques (side-channel and backdoor attacks) give the book a strong practical flavour in tandem with its otherwise formal appearance.
As an eleventh-hour decision, we added a final chapter to our book, a chapter on quantum computation and its implications on public-key cryptography. Although somewhat theoretical at this point, quantum computation exhibits important ramifications in public-key cryptography. The mathematics behind quantum mechanics and computation are never discussed earlier just to highlight the distinctive nature of this chapter, which may perhaps be titled cryptography in future.
This schematic description of this book perhaps makes it clear that this book is better suited as a graduate-level textbook. A one- or two-semester graduate or advanced undergraduate course can run based on the contents of this book. Self-studying this book is also possible at an advanced graduate or research level, but is expected to be difficult at an undergraduate level. We highlight the importance of classroom teaching, if an undergraduate course is to be based on this textbook.
We rated different items in the book by their levels of difficulty and/or mathematical sophistication. Unstarred items can be covered even in undergraduate courses. Items marked by single stars can be taken seriously for a second course or a second reading. Doubly starred items, on the other hand, are research-level materials and can be pursued only in really advanced courses or for undergoing research. Inclusion of a good amount of these advanced topics marks another distinction of this book compared to other available textbooks.
The book comes with plenty of exercises. We have two-fold motivations behind these exercises. In the first place, they help the readers deepen their understanding of the matter discussed in the text. In the second place, some of these exercises build additional theory that we omit in the text proper. We occasionally make use of these additional topics in proving and/or explaining results in the text. We do not classify the exercises into easy and difficult ones, but specify hints, some of which are pretty explicit, for intellectually challenging parts. We separate out the hints in an appendix near the end of this book and leave the marker [H] in appropriate locations of the statements of the exercises. This practice prevents a reader from accidentally seeing a hint. Only when the reader gets stuck, (s)he can look at the hints at the end. We believe that the exercises, together with our discussion on algorithms and implementation issues, will offer serious students many ways to carry out substantial implementation work to further their research and development in cryptography.
Every chapter ends with annotated references for further studies. We do not claim to be encyclopaedic in this respect. Instead we mention only those references that, we feel, are directly related to the topics dealt with in the respective chapters.
As a trade-off between bulk and coverage, we had to leave many issues untouched. For example, we were limited by constraints of space to present symmetric-key cryptography in detail. However, in view of its importance today, we include brief discussions in an appendix on block ciphers, stream ciphers and hash functions. We also do not discuss anything about formal security of public-key protocols. The issues related to provable security are at the minimum theoretically important in the study of cryptography, but are entirely left out here. Only a brief discussion on the implication of complexity theory on the security of public-key protocols is included in another appendix. The Handbook of Applied Cryptography [194] by Menezes et al. can supplement this book for learning symmetric techniques, whereas the book by Delfs and Knebl [74] or those by Goldreich [113, 114] can be consulted for formal security issues.
We are indebted to everybody whose criticism, encouragement and support made this project materializable. Special thanks go to Bimal Roy, Chandan Mazumdar, C. Pandurangan, Debdeep Mukhopadhyay, Dipanwita Roychowdhury, Gagan Garg, Hartmut Wiebe, H. V. Kumar Swamy, Indranil Sengupta, Kapil Paranjape, Manindra Agarwal, Palash Sarkar, Rajesh Pillai, Rana Barua, R. Balasubramanian, Sanjay Barman, Shailesh, Satrajit Ghosh, Souvik Bhattacherjee, Srihari Vavilapalli, Subhamoy Maitra, Surjyakanta Mohapatro, and Uwe Storch. This book has been tested in postgraduate courses in the Indian Institute of Science, Bangalore, and in the Indian Institute of Technology Kharagpur. We sincerely thank all our students for pointing out many errors and suggesting several improvements. We express our deep gratitude to our family members for their constant understanding and moral support. We are also indebted to our institutes for providing the wonderful intellectual climate for completing this work.
A. D.
C. E. V. M.
Any time you are stuck on a problem, introduce more notation.
—Chris Skinner [Plenary Lecture, Aug 1997, Topics in Number Theory, Penn State]
| General | |
| |a| | absolute value of real number a |
| min S | minimum of elements of set S |
| max S | maximum of elements of set S |
| exp(a) | ea, where ![]() |
| log x | logarithm of x with respect to some unspecified base (like 10) |
| ln x | loge x, where ![]() |
| lg x | log2 x |
| logk x | (log x)k (similarly, lnk x = (ln x)k and lgk x = (lg x)k) |
| := | is defined as (or “is assigned the value” in code snippets) |
| i | ![]() |
![]() | complex conjugate (x – iy) of the complex number z = x + iy |
| δij | Kronecker delta |
| (asas–1 . . . a0)b | b-ary representation of a non-negative integer |
![]() | binomial coefficient, equals n(n – 1) ··· (n – r + 1)/r! |
| ⌊x⌋ | floor of real number x |
| ⌈x⌉ | ceiling of real number x |
| [a, b] | closed interval, that is, the set of real numbers x in the range a ≤ x ≤ b |
| (a, b) | open interval, that is, the set of real numbers x in the range a < x < b |
| L(t, α, c) | expression of the form exp ((c + o(1))(ln t)α(ln ln t)1–α) |
| Lt[c] | abbreviation for L(t, 1/2, c) (denoted also as L[c] if t is understood) |
| Bit-wise operations (on bit strings a, b) | |
| NAND | negation of AND |
| NOR | negation of OR |
| XOR | exclusive OR |
| a ⊕ b | bit-wise exclusive OR (XOR) of a and b |
| a AND b | bit-wise AND of a and b |
| a OR b | bit-wise inclusive OR of a and b |
| LSk(a) | left shift of a by k bits |
| RSk(a) | right shift of a by k bits |
| LRk(a) | left rotate (cyclic left shift) of a by k bits |
| RRk(a) | right rotate (cyclic right shift) of a by k bits |
| ā | bit-wise complement of a |
| a ‖ b | concatenation of a and b |
| Sets | |
![]() | empty set |
| #A | cardinality of set A |
![]() | a is an element of set A |
| A ⊆ B | set A is contained in set B |
| A ⊈ B | set A is not contained in set B |
![]() | set A is properly contained in set B |
| A ∪ B | union of sets A and B |
| A ⊎ B | disjoint union of sets A and B |
| A ∩ B | intersection of sets A and B |
| A \ B | difference of sets A and B |
| Ā | complement of set A (in a bigger set) |
| A × B | (Cartesian) product of sets A and B |
![]() | set of all natural numbers, that is, {1, 2, 3, . . .} |
![]() | set of all non-negative integers, that is, {0, 1, 2, . . .} |
![]() | set of all integers, that is, {. . . , –2, –1, 0, 1, 2, . . .} |
![]() | set of all (positive) prime numbers, that is, {2, 3, 5, 7, . . .} |
![]() | set of all rational numbers, that is, ![]() |
![]() | set of all non-zero rational numbers |
![]() | set of all real numbers |
![]() | set of all non-zero real numbers |
![]() | set of all non-negative real numbers |
![]() | set of all complex numbers |
![]() | set of all non-zero complex numbers |
![]() | , can be represented by the set {0, 1, . . . , n –1} |
![]() | group of units in , can be represented as {a | 0 ≤ a < n, gcd(a, n) = 1} |
![]() | finite field of cardinality q |
![]() | multiplicative group of , that is, ![]() |
![]() | ring of integers of number field K |
![]() | group of units of ![]() |
![]() | ring of p-adic integers |
![]() | field of p-adic numbers |
| Up | group of units of ![]() |
| Functions and relations | |
| f : A → B | f is a function from set A to set B |
| f : A ↪ B | f is an injective function from set A to set B |
| f : A ↠ B | f is a surjective function from set A to set B |
| a ↦ b | a is mapped to b (by a function) |
| f ο g | composition of functions f and g (applied from right to left) |
| f–1 | inverse of bijective function f |
| Ker f | kernel of function (homomorphism) f |
| Im f | image of function f |
| ~ | equivalent to |
| [a] | equivalence class of a |
| Groups | |
| aH | coset in a multiplicative group |
| a + H | coset in an additive group |
| HK | internal direct product of (sub)groups H and K |
| H × K | external direct product of (sub)groups H and K |
| [G : H] | index of subgroup H in group G |
| G/H | quotient group |
| G1 ≅ G2 | groups G1 and G2 are isomorphic |
| ord G | order (that is, cardinality) of group G |
| ordG a | order of element a in group G |
| Exp G | exponent of group G |
| Z(G) | centre of group G |
| C(a) | centralizer of group element a |
| GLn(K) | general linear group over field K (of n × n matrices) |
| SLn(K) | special linear group over field K (of n × n matrices) |
| Gtors | torsion subgroup of G |
| Rings | |
| char A | characteristic of ring A |
| A × B | direct product of rings A and B |
| A* | multiplicative group of units of ring A |
| 〈S〉 | for ring A, ideal generated by S ⊆ A, also written as ![]() |
| 〈a〉 | for ring A, principal ideal generated by , also written as aA and Aa |
a ≡ b (mod ) | a is congruent to b modulo ideal , that is, ![]() |
| A ≅ B | rings A and B are isomorphic |
![]() | quotient ring (modulo ideal ) |
| a|b | a divides b (in some ring) |
| vp(a) | multiplicity of prime p in element a |
| pk‖a | k = vp(a) |
![]() | nilradical of ring A |
| Ared | reduction of ring A, equals ![]() |
| gcd(a, b) | greatest common divisor of elements a and b |
| lcm(a, b) | least common multiple of elements a and b |
![]() | sum of ideals and ![]() |
![]() | intersection of ideals and ![]() |
![]() | product of ideals and ![]() |
![]() | root (or radical) of ideal ![]() |
| Q(A) | total quotient ring of ring A (quotient field of A, if A is an integral domain) |
| S–1A | localization of ring A at multiplicative set S |
![]() | localization of ring A at prime ideal ![]() |
![]() | ring of integers of number field K |
N( ) | norm of ideal (in a Dedekind domain) |
| CRT | Chinese remainder theorem |
| ED | Euclidean domain |
| DD | Dedekind domain |
| DVD (or DVR) | discrete valuation domain (or ring) |
| PID | principal ideal domain |
| UFD | unique factorization domain |
| Fields | |
| char K | characteristic of field K |
| K* | multiplicative group of units of field K, that is, K \ {0} |
![]() | algebraic closure of field K |
| [K : F] | degree of the field extension F ⊆ K |
| K[a] | ![]() |
| K(a) | {f(a)/g(a) | f(X), , g(a) ≠ 0} |
| Aut K | group of automorphisms of field K |
| AutF K | for field extension F ⊆ K, group of F-automorphisms of K (also Gal(K|F)) |
| FixF H | for field extension F ⊆ K, fixed field of subgroup H of AutF K |
![]() | finite field of cardinality q |
![]() | multiplicative group of units of , that is, ![]() |
| Tr | trace function |
| TrK|F (a) | for field extension F ⊆ K, trace of over F |
| N | norm function |
| NK|F (a) | for field extension F ⊆ K, norm of over F |
![]() | Frobenius automorphism , a ↦ aq |
![]() | ring of integers of number field K |
![]() | group of units of ![]() |
| ΔK | discriminant of number field K |
![]() | ring of p-adic integers |
![]() | field of p-adic numbers |
| Up | group of units of ![]() |
| | |p | p-adic norm on ![]() |
| Integers | |
| a quot b | quotient of Euclidean division of a by b ≠ 0 |
| a rem b | remainder of Euclidean division of a by b ≠ 0 |
| a|b | a divides b in , that is, b = ca for some ![]() |
| vp(a) | multiplicity of prime p in non-zero integer a |
| gcd(a, b) | greatest common divisor of integers a and b (not both zero) |
| lcm(a, b) | least common multiple of integers a and b |
| a ≡ b (mod n) | a is congruent to b modulo n |
| a–1 (mod n) | multiplicative inverse of a modulo n (given that gcd(a, n) = 1) |
| φ(n) | Euler’s totient function |
![]() | Legendre (or Jacobi) symbol |
| [a]n | coset ![]() |
| ordn a | multiplicative order of a modulo n (given that gcd(a, n) = 1) |
| μ(n) | Möbius function |
| π(x) | number of primes between 1 and positive real number x |
| Li(x) | Gauss’ Li function |
| ψ(x, y) | fraction of positive integers ≤ x, that are y-smooth |
| ζ(s) | Riemann zeta function |
| RH | Riemann hypothesis |
| ERH | extended Riemann hypothesis |
| Mn | 2n – 1 (Mersenne number) |
| ℜ | 232, standard radix for representation of multiple-precision integers |
| Polynomials | |
| A[X1, . . . , Xn] | polynomial ring in indeterminates X1, . . . , Xn over ring A |
| A(X1, . . . , Xn) | ring of rational functions in indeterminates X1, . . . , Xn over ring A |
| deg f | degree of polynomial f |
| lc f | leading coefficient of polynomial f |
| minpolyα,K(X) | minimal polynomial of α over field K, belongs to K[X] |
| cont f | content of polynomial f |
| pp f | primitive part of polynomial f |
| f′(X) | formal derivative of polynomial f(X) |
| Δ(f) | discriminant of polynomial f |
![]() | the polynomial ![]() |
| μm | group of m-th roots of unity |
| Фm | m-th cyclotomic polynomial |
| Vector spaces, modules and matrices | |
| dimK V | dimension of vector space V over field K |
| Span S | span of subset S of a vector space |
| HomK(V, W) | set of all K-linear transformations V → W |
| EndK(V) | set of all K-linear transformations V → V |
| M/N | quotient vector space or module |
| M ≅ N | vector spaces or modules M and N are isomorphic |
![]() | direct product of modules Mi, ![]() |
![]() | direct sum of modules Mi, ![]() |
| At | transpose of matrix (or vector) A |
| A–1 | inverse of matrix A |
| Rank T | rank of matrix or linear transformation T |
| RankA M | rank of A-module M |
| Null T | nullity of matrix or linear transformation T |
| (M : N) | for A-module M and submodule N, the ideal of A |
| AnnA(M) | annihilator of A-module M, same as (M : 0) |
| Tors M | torsion submodule of M |
| A[S] | A-algebra generated by set S |
| 〈v, w〉 | inner product of two real vectors v and w |
| Algebraic curves | |
![]() | n-dimensional affine space over field K |
![]() | n-dimensional projective space over field K |
| (x1, . . . , xn) | homogeneous coordinates of a point in ![]() |
| [x0, x1, . . . , xn] | projective coordinates of a point in ![]() |
| f(h) | homogenization of polynomial f |
| C(K) | set of K-rational points over curve C defined over field K |
| K[C] | ring of polynomial functions on curve C defined over K |
| K(C) | field of rational functions on curve C defined over K |
| [P] | point P on a curve in formal sums |
| ordP (r) | order of rational function r at point P |
| DivK (C) | group of divisors on curve C defined over field K |
![]() | group of divisors of degree 0 on curve C defined over field K |
| DivK(r) | divisor of a rational function r |
| PrinK(C) | group of principal divisors on curve C defined over field K |
![]() | Jacobian of curve C defined over field K |
| PicK(C) | Picard group of curve C (equals DivK(C)/ PrinK(C)) |
![]() | , same as Jacobian ![]() |
![]() | point at infinity on an elliptic or a hyperelliptic curve |
| Δ(E) | discriminant of elliptic curve E |
| j(E) | j-invariant of elliptic curve E |
| E(K) | group of points on elliptic curve E defined over field K |
| P + Q | sum of two points P, ![]() |
| mP | m-th multiple (that is, m-fold sum) of point ![]() |
ψm, , fm | m-th division polynomials |
| t | trace of Frobenius of elliptic curve ![]() |
| EK[m] | group of m-torsion points in E(K) |
| E[m] | abbreviation for ![]() |
| em | Weil pairing (a map E[m] × E[m] → μm) |
| Div(a, b) | representation of reduced divisor on hyperelliptic curve by polynomials a, b |
| Probability and statistics | |
| Pr(E) | probability of event E |
| Pr(E1|E2) | conditional probability of event E1 given event E2 |
| E(X) | expectation of random variable X |
| Var(X) | variance of random variable X |
| σX | standard deviation of random variable X (equals ) |
| Cov(X, Y) | covariance of random variables X, Y |
| ρX,Y | correlation coefficient of random variables X, Y |
| Computational complexity | |
| f = O(g) | big-Oh notation: f is of the order of g |
| f = Ω(g) | big-Omega notation: g is of the order of f |
| f = Θ(g) | big-Theta notation: f and g have the same order |
| f = o(g) | small-oh notation: f is of strictly smaller order than g |
| f = ω(g) | small-omega notation: f is of strictly larger order than g |
| f = O~(g) | soft-Oh notation: f = O(g logk g) for real constant k ≥ 0 |
![]() | problem P1 is polynomial-time reducible to problem P2 |
| P1 ≅ P2 | problems P1 and P2 are polynomial-time equivalent |
| Intractable problems | |
| CVP | closest vector problem |
| DHP | (finite field) Diffie–Hellman problem |
| DLP | (finite field) discrete logarithm problem |
| ECDHP | elliptic curve Diffie–Hellman problem |
| ECDLP | elliptic curve discrete logarithm problem |
| HECDHP | hyperelliptic curve Diffie–Hellman problem |
| HECDLP | hyperelliptic curve discrete logarithm problem |
| GIFP | general integer factorization problem |
| IFP | integer factorization problem |
| QRP | quadratic residuosity problem |
| RSAIFP | RSA integer factorization problem |
| RSAKIP | RSA key inversion problem |
| RSAP | RSA problem |
| SQRTP | modular square root problem |
| SSP | subset sum problem |
| SVP | shortest vector problem |
| Algorithms | |
| ADH | Adleman, DeMarrais and Huang’s algorithm |
| AES | advanced encryption standard |
| AKS | Agarwal, Kayal and Saxena’s deterministic primality test |
| BSGS | Shanks’ baby-step–giant-step method |
| CBC | cipher-block chaining mode |
| CFB | cipher feedback mode |
| CSM | cubic sieve method |
| CSPRBG | cryptographically strong pseudorandom bit generator |
| CvA | Chaum and Van Antwerpen’s undeniable signature scheme |
| DDF | distinct-degree factorization |
| DES | data encryption standard |
| DH | Diffie–Hellman key exchange |
| DPA | differential power analysis |
| DSA | digital signature algorithm |
| DSS | digital signature standard |
| ECB | electronic codebook mode |
| ECDSA | elliptic curve digital signature algorithm |
| ECM | elliptic curve method |
| E-D-E | encryption–decryption–encryption scheme of triple encryption |
| EDF | equal-degree factorization |
| EG | Eschenauer and Gligor’s scheme |
| FEAL | fast data encipherment algorithm |
| FFS | Feige, Fiat and Shamir’s zero-knowledge protocol |
| GKR | Gennaro, Krawczyk and Rabin’s RSA-based undeniable signature scheme |
| GNFSM | general number field sieve method |
| GQ | Guillou and Quisquater’s zero-knowledge protocol |
| HFE | cryptosystem based on hidden field equations |
| ICM | index calculus method |
| IDEA | international data encryption algorithm |
| KLCHKP | braid group cryptosystem |
| L3 | Lenstra–Lenstra–Lovasz algorithm |
| LFSR | linear feedback shift register |
| LSM | linear sieve method |
| LUC | cryptosystem based on Lucas sequences |
| MOV | Menezes, Okamoto and Vanstone’s reduction |
| MPQSM | multiple polynomial quadratic sieve method |
| MQV | Menezes–Qu–Vanstone key exchange |
| NFSM | number field sieve method |
| NR | Nyberg–Rueppel signature algorithm |
| NTRU | Hoffstein, Pipher and Silverman’s encryption algorithm |
| NTRUSign | NTRU signature algorithm |
| OAEP | optimal asymmetric encryption procedure |
| OFB | output feedback mode |
| PAP | pretty awful privacy |
| PGP | pretty good privacy |
| PH | Pohlig–Hellman method |
| PRBG | pseudorandom bit generator |
| PSS | probabilistic signature scheme |
| QSM | quadratic sieve method |
| RSA | Rivest, Shamir and Adleman’s algorithm |
| SAFER | secure and fast encryption routine |
| Satoh–FGH | Point counting algorithm on elliptic curves over fields of characteristic 2 |
| SDSA | shortened digital signature algorithm |
| SEA | Schoof, Elkies and Atkins’ algorithm for point counting on elliptic curves |
| SETUP | secretly embedded trapdoor with universal protection |
| SFF | square-free factorization |
| SHA | secure hash algorithm |
| SmartASS | algorithm for computing discrete logs in anomalous elliptic curves |
| SNFSM | special number field sieve method |
| SPA | simple power analysis |
| TWINKLE | the Weizmann Institute key location engine |
| TWIRL | the Weizmann Institute relation locator |
| XCM | xedni calculus method |
| XSL | extended sparse linearization attack |
| XTR | efficient and compact subgroup trace representation |
| ZK | zero-knowledge |
| Quantum computation | |
| |ψ〉 | ket notation for vector ψ |
![]() | inner product of vectors |ψ〉 and ![]() |
| ‖ψ‖ | norm of vector |ψ〉 (equals ) |
![]() | n-dimensional Hilbert space (over ) |
| |0〉, |1〉, . . . , |n – 1〉 | orthonormal basis of ![]() |
| cbit | classical bit |
| qubit | quantum bit |
| ⊗ | tensor product of Hilbert spaces |
| F | Fourier transform |
| H | Hadamard transform |
| I | Identity transform |
| X | Exchange transform |
| Z | Z transform |
| Computational primitives | |
| ulong | 32-bit unsigned integer data type (unsigned long) |
| ullong | 64-bit unsigned integer data type (unsigned long long) |
| a := b | assignment operator (returns the value assigned) |
| +, –, ×, /, % | arithmetic operators |
| ++, – – | increment and decrement operators |
| a ◊ = b | a := a ◊ b for ![]() |
| =, ≠, >, <, ≥, ≤ | comparison operators |
| 1 | True as a condition |
| if | conditional statement: if (condition)··· |
| if-else | conditional statement: if (condition)··· , else··· |
| while | while loop: while (condition)··· |
| do | do loop: do···while (condition) |
| for | for loop: for (range of values)··· |
| {···} | block of statements |
| , or. or new-line | statement terminator |
| /*··· */ | comment |
| return | return from this routine |
| Miscellaneous | |
![]() | end of (visible or invisible) proof |
![]() | end of item (like example, definition, assumption) |
| [H] | hint available in Appendix D |
| 1.1 | Introduction |
| 1.2 | Common Cryptographic Primitives |
| 1.3 | Public-key Cryptography |
| 1.4 | Some Cryptographic Terms |
| Chapter Summary |
Aller Anfang ist schwer: All beginnings are difficult.
—German proverb
Defendit numerus: There is safety in numbers.
—Anonymous
The ability to quote is a serviceable substitute for wit.
—W. Somerset Maugham
It is rather difficult to give a precise definition of cryptography. Loosely speaking, it is the science (or art or technology) of preventing access to sensitive data by parties who are not authorized to access the data. Secure transmission of messages over a public channel is the first, simplest and oldest example of a cryptographic protocol. For assessing the security of these protocols, one studies their possible weak points, namely the strategies for breaking them. This study is commonly referred to as cryptanalysis. And, finally, the study of both cryptography and cryptanalysis is known as cryptology.
| Cryptology = Cryptography + Cryptanalysis |
The science of cryptology is rather old. It naturally developed as and when human beings felt the need for privacy and secrecy. The rapid deployment of the Internet in the current years demands that we look into this subject with a renewed interest. Newer requirements tailored to Internet applications have started cropping up and as a result newer methods, protocols and algorithms are coming up. The most startling discoveries include that of the key-exchange protocol by Diffie and Hellman in 1976 and that of the RSA cryptosystem by Rivest, Shamir and Adleman in 1978. They opened up a new branch of cryptology, namely public-key cryptology. Historically, public-key technology came earlier than the Internet, but it is the latter that makes an extensive use of the former.
This book is an attempt to introduce to the reader the vast and interesting branch of public-key cryptology. One of the most distinguishing features of public-key cryptology is that it involves a reasonable amount of abstract mathematics which often comes in the way of a complete understanding to an uninitiated reader. This book tries to bridge the gap. We develop the required mathematics in necessary and sufficient details.
This chapter is an overview of the topics that the rest of the book deals with. We start with a description of the most common cryptographic protocols. Then we introduce the public-key paradigm and discuss the source of its security. We use certain mathematical terms and notations throughout this chapter. If the reader is not already familiar with these terms, there is nothing to worry about. As we have just claimed, we will introduce the mathematics in the later chapters. The exposition of this chapter is expected to give the reader an overview of the area of public-key cryptography and also the requisite motivation for learning the mathematical tools that follow.
As claimed at the outset of this chapter, it is rather difficult to give a precise definition of the term cryptography. The best way to understand it is by examples. In this section, we briefly describe the common problems that cryptography deals with.
To start with, we introduce the legendary figures of cryptography: Alice, Bob and Carol. Alice wants to send a message to Bob over a public communication channel like the Internet and wants to ensure that nobody other than Bob can make out the meaning of the message. A third party like Carol, who has access to the communication channel, can intercept the message. But the message should be wrapped or transformed before transmission in such a way that knowledge of some secret piece of information is needed to unwrap or transform back the message. It is Bob who has this information, but not Carol (nor Dorothy nor Emily nor . . .).
It is expedient to point out here that Alice, Bob and Carol need not be human beings. They can stand for organizations (like banks) or, more correctly, for computers or computer programs run by individuals or organizations. It is, therefore, customary to call them parties, entities or subjects instead of persons or characters. In the cryptology jargon, Carol has got several names used interchangeably: adversary, eavesdropper, opponent, intruder, attacker and enemy are the most common ones. When a message transmission like that just mentioned is involved, Alice is called the sender and Bob is called the receiver of the message.
It is a natural strategy to put the message in a box and lock the box using a key, called the encryption key. A matching decryption key is needed to unlock the box and retrieve the message. The process of putting the message in the box is commonly called encoding and that of locking the box is called encryption. The reverse processes, namely unlocking the box and taking the message out of the box are respectively called decryption and decoding. This is precisely the classical encryption–decryption protocol of cryptography.[1]
[1] Some people prefer to use the terms enciphering and deciphering in place of the words encryption and decryption respectively.
In the world of electronic communication, a message M is usually a bit string, and encoding, encryption, decryption and decoding are well-defined transformations of bit strings. If we denote by fe the transformation function consisting of encoding and encryption, then we get a new bit string C = fe(M, Ke), where Ke stands for the encryption key. This bit string C is sent over the communication channel. After Bob receives C, he uses the reverse transformation fd (decryption followed by decoding) to get the original message M back; that is, M = fd(C, Kd). Note that the decryption key Kd is needed as an argument to fd. If Carol does not know this, she cannot compute M. We conventionally call M the plaintext message and C the ciphertext message.
The encoding and decoding operations do not make use of keys and can be performed by anybody. (It should not be difficult to put a letter in or take a letter out of an unlocked box!) One might then wonder why it is necessary to do these transformations instead of applying the encryption and decryption operations directly on M and C respectively. With whatever we have discussed so far, we cannot give a full answer to this question. For the answer, we will need to wait until we reach the later chapters. We only mention here that the encryption algorithms often require as input some mathematical entities (like integers or elements of a field) which are logically not bit strings. But that’s not all! As we see later, the additional transformations often add to the security of the protocols. On the other hand, for a general discussion, it is often unnecessary to start from the encoding process and end at the decoding process. As a result, we will assume, unless otherwise stated, that M is the input to the encryption routine and the output of the decryption routine, in which case fe and fd stand for the encryption and decryption functions only.
In the simplest form of locking mechanism, one has Ke = Kd. That is, the same key, called the symmetric key or the secret key, is used for both encryption and decryption. Common examples of such symmetric-key algorithms include DES (Data Encryption Standard) together with its various modifications like the Triple DES and DES-X, IDEA (International Data Encryption Algorithm), SAFER (Secure And Fast Encryption Routine), FEAL (Fast Encryption Algorithm), Blowfish, RC5 and AES (Advanced Encryption Standard). We will not describe all these algorithms in this book. Interested readers can look at the abundant literature to know more about them.
The biggest disadvantage of using a secret-key system is that Alice and Bob must agree upon the key Ke = Kd secretly, for example by personal contact or over a secure channel. This is a serious limitation and is not often practical nor even possible. Another drawback of secret-key systems is that every pair of parties needs a key for communication. Thus, if there are n entities communicating over a net, the number of keys would be of the order of n2. Also, each entity has to remember O(n) keys for communicating with other entities. In practice, however, an entity does not communicate with every other entity on the net. Yet the total number of keys to be remembered by an entity could be quite high.
Both these problems can be avoided by using what is called an asymmetric-key or a public-key protocol. In such a protocol, each entity decides a key pair (Ke, Kd), makes the encryption key Ke public and keeps the decryption key Kd secret. Ke is also called the public key and Kd the private key. Anybody who wants to send a message to Bob gets Bob’s public key, encrypts the message with the key, and sends the ciphertext to Bob. Upon receiving the ciphertext, Bob uses his private key to decrypt the message. One may view such a lock as a self-locking padlock. Anybody can lock a box with a self-locking padlock, but opening it requires a key which only Bob possesses.
The source of security of such a system is based on the difficulty of computing the private key Kd given the public key Ke. It is apparent that Ke and Kd are sort of inverses of each other, because the former is used to generate C from M and the latter is used to generate M from C. This is where mathematics comes into the picture. We mention a few possible constructions of key pairs in the next section and the rest of the book deals with an in-depth study of these public-key protocols.
Attractive as it looks, public-key protocols have a serious drawback, namely they are orders of magnitude slower than their secret-key counterparts. This is of concern, if huge amounts of data need to be encrypted and decrypted. This shortcoming can be overcome by using both secret-key and public-key protocols in tandem as follows: Alice generates a secret key (say, for AES), encrypts the message by the secret key and the secret key by the public key of Bob and sends both the encrypted message and the encrypted secret key. Bob first decrypts the encrypted secret key using his private key and uses this decrypted secret key to decrypt the message. Since secret keys are usually short bit strings (most commonly of length 128 bits), the slow performance of the public-key algorithms causes little trouble. But at the same time, Alice and Bob are relieved of having a previous secret meeting or communication for agreeing on the secret key. Moreover, neither Alice nor Bob needs to remember the secret key. During every session of message transmission, a random secret key can be generated and later destroyed, when the communication is over.
There is an alternative method by which Alice and Bob can exchange secret information (like AES keys) over a public communication channel. Let us first see how this can be done in the physical lock-and-key scenario. Alice generates a secret, puts it in a box, locks the box with her own key and sends it to Bob. Bob, upon receiving the locked box, adds a second lock to it and sends the doubly locked box back to Alice. Alice then removes her lock and again sends the box to Bob. Finally, Bob uses his key to unlock the box and retrieve the secret. A third party (Carol) that can access the box during the three communications finds it locked by Alice or Bob or both. Since Carol does not possess the keys to these locks, she cannot open the box to discover the secret.
This process can be abstractly described as follows: Alice and Bob first independently generate key pairs (AKe, AKd) and (BKe, BKd) respectively. Alice then sends AKe to Bob and Bob sends BKe to Alice. The private keys AKd and BKd are not disclosed. They also agree upon a function g with which Alice computes gA = g(AKd, BKe) and Bob computes gB = g(BKd, AKe). If gA = gB, then this common value can be used as a shared secret between Alice and Bob.
Our intruder Carol knows g and taps the values of AKe and BKe. So the function g should be such that a knowledge of these values alone does not suffice for the computation of gA = gB. One of the private keys AKd or BKd is needed for the computation. Since (AKe, AKd) and (BKe, BKd) are key pairs, it is assumed that private keys are difficult to compute from the knowledge of the corresponding public keys.
Such a technique of exchanging secret values over an insecure channel is called a key-exchange or a key-agreement protocol. It is important to point out here that such a protocol is usually based on the public-key paradigm; that is to say, we do not know secret-key counterparts for a key-exchange protocol. Since a shared secret between the communicating parties is usually short, the low speed of public-key algorithms is really not a concern in this case.
A digital signature is yet another application of the public-key paradigm. Suppose Alice wants to sign a message M in such a way that the signature S can be verified by anybody but nobody other than Alice would be able to generate the signature S on the message M. This can be achieved as follows: Alice generates a key pair (Ke, Kd), makes Ke public and keeps Kd secret. She now uses the decryption function fd to generate the signature, that is, S = fd(M, Kd). The signature S is then made public. Anybody who has access to Alice’s public key Ke applies the reverse transformation fe to get back the message M = fe(S, Ke).
If Carol signs the message M with a different key
, then she generates the signature S′ = fd(M,
). Now, since
and Ke are not matching keys, verification using Ke gives M′ = fe(S′, Ke), which is different from M. If we assume that M is a message written in a human-readable language (like English), then M′ would generally look like a meaningless sequence of characters which is neither English nor any sensible string to a human reader. So the signature verifier would then immediately conclude that this is a case of forged signature.
Such a scheme of generating digital signatures is called a signature scheme with message recovery. It is obvious that this is the same as our encrypt–decrypt scheme with the sequence of encryption and decryption steps reversed. If the message M to be signed is quite long, using this algorithm calls for a large execution time both for signature generation and for verification. It is, therefore, customary to use another variant of signature schemes called signature schemes with appendix that we describe now.
Instead of applying the decryption transform directly on M, Alice first computes a short representative H(M) of her message M. Her signature now becomes the pair S = (M, σ), where σ = fd(H(M), Kd). Typically, a hash function (see Section 1.2.6) is used to compute the representative H(M) from M and is assumed to be a public knowledge. Now anybody can verify the signature by checking if the equality H(M) = fe(σ, Ke) holds. If a key different from Kd is used to generate the signature, one would (in general) get a value σ′ ≠ σ and the signature forging will be detected by observing that H(M) ≠ fe(σ′, Ke).
By entity authentication, we mean a process in which one entity called the claimant proves its identity to another entity called the prover. Entity-authentication techniques, thus, tend to prevent impersonation of an entity by an intruder. Both secret-key and public-key techniques are used for entity-authentication schemes.
The simplest example of an entity-authentication scheme is the use of passwords, as in a computer where a user (the claimant) tries to gain access to some resources in a computer (the prover) by proving its identity using a password. Password schemes are mostly based on secret-key techniques. For example, the UNIX password system is based on encrypting the zero message (a string of 64 zero bits) using a repeated application of a variant of the DES algorithm with 64 bits of the user input (the password) as the key. Password-based authentication schemes are fixed and time-invariant and are often called weak authentication schemes.
We see applications of public-key techniques in challenge–response authentication schemes (also called strong authentication schemes). Assume that an entity, Alice, wants to prove her identity to another entity, Bob. Alice generates a key pair (Ke, Kd), makes Ke public and keeps Kd secret. Now, Bob chooses a random message M, encrypts M using Alice’s public key—that is, computes C = fe(M, Ke)—and sends C to Alice. Alice, upon reception of C, decrypts it using her private key Kd; that is, she regenerates M = fd(C, Kd) and sends M to Bob. Bob compares this value of M with the one he generated, and if a match occurs, Bob becomes sure that the entity who is claiming to be Alice possesses the knowledge of Alice’s private key. If Carol uses any private key other than Kd for the decryption, she gets a message M′ different from M and thereby cannot prove to Bob her identity as Alice. This is how this scheme prevents impersonation of Alice by Carol.
Entity authentication is often carried out using another interesting technique called zero-knowledge proof. In such a protocol, the prover (or any third party listening to the conversation) gains no knowledge regarding the secret possessed by the claimant, but develops the desired confidence regarding the claim by the claimant of the possession of the secret. We provide here an informal example explaining zero-knowledge proofs.
Let us think of a circular cave as shown in Figure 1.1. The cave has two exits, left and right, denoted by L and R respectively. The cave also has a door inside it, which is invisible outside the cave. Alice (A) wants to prove to Bob (B) that she possesses a key to this door without showing him the key or the process of unlocking the door with the key. Bob stations himself somewhere outside the exits of the cave. Alice enters the cave and randomly chooses the left or right wing of the cave (and goes there). She does not disclose this choice to Bob, because Bob is not allowed to know the session secrets too. Once Alice is placed in the cave, Bob makes a random choice from L and R and asks Alice (using cell phones or by shouting loudly) to come out of the cave via that chosen exit. Suppose Bob challenges Alice to use L. If Alice is in the left wing, she can come out of the cave using L. If Alice is in the right wing, she must use her secret key to open the central door to come to the left wing and then go out using exit L. If Alice does not possess the secret key, she can succeed in obeying Bob’s directive with a probability of half. If this procedure is repeated t times, then the probability that Alice succeeds on all occasions without possessing the secret key is (1/2)t = 1/2t. By choosing t appropriately, Bob can make the probability of accepting a false claim arbitrarily small. For example, if t = 20, then the chance is less than one in a million that Alice can establish a false claim.

Thus, if Alice succeeds every time, Bob gains the desired confidence that Alice actually possesses the secret. However, during this entire process, Bob can obtain no information regarding Alice’s secrets (the key and the choices of wings). Another important aspect of this interaction is that Alice has no way of predicting Bob’s questions, preventing impostors (of Alice) from fooling Bob.
Suppose that a secret piece of information is to be distributed among n entities in such a way that n – 1 (or fewer) entities are unable to construct the secret. All of the n entities must participate to reveal the secret. As usual, let us assume that the secret is an l-bit string. A simple strategy would be to break the string into n parts and provide each entity with a part. This method is, however, not really attractive, because it gives partial information about the secret. Thus, for example, if a 256-bit long bit string is to be distributed equally among 16 entities, any 15 of them working together can reconstruct the secret by trying only 216 = 65536 possibilities for the unknown 16 bits.
We now describe an alternative strategy that does not suffer from this drawback. Once again, we break the secret string into n parts and consider the parts as integers a0, . . . , an–1. We construct the polynomial f(x) = xn+an–1xn–1 + · · · + a1x+a0 and give the integers f(1), f(2), . . . , f(n) to the entities. When all of the entities cooperate, the linear system of equations f(i) = in + an–1in–1 + · · · + a1i + a0, 1 ≤ i ≤ n, can be solved to find out the unknown coefficients a0, . . . , an–1 which, in turn, reveal the secret. On the other hand, if n – 1 or less entities cooperate, they get an underspecified system of equations in n unknowns, from which the actual solution is not readily available.
The secret-sharing problem can be generalized in the following way: to distribute a secret among n parties in such a way that any m or more of the parties can reconstruct the secret (for some m ≤ n), whereas any m – 1 or less parties cannot do the same. A polynomial of degree m as in the above example readily adapts to this generalized situation.
A function which converts bit strings of arbitrary lengths to bit strings of a fixed (finite) length is called a hash function. Hash functions play a crucial role in cryptography. We have already seen an application of it for designing a digital signature scheme with appendix. If H is a hash function, a pair of input values (strings) x1 and x2 for which H(x1) = H(x2) is called a collision for H. For any hash function H, collisions must exist, since H is a map from an infinite set to a finite set. However, for cryptographic purposes we want that collisions should be difficult to obtain. More specifically, a cryptographic hash function H should satisfy the following desirable properties:
Except for a small set of hash values y it should be difficult to find an input x with H(x) = y. We exclude a small set of values, because an adversary might prepare (and maintain) a list of pairs (x, H(x)) for certain values of x of her choice. If the given value of y is the second coordinate of one pair in her list, she can produce the corresponding input value x easily.
Given a pair (x, H(x)), it should be difficult to find an input x′ different from x with H(x) = H(x′).
It should be difficult to find two different input strings x, x′ with H(x) = H(x′).
Hash functions are also called message digests and can be used with a secret key. Popular examples of unkeyed hash functions are SHA-1, MD5 and MD2, whereas those for keyed hash functions include HMAC and CBCMAC.
So far we have seen several protocols which are based on the use of public keys of remote entities, but have never questioned the authenticity of public keys. In other words, it is necessary to ascertain that a public key is really owned by a remote entity. Public-key certificates are used to that effect. These are data structures that bind public-key values to entities. This binding is achieved by having a trusted certification authority digitally sign each certificate.
Typically a certificate is issued for a period of validity. However, it is possible that a certificate becomes invalid before its date of expiry for several reasons, like possible or suspected compromise of the private key. Under such circumstances it is necessary that the certification authority revokes the certificate and maintains a list called certificate revocation list (CRL) of revoked certificates. When Alice verifies the authenticity of Bob’s public-key certificate by verifying the digital signature of the authority and does not find the certificate in the CRL, she gains the desired confidence in using Bob’s public key.
The X 5.09 public-key infrastructure specifies Internet standards for certificates and CRLs.
In this section, we give a short introduction to the realization of public-key cryptosystems. More specifically, we list some of the computationally intensive mathematical problems and describe how the (apparent) intractability of these problems can be used for designing key pairs. We use some mathematical terms that we will introduce later in this book.
The security of the public-key cryptosystems is based on the presumed difficulty of solving certain mathematical problems.
Given the product n = pq of two distinct prime integers p and q, find p and q.
Let G be a finite cyclic (multiplicatively written) group with cardinality n and a generator g. Given an element
, find an integer x (or the integer x with 0 ≤ x ≤ n – 1) such that a = gx in G. Three different types of groups are commonly used for cryptographic applications: the multiplicative group of a finite field, the group of rational points on an elliptic curve over a finite field and the Jacobian of a hyperelliptic curve over a finite field. By an abuse of notation, we often denote the DLP over finite fields as simply DLP, whereas the DLP in elliptic curves and hyper-elliptic curves is referred to as the elliptic curve discrete logarithm problem (ECDLP) and the hyperelliptic curve discrete logarithm problem (HECDLP).
Let G and g be as above. Given elements ga and gb of G, compute the element gab. As in the case of the DLP, the DHP can be applied to the multiplicative group of finite fields, the group of rational points on an elliptic curve and the Jacobian of a hyperelliptic curve.
We show in the next section how (the intractability of) these problems can be exploited to create key pairs for various cryptosystems. These computational problems are termed difficult, intractable, infeasible or intensive in the sense that there are no known algorithms to solve these problems in time polynomially bounded by the input size. The best-known algorithms are subexponential or even fully exponential in some cases. This means that if the input size is chosen to be sufficiently large, then it is infeasible to compute the private key from a knowledge of the public key in a reasonable amount of time. This, in turn, implies (not provably, but as the current state of the art stands) that encryption or signature verification can be done rather quickly (in polynomial time), but the converse process of decryption or signature generation cannot be done in feasible time, unless one knows the private key. As a result, encryption (or signature verification) is called a trapdoor one-way function, that is, a function which is easy to compute but for which the inverse is computationally infeasible, unless some additional information (the trapdoor) is available.
It is, however, not known that these problems are really computationally infeasible, that is, there is no proof of the fact that these problems cannot be solved in polynomial time. As a result, the public-key cryptographic systems based on these problems are not provably secure.
In RSA and similar cryptosystems, one generates two (distinct) suitably large primes p and q and computes the product n = pq. Then φ(n) = (p – 1)(q – 1), where φ denotes Euler’s totient function. One then chooses a random integer e with gcd(e, φ(n)) = 1. There exists an integer d such that ed ≡ 1 (mod φ(n)). The integer e is used as the public key, whereas the integer d is used as the private key.
If the IFP can be solved fast, one can also compute φ(n) easily, and subsequently d can be computed from e using the (polynomial-time) extended GCD algorithm. This is why[2] we say that the RSA cryptosystem derives its security from the intractability of the IFP.
[2] The problem of factoring n = pq is polynomial-time equivalent to computing φ(n) = (p – 1)(q – 1).
In order to see how RSA encryption and decryption work, let the plaintext message be encoded as an integer m with 2 ≤ m < n. The ciphertext message is generated (as an integer) as c = me (mod n). Decryption is analogous, that is, m = cd (mod n). The correctness of the algorithm follows from the fact that ed ≡ 1 (mod φ(n)). It is, however, not proved that one has to know d or φ(n) or the factorization of n in order to decrypt an RSA-encrypted message. But at present no better methods are known.
Let us now consider the discrete logarithm problem. Let G be a finite cyclic multiplicative group (as those mentioned above) where it is easy to multiply two elements, but where it is difficult to compute discrete logarithms. Let g be a generator of G. In order to set up a random key pair over such a group, one chooses the private key as a random integer d, 2 ≤ d < n, where n is the cardinality of G. The public key e is then computed as an element of G as e = gd.
Applications of encryption–decryption schemes based on the key pair (gd, d) are given in Chapter 5. Now, we only remark that many such schemes (like the ElGamal scheme) derive their security from the DHP instead of the DLP, whereas the other schemes (like the Nyberg–Rueppel scheme) do so from the DLP. It is assumed that these two problems are computationally equivalent (at least for the groups of our interest). Obviously, if one assumes availability of a solution of the DLP, one has a solution for the DHP too (gab = (ga)b). The reverse implication is not clear.
As we pointed out earlier, (most of) the public-key cryptosystems are not provably secure in the sense that they are based on the apparent difficulty of solving certain computational problems. It is expedient to know how difficult these problems are. No non-trivial complexity–theoretic statements are available for these problems, and as such it is worthwhile to study the algorithms known till date to solve these problems. Unfortunately, however, many of the algorithms of this kind are often much more complicated than the algorithms for building the corresponding cryptographic systems. One needs to acquire more mathematical machinery in order to understand (and augment) these cryptanalytic algorithms. We devote Chapter 4 to a detailed discussion on these algorithms.
In specific situations, one need not always use these computationally intensive algorithms. Access to a party’s decryption equipment may allow an adversary to gain partial or complete information about the private key by watching a decryption process. For example, an adversary (say, the superuser) might have the capability to read the contents of the memory holding a private key during some decryption process. For another possibility, think of RSA decryption which involves a modular exponentiation. If the standard square-and-multiply algorithm (Algorithm 3.9) is used for this purpose and the adversary can tap some hardware details (like machine cycles or power fluctuations) during a decryption process, she can guess a significant number of the bits in the private key. Such attacks, often called side-channel attacks, are particularly relevant for cryptographic applications based on smart cards.
A cryptographic system is (believed to be) strong if and only if there are no good known mechanisms to break it. It is, therefore, for the sake of security that we must study cryptanalysis. Cryptography and cryptanalysis are deeply intertwined and a complete study of one must involve the other.
In cryptology, there are different models of attacks or attackers.
So far we have assumed that an adversary can only read messages during transmission over a channel. Such an adversary is called a passive adversary. An active adversary, on the other hand, can mutilate or delete messages during transmission and/or generate false messages. An attack mounted by an active (resp.[3] a passive) adversary is called an active (resp. a passive) attack. In this book, we will mostly concentrate on passive attacks.
[3] Throughout the book, resp. stands for respectively.
A two-party communication involves transmission of ciphertext messages over a communication channel. A passive attacker can read these ciphertext messages. In practice, however, an attacker might have more control over the choice of ciphertext and/or plaintext messages. Based on these capabilities of the attacker we have the following types of attacks.
This is the weakest model of the adversary. Here the attacker has absolutely no choices on the ciphertext messages that flow in the channel and also on the corresponding plaintext messages. Using only these ciphertext messages the attacker has to obtain a private key and/or a plaintext message corresponding to a new ciphertext message.
In this kind of attack (also called known-plaintext or known-ciphertext attack), the attacker uses her knowledge of some plaintext–ciphertext pairs. If many such pairs are available to the attacker, she can use these pairs to deduce a pattern based on which she can subsequently gain some information on a new plaintext for which the ciphertext is available. In a public-key scheme, the adversary can generate as many such pairs as she wants, because in order to generate such a pair it is sufficient to have a knowledge of the receiver’s public key. Thus a public-key encryption scheme must provide sufficient security against known plaintext attacks.
In this kind of attack, the attacker knows some plaintext–ciphertext pairs in which the plaintexts are chosen by the attacker. As discussed earlier, such an attack is easily mountable for a public-key encryption scheme.
This is similar to the chosen-plaintext attack with the additional possibility that the attacker chooses the plaintexts in the known plaintext–ciphertext pairs sequentially and adaptively based on the knowledge of the previous pairs. This kind of attack can be easily mounted on public-key encryption systems.
The attacker has knowledge of some plaintext–ciphertext pairs in which the ciphertexts are chosen by the attacker. Such an attack is not directly mountable on a public-key scheme, since obtaining a plaintext from a chosen ciphertext requires knowledge of the private key. However, if the attacker has access to the receiver’s decryption equipment, the machine can divulge the plaintexts corresponding to the ciphertexts that the attacker supplies to the machine. In this context, we assume that the machine does not reveal the private key itself, that is, it has the key stored secretly somewhere in its hardware which the attacker cannot directly access. However, the attacker can run the machine to know the plaintexts corresponding to the ciphertexts of her choice. Later (when the attacker no longer has access to the decryption equipment) the known pairs may be exploited to obtain information about the plaintext corresponding to a new ciphertext.
This is similar to the chosen-ciphertext attack with the additional possibility that the attacker chooses the ciphertexts in the known pairs sequentially and adaptively based on her knowledge of the previously generated plaintext–ciphertext pairs. This attack is mountable in a scenario described in connection with chosen-ciphertext attacks.
For a digital signature scheme, there are equivalent names for these types of attacks. The attacker is assumed to have access to the public key of the signer, because this key is used for signature verification. An attempt to forge signatures based only on the knowledge of this verification key is called a key-only attack. The adversary may additionally possess knowledge of some message–signature pairs. An attack based on this knowledge is called a known-pair or known-message or known-signature attack. If the messages are chosen by the adversary, we call the attack a chosen-message attack. If the adversary generates the sequence of messages in a chosen-message attack adaptively (based on the previously generated message–signature pairs), we have an adaptive chosen-message attack. An (adaptive or non-adaptive) chosen-message attack can be mounted, if the attacker gains access to the signer’s signature generation equipment, or if the signer is willing to sign arbitrary messages provided by the adversary.
The attacker can choose some signatures and generate the corresponding messages by encrypting them with the signer’s public key. The private-key operation on these messages generates the signatures chosen by the attacker. This gives chosen-signature and adaptive chosen-signature attacks on a digital signature scheme. Now the adversary cannot directly control the messages to sign. On the other hand, such an attack is easily mountable, because it utilizes only some public knowledge (the signer’s public key). Indeed, one may treat chosen-signature attacks as variants of key-only attacks.
So far, we have assumed that all the parties connected to a network know the algorithms used in a cryptographic scheme. The security of the scheme is based on the difficulty of obtaining some secret information (the secret or private key).
It, however, remains possible that two parties communicate using an algorithm unknown to other entities. Top-secret communications (for example, during wars or diplomatic transactions) often use private cryptographic algorithms. In this book, we will not deal with such techniques. Our attention is focused mostly on Internet applications in which public knowledge of the algorithms is of paramount importance (for the sake of universal applicability and convenience).
In short, this book is going to deal with a world in which only public public-key algorithms are deployed and in which adversaries are usually passive. A restricted model of the world though it may be, it is general and useful enough to concentrate on. Let us begin our journey!
This chapter provides an overview of the problems that cryptology deals with. The first and oldest cryptographic primitive is encryption for secure transmission of messages. Some other primitives are key exchange, digital signature, authentication, secret sharing, hashing, and digital certificates. We then highlight the difference between symmetric (secret-key) and asymmetric (public-key) cryptography. The relevance of some computationally intractable mathematical problems in public-key cryptography is discussed next, and the working of a prototype public-key cryptosystem (RSA) is explained. We finally discuss different models of attacks on cryptosystems.
Not uncommonly, some people think that cryptology also deals with intrusion, viruses, and Trojan horses. We emphasize that this is never the case. Data and network security is the branch that deals with these topics. Cryptography is also a part of this branch, but not conversely. Imagine that your house is to be secured against theft. First, you need a good lock—that is, cryptography. However, a lock has nothing to prevent a thief from entering the house after breaking the window panes. A bad butler who leaks secret information of the house to the outside world also does not come under the jurisdiction of the lock. Securing your house requires adopting sufficient guards against all these possibilities of theft. In this book, we will study only the technology of manufacturing and breaking locks.
| 2.1 | Introduction |
| 2.2 | Sets, Relations and Functions |
| 2.3 | Groups |
| 2.4 | Rings |
| 2.5 | Integers |
| 2.6 | Polynomials |
| 2.7 | Vector Spaces and Modules |
| 2.8 | Fields |
| 2.9 | Finite Fields |
| 2.10 | Affine and Projective Curves |
| 2.11 | Elliptic Curves |
| 2.12 | Hyperelliptic Curves |
| 2.13 | Number Fields |
| 2.14 | p-adic Numbers |
| 2.15 | Statistical Methods |
| Chapter Summary | |
| Sugestions for Further Reading |
Young man, in mathematics you don’t understand things, you just get used to them.
—John von Neumann
Mathematics contains much that will neither hurt one if one does not know it nor help one if one does know it.
—J. B. Mencken
Mathematics is the Queen of Science but she isn’t very pure; she keeps having babies by handsome young upstarts and various frog princes.
—Donald Kingsbury
In this chapter, we introduce the basic mathematical concepts that one should know in order to understand the public-key cryptographic protocols and the corresponding cryptanalytic algorithms described in the later chapters. If the reader is already familiar with these concepts, she may quickly browse through the chapter in order to know about our notations and conventions.
This chapter is meant for cryptology students and as such does not describe the mathematical topics in their full generality. It is our intention only to state (and, if possible, prove) the relevant results that would be useful for the rest of the book. For further study, we urge the reader to consult the books suggested at the end of this chapter.
Sets are absolutely basic entities used throughout the present-day study of mathematics. Unfortunately, however, we cannot define sets. Loosely speaking, a set is an (unordered) collection of objects. But we run into difficulty with this definition for collections that are too big. Of course, infinite sets like the set of all integers or real numbers are not too big. However, a collection of all sets is too big to be called a set. (Also see Exercise 2.6.) It is, therefore, customary to have an axiomatic definition of sets. That is to say, a collection qualifies to be a set if it satisfies certain axioms. We do not go into the details of this axiomatic definition, but tell the axioms as properties of sets. Luckily enough, we won’t have a chance in the rest of this book to deal with collections that are not sets. So the reader can, for the time being, have faith in the above (wrong) identification of a set as a collection.
An object in a set is commonly called an element of A. By the notation
, we mean that a is an element of the set A. Often a set A can be represented explicitly by writing down its elements within curly brackets or braces. For example, A = {2, 3, 5, 7} denotes the set consisting of the elements 2, 3, 5, 7 which are incidentally all the (positive) prime numbers less than 10. We often use the ellipsis sign (. . .) to denote an infinite (or even a finite) set. For example,
would denote the set of all (positive) prime numbers. (We prove later that
is an infinite set.) Alternatively, we often describe a set by mentioning the properties of the elements of the set. For example, the set
can also be described as
.
Some frequently occurring sets are denoted by special symbols. We list a few of them here.
The cardinality of a set A is the number of elements in A. We use the symbol #A to denote the cardinality of A. If #A is finite, we call A a finite set. Otherwise A is said to be infinite. The empty set has cardinality zero.
Let A and B be two sets. We say that A is a subset of B and denote this as A ⊆ B, if all elements of A are in B. Two sets A and B are equal (that is, A = B) if and only if A ⊆ B and B ⊆ A. A is said to be a proper subset of B (denoted
), if A ⊆ B and A ≠ B (that is, B ⊈ A).
The union of A and B is the set whose elements are either in A or in B (or both). This set is denoted by A ∪ B. The intersection of A and B is the set consisting of elements that are common to A and B. The intersection of A and B is denoted by A ∩ B. If
, then we say that A and B are disjoint. In that case, the union A∪B is also called a disjoint union and is referred to as by A⊎B. (For a generalization, see Exercise 2.7.) The difference of A and B, denoted A \ B, is the set whose elements are in A but not in B. If A is understood from the context and B ⊆ A, then we denote A \ B by
and refer to
as the complement of B (in A). The product A × B of two sets A and B is the set of all ordered pairs (a, b) where
and
.
The notion of union, intersection and product of sets can be readily extended to an arbitrary family of sets. Let Ai,
, be a family of sets indexed by I. In this case, we denote the union and intersection of Ai,
, by
and
respectively. The product of Ai,
, is denoted by
. When Ai = A for all
, we denote the product also as AI. If, in addition, I is a finite set of cardinality n, then the product AI is also written as An.
A relation ρ on a set A is a subset of A × A. For
, we usually say a ρ b implying that a is related by ρ to b. Common examples are the standard relations =, ≠, ≤, <, ≥, > on
(or
or
).
A relation ρ on a set A is called reflexive, if a ρ a for all
. For example, =, ≤ and ≥ are reflexive relations on
, but the relations ≠, <, > are not.
A relation ρ on A is called symmetric, if a ρ b implies b ρ a. On the other hand, ρ is called anti-symmetric if a ρ b and b ρ a imply a = b. For example, = is symmetric and anti-symmetric, <, ≤, > and ≥ are anti-symmetric but not symmetric, ≠ is symmetric but not anti-symmetric.
A relation ρ on A is called transitive if a ρ b and b ρ c imply a ρ c, For example, =, <, ≤, >, ≥ are all transitive, but ≠ is not transitive.
An equivalence relation is one which is reflexive, symmetric and transitive. For example, = is an equivalence relation on
, but neither of the other relations mentioned above (≠, <, ≥ and so on) is an equivalence relation on
.
A partition of a set A is a collection of pairwise disjoint subsets Ai,
, of A, such that
, that is, A is the union of Ai,
, and for i,
, i ≠ j,
. The following theorem establishes an important connection between equivalence relations and partitions.
|
An equivalence relation on a set A produces a partition of A. Conversely, every partition of a set A corresponds to an equivalence relation on A. Proof Let ρ be an equivalence relation on a set A. For Conversely, let Ai, |
The subset [a] of A defined in the proof of the above theorem is called the equivalence class of a with respect to the equivalence relation ρ.
An anti-symmetric and transitive relation is called a partial order (or simply an order). All of the relations =, ≤, <, ≥, > are partial orders on
(but ≠ is not). A partial order ρ on A is called a total order or a linear order or a simple order, if for every a,
, a ≠ b, either a ρ b or b ρ a. For example, if we take A = {1, 2, 3} and the relation ρ = {(1, 2), (1, 3)}, then ρ is a partial order but not a total order (because it does not specify a relation between 2 and 3). On the other hand, ρ′ = {(1, 2), (1, 3), (2, 3)} is a total order. A set with a partial (resp. total) order is often called a partially ordered (resp. totally ordered or linearly ordered or simply ordered) set.
Let A and B two sets (not necessarily distinct). A function or a map f from A to B, denoted f : A → B, assigns to each
some element
. In this case, we write b = f(a) or f maps a ↦ b and say that b is the image of a (under f). For example, if
, then the assignment a ↦ a2 is a function. On the other hand, the assignment
(the non-negative square root) is not a function, because it is not defined for negative values of a. However, if
and
, then the assignment
(with non-negative real and imaginary parts) is a function.
The function f : A → A assigning a ↦ a for all
is called the identity map on A and is usually denoted by idA. On the other hand, if f : A → B maps all the elements of A to a fixed element of B, then f is said to be a constant function. A function which is not constant is called a non-constant function.
A function f : A → B that maps different elements of A to different elements of B is called injective or one-one. In other words, we call f to be injective if and only if f(a) = f(a′) implies a = a′. The function
given by a ↦ a2 is not injective, since f(–a) = f(a) for all
. On the other hand, the function
given by a ↦ 2a is injective. An injective map f : A → B is sometimes denoted by the special symbol f : A ↪ B.
The image of a function f : A → B is defined to be the following subset of
. It is denoted by f(A) or by Im f. The function f is said to be surjective or onto or a surjection, if Im f = B, that is, every element b of B has at least one preimage
(which means f(a) = b). As an example, the function
given by a ↦ a/2 (if a is even) and by a ↦ (a – 1)/2 (if a is odd) is surjective, whereas the function
that maps a → |a| (the absolute value) is not surjective. A surjective map f : A → B is sometimes denoted by the special symbol f : A ↠ B.
A map f : A → B is called bijective or a bijection, if it is both injective and surjective. For example, the identity map on a set is bijective. Another example of a bijective function is
that maps a to the ath prime.
Let f : A → B and g : B → C be functions. The composition of f and g is the function from A to C that takes a ↦ g(f(a)). It is denoted by g ο f, that is, (g ο f)(a) = g(f(a)). Note that in the notation g ο f one applies f first and then g. The notion of composition of functions can be extended to more than two functions. In particular, if f : A → B, g : B → C and h : C → D are functions, then (h ο g) ο f and h ο (g ο f) are the same function from A to D, so that we can unambiguously write this as h ο g ο f.
The study of mathematics is based on certain axioms. We state four of these axioms. It is not possible to prove the axioms independently, but it can be shown that they are equivalent in the sense that each of them can be proved, if any of the others is assumed to be true.
Let A be a partially ordered set under the relation
. An element
is called maximal (resp. minimal), if there is no element
, b ≠ a, that satisfies
(resp.
). Let B be a non-empty subset of A. Then an upper bound (resp. a lower bound) for B is an element
such that
(resp.
) for all
. If an upper bound (resp. a lower bound) a of B is an element of B, then a is called a last element or a largest element or a maximum element (resp. a first element or a least element or a smallest element or a minimum element) of B. By antisymmetry, it follows that a first (resp. last) element of B, if existent, is unique. A chain of A is a totally ordered (under
) subset of A.
Consider the sets
,
and
with the natural order ≤. Neither of these sets contains a maximal element.
contains a minimal element 1, but
and
do not contain minimal elements. The subset
of even natural numbers has two lower bounds, namely 1 and 2, of which 2 is the first element of
.
A totally ordered set A is said to be well ordered (and the relation is called a well order), if every non-empty subset B of A contains a first element.
|
Every set A can be well ordered, that is, there is a relation |
The set
is well-ordered under the natural relation ≤. The set
can be well ordered by the relation
defined as
. A well ordering of
is not known.
|
Let A be a partially ordered set. If every chain of A has an upper bound (in A), then A has at least one maximal element. |
To illustrate Zorn’s lemma, consider any non-empty set A and define
to be the set of all subsets of A.
is called the power set of A and is partially ordered under containment ⊆. A chain of
is a set
of subsets of A such that for all i,
either Ai ⊆ Aj or Aj ⊆ Ai. Clearly, the union
is an upper bound of the chain. Then Zorn’s lemma guarantees that
has at least one maximal element. In this case, the maximal element, namely A, is unique. If A is finite, then for the set
of all proper subsets of A, a maximal element (under the partial order ⊆) exists by Zorn’s lemma, but is not unique, if #A > 1.
|
Let |
Finally, let A be a set and
, that is,
is the set of all non-empty subsets of A. A choice function of A is a function
such that for every
we have
.
|
Every set has a choice function. |
So far we have studied sets as unordered collections. However things start getting interesting if we define one or more binary operations on sets. Such operations define structures on sets and we compare different sets in light of their respective structures. Groups are the first (and simplest) examples of sets with binary operations.
|
A binary operation on a set A is a map from A × A to A. If ◊ is a binary operation on A, it is customary to write a ◊ a′ to denote the image of (a, a′) (under ◊). |
For example, addition, subtraction and multiplication are all binary operations on
(or
or
). Subtraction is not a binary operation on
, since, for example, 2 – 3 is not an element of
. Division is not a binary operation on
, since division by zero is not defined. Division is a binary operation on
.
|
A group[1] (G, ◊) is a set G together with a binary operation ◊ on G, that satisfy the following three conditions:
|
A group (G, ◊) is also written in short as G, when the operation ◊ is understood from the context. More often than not, the operation ◊ is either addition (+) or multiplication (·) in which cases we also say that G is respectively an additive or a multiplicative group. For a multiplicative group, we often omit the multiplication sign and denote a · b simply as ab. The identity in an additive group is usually denoted by 0, whereas that in a multiplicative group by 1. The inverse of an element a in these cases are denoted respectively by –a and a–1. Groups written additively are usually Abelian, but groups written multiplicatively need not be so.
Note that associativity allows us to write a ◊ b ◊ c unambiguously to represent (a ◊ b) ◊ c = a ◊ (b ◊ c). More generally, if
, then a1 ◊ ··· ◊ an represents a unique element of the group irrespective of how we insert brackets to compute the element a1 ◊ ··· ◊ an.
|
|
Let (G, ◊) be a group. For subsets A and B of G, we denote by A ◊ B the set
. In particular, if A = {a} (resp. B = {b}), then A ◊ B is denoted by a ◊ B (resp. A ◊ b). Note that the sets A ◊ B and B ◊ A are not necessarily equal. If G is Abelian, then A ◊ B = B ◊ A.
|
Let (G, ◊) be a group, H a subgroup of G and |
From now onward, we consider left cosets only and call them cosets. If the underlying group is Abelian, then they are the same thing. The theory of right cosets can be parallelly developed, but we choose to omit that here. For simplicity, we also assume that the group G is a multiplicative group, so that the operation ◊ would be replaced by · (or by mere juxtaposition).
|
Let G be a (multiplicative) group and H a subgroup of G. Then, the cosets aH, Proof We define a relation ~ on G such that a ~ b if and only if Now we define a map |
The following theorem is an important corollary to the last proposition.
|
Let G be a finite group and H a subgroup of G. Then, the cardinality of G is an integral multiple of the cardinality of H. Proof From Proposition 2.2, the cosets form a partition of G and there is a bijective map from one coset to another. Hence by Exercise 2.3 all cosets have the same cardinality. Finally, note that H is the coset of the identity element. |
|
Let G be a group and H a subgroup of G. The number of distinct cosets of H in G is called the index of H in G and is denoted by [G : H]. If G is finite, then [G : H] = #G/#H. |
|
Let H be a subgroup of a (multiplicative) group G. Then H is called a normal subgroup of G, if (aH)(bH) = (abH) for all a, If H is a normal subgroup of a group G, then the cosets aH, |
|
|
Let (G, ◊) and (G′, ⊙) be groups. A function f : G → G′ is called a homomorphism (of groups), if f(a ◊ b) = f(a) ⊙ f(b) for all a, A group homomorphism f : G → G′ is called an isomorphism, if there exists a group homomorphism g : G′ → G such that g ο f = idG and f ο g = idG′. It can be easily seen that a homomorphism f : G → G′ is an isomorphism if and only if f is bijective as a function.[2] If there exists an isomorphism f : G → G′, we say that the groups G and G′ are isomorphic and write G ≅ G′.
A homomorphism f from G to itself is called an endomorphism (of G). An endomorphism which is also an isomorphism is called an automorphism. The set of all automorphisms of a group G is a group under function composition. We denote this group by Aut G. |
|
|
Let f be a group homomorphism from (G, ◊) to (G′, ⊙). Let e and e′ denote the identity elements of G and G′ respectively. Then f(e) = e′. If a, Proof We have e′ ⊙ f(e) = f(e) = f(e ◊ e) = f(e) ⊙ f(e), so that by right cancellation f(e) = e′. To prove the second assertion we note that c ⊙ d = e′ = f(e) = f(a ◊ b) = f(a) ⊙ f(b) = c ⊙ f(b). Thus f(b) = d. |
|
Ker f is a normal subgroup of G, Im f is a subgroup of G′, and G/ Ker f ≅ Im f. Proof In order to simplify notations, let us assume that G and G′ are multiplicatively written groups. For u, Now define a map |
|
Let G be a group. In this section, we assume, unless otherwise stated, that G is multiplicatively written and has identity e. Let ai,
with the empty product (corresponding to r = 0) being treated as e. It is easy to check that H is a subgroup of G and contains all ai, |
|
With these notations we prove the following important proposition.
|
The order m := ordG a of Proof Let H be the (cyclic) subgroup of G generated by a. Then by Example 2.5 H = {ar | r = 0, . . . , m – 1} and m is the smallest of the positive integers r for which ar = e. By Lagrange’s theorem (Theorem 2.2), n is an integral multiple of m. That is, n = km for some |
|
Let G be a finite cyclic multiplicative group with identity e and let H be a subgroup of order m. Then an element Proof If |
Finite cyclic groups play a crucial role in public-key cryptography. To see how, let G be a group which is finite, cyclic with generator g and multiplicatively written. Given
one can compute gr using ≤ 2 lg r + 2 group multiplications (See Algorithms 3.9 and 3.10). This means that if it is easy to multiply elements of G, then it is also easy to compute gr. On the other hand, there are certain groups for which it is very difficult to find out the integer r from the knowledge of g and gr, even when one is certain that such an integer exists. This is the basic source of security in many cryptographic protocols, like those based on finite fields, elliptic and hyperelliptic curves.
Sylow’s theorem is a powerful tool for studying the structure of finite groups. Recall that if G is a finite group of order n and if H is a subgroup of G of order m, then by Lagrange’s theorem m divides n. But given any divisor m′ of n, there need not exist a subgroup of G of order m′. However, for certain special values of m′, we can prove the existence of subgroups of order m′. Sylow’s theorem considers the case that m′ is a power of a prime.
|
Let G be a finite group of cardinality n and let p be a prime. If n = pr for some |
We shortly prove that p-Sylow subgroups always exist. Before doing that, we prove a simpler result.
Now we are in a position to prove the general theorem.
|
Let G be a finite group of order n and let p be a prime dividing n. Then there exists a p-Sylow subgroup of G. Proof We proceed by induction on n. If n = p, then G itself is a p-Sylow subgroup of G. So we assume n > p and write n = prm, where p does not divide m. If r = 1, then the theorem follows from Cauchy’s theorem (Theorem 2.4). So we assume r > 1 and consider the class equation of G, namely, |
Note that if H is a p-Sylow subgroup of G and
, then gHg–1 is also a p-Sylow subgroup of G. The converse is also true, that is, if H and H′ are two p-Sylow subgroups of G, then there exists a
such that H′ = gHg–1. We do not prove this assertion here, but mention the following important consequence of it. If G is Abelian, then H′ = gHg–1 = gg–1H = H, that is, there is only one p-Sylow subgroup of G. If G is Abelian and
with pairwise distinct primes pi and with
, then G is the internal direct product of its pi-Sylow subgroups, i = 1, . . . , t (Exercises 2.17 and 2.19).
| 2.8 | Let G be a multiplicatively written group (not necessarily Abelian). Prove the following assertions. |
| 2.9 | Let G be a multiplicatively written group and let H and K be subgroups of G. Show that:
|
| 2.10 |
|
| 2.11 | Let G be a (multiplicative) group.
|
| 2.12 | |
| 2.13 | Let H be a subgroup of G generated by ai, . Show that H is the smallest subgroup of G, that contains all of ai, .
|
| 2.14 | Let be a homomorphism of (multiplicative) groups. Show that:
|
| 2.15 | Let G be a cyclic group. Show that G is isomorphic to or to for some depending on whether G is infinite or finite.
|
| 2.16 | Let G be a finite (multiplicative) group (not necessarily Abelian).
|
| 2.17 | Let G be a (multiplicative) Abelian group with identity e and order , where pi are distinct primes and . For each i, let Hi be the pi-Sylow subgroup of G. Show that:
|
| 2.18 | Let G be a finite (multiplicative) Abelian group with identity e. Assume that for every there are at most n elements x of G satisfying xn = e. Show that G is cyclic. [H]
|
| 2.19 | Let G be a (multiplicative) group and let H1, . . . , Hr be normal subgroups of G. If G = H1 · · · Hr and every element can be written uniquely as g = h1 · · · hr with , then G is called the internal direct product of H1, . . . , Hr. (For example, if G is finite and Abelian, then by Exercise 2.17 it is the internal direct product of its Sylow subgroups.) Show that:
|
| 2.20 | Let Hi, i = 1, . . . , r, be finite Abelian groups of orders mi and let H := H1 × · · ·× Hr be their direct product. Show that H is cyclic if and only if each Hi is cyclic and m1, . . . , mr are pairwise coprime. |
So far we have studied algebraic structures with only one operation. Now we study rings which are sets with two (compatible) binary operations. Unlike groups, these two operations are usually denoted by + and · . One can, of course, go for general notations for these operations. However, that generalization doesn’t seem to pay much, but complicates matters. We stick to the conventions.
|
A ring (R, +, ·) (or R in short) is a set R together with two binary operations + and · on R such that the following conditions are satisfied. As in the case of multiplicative groups we write ab for a · b.
|
Notice that it is more conventional to define a ring as an algebraic structure (R, +, ·) that satisfies conditions (1), (2) and (5) only. A ring (by the conventional definition) is called a commutative ring (resp. a ring with identity), if it (additionally) satisfies condition (3) (resp. (4)). As per our definition, a ring is always a commutative ring with identity. Rings that are not commutative or that do not contain the identity element are not used in the rest of the book. So let us be happy with our unconventional definition of a ring.[3]
[3] Cool! But what’s circular in a ring? Historically, such algebraic structures were introduced by Hilbert to designate a Zahlring (a number ring, see Section 2.13). If α is an algebraic integer (Definition 2.95) and we take a Zahlring of the form
and consider the powers α, α2, α3, . . . , we eventually get an αd which can be expressed as a linear combination of the previous (that is, smaller) powers of α. This is perhaps the reason that prompted Hilbert to call such structures “rings”. Also see Footnote 1.
We do not rule out the possibility that 0 = 1 in R. In that case, for any
, we have a = a · 1 = a · 0 = 0 (See Proposition 2.6), that is to say, the set R consists of the single element 0. In this case, R is called the zero ring and is denoted (by an abuse of notation) by 0.
Finally, note that R is, in general, not a group under multiplication. This is because we do not expect a ring R to contain the multiplicative inverse of every element of R. Indeed the multiplicative inverse of the element 0 exists if and only if R = 0.
|
|
Let R be a ring. For all a,
Proof
|
|
|
Let R be a ring.
|
|
A field is an integral domain. Proof Recall from Definition 2.13 that an element in a ring cannot be simultaneously a unit and a zero-divisor. |
|
Let R be a non-zero ring. The characteristic of R, denoted char R, is the smallest positive integer n such that 1 + 1 + · · · + 1 (n times) = 0. If no such integer exists, then we take char R = 0. |
,
,
and
are rings of characteristic zero. If R is a non-zero finite ring, then the elements 1, 1 + 1, 1 + 1 + 1, · · · cannot be all distinct. This shows that there are positive integers m and n, m < n, such that 1+1+· · · + 1 (n times) = 1 + 1 + · · · + 1 (m times). But then 1 + 1 + · · · + 1 (n – m times) = 0. Thus any non-zero finite ring has positive (that is, non-zero) characteristic. If char R = t is finite, then for any
one has
.
In what follows, we will often denote by n the element 1 + 1 + · · · + 1 (n times) of any ring. One should not confuse this with the integer n. One can similarly identify a negative integer –n with the ring element –(1 + 1 + · · · + 1)(n times) = (–1) + (–1) + · · · + (–1)(n times).
|
Let R be an integral domain of positive characteristic p. Then p is a prime. Proof If p is composite, then we can write p = mn with 1 < m < p and 1 < n < p. But then p = mn = 0 (in R). Since R is an integral domain, we must have m = 0 or n = 0 (in R). This contradicts the minimality of p. |
Just as we studied subgroups of groups, it is now time to study subrings of rings. It, however, turns out that subrings are not that important for the study of rings as the subsets called ideals are. In fact, ideals (and not subrings) help us construct quotient rings. This does not mean that ideals are “normal” subrings! In fact, ideals are, in general, not subrings at all, and conversely. The formal definitions are waiting!
is a subring of
,
and
, whereas
and
are field extensions.
We demand that a ring always contains the multiplicative identity (Definition 2.12). This implies that if S is a subring of R, then for all integers n, the elements
are also in S (though they need not be pairwise distinct). Similarly, if R and S are fields, then S contains all the elements of the form mn–1 for m,
,
(cf. Exercise 2.26). Thus
, the set of all even integers, is not a subring of
, though it is a subgroup of (
, +) (Example 2.2).
|
Let R be a ring. A subset
|
In this book, we will use Gothic letters (usually lower case) like
,
,
,
,
to denote ideals.[5]
[5] Mathematicians always run out of symbols. Many believe if it is Gothic, it is just ideal!
The condition for being an ideal is in one sense more stringent than that for being a subring, that is, an ideal has to be closed under multiplication by any element of the entire ring. On the other hand, we do not demand an ideal to necessarily contain the identity element 1. In fact,
is an ideal of
. Conversely,
is a subring of
but not an ideal. Subrings and ideals are different things.
|
|
The only ideals of a field are the zero ideal and the unit ideal. Proof By definition, every non-zero element of a field is a unit. |
|
Let R be a ring and ai, An integral domain every ideal of which is principal is called a principal ideal domain or PID in short. A ring every ideal of which is finitely generated is called Noetherian. Thus principal ideal domains are Noetherian. |
Note that an ideal may have different generating sets of varying cardinalities. For example, the unit ideal in any ring is principal, since it is generated by 1. The integers 2 and 3 generate the unit ideal of
, since
. However, neither 2 nor 3 individually generates the unit ideal of
. Indeed, using Bézout’s relation (Proposition 2.16) one can show that for every
there is a (minimal) generating set of the unit ideal of
, that contains exactly n integers. Interested readers may try to construct such generating sets as an (easy) exercise.
A very similar argument proves the following theorem. The details are left to the reader. Also see Exercise 2.31.
We now prove a very important theorem:
Two particular types of ideals are very important in algebra.
|
Let R be a ring.
|
Prime and maximal ideals can be characterized by some nice equivalent criteria. See Proposition 2.9.
|
Let R be a ring and We say that two elements a, |
|
The last proposition in conjunction with Corollary 2.1 indicates:
|
Maximal ideals are prime. |
|
For every Proof Since |
Recall how we have defined homomorphisms of groups. In a similar manner, we define homomorphisms of rings. A ring homomorphism is a map from one ring to another, which respects addition, multiplication and the identity element. More precisely:
|
|
Let f : R → S be a ring homomorphism.
Proof
|
The ideal
of the above proposition is called the contraction of
and is often denoted by
. If R ⊆ S and f is the inclusion homomorphism, then
.
|
Let f : R → S be a ring homomorphism. The set |
|
With the notations of the last definition, Ker f is an ideal of R, Im f is a subring of S and R/ Ker f ≅ Im f. Proof Consider the map |
|
Two ideals |
In Section 2.5, we will see an interesting application of this theorem. Notice that the injectivity of
in the last proof does not require the coprimality of
; the surjectivity of
requires this condition.
Now we introduce the concept of divisibility in a ring. We also discuss about an important type of rings known as unique factorization domains. This study is a natural generalization of that of the rings
and K[X], K a field.
Note that for
the concepts of prime and irreducible elements are the same. This is indeed true for any PID (Proposition 2.12). Thus our conventional definition of a prime integer p > 0 as one which has only 1 and p as (positive) divisors tallies with the definition of irreducible elements above. For the ring K[X], on the other hand, it is more customary to talk about irreducible polynomials instead of prime polynomials; they are the same thing anyway.
|
Let R be an integral domain and Proof Let p = ab. Then p|(ab), so that by hypothesis p|a or p|b. If p|a, then a = up for some |
|
Let R be a PID. An element Proof [if] Let p be irreducible, but not prime. Then there are a, [only if] Immediate from Proposition 2.11. |
|
Let R be a UFD. An element Proof The only if part is immediate from Proposition 2.11. For proving the if part, let p = up1 · · · pr ( |
A classical example of an integral domain that is not a UFD is
. In this ring, we have two essentially different factorizations
of 6 into irreducible elements. The failure of irreducible elements to be primes in such rings is a serious thing to patch up!
|
A PID is a UFD Proof Let R be a PID and |
The converse of the above theorem is not necessarily true. For example, the polynomial ring K[X1, . . . , Xn] over a field K is a UFD for every
, but not a PID for n ≥ 2.
Divisibility in a UFD can be rephrased in terms of prime factorizations. Let R be a UFD and let the non-zero elements a,
have the prime factorizations
and
with units u, u′, pairwise non-associate primes p1, . . . , pr and with αi ≥ 0 and βi ≥ 0. Then a|b if and only if αi ≤ βi for all i = 1, . . . , r. This notion leads to the following definitions.
It is clear that these definitions of gcd and lcm can be readily generalized for any arbitrary finite number of elements.
|
Let R be a UFD and a, Proof Immediate from the definitions. |
|
Let R be a UFD and a, b, Proof Consider the prime factorizations of a, b and c. |
For a PID, the gcd and lcm have equivalent characterizations.
A direct corollary to the last proposition is the following.
This completes our short survey of factorization in rings. Note that
and K[X] (for a field K) are PID and hence UFD. Thus all the results we have proved in this section apply equally well to both these rings. It is because of this (and not of a mere coincidence) that these two rings enjoy many common properties. Thus our abstract treatment saves us from the duplicate effort of proving the same results once for integers (Section 2.5) and once more for polynomials (Section 2.6).
| 2.21 | For a non-zero ring R, prove the following assertions:
Let K be a field. What are the units in the polynomial ring K[X]? In K[X1, . . . , Xn]? In the ring K(X) of rational functions? In K(X1, . . . , Xn)? |
| 2.22 | Binomial theorem Let R be a ring, a,
are the binomial coefficients. |
| 2.23 | Show that every non-zero ring has a maximal (and hence prime) ideal. More generally, show that every non-unit ideal of a non-zero ring is contained in a maximal ideal. [H] |
| 2.24 | Let R be a ring.
|
| 2.25 | Show that a finite integral domain R is a field. [H] |
| 2.26 | Let R be a ring of characteristic 0. Show that:
|
| 2.27 | Let f : R → S be a ring-homomorphism and let and be ideals in R and S respectively. Find examples to corroborate the following statements.
|
| 2.28 | Let K be a field.
|
| 2.29 |
|
| 2.30 | Let R be a ring and let and be ideals of R with . Show that is an ideal of and that . [H]
|
| 2.31 | An integral domain R is called a Euclidean domain (ED) if there is a map satisfying the following two conditions:
Show that:
|
| 2.32 | Let R be a ring and an ideal. Consider the set
Show that
|
| 2.33 | Let R be a ring. An ascending chain of ideals is a sequence . The ascending chain is called stationary, if there is some such that for all n ≥ n0. Show that the following conditions are equivalent. [H]
|
| 2.34 |
|
The set
of integers is the main object of study in this section. We use many results from previous sections to derive properties of integers. Recall that
is a PID and hence a UFD.
The notions of divisibility, prime and relatively prime integers, gcd and lcm of integers are essentially the same as discussed in connection with a PID or a UFD. We avoid repeating the definitions here, but concentrate on other useful properties of integers, not covered so far. We only mention that whenever we talk about a prime integer, or the gcd or lcm of two or more integers, we will usually refer to a non-negative integer. This convention makes primes, gcds and lcms unique.
|
There are infinitely many prime integers. Proof Let |
The integers q and r in the above theorem are respectively called the quotient and the remainder of Euclidean division of a by b and are denoted respectively by a quot b and a rem b. Do not confuse Euclidean division with the division (that is, the inverse of multiplication) of the ring
. Euclidean division is the basis of the Euclidean gcd algorithm. More specifically:
|
For integers a, b with b ≠ 0, let r be the remainder of Euclidean division of a by b. Then gcd(a, b) = gcd(b, r). Proof Clearly, 〈a〉 + 〈b〉 = 〈r〉 + 〈b〉. Now use Proposition 2.14. |
|
Let a and b be two integers, not both zero, and let d be the (positive) gcd of a and b. Then there are integers u and v such that d = ua + vb. (Such an equality is called a Bézout relation.) Furthermore, if a and b are both non-zero and (|a|, |b|) ≠ (1, 1), then u and v can be so chosen that |u| < |b| and |v| < |a|. Proof The existence of u and v follows immediately from Proposition 2.14. If a = qb, then u = 0 and v = 1 is a suitable choice. So assume that a b and b a, in which case d < |a| and d < |b|. We may assume, without loss of generality, that a and b are positive. First note that if (u, v) satisfies the Bézout relation, then for any |
The notions of the gcd and of the Bézout relation can be generalized to any finite number of integers a1, . . . , an as
gcd(a1, . . . , an) = gcd(· · · (gcd(gcd(a1, a2), a3) · · ·), an) = u1a1 + · · · + unan
for some integers u1, . . . , un (provided that all the gcds mentioned are defined).
Since
is a PID, congruence modulo a non-zero ideal of
can be rephrased in terms of congruence modulo a positive integer as follows.
|
Let |
By an abuse of notation, we often denote the equivalence class [a] of
simply by a. The following are some basic properties of congruent integers.
|
Let
Proof (1) and (2) follow from the consideration of the quotient ring |
Let
with gcd(ni, nj) = 1 for i ≠ j. Then lcm(n1, . . . , nr) = n1 · · · nr, and by the Chinese remainder theorem (Theorem 2.10), we have

This implies that, given integers a1, . . . , ar, there exists an integer x unique modulo n1 · · · nr such that x satisfies the following congruences simultaneously:
| x | ≡ | a1 (mod n1) |
| x | ≡ | a2 (mod n2) |
| ⋮ | ||
| x | ≡ | ar (mod nr) |
We now give a procedure for constructing the integer x explicitly. Define N := n1 · · · nr and Ni := N/ni for 1 ≤ i ≤ r. Then for each i we have gcd(ni, Ni) = 1 and, therefore, there are integers ui and vi with uini + viNi = 1. Then
(mod N) is the desired solution.
Let
. We now study the multiplicative group
of the ring
. We say that an integer
has a multiplicative inverse modulo n, if
, or, equivalently, if there is an integer b with ab ≡ 1 (mod n). The following proposition is an important characterization of the elements of
.
|
(The equivalence class of) an integer a belongs to Proof [if] By Proposition 2.16, there exist integers u and v such that ua + vn = 1. But then ua ≡ 1 (mod n). [only if] For some integers u and v, we have ua + vn = 1, which implies that the gcd of a and n divides 1 and hence is equal to 1. |
|
The cardinality of |
The following two theorems are immediate consequences of Proposition 2.4.
|
Let aφ(n) ≡ 1 (mod n). |
|
Let p be a prime and ap–1 ≡ 1 (mod p). For any integer |
|
For every prime p, we have (p – 1)! ≡ –1 (mod p). Proof The result holds for p = 2. So assume that p is an odd prime. Since Equation 2.1
Looking at the constant terms in two sides proves Wilson’s theorem. |
The structure of the group
,
, can be easily deduced from Fermat’s little theorem. This gives us the following important result.
|
For a prime p, the group Proof For every divisor d of p –1, we have Xp–1–1 = (Xd–1)f(X) for some |
Euler’s totient function plays an extremely important role in number theory (and cryptology). We now describe a method for computing it.
|
If p is a prime and Proof Integers between 0 and pe – 1, which are relatively prime to pe are precisely those that are not multiples of p. |
|
Let
Proof Immediate from Lemmas 2.2 and 2.3. |
By Proposition 2.18, the linear congruence ax ≡ 1 (mod n) is solvable for x if and only if gcd(a, n) = 1. In such a case, the solution is unique modulo n. Now, let us concentrate on the solutions of the general linear congruence:
ax ≡ b (mod n).
Theorem 2.17 characterizes the solutions of this congruence.
|
Let d := gcd(a, n). Then the congruence ax ≡ b (mod n) is solvable for x if and only if d|b. A solution of the congruence, if existent, is unique modulo n/d. Proof [if] By Proposition 2.17, (a/d)x ≡ b/d (mod n/d). Since gcd(a/d, n/d) = 1, the congruence (a/d)x′ ≡ 1 (mod n/d) is solvable for x′. Then a solution for x is x ≡ (b/d)x′ (mod n/d). [only if] There exists an integer k such that ax + kn = b. This shows that d|b. To prove the uniqueness let x and x′ be two integers satisfying the given congruence. But then a(x – x′) ≡ 0 (mod n), that is, (a/d)(x – x′) ≡ 0 (mod n/d), that is, x – x′ ≡ 0 (mod n/d), since gcd(a/d, n/d) = 1. |
The last theorem implies that if d|b, then the congruence ax ≡ b (mod n) has d solutions modulo n. These solutions are given by ξ + r(n/d), r = 0, . . . , d – 1, where ξ is the solution modulo n/d of the congruence (a/d)ξ ≡ b/d (mod n/d).
In this section, we consider quadratic congruences, that is, congruences of the form ax2+bx+c ≡ 0 (mod n). We start with the simple case
. We assume further that p is odd, so that 2 has a multiplicative inverse mod p. Since we are considering quadratic equations, we are interested only in those integers a for which gcd(a, p) = 1. In that case, a also has a multiplicative inverse mod p and the above congruence can be written as y2 ≡ α (mod p), where y ≡ x + b(2a)–1 (mod p) and α ≡ b2(4a2)–1 – c(a–1) (mod p). This motivates us to provide Definition 2.29.
If a is a quadratic residue modulo an odd prime p, then the equation x2 ≡ a (mod p) has exactly two solutions. If ξ is one solution, the other solution is p – ξ. It is, therefore, evident that there are exactly (p – 1)/2 quadratic residues and exactly (p – 1)/2 quadratic non-residues modulo p. For example, the quadratic residues modulo p = 11 are 1 = 12 = 102, 3 = 52 = 62, 4 = 22 = 92, 5 = 42 = 72 and 9 = 32 = 82. The quadratic non-residues modulo 11 are, therefore, 2, 6, 7, 8 and 10. We treat 0 neither as a quadratic residue nor as a quadratic non-residue.
|
Let p be an odd prime and a an integer with gcd(a, p) = 1. The Legendre symbol
|
|
Let p be an odd prime and a and b integers coprime to p.
Proof If a is a quadratic residue modulo p, then a ≡ b2 (mod p) for some integer b (coprime to p) and by Fermat’s little theorem we have a(p–1)/2 ≡ bp–1 ≡ 1 (mod p). Conversely, the polynomial Xp–1 – 1 = (X(p–1)/2 – 1)(X(p–1)/2 + 1) has p – 1 (distinct) roots mod p (again by Fermat’s little theorem). We have seen that no quadratic residues are roots of X(p–1)/2 + 1. Since |
Euler’s criterion gives us a nice way to check if a given integer is a quadratic residue modulo an odd prime. While this is much faster than the brute-force strategy of enumerating all the quadratic residues, it is still not the best solution, because it involves a modular exponentiation. We can, however, employ a gcd-like procedure for a faster computation. The development of this method demands further results which are otherwise interesting in themselves as well. The first important result is known as the law of quadratic reciprocity (Theorem 2.18 below). Gauss was the first to prove it and he deemed the result so important that he gave eight proofs for it. At present about two hundred published proofs of this law exist in the literature. We go in the classical way, that is, the Gaussian way, because the proof, though somewhat long, is elementary.
|
Let p be an odd prime and a an integer with gcd(a, p) = 1. Let us denote t := (p – 1)/2. For an integer i, let ri be the unique integer with ri ≡ ia (mod p) and –t ≤ ri ≤ t. Let n be the number of i, 1 ≤ i ≤ t, for which ri is negative. Then Proof It is easy to check that ri ≢ ±rj (mod p) for all i ≠ j with 1 ≤ i, j ≤ t. Thus |ri|, i = 1, . . . , t, are precisely (a permuted version of) the integers 1, . . . , t. Thus |
|
Let |
|
With the notations of Lemma 2.4 we have Proof Since If a is odd, p + a is even. Also 4 is a quadratic residue modulo p. So |
|
Let p and q be distinct odd primes. Then Proof By Corollary 2.7, |
To demonstrate how we can use the results deduced so far, let us compute
. Since 360 = 23 · 32 · 5, we have

Thus 360 is a quadratic residue modulo 997. The apparent attractiveness of this method is beset by the fact that it demands the factorization of several integers and as such does not lead to a practical algorithm. We indeed need further machinery in order to have an efficient algorithm. First, we define a generalization of the Legendre symbol.
|
Let a, b be integers with b > 0 and odd. We define the Jacobi symbol
where, in the last case, p1, . . . , pt are all the prime factors of b (not necessarily all distinct). |
Note that if
, then a is not a quadratic residue mod b. However, the converse is not always true, that is,
does not necessarily imply that a is a quadratic residue modulo b (Example: a = 2 and b = 9). Of course, if b is an odd prime and if gcd(a, b) = 1, the Legendre and Jacobi symbols
correspond to the same value and meaning.
The Jacobi symbol enjoys many properties similar to the Legendre symbol.
|
For integers a, a′ and positive odd integers b, b′, we have:
Proof Immediate from the definition and Proposition 2.21. |
Proof
|
Now, we can calculate
without factoring as follows.

So far, we have studied some elementary properties of integers. Number theory is, however, one of the oldest and widest branches of mathematics. Various complex-analytic and algebraic tools have been employed to derive more complicated properties of integers. In Section 2.13, we give a short introductory exposition to algebraic number theory. Here, we mention a collection of useful results from analytic number theory. The proofs of these analytic results would lead us too far away and hence are omitted here. Inquisitive (and/or cynical) readers may consult textbooks on analytic number theory for the details missing here.
The famous prime number theorem gives an asymptotic estimate of the density of primes smaller than or equal to a positive real number. Gauss conjectured this result in 1791. Many mathematicians tried to prove it during the 19th century and came up with partial results. Riemann made reasonable progress towards proving the theorem, but could not furnish a complete proof before he died in 1866. It is interesting to mention here that a good portion of the theory of analytic functions (also called holomorphic functions) in complex analysis was developed during these attempts to prove the prime number theorem. The first complete proof of the theorem (based mostly on the ideas of Riemann and Chebyshev) was given independently by the French mathematician Hadamard and by the Belgian mathematician de la Vallée Poussin in 1896. Their proof is regarded as one of the major achievements of modern mathematics. People started believing that any proof of the prime number theorem has to be analytic. Erdös and Selberg destroyed this belief by independently providing the first elementary proof of the theorem in 1949. Here (and elsewhere in mathematics), the adjective elementary refers to something which does not depend on results from analysis or algebra. Caution: Elementary is not synonymous with easy !
|
Let π(x) denote the number of primes less than or equal to a real number x > 0. As x → ∞ we have π(x) → x/ln x (that is, the ratio π(x)/(x/ln x) → 1). In particular, for |
Though the prime number theorem provides an asymptotic estimate (that is, one for x → ∞), for finite values of x (for example, for the values of x in the cryptographic range) it does give good approximations for π(x). Table 2.1 lists π(x) against the rounded values of x/ ln x for x equal to small powers of 10.
| x | π(x) | x/ ln x | x/(ln x – 1) | Li(x) |
|---|---|---|---|---|
| 103 | 168 | 145 | 169 | 178 |
| 104 | 1229 | 1086 | 1218 | 1246 |
| 105 | 9592 | 8686 | 9512 | 9630 |
| 106 | 78,498 | 72,382 | 78,030 | 78,628 |
| 107 | 664,579 | 620,421 | 661,458 | 664,918 |
| 108 | 5,761,455 | 5,428,681 | 5,740,304 | 5,762,209 |
Given the prime number theorem it follows that π(x) approaches x/(ln x – ξ) for any real ξ. It turns out that ξ = 1 is the best choice. Gauss’ Li function is also an asymptotic estimate for π(x), where for real x > 0 one defines:

Gauss conjectured that Li(x) asymptotically equals π(x). The prime number theorem is, in fact, equivalent to this conjecture. Furthermore, de la Vallée Poussin proved that Li(x) is a better approximation to π(x) than x/(ln x – ξ) for any real ξ. Table 2.1 also lists x/(ln x – 1) and Li(x) against the actual values of π(x).
The asymptotic formula
does not rule out the possibility that the error π(x)–(x/ ln x) tends to zero as x → ∞. It has been shown by Dusart [83] that (x/ ln x) – 0.992(x/ ln2 x) ≤ π(x) ≤ (x/ ln x) + 1.2762(x/ ln2 x) for all x > 598.
Integers having only small prime divisors play an interesting role in cryptography and in number theory in general.
|
Let |
The following theorem gives an asymptotic estimate for ψ(x, y).
|
Let x, ψ(x, y) → u–u+o(u) = e–[(1+o(1))u ln u]. |
In Theorem 2.21, the notation g(u) = o(f(u)) implies that the ratio g(u)/f(u) tends to 0 as u approaches ∞. See Definition 3.1 for more details. An interesting special case of the formula for ψ(x, y) will be used quite often in this book and is given as Corollary 4.1 in Chapter 4.
Like the prime number theorem, Theorem 2.21 gives only asymptotic estimates, but is indeed a good approximation for finite values of x, y and u (that is, for the values of practical interest). The most important implication of this theorem is that the density of y-smooth integers in the set {1, . . . , x} is a very sensitive function of u = ln x/ ln y and decreases very rapidly as x increases. For example, if y = 15,485,863, the millionth prime, then a random integer ≤ 2250 is y-smooth with probability approximately 2.12 × 10–11, whereas a random integer ≤ 2500 is y-smooth with probability approximately 2.23 × 10–28. (These figures are computed neglecting the o(u) term in the expression of ψ(x, y).) In other words, smaller integers have higher probability of being smooth (that is, y-smooth for a given y).
The Riemann hypothesis (RH) is one of the deepest unsolved problems in mathematics. An extended version of this hypothesis has important bearings on the solvability of certain computational problems in polynomial time.
|
The Euler zeta function ζ(s) is defined for a complex variable s with Re s ≥ 1 as
The reader may already be familiar with the results: ζ(1) = ∞, ζ(2) = π2/6 and ζ(4) = π4/90. Riemann (analytically) extended the Euler Zeta function for all complex values of s (except at s = 1, where the function has a simple pole). This extended function, called the Riemann zeta function, is known to have zeros at s = –2, –4, –6, . . . . These are called the trivial zeros of ζ(s). It can be proved that all non-trivial zeros of ζ(s) must lie in the so-called critical strip : 0 ≤ Re s ≤ 1, and are symmetric about the critical line: Re s = 1/2. |
In 1900, Hilbert asserted that proving or disproving the RH is one of the most important problems confronting 20th century mathematicians. The problem continues to remain so even to the 21st century mathematicians.
In 1901, von Koch proved that the RH is equivalent to the formula:
|
π(x) = Li(x) + O(x1/2 ln x) |
Here the order notation f(x) = O(g(x)) means that |f(x)/g(x)| is less than a constant for all sufficiently large x (See Definition 3.1).
Hadamard and de la Vallée Poussin proved that

for some positive constant α. While this estimate was sufficient to prove the prime number theorem, the tighter bound of Conjecture 2.2 continues to remain unproved.
|
Let a, |
Dirichlet’s theorem is a powerful generalization of Theorem 2.12 (which corresponds to a = b = 1). One can accordingly generalize the notation π(x) as follows:
|
Let a, |
The prime number theorem gives the estimate:

where φ is Euler’s totient function. The RH now generalizes to:
|
For a,
|
Some authors use the expression Generalized Riemann hypothesis (GRH) in place of ERH. Taking b = 1 demonstrates that the ERH implies the RH. The ERH also implies the following:
|
The smallest positive quadratic non-residue modulo a prime p is < 2 ln2 p. |
| 2.35 |
|
| 2.36 | Let and S a subset of {1, 2, ..., 2n} of cardinality n + 1. Show that: [H]
|
| 2.37 | Show that for any , n > 1, the rational number is not an integer. [H]
|
| 2.38 |
|
| 2.39 | Let n ≥ 2 be a natural number. A complete residue system modulo n is a set of n integers a1, . . . , an such that ai ≢ aj (mod n) for i ≠ j. Similarly, a reduced residue system modulo n is a set of φ(n) integers b1, . . . , bφ(n) such that gcd(bi, n) = 1 for all i = 1, . . . , φ(n) and bi ≢ bj (mod n) for i ≠ j. Show that:
|
| 2.40 | Prove that the decimal expansion of any rational number a/b is recurring, that is, (eventually) periodic. (A terminating expansion may be viewed as one with recurring 0.) [H] |
| 2.41 | Let p be an odd prime. Show that the congruence x2 ≡ –1 (mod p) is solvable if and only if p ≡ 1 (mod 4). [H] |
| 2.42 | Let .
|
| 2.43 | For , show that .
|
| 2.44 | Let n > 2 and gcd(a, n) = 1. Let h be the multiplicative order of a modulo n (that is, in the group ). Show that:
|
| 2.45 | Device a criterion for the solvability of ax2 + bx + c ≡ 0 (mod p), where p is an odd prime and gcd(a, p) = 1. [H] |
| 2.46 | Let p be a prime and . An integer a with gcd(a, p) = 1 is called an r-th power residue modulo p, if the congruence xr ≡ a (mod p) has a solution. Show that a is an r-th power residue modulo p if and only if a(p–1)/ gcd(r, p–1) ≡ 1 (mod p). This is a generalization of Euler’s criterion for quadratic residues.
|
| 2.47 | Let G be a finite cyclic group of cardinality n. Show that and that there are exactly φ(n) generators (that is, primitive elements) of G.
|
| 2.48 | Let m, with m|n. Show that the canonical (surjective) ring homomorphism induces a surjective group homomorphism of the respective groups of units. (Note that every ring homomorphism induces a group homomorphism , where A* and B* are the groups of units of A and B respectively. Even when is surjective, need not be surjective, in general. As an example consider the canonical surjection for a prime p > 3.)
|
| 2.49 | In this exercise, we investigate which of the groups is cyclic for a prime p and .
|
| 2.50 | Show that the multiplicative group , n ≥ 2, is cyclic if and only if n = 2, 4, pe, 2pe, where p is an odd prime and . [H]
|
Unless otherwise stated, in this section we denote by K an arbitrary field and by K[X] the ring of polynomials in one indeterminate X and with coefficients from K. Since K[X] is a PID, it enjoys many properties similar to those of
. To start with, we take a look at these properties. Then we introduce the concept of algebraic elements and discuss how irreducible polynomials can be used to construct (algebraic) extensions of fields. When no confusions are likely, we denote a polynomial
by f only.
Since K[X] is a PID and hence a UFD, every polynomial in K[X] can be written essentially uniquely as a product of prime polynomials. Conventionally prime polynomials are more commonly referred to as irreducible polynomials. Similar to the case of
the ring K[X] contains an infinite number of irreducible elements, for if K is infinite, then
is an infinite set of irreducible polynomials of K[X], and if K is finite, then as we will see later, there is an irreducible polynomial of degree d in K[X] for every
.
It is important to note here that the concept of irreducibility of a polynomial is very much dependent on the field K. If K ⊆ L is a field extension, then a polynomial in K[X] is naturally an element of L[X] also. A polynomial which is irreducible over K need not continue to remain so over L. For example, the polynomial x2 – 2 is irreducible over
, but reducible over
, since
,
being a real number but not a rational number. As a second example, the polynomial x2 + 1 is irreducible over both
and
but not over
. In fact, we will show shortly that an irreducible polynomial in K[X] of degree > 1 becomes reducible over a suitable extension of K.
For polynomials f(X),
with g(X) ≠ 0, there exist unique polynomials q(X) and r(X) in K[X] such that f(X) = q(X)g(X) + r(X) with r(X) = 0 or deg r(X) < deg g(X). The polynomials q(X) and r(X) are respectively called the quotient and remainder of polynomial division of f(X) by g(X) and can be obtained by the so-called long division procedure. We use the notations: q(X) = f(X) quot g(X) and r(X) = f(X) rem g(X).
Whenever we talk about the gcd of two non-zero polynomials, we usually refer to the monic gcd, that is, a polynomial with leading coefficient 1. This makes the gcd of two polynomials unique. We have gcd(f(X), g(X)) = gcd(g(X), r(X)), where r(X) = f(X) rem g(X). This gives rise to an algorithm (similar to the Euclidean gcd algorithm for integers) for computing the gcd of two polynomials. Bézout relations also hold for polynomials. More specifically:
|
Let f(X),
Proof Similar to the proof of Proposition 2.16. |
The concept of congruence can be extended to polynomials, namely, if
, then two polynomials g(X),
are said to be congruent modulo f(X), denoted g(X) ≡ h(X) (mod f(X)), if f(X)|(g(X) – h(X)), that is, if there exists
with g(X) – h(X) = u(X)f(X), or equivalently, if g(X) rem f(X) = h(X) rem f(X).
The principal ideals 〈f(X)〉 of K[X] play an important role (as do the ideals 〈n〉 of
). Let us investigate the structure of the quotient ring R := K[X]/〈f(X)〉 for a non-constant polynomial
. If r(X) denotes the remainder of division of
by f(X), then it is clear that the residue classes of g(X) and r(X) are the same in R. On the other hand, two polynomials g(X),
with deg g(X) < deg f(X) and deg h(X) < deg f(X) represent the same residue class in R if and only if g(X) = h(X). Thus elements of R are uniquely representable as polynomials of degrees < deg f(X). In other words, we may represent the ring R as the set
together with addition and multiplication modulo the polynomial f(X). The ring R contains all the constant polynomials
, that is, the field K is canonically embedded in R. In general, R is not a field. The next theorem gives the criterion for R to be a field.
|
For a non-constant polynomial Proof If f(X) is reducible over K, then we can write f(X) = g(X)h(X) for some polynomials g(X), Conversely, if f(X) is irreducible over K and if g(X) is a non-zero polynomial of degree < deg f(X), then gcd(f(X), g(X)) = 1, so that by Proposition 2.23 there exist polynomials u(X), |
Let L := K[X]/〈f(X)〉 with f(X) irreducible over K. Then K ⊆ L is a field extension. If deg f(X) = 1, then L is isomorphic to K. If deg f(X) ≥ 2, then L is a proper extension of K. This gives us a useful and important way of representing the extension field L, given a representation for K. (For example, see Section 2.9.)
The study of the roots of a polynomial is the central objective in algebra. We now derive some elementary properties of roots of polynomials.
|
Let |
|
Let Proof Polynomial division of f(X) by X – a gives f(X) = (X – a)q(X) + r(X) with deg r(X) < deg(X – a) = 1. Thus r(X) is a constant polynomial. Let us denote r(X) by |
|
A non-zero polynomial Proof We proceed by induction on d. The result clearly holds for d = 0. So assume that d ≥ 1 and that the result holds for all polynomials of degree d – 1. If f has no roots in K, we are done. So assume that f has a root, say, |
In the last proof, the only result we have used to exploit the fact that K is a field is that K contains no non-zero zero divisors. This is, however, true for every integral domain. Thus Proposition 2.25 continues to hold if K is any integral domain (not necessarily a field). However, if K is not an integral domain, the proposition is not necessarily true. For example, if ab = 0 with a,
, a ≠ b, then the polynomial X2 + (b – a)X has at least three roots: 0, a and a – b.
For a field extension K ⊆ L and for a polynomial
, we may think of the roots of f in L, since
too. Clearly, all the roots of f in K are also roots of f in L. However, the converse is not true in general. For example, the only roots of X4 – 1 in
are ±1, whereas the roots of the same polynomial in
are ±1, ±i. Indeed we have the following important result.
|
For any non-constant polynomial Proof If f has a root in K, taking K′ = K proves the proposition. So we assume that f has no root in K (which implies that deg f ≥ 2). In principle, we do not require f to be irreducible. But if we consider a non-constant factor g of f, irreducible over K, we see that the roots of g in any extension L of K are roots of f in L too. Thus we may replace f by g and assume, without loss of generality, that f is irreducible. We construct the field extension K′ := K[X]/〈f〉 of K and denote the equivalence class of X in K′ by α. (One also writes x, X or [X] to denote this equivalence class.) It is clear that |
We say that the field K′ in the proof of the last proposition is obtained by adjoining the root α of f and denote this as K′ = K(α). We can write f(X) = (X – α)f1(X), where
and deg f1 = (deg f) – 1. Now there is a field extension K″ of K′, where f1 has a root. Proceeding in this way we prove the following result.
|
A non-constant polynomial f in K[X] with deg f = d has d roots (not necessarily all distinct) in some field extension L of K. |
If a polynomial
of degree d ≥ 1 has all its roots α1, . . . , αd in L, then f(X) = a(X – α1) · · · (X – αd) for some
(actually
). In this case, we say that f splits (completely or into linear factors) over L.
|
Let
|
Every non-constant polynomial
has a splitting field L over K. Quite importantly, this field L is unique in some sense. This allows us to call the splitting field of f instead of a splitting field of f. We discuss these topics further in Section 2.8.
|
Let f be a non-constant polynomial in K[X] and let α be a root of f (in some extension of K). The largest natural number n for which (X –α)n|f(X) is called the multiplicity of the root α (in f). If n = 1 (resp. n > 1), then α is called a simple (resp. multiple) root of f. If all the roots of f are simple, then we call f a square-free polynomial. It is easy to see that f is square-free, only if f is not divisible by the square of a non-constant polynomial in K[X]. The reverse implication also holds, if char K = 0 or if K is a finite field (or, more generally, if K is a perfect field—see Exercise 2.76). |
The notion of multiplicity can be extended to a non-root β of f by setting the multiplicity of β to zero.
Here we assume, unless otherwise stated, that K ⊆ L is a field extension.
|
|
Let |
|
Let Proof Let f = f1f2 for some non-constant polynomials f1, Using polynomial division one can write h(X) = q(X)f(X) + r(X) for some polynomials q, Finally, if f and g are two minimal polynomials of α over K, then f|g and g|f and it follows that g(X) = cf(X) for some unit c of K[X]. But the only units of K[X] are the non-zero elements of K. |
By Proposition 2.28, a monic minimal polynomial f of α over K is uniquely determined by α and K. It is, therefore, customary to define the minimal polynomial of α over K to be this (unique) monic polynomial. Unless otherwise stated, we will stick to this revised definition and write f(X) = minpolyα, K(X).
|
For a field K, the following conditions are equivalent.
Proof [(a)⇒(b)] Consider a non-constant irreducible polynomial [(b)⇒(c)] Let [(c)⇒(d)] Obvious. [(d)⇒(a)] Let |
|
A field K satisfying the equivalent conditions of Proposition 2.29 is called an algebraically closed field. For an arbitrary field K, a minimal algebraically closed field |
We will see in Section 2.8 that an algebraic closure of every field exists and is unique in some sense. The algebraic closure of an algebraically closed field K is K itself. We end this section with the following well-known theorem. We will not prove the theorem in this book, because every known proof of it uses some kind of complex analysis which this book does not deal with.
|
The field
|
| 2.51 | Let R be a ring and f, . Show that:
|
| 2.52 | Let f, , where R is an integral domain. Show that if f(ai) = g(ai) for i = 1, . . . , n, where n > max(deg f, deg g) and where a1, . . . , an are distinct elements of R, then f = g. In particular, if f(a) = g(a) for an infinite number of , then f = g.
|
| 2.53 | Lagrange’s interpolation formula Let K be a field and let a0, . . . , an be distinct elements of K. Show that for |
| 2.54 | Polynomials over a UFD Let R be a UFD. For a non-zero polynomial |
| 2.55 | Let R be a UFD. Show that a non-constant polynomial is irreducible over R if and only if f is irreducible over Q(R), where Q(R) denotes the quotient field of R (see Exercise 2.34).
|
| 2.56 |
|
| 2.57 | Let K ⊆ L be a field extension and f1, . . . , fn non-constant polynomials in K[X]. Show that each fi, i = 1, . . . , n, splits over L if and only if the product f1 · · · fn splits over L. |
| 2.58 | Show that the irreducible polynomials in have degrees ≤ 2. [H]
|
| 2.59 | Show that a finite field (that is, a field with finite cardinality) is not algebraically closed. In particular, the algebraic closure of a finite field is infinite. |
| 2.60 | A complex number z is called an algebraic number, if z is algebraic over . An algebraic number z is called an algebraic integer, if z is a root of a monic polynomial in . Show that:
|
| 2.61 | Let K be a field and . The formal derivative f′ of f is defined to be the polynomial . Show that:
|
| 2.62 | Let be a non-constant polynomial of degree d and let α1, . . . , αd be the roots of f (in some extension field of K). The quantity is called the discriminant of f. Prove the following assertions:
|
Vector spaces and linear transformations between them are the central objects of study in linear algebra. In this section, we investigate the basic properties of vector spaces. We also generalize the concept of vector spaces to get another useful class of objects called modules. A module which also carries a (compatible) ring structure is referred to as an algebra. Study of algebras over fields (or more generally over rings) is of importance in commutative algebra, algebraic geometry and algebraic number theory.
Unless otherwise specified, K denotes a field in this section.
|
|
Let V be a K-vector space. For every
Proof Easy verification. |
|
Let V be a vector space over K and S a subset of V. We say that S is a generating set or a set of generators of V (over K), or that S generates V (over K), if every element |
|
If
, then S is linearly dependent, since a · 0 = 0 for any
. One can easily check that all the generating sets of Example 2.13 are linearly independent too. This is, however, not a mere coincidence, as the following result demonstrates.
|
A subset S of a K-vector space V is a minimal generating set for V if and only if S is a maximal linearly independent set of V. Proof [if] Given a maximal linearly independent subset S of V, we first show that S is a generating set for V. Take any non-zero [only if] Given a minimal generating set S of V, we first show that S is linearly independent. Assume not, that is, a1x1 + · · · + anxn = 0 for some |
|
Let V be a K-vector space. A minimal generating set S of V is called a basis of V over K (or a K-basis of V). By Theorem 2.25, S is a basis of V if and only if S is a maximal linearly independent subset of V. Equivalently, S is a basis of V if and only if S is a generating set of V and is linearly independent. |
Any element of a vector space can be written uniquely as a finite linear combination of elements of a basis, since two different ways of writing the same element contradict the linear independence of the basis elements.
A K-vector space V may have many K-bases. For example, the elements 1, aX + b, (aX + b)2, · · · form a K-basis of K[X] for any a,
, a ≠ 0. However, what is unique in any basis of a given K-vector space V is the cardinality[8] of the basis, as shown in Theorem 2.26.
[8] Two sets (finite or not) S1 and S2 are said to be of the same cardinality, if there exists a bijective map S1 → S2.
For the sake of simplicity, we sometimes assume that V is a finitely generated K-vector space. This assumption simplifies certain proofs greatly. But it is important to highlight here that, unless otherwise stated, all the results continue to remain valid without the assumption. For example, it is a fact that every vector space has a basis. For finitely generated vector spaces, this is a trivial statement to prove, whereas without our assumption we need to use arguments that are not so simple. (A possible proof follows from Exercise 2.63 with U = {0}.)
|
Let V be a K-vector space. Then any K-basis of V has the same cardinality. Proof We assume that V is finitely generated. Let S = {x1, . . . , xn} be a minimal finite generating set, that is, a basis, of V. Let T be another basis of V. Assume that m := #T > n. (We might even have m = ∞.) We can choose distinct elements |
Theorem 2.26 holds even when V is not finitely generated. We omit the proof for this case here.
For example, dimK Kn = n,
, and dimK K[X] = ∞.
|
Let V be a K-vector space. A subgroup U of V, which is closed under the scalar multiplication of V, is again a K-vector space and is called a (vector) subspace of V. In this case, we have dimK U ≤ dimK V (Exercise 2.63). |
|
Let V be a vector space over K.
|
|
Let V and W be K-vector spaces. A map f : V → W is called a homomorphism (of vector spaces) or a linear transformation or a linear map over K, if f(ax + by) = af(x) + bf(y) for all a,
|
|
Let V and W be K-vector spaces. Then V and W are isomorphic if and only if dimK V = dimK W. Proof If dimK V = dimK W and S and T are bases of V and W respectively, then there exists a bijection f : S → T. One can extend f to a linear map |
|
A K-vector space V with n := dimK V < ∞ is isomorphic to Kn. |
Let V be a K-vector space and U a subspace. As in Section 2.3 we construct the quotient group V/U. This group can be given a K-vector space structure under the scalar multiplication map a(x + U) := ax + U,
,
. If T ⊆ V is such that the residue classes of the elements of T form a K-basis of V/U and if S is a K-basis of U, then it is easy to see that S ∪ T is a K-basis of V. In particular,
Equation 2.2

For
, the set
is called the kernel Ker f of f, and the set
is called the image Im f of f. We have the isomorphism theorem for vector spaces:
|
Ker f is a subspace of V, Im f is a subspace of W, and V/Ker f ≅ Im f. Proof Similar to Theorem 2.3 and Theorem 2.9. |
|
For |
|
Rank f + Null f = dimK V for any |
If we remove the restriction that K is a field and assume that K is any ring, then a vector space over K is called a K-module. More specifically, we have:
|
Let R be a ring. A module over R (or an R-module) is an (additively written) Abelian group M together with a multiplication map · : R × M → M called the scalar multiplication map, such that for every a, |
Modules are a powerful generalization of vector spaces. Any result we prove for modules is equally valid for vector spaces, ideals and Abelian groups. On the other hand, since we do not demand that the ring R be necessarily a field, certain results for vector spaces are not applicable for all modules.
It is easy to see that Corollary 2.8 continues to hold for modules. An R-submodule of an R-module M is a subgroup of M, that is closed under the scalar multiplication of M. For a subset S ⊆ M, the set of all finite linear combinations of the form a1x1 + · · · + anxn,
,
,
, is an R-submodule N of M, denoted by RS or
. We say that N is generated by S (or by the elements of S). If S is finite, then N is said to be finitely generated. A (sub)module generated by a single element is called cyclic. It is important to note that unlike vector spaces the cardinality of a minimal generating set of a module is not necessarily unique. (See Exercise 2.68 for an example.) It is also true that given a minimal generating set S of M, there may be more than one ways of writing an element of M as finite linear combinations of elements of S. For example, if
and S = {2, 3}, then 1 = (–1)·2+1·3 = 2·2+(–1)·3. The nice theory of dimensions developed in connection with vector spaces does not apply to modules.
For an R-submodule N of M, the Abelian group M/N is given an R-module structure by the scalar multiplication map a(x + N) := ax + N. This module is called the quotient module of M by N.
For R-modules M and N, an R-linear map or an R-module homomorphism (from M to N) is defined as a map f : M → N with f(ax+by) = af(x)+bf (y) for all a,
and x,
(or equivalently with f(x + y) = f(x) + f(y) and f(ax) = af(x) for all
and x,
). An isomorphism, an endomorphism and an automorphism are defined in analogous ways as in case of vector spaces. The set of all (R-module) homomorphisms M → N is denoted by HomR(M, N) and the set of all (R-module) endomorphisms of M is denoted by EndR M. These sets are again R-modules under the definitions: (f + g)(x) := f(x) + g(x) and (af)(x) := af(x) for all
and
(and f, g in HomR(M, N) or EndR M).
The kernel and image of an R-linear map f : M → N are defined as the sets Ker
and Im
. With these notations we have the isomorphism theorem for modules:
|
Ker f and Im f are submodules of M and N respectively and M / Ker f ≅ Im f. |
For an R-module M and an ideal
of R, the set
consisting of all finite linear combinations
with
,
and
is a submodule of M. On the other hand, for a submodule N of M the set
is an ideal of R. In particular, the ideal (M : 0) is called the annihilator of M and is denoted as AnnR M (or as Ann M). For any ideal
, one can view M as an
under the map
. One can easily check that this map is well-defined, that is, the product
is independent of the choice of the representative a of the equivalence class
.
|
A free module M over a ring R is defined to be a direct sum |
Any vector space is a free module (Theorem 2.27 and Corollary 2.9). The Abelian groups
,
, are not free.
|
M is a finitely generated R-module if and only if M is a quotient of a free module Rn for some Proof [if] The free module Rn has a canonical generating set ei, ei = (0, . . . , 0, 1, 0, . . . , 0) (1 in the i-th position). If M = Rn/N, then the equivalence classes ei + N, i = 1, ..., n, constitute a finite set of generators of M. [only if] If x1, ..., xn generate M, then the R-linear map f : Rn → M defined by (a1, ..., an) ↦ a1x1 + · · · + anxn is surjective. Hence by the isomorphism theorem M ≅ Rn / Ker f. |
Let
be a homomorphism of rings. The ring A can be given an R-module structure with the multiplication map
for
and
. This R-module structure of A is compatible with the ring structure of A in the sense that for every a,
and x,
one has (ax)(by) = (ab)(xy).
Conversely, if a ring A has an R-module structure with (ax)(by) = (ab)(xy) for every a,
and x,
, then there is a unique ring homomorphism
taking a ↦ a · 1 (where 1 denotes the identity of A). This motivates us to define the following.
|
Let R be a ring. An algebra over R or an R-algebra is a ring A together with a ring homomorphism |
|
Let R be a ring.
|
An R-algebra A is an R-module with the added property that multiplication of elements of A is now legal. Exploiting this new feature leads to the following concept of algebra generators.
One may proceed to define kernels and images of R-algebra homomorphisms and frame and prove the isomorphism theorem for R-algebras. We leave the details to the reader. We only note that algebra homomorphisms are essentially ring homomorphisms with the added condition of commutativity with the structure homomorphisms.
|
A ring A is a finitely generated R-algebra if and only if A is a quotient of a polynomial algebra (over R). Proof [if] Immediate from Example 2.17. [only if] Let A := R[x1, . . . , xn]. The map η : R[X1, . . . , Xn] → A that takes f(X1, . . . , Xn) ↦ f(x1, . . . , xn) is a surjective R-algebra homomorphism. By the isomorphism theorem, one has the isomorphism A ≅ R[X1, . . . , Xn]/Ker η of R-algebras. |
This theorem suggests that for the study of finitely generated algebras it suffices to investigate only the polynomial algebras and their quotients.
| 2.63 | Let V be a K-vector space, U a subspace of V, and T an arbitrary K-basis of U. Show that there is a K-basis of V, that contains T. [H] |
| 2.64 |
|
| 2.65 | Let V and W be K-vector spaces and f : V → W a K-linear map. Show that f is uniquely determined by the images f(x), , where S is a basis of V.
|
| 2.66 | Let V and W be K-vector spaces. Check that HomK(V, W) is a vector space over K. Show that dimK(HomK(V, W)) = (dimK V)(dimK W). In particular, if W = K, then HomK(V, K) is isomorphic to V. The space HomK(V, K) is called the dual space of V. |
| 2.67 | Let V and W be m- and n-dimensional K-vector spaces, S = {x1, . . . , xm} a K-basis of V, T = {y1, . . . , yn} a K-basis of W, and f : V → W a K-linear map. For each i = 1, . . . , m, write f(xi) = ai1y1 + · · · + ainyn, . The m × n matrix is called the transformation matrix of f (with respect to the bases S and T). We have:
Let V1, V2, V3 be K-vector spaces, f, f1,
(Remark: This exercise explains that the linear transformations of finite-dimensional vector spaces can be explained in terms of matrices.) |
| 2.68 | Show that for every there are integers a1, . . . , an that constitute a minimal set of generators for the unit ideal in . [H]
|
| 2.69 | Let M be an R-module. A subset S of M is called a basis of M, if S generates M and is linearly independent over R in the sense that , , , , implies a1 = · · · = an = 0. Show that M has a basis if and only if M is a free R-module.
|
| 2.70 | We define the rank of a finitely generated R-module M as
RankR M := min{#S | M is generated by S}. If N is a submodule of M, show that RankR M ≤ RankR N + RankR(M/N). Give an example where the strict inequality holds. |
| 2.71 | Let M be an R-module. An element is called a torsion element of M, if Ann Rx ≠ 0, that is, if there is with ax = 0. The set of all torsion elements of M is denoted by Tors M. M is called torsion-free if Tors M = {0}, and a torsion module if Tors M = M.
|
| 2.72 | Show that:
This shows that the converse of Exercise 2.71(c) is not true in general. |
In this section, we study some important properties of field extensions. We also give an introduction to Galois theory. Unless otherwise stated, the letters F, K and L stand for fields in this section.
We have seen that if F ⊆ K is a field extension, then K is a vector space over F. This observation leads to the following very useful definitions.
|
For a field extension F ⊆ K, the cardinality of any F-basis of K is called the degree of the extension F ⊆ K and is denoted by [K : F]. If [K : F] is finite, K is called a finite extension of F. Otherwise, K is called an infinite extension of F. |
|
Let F ⊆ K ⊆ L be a tower of field extensions. Then [L : F] = [L : K] [K : F]. In particular, the extension F ⊆ L is finite if and only if the extensions F ⊆ K and K ⊆ L are finite. In that case, [L : K] | [L : F] and [K : F] | [L : F]. Proof One can easily check that if S is an F-basis of K and S′ a K-basis of L, then the set |
Recall the definitions of the rings F[X] of polynomials and F(X) of rational functions in one indeterminate X. These notations are now generalized. For a field extension F ⊆ K and for
, we define:

and
Equation 2.3

It is easy to see that F[a] is the smallest (with respect to inclusion) of the integral domains that contain F and a. Similarly F(a) is the smallest of the fields that contain F and a. We also have F[a] ⊆ F(a). Now we state the following important characterization of algebraic elements.
|
For a field extension F ⊆ K and an element
Proof [(a)⇒(b)] Let [(b)⇒(c)] Let d := [F(a) : F]. Since the elements 1, a, a2, . . . , ad are linearly dependent over F, there exists [(c)⇒(a)] Clearly, the element 0 is algebraic over F. So assume a ≠ 0. Since |
|
For a field extension F ⊆ K, the set of elements in K that are algebraic over F is a field. Proof It is sufficient to show that if a, |
The field F(a)(b) in the proof of the last corollary is also denoted as F(a, b). It is the smallest subfield of K that contains F, a and b, and it follows that F(a, b) = F(b, a). More generally, for a field extension F ⊆ K and for
, each algebraic over F, the field F(a1, . . . , an) is defined as F(a1)(a2) . . . (an) and is independent of the order in which ai are adjoined.
|
Let F ⊆ K be a finite extension. Then K is algebraic over F. Proof For any |
The converse of the last corollary is not true, that is, it is possible that an algebraic extension has infinite extension degree. Exercise 2.59 gives an example.
|
A field extension F ⊆ K is called simple, if K = F(a) for some |
|
Let F be a field of characteristic 0 and let a, b (belonging to some extension of F) be algebraic over F. Then the extension F(a, b) of F is simple. Proof Let p(X) and q(X) be the minimal polynomials (over F) of a and b respectively. Let d := deg p and d′ := deg q. The polynomials p and q are irreducible over F and hence by Exercise 2.61 have no multiple roots. Let a1, . . . , ad be the roots of p and b1, . . . , bd′ the roots of q with a = a1 and b = b1. For each i, j with j ≠ 1, the equation ai + λbj = a + λb has a unique solution for λ (not necessarily in F). Since F is infinite, we can choose |
Let f(X) be a non-constant polynomial of degree d in F[X]. Assume that f does nor split over F. Consider an irreducible (in F[X]) factor f′ of f of degree d′ > 1. F′ := F[X]/〈f′〉 is a field extension of F. Furthermore, if
, the elements 1,
constitute a basis of F′ over F. In particular, [F′ : F] = d′ ≤ d. Now, one can write f(X) = (X – α1)g(X) for some
. If g splits over F′, so does f too. Otherwise, choose any irreducible (in F′[X]) factor g′ of g with deg g′ > 1 and consider the field extension F″ := F′[X]/〈g′〉. Then [F″ : F′] = deg g′ ≤ deg g = d – 1, so that [F″ : F] ≤ d(d – 1). Moreover, if
, then f(X) = (X –α1)(X –α2)h(X) for some
. Proceeding in this way we get:
|
For a polynomial |
We now establish the uniqueness of the splitting field of a polynomial
. To start with, we set up certain notations. An isomorphism μ : F → F′ of fields induces an isomorphism μ* : F[X] → F′[Y] of polynomial rings, defined by adXd+ad–1Xd–1 + · · · + a0 ↦ μ(ad)Yd + μ(ad–1)Yd–1 + · · · + μ(a0). We have μ*(a) = μ(a) for all
. Note also that
is irreducible over F if and only if
is irreducible over F′. With these notations we state the following important lemma.
|
Let the non-constant polynomial Proof Since F(α) = F[α] and F′(β) = F′[β], we can define the map τ : F[α] → F′[β] by g(α) ↦ (μ*(g))(β) for each |
Roots of an irreducible polynomial are called conjugates (of each other). If α and β are two roots of an irreducible polynomial
, the last lemma guarantees the existence of an isomorphism τ : F(α) → F(β) that fixes all the elements of F and that maps α ↦ β.
|
We use the maps μ : F → F′ and μ* : F[X] → F′[Y] as defined above. Let Proof We proceed by induction on n := [K : F]. (By Proposition 2.32 n is finite.) If n = 1, then K = F, that is, the polynomial f splits over F itself and so does μ*(f) over F′, that is, K′ = F′. Thus τ = μ is the desired isomorphism. Now assume that n > 1 and that the result holds for all fields L and for all polynomials in L[X] with splitting fields (over L) of extension degrees less than n. Consider an irreducible factor g of f with 1 < deg g ≤ deg f. Note that g also splits over K. We take any root |
The results pertaining to the splitting field of a polynomial can be generalized in the following way. Let S be a non-empty subset of F[X]. A splitting field of S over F is a minimal field K containing F such that each polynomial
splits in K. If S = {f1, . . . , fr} is a finite set, the splitting field of S is the same as the splitting field of f = f1 · · · fr (Exercise 2.57). But the situation is different, if S is infinite. Of particular interest is the set S consisting of all irreducible polynomials in F[X]. In this case, the splitting field of S is an algebraic closure of F.
We give a sketch of the proof that even when S is infinite, a splitting field for S can be constructed. This, in particular, establishes the existence of an algebraic closure of any field. We may assume that S comprises non-constant polynomials only. For each
, we define an indeterminate Xf and consider the ring
and the ideal
of A generated by f(Xf) for all
. We have
and, therefore, there is a maximal ideal
of A containing
(Exercise 2.23). Consider the field F1 := A/m containing F. Every polynomial
contains at least one root in F1. Now we replace F by F1 and as above get another field F2 containing F1 (and hence F), such that every polynomial in S (of degree ≥ 2) has at least two roots in F2. We continue this procedure (infinitely often, if necessary) and obtain a sequence of fields F ⊆ F1 ⊆ F2 ⊆ F3 ⊆ · · ·. Define K to be the field consisting of all elements of
, that are algebraic over F. Each polynomial in S splits in K, but in no proper subfield of K, that is, K is a splitting field of S.
It turns out that the splitting field of S is unique up to isomorphisms that fix elements of F. In particular, the algebraic closure of F is unique up to isomorphisms that fix elements of F, and is denoted by
.
For a field K, the set Aut K of all automorphisms of K is a group under (functional) composition. We extend this concept now. Let F ⊆ K be an extension of fields.
For every intermediate field L (that is, a field L with F ⊆ L ⊆ K), we have a subgroup AutL K of AutF K. Conversely, given a subgroup H of AutF K we have the intermediate fixed field FixF H. It is a relevant question to ask if there is any relationship between the subgroups of AutF K and the intermediate fields. A nice correspondence exists for a particular type of extensions that we define now.
|
A field extension F ⊆ K is said to be a Galois extension (or K is said to be a Galois extension over F), if FixF (AutF K) = F. Thus K is Galois over F if and only if for every |
|
Let K be the splitting field of a non-constant polynomial |
The following theorem establishes the correspondence we are looking for.
|
For a finite Galois extension F ⊆ K, there is a bijective correspondence between the set of all intermediate fields and the set of all subgroups of AutF K (given by L ↦ AutL K and H ↦ FixF H) such that the following assertions hold:
|
A proof of this theorem is rather long and uses many auxiliary results which we would not need otherwise. We, therefore, choose to omit the proof here.
| 2.73 | Let α be transcendental over F. Show that the domain F[α] and the field F(α) are respectively isomorphic to the polynomial ring F[X] and the field F(X) of rational functions in one indeterminate X. Generalize the result for an arbitrary family αi, , of elements each of which is transcendental over F.
|
| 2.74 | Let F ⊆ K be a field extension and let be an endomorphism of K with for every .
|
| 2.75 | Let F ⊆ K be a field extension.
|
| 2.76 | F is called a perfect field, if every irreducible polynomial in F[X] is separable over F. |
| 2.77 | A field extension F ⊆ K is called normal, if every irreducible polynomial in F[X], that has a root in K, splits in K[X].
|
| 2.78 | Prove the following assertions:
|
| 2.79 | Let F ⊆ K be a field extension and let L be the fixed field of AutF K over F. Show that K is a Galois extension of L. |
Finite fields are seemingly the most important types of fields used in cryptography. They enjoy certain nice properties that infinite fields (in particular, the well-known fields like
,
and
) do not. We concentrate on some properties of finite fields in this section. As we see later, arithmetic over a finite field K is fast, when char K = 2 or when #K is a prime. As a result, these two classes of fields are the most common ones employed in cryptography. However, in this section, we do not restrict ourselves to these specific fields only, but provide a general treatment valid for all finite fields. As in the previous section, we continue to use the letters F, K, L to denote fields. In addition, we use the letter p to denote a prime number and q a power of p: that is, q = pn for some
.
Let K be a finite field of cardinality q. Then p := char K > 0. By Proposition 2.7, p is a prime, that is, K contains an isomorphic copy of the field
. If
, we have q = pn. Therefore, we have proved the first statement of the following important result.
|
The cardinality of a finite field is a power pn, Proof In order to construct a finite field of cardinality q := pn, we start with |
|
Let K be a finite field of cardinality q. Then every Proof Clearly, 0q = 0. Take a ≠ 0. K* being a group of order q – 1, by Proposition 2.4 ordK* (a) divides q – 1. In particular, aq–1 = 1, that is, aq = a. |
|
Let K be a finite field of cardinality q = pn and let F be the subfield of K isomorphic to Proof By Theorem 2.37, each of the q elements of K is a root of f and consequently K is the splitting field of f. The last assertion in the theorem follows from the uniqueness of splitting fields (Proposition 2.33). |
This uniqueness allows us to talk about the finite field of cardinality q (rather than a finite field of cardinality q). We denote this (unique) field by
.
The results proved so far can be generalized for arbitrary extensions
, where q = pn, n,
. We leave the details to the reader (Exercise 2.82). It is important to point out here that since
is the splitting field of Xqm – X over
, by Exercise 2.77 we have:
|
Every finite extension of finite fields is normal. |
This implies that an irreducible polynomial
has either none or all of its roots in
. Also if
with q = pn, then αq = αpn = α. Therefore, αpn–1 is a p-th root of α. By Exercise 2.76(b), we then conclude:
|
Every finite field is perfect. |
|
Consider the extension Proof For d|m, we have (Xqd – X)|(Xqm – X). The qd roots of Xqd – X in K constitute an intermediate field L. If L′ ≠ L is another intermediate field with qd elements, by Theorem 2.36 there are more than qd elements of K, that are roots of Xqd – X, a contradiction. Conversely, an intermediate field L contains qd elements, where |
|
Proof Modify the proof of Proposition 2.19 or use the following more general result. |
|
Let K be a field (not necessarily finite). Then any finite subgroup G of the multiplicative group K* is cyclic. Proof Since K is a field, for any |
|
Every finite extension Proof Let α be a generator of the cyclic group |
In this section, we study some useful properties of polynomials over finite fields. We concentrate on polynomials in
for an arbitrary q = pn,
,
. We have seen how the polynomials Xqm – X proved to be important for understanding the structures of finite fields. But that is not all; these polynomials indeed have further roles to play. This prompts us to reserve the following special symbol:
.
Let
be a finite extension of finite fields and let
be a root of the polynomial
. Since each
, we have
. Therefore,
. More generally, for each r = 0, 1, 2, · · · the element
is a root of f(X). This gives us a nice procedure for computing the minimal polynomial of α as the following corollary suggests.
We now prove a theorem which has important consequences.
|
Proof We have |
The first consequence of Theorem 2.40 is that it leads to a procedure for checking the irreducibility of a polynomial
. Let d := deg f. If f(X) is reducible, it admits an irreducible factor of degree ≤ ⌊d/2⌋. Since
is the product of all distinct irreducible factors of f with degrees dividing m, we compute the gcds g1, . . . , g⌊d/2⌋. If all these gcds are 1, we conclude that f is irreducible. Otherwise f is reducible. We will see an optimized implementation of this procedure in Chapter 3. Besides irreducibility testing, the above theorem also leads to algorithms for finding random irreducible polynomials and for factorizing polynomials, as we will also discuss in Chapter 3.
The second consequence of Theorem 2.40 is that it gives us a formula for calculating the number of monic irreducible polynomials of a given degree over a given field. First we need to define a function on
.
|
The Möbius function
It follows that μ(n) ≠ 0 if and only if n is square-free. |
|
For
where Proof The result follows immediately for n = 1. For n > 1, write |
|
Let f and g be maps from
Proof To prove the additive formula we note that
where the last equality follows from Lemma 2.6. The multiplicative formula can be proved similarly. |
Let us denote by νq,m the number of monic irreducible polynomials in
of degree m and by
the product of all monic irreducible polynomials in
of degree m. By Theorem 2.40, we have
and
. Applications of the Möbius inversion formula then yield the following formulas:
Equation 2.4

If p1, . . . , pr are all the prime divisors of m (not necessarily all distinct), Equation (2.4) together with the observation that μ(n) ≥ –1 for all
imply that
But each pi ≥ 2, so that m ≥ 2r, and hence
. We, therefore, have an independent proof of the second statement in Corollary 2.17. Moreover, for practical values of q and m we have the good approximation:
Equation 2.5

Since the total number of monic polynomials of degree m in Fq[X] is qm, a randomly chosen monic polynomial in
of degree m has an approximate probability of 1/m for being irreducible, that is, one expects to get an irreducible polynomial of degree m, after O(m) random monic polynomials are picked up from
. These observations have an important bearing for devising efficient algorithms for finding irreducible polynomials over finite fields. (See Chapter 3.)
The conjugates of
over
are αqi,
. It is interesting to look at the sum and the product of the conjugates of α. By Corollary 2.18,
for some
. Since
, the elements
and
belong to
. Since αqd = α, for any (positive) integral multiple δ of d, the sum
and the product
are elements of
too.
|
Let
and the norm of α over
In view of the preceding discussion, the trace and norm of α are elements of |
The trace and norm functions play an important role in the theory of finite fields. See Exercise 2.86 for some elementary properties of these functions.
is a vector space of dimension m over
. Let β0, . . . , βm–1 be an
-basis of
. Each element
has a unique representation a = a0β0 + · · · + am–1βm–1 with each
. Therefore, if we have a representation of the elements of
, we can also represent the elements of
. Thus elements of any finite field can be represented, if we have representations of elements of prime fields. But the set {0, 1, . . . , p – 1} under the modulo p arithmetic represents
.
So our problem reduces to selecting suitable bases β0, . . . , βm–1 of
over
. In order to illustrate how we can do that, let us choose a priori a fixed monic irreducible polynomial
with deg f = m. We then represent
, where α (the residue class of X) is a root of f in
. The elements
are linearly independent over
, since otherwise we would have a non-zero polynomial of degree less than m, of which α is a root. The
-basis 1, α, . . . , αm–1 of
is called a polynomial basis (with respect to the defining polynomial f). The elements of
are then polynomials of degrees < m. The arithmetic in
is carried out as the polynomial arithmetic of
modulo the irreducible polynomial f.
|
Polynomial bases are most common in finite field implementations. Some other types of bases also deserve specific mention in this context.
It can be shown that normal bases exist for all finite extensions
. It can even be shown that primitive normal bases also do exist for all such extensions.
|
Consider the representation of
with the 3×3 transformation matrix having determinant 1 modulo 2. Thus α is a normal element of On the other hand, α + 1 is not a normal element of
with the transformation matrix having determinant zero modulo 2. |
Computations over finite fields often call for exponentiations of elements a = a0β0 + · · · + am–1βm–1. If βi = αqi, i = 0, . . . , m – 1, construct a normal basis,
, since αqm = α and
for each i. Thus the coefficients of aq (in the representation under the given normal basis) is obtained simply by cyclically shifting the coefficients a0, . . . , am–1 in the representation of a. This leads to a considerable saving of time. In particular, this trick becomes most meaningful for q = 2 (a case of high importance in cryptography).
Now that exponentiations become cheaper with normal bases, one should not let the common operations (addition and multiplication) turn significantly slower. The sum of a = a0β0 + · · · + am–1βm–1 and b = b0β0 + · · · + bm–1βm–1 continues to remain as easy as in the case of a polynomial basis, namely, a + b = (a0 + b0)β0 + · · · + (am–1 + bm–1)βm–1, where each ai + bi is calculated in
. However, computing the product ab introduces difficulty. In particular, it requires the representation of βiβj, 0 ≤ i, j ≤ m – 1, in the basis β0, . . . , βm–1, say,
. For i ≤ j, we have
. It is thus sufficient to look only at the coefficients
, 0 ≤ j, k ≤ m – 1. We denote by Cα the number of non-zero
. From practical considerations (for example, for hardware implementations), Cα should be as small as possible. For q = 2, one can show that 2m – 1 ≤ Cα ≤ m2. If, for this special case, Cα = 2m – 1, the normal basis α, αq, . . . , αqm–1 is called an optimal normal basis. Unlike normal (or primitive normal) bases, optimal normal bases do not exist for all values of
.
We finally mention another representation of elements of a finite field
, that does not depend on the vector space representation discussed so far, but which is based on the fact that the group
is cyclic. If we are given a primitive element (that is, a generator) γ of
, then the elements of
are 0, 1 = γ0, γ, . . . , γq–2. Multiplication and exponentiation become easy with this representation, since 0 · a = 0 for all
, whereas γi · γj = γk with k ≡ i + j (mod q – 1). Unfortunately, this representation provides no clue on how to compute γi + γj. One possibility is to store a table consisting of the values zk satisfying 1 + γk = γzk for all k = 0, . . . , q – 2 (with γk ≠ –1), so that for i ≤ j one can compute γi + γj = γi(1 + γj–i) = γiγzj–i = γl, where l ≡ i + zj–i (mod q – 1). Such a table is called Zech’s logarithm table, can be maintained for small values of q and may facilitate computations in extensions
. But if q is large (or more correctly if p is large, where q = pn), this representation of elements of
is not practical nor often feasible. Another difficulty of this representation is that it calls for a primitive element γ. If q is large and the integer factorization of q – 1 is not provided, there are no efficient methods known for finding such an element or even for checking if a given element is primitive.
|
Consider the representation of |
| k | γk | 1 + γk | zk |
|---|---|---|---|
| 0 | 1 | 2 | 4 |
| 1 | β + 1 | β + 2 | 7 |
| 2 | 2β | 2β + 1 | 3 |
| 3 | 2β + 1 | 2β + 2 | 5 |
| 4 | 2 | 0 | – |
| 5 | 2β + 2 | 2β | 2 |
| 6 | β | β + 1 | 1 |
| 7 | β + 2 | β | 6 |
| 2.80 | Let F be a field (not necessarily finite) of characteristic and let a, . Prove that (a + b)p = ap + bp, or, more generally, (a + b)pn = apn + bpn for all . [H]
|
| 2.81 | Let , and q := pn. Prove that:
|
| 2.82 | Let , n, and q := pn. Let F ⊆ K be an extension of finite fields with #F = q and #K = qm. Show that K is the splitting field of over . [H]
|
| 2.83 | Write the addition and multiplication tables of (some representations of) the fields and . Use these tables to find a primitive element in each of these fields and a normal element in (over ).
|
| 2.84 | Let K be a field (not necessarily finite or of positive characteristic).
|
| 2.85 | In this exercise, one studies the arithmetic in the finite field .
|
| 2.86 | Let F ⊆ K ⊆ L be finite extensions of finite fields with [L : K] = s. Let α, and . Prove the following assertions:
|
| 2.87 | Let be a finite extension of finite fields. In this exercise, we treat both K and L as vector spaces over K. Show that:
|
| 2.88 | Let K and L be as in Exercise 2.87 and let . Show that TrL|K(β) = 0 if and only if β = γq – γ for some .
|
| 2.89 | Let K and L be as in Exercise 2.87. Two K-bases (β0, . . . , βm–1) and (γ0, . . . , γm–1) of L are called dual or complementary, if TrL|K(βiγj) = δij.[10] Show that every K-basis of L has a unique dual basis.
|
| 2.90 | Prove that every finite extension of finite fields is Galois. [H] |
| 2.91 | For the extension , consider the map , α ↦ αq.
|
| 2.92 | Let be irreducible with deg f = d. Consider the extension and let r := gcd(d, m).
|
| 2.93 | Consider the representation of in Example 2.19. Construct the minimal polynomials over of the elements of . [H]
|
| 2.94 | Show that the number of (ordered) -bases of is
(qm – 1)(qm – q)(qm – q2) · · ·(qm – qm – 1). |
In this section, we introduce some elementary concepts from algebraic geometry, which facilitate the treatment of elliptic and hyperelliptic curves in the next two sections. We concentrate only on plane curves, because these are the only curves we need in this book. Throughout this section, K denotes a field (finite or infinite) and
the algebraic closure of K.
The solutions of a polynomial equation f(X, Y) = 0 is one of the central objects of study in algebraic geometry. For example, we know that in
the equation X2 + Y2 – 1 = 0 represents a circle with origin at (0, 0) and with radius 1. When we pass to an arbitrary field, it is often not possible to visualize such plots, but it still makes sense to talk about the set of solutions of such an equation. For example, the solutions of the above circle in
are the four discrete points (0, 1), (0, 2), (1, 0) and (2, 0). (This solution set does not really look round.)
One can generalize this study by considering polynomials in n indeterminates and by investigating the simultaneous solutions of m polynomials. We, however, do not intend to be so general here and concentrate only on curves defined by a single polynomial equation in two indeterminates.
|
For |
is an n-dimensional vector space over K. For example, the affine plane can be identified with the conventional X-Y plane.
|
An affine plane (algebraic) curve C over K is defined by a polynomial |
K-rational points on a plane curve are precisely the solutions of the defining polynomial equation. Standard examples of affine plane curves include the straight lines given by aX + bY + c = 0, a,
, not both 0, and the conic sections (circles, ellipses, parabolas and hyperbolas) given by aX2 + bXY + cY2 + dX + eY + f = 0, a, b, c, d, e,
with at least one of a, b, c non-zero. For
, the set of K-rational points can be drawn as a graph of the polynomial equation, whereas for an arbitrary field K (in particular, for finite fields) such drawings make little or no sense. However, it is often helpful to visualize curves as curves over
(also called real curves) and then generalize the situation to an arbitrary field K.
The number ∞ is not treated as a real number (or integer or natural number). But it is often helpful to extend the definition of
by including two points that are infinitely far away from the origin, one in each direction. This gives us the so-called extended real line
. An immediate advantage of such a completion of
is that every Cauchy sequence converges in
. But for studying the roots of polynomial equations it is helpful to add only a single point at infinity to
in order to get what is called the projective line
over
. Similarly, if we start with the affine plane
and add a point at infinity for each slope
of straight lines Y = aX + b and one more for the lines X = c, we get the so-called projective plane
over
. We also call the line passing through all the points at infinity in
to be the line at infinity. An immediate benefit of passing from
to
is that in
any two distinct lines (parallel or not in
) meet at exactly one point and through any two distinct points of
passes a unique line.
Now it is time to replace
by an arbitrary field K and rephrase our definitions in such a way that it continues to make sense to talk about points and line at infinity, even when K itself contains only finitely many points.
It is evident that
can be identified with the set of all 1-dimensional vector subspaces (that is lines) of the affine space
. To argue that this formal definition tallies with the intuitive notion for n = 2 and
, consider the affine 3-space
referred to by the coordinates X, Y, Z. Look at the family of planes
,
parallel to the X-Y plane. (ε0 is the X-Y plane itself.) First take a non-zero value of λ, say λ = 1. Every line in
passing through the origin and not parallel to the X-Y plane meets ε1 exactly at one point. Conversely, a unique line passes through each point on ε1 and the origin. In this way, we associate points of
with points on ε1. These are all the finite points of
. On the other hand, the lines passing through the origin and lying in the X-Y plane (ε0 : Z = 0) do not meet ε1 and correspond to the points at infinity of
.
In the last paragraph, we obtained the canonical embedding of the affine plane
in
by setting Z = 1. By definition,
is symmetric in X, Y and Z. This means that we can as well set X = 1 or Y = 1 and see that there are other embeddings of
in
. This observation often proves to be useful (for example, see Definition 2.66).
Now that we have passed from the affine plane to the projective plane, we should be able to carry (affine) plane curves to the projective plane. For this, we need some definitions.
|
Let R denote the polynomial ring K[X0, X1, . . . , Xn] over a field K. A monomial of R is an element of R of the form Let C : f(X, Y) = 0 be an affine plane curve over a field K defined by a non-zero polynomial |
Take
and
. By definition, [x, y, z] = [λx, λy, λz]. Since f(h)(λx, λy, λz) = λdf(h)(x, y, z) = 0 if and only if f(h)(x, y, z) = 0, it makes sense to talk about the zeros of the homogeneous polynomial f(h) in the projective plane
. This motivates us to define projective plane curves:
|
A projective plane curve C over K is defined by a homogeneous polynomial |
Let C : f(X, Y) = 0 be an affine plane curve. The projective plane curve defined by f(h)(X, Y, Z) is by an abuse of notation denoted also by C. The zeros of the affine curve C : f(X, Y) = 0 in
are in one-to-one correspondence with the finite zeros of C : f(h)(X, Y, Z) = 0 in
(that is, zeros with Z = 1). The projective curve contains some more point(s), namely those at infinity, that can be obtained by putting Z = 0 in f(h)(X, Y, Z). Passage from the affine plane to the projective plane is just that: a systematic inclusion of the points at infinity.
It is often customary to write an affine plane curve as C : f(X, Y) = g(X, Y) and a projective plane curve as C : f(h)(X, Y, Z) = g(h)(X, Y, Z) with f(h) and g(h) of the same degree. The former is the same as the curve C : f – g = 0, and the latter the same as C : f(h) – g(h) = 0.
A homogeneous polynomial
can be viewed as the homogenization of any of the polynomials
fZ(X, Y) = f(X, Y, 1), fY (X, Z) = f(X, 1, Z) and fX(Y, Z) = f(1, Y, Z).
Consider a point P = [a, b, c] on the projective curve C : f(X, Y, Z) = 0. Since a, b and c are not all 0, P is a finite point on at least one of fX, fY and fZ.
Throughout the rest of Section 2.10 we make the following assumption:
|
K is an algebraically closed field, that is, |
Although many of the results we state now are valid for fields that are not algebraically closed, it is convenient to make this assumption in order to avoid unnecessary complications.
Let C : f(X, Y) = 0 be a curve defined over K. Henceforth we assume that the polynomial f(X, Y) is irreducible over K. Though we write the affine equation for the curve for notational simplicity, we usually work with the set C(K) of the K-rational points on the corresponding projective curve. We refer to the solutions of C in the affine plane
as the finite points on the curve.
Now we define polynomial functions on C. For a moment, we concentrate on the affine curve, that is, only the finite points on C. Let g,
with
(that is, f|(g – h)). Since for any point P on C we have f(P) = 0, it follows that g(P) = h(P). This motivates us to define the following.
|
The ring K[X, Y]/〈f〉 is called the affine coordinate ring of C and is denoted by K[C]. Elements of K[C] are called polynomial functions on C. If we denote by x and y the residue classes of X and Y respectively in K[C], then a polynomial function on C is given by a polynomial
The quotient field (Exercise 2.34) of K[C] is called the function field of C and is denoted by K(C). An element of K(C) is of the form g(x, y)/h(x, y) with g(x, y), |
By definition, two rational functions
are equal if and only if g1(x, y)h2(x, y) – g2(x, y)h1(x, y) = 0 in K[C] or, equivalently, if and only if
. We define addition and multiplication of rational functions by the usual rules (Exercise 2.34).
|
Let P = (a, b) be a finite point on the curve C. Given a polynomial function |
By definition, K[C] and K(C) are collections of equivalence classes. However, the value of a polynomial or a rational function on C is independent of the representatives of the equivalence classes and is, therefore, a well-defined concept.
The above definitions can be extended to the corresponding projective curve C : f(h)(X, Y, Z) = 0. By Exercise 2.96(e), the polynomial f(h) is irreducible, since we assumed f to be so.
|
The function field (denoted again by K(C)) of the projective curve C is the set of quotients (called rational functions) of the form g(X, Y, Z)/h(X, Y, Z), where g, A rational function |
One can define polynomial functions on a projective curve (as we did for affine curves), but it makes no sense to talk about the value of such a polynomial function at a point P on the curve, because this value depends on the choice of the homogeneous coordinates of P (Exercise 2.95). This problem is eliminated for a rational function g/h by assuming g and h to be of the same degree.
|
Let C be a projective plane curve, r be a non-zero rational function and P a point on C. P is called a zero of r if r(P) = 0, and a pole of r if r(P) = ∞. |
Now we define the multiplicities of zeros and poles of a rational function or, more generally, the order of any point on a projective plane curve. This is based on the following result, the proof of which is long and difficult, and is omitted.
|
The function uP of the last theorem is called a uniformizing variable or a uniformizing parameter or simply a uniformizer of C at P. For any non-zero rational function |
The connection of poles and zeros with orders is established by the following theorem which we again avoid to prove.
|
P is neither a pole nor a zero of r if and only if ordP(r) = 0. P is a zero of r if and only if ordP(r) > 0. P is a pole of r if and only if ordP(r) < 0. |
If P is a zero (resp. a pole) of r, the integer ordP(r) (resp. – ordP(r)) is called the multiplicity of the zero (resp. pole) P.
|
Let r be a rational function on the projective plane curve C defined over K. Then r has finitely many poles and zeros. Furthermore, |
This is one of the theorems that demand K to be algebraically closed. More explicitly, if K is not algebraically closed, any rational function
continues to have only finitely many zeros and poles, but the sum of the orders of r at these points is not necessarily equal to 0. Also note that this sum, if taken over only the finite points of C, need not be 0, even when K is algebraically closed.
Now that we know how to define and evaluate rational functions on a curve, we are in a position to define rational maps between two curves. Let C1 : f1(X, Y, Z) = 0 and C2 : f2(X, Y, Z) = 0 be two projective plane curves defined over K by irreducible homogeneous polynomials f1,
.
Isomorphism is an equivalence relation on the set of all projective plane curves defined over K. Since two isomorphic curves share many common algebraic and geometric properties, it is of interest in algebraic geometry to study the equivalence classes (rather than the individual curves). If C1 ≅ C2 and C2 has a simpler representation than C1, then studying the properties of C2 makes our job simpler and at the same time reveals all the common properties of C1. (See Section 2.11 for an example.)
Let a be a symbol and n a positive integer. We represent by na the formal sum a+···+a (n times). We also define 0a := 0 and –na := n(–a), where the symbol –a satisfies a + (–a) = (–a) + a = 0. For n1,
, we define n1a + n2a := (n1 + n2)a. The set
under these definitions becomes an Abelian group. If we are given two symbols a, b we can analogously define formal sums na + mb, n,
, and the sum of formal sums as (n1a + m1b) + (n2a + m2b) := (n1 + n2)a + (m1 + m2)b. With these definitions the set
becomes an Abelian group. These constructions can be generalized as follows:
Now let ai be the K-rational points on a projective plane curve C defined over K. For notational convenience, we represent by [P] the symbol corresponding to the point P on C. This removes confusions in connection with elliptic curves C (See Section 2.11) for which we intend to make a distinction between P + Q and [P] + [Q] for two points P,
. The former sum is again a point on C, whereas the latter is never (the symbol corresponding to) a point on C.
Now we define divisors of rational functions on C. Henceforth we assume that C is smooth (that is, smooth at all K-rational points on C).
|
The divisor of a rational function A divisor |
Though the Jacobian
is defined for an arbitrary smooth curve C (defined by an irreducible polynomial), it is a special class of curves called hyperelliptic curves for which it is particularly easy to represent and do arithmetic in the group
. This gives us yet another family of groups on which cryptographic protocols can be built.
If K is not algebraically closed, we need not have
for a rational function
. This means that in that case the group
cannot be defined in the above manner. However, since C is also a curve defined over
, we can define
as above and call a particular subgroup of
as the Jacobian
of C over K. We defer this discussion until Section 2.12.
In this exercise set, we do not assume (unless otherwise stated) that K is necessarily algebraically closed.
| 2.95 |
|
| 2.96 | In this exercise, we generalize the notion of homogenization and dehomogenization of polynomials. Let K[X1, . . . , Xn] denote the polynomial ring in n indeterminates. Introducing another indeterminate X0, we define the homogenization of a polynomial as
Prove the following assertions.
|
| 2.97 | Let C : f(X, Y) = 0 be an affine plane curve defined by a non-zero polynomial and C : f(h)(X, Y, Z) = 0 the corresponding projective plane curve. Let d := deg f = deg f(h) and fd the sum of non-zero terms of f of degree d. Show that:
|
| 2.98 | Show that the defining polynomial of the elliptic curve in Exercise 2.97(e) is irreducible. Prove the same for the hyperelliptic curve of Exercise 2.97(f). [H] |
| 2.99 | Show that for an ideal the following two conditions are equivalent:
An ideal satisfying the above equivalent conditions is called a homogeneous ideal. Construct an example to demonstrate that all ideals of K[X1, . . . , Xn] need not be homogeneous. |
The mathematics of elliptic curves is vast and complicated. A reasonably complete understanding of elliptic curves would require a book of comparable size as this. So we plan to be rather informal while talking about elliptic curves and about their generalizations called hyperelliptic curves. Interested readers can go through the books suggested at the end of this chapter to learn more about these curves. In this section, K stands for a field (finite or infinite) and
the algebraic closure of K.
An elliptic curve E over K is a plane curve defined by the polynomial equation
Equation 2.6

or by the corresponding homogeneous equation
E : Y2Z + a1XYZ + a3YZ2 = X3 + a2X2Z + a4XZ2 + a6Z3.
These equations are called the Weierstrass equations for E. In order that E qualifies as an elliptic curve, we additionally require that it is smooth at all
-rational points (Definition 2.66).[12] Two elliptic curves defined over the field
are shown in Figure 2.1.
[12] Ellipses are not elliptic curves.

(a) Y2 = X3 – X + 1
(b) Y2 = X3 – X

E contains a single point at infinity, namely
(Exercise 2.97(e)). The set of K-rational points on E in the projective plane
is denoted by E(K) and is the central object of study in the theory of elliptic curves. We shortly endow E(K) with a group structure and this group is used extensively in cryptography.
Let us first see how we can simplify the equation for E. The simplification depends on the characteristic of K. Because fields of characteristic 3 are only rarely used in cryptography, we will not deal with such fields. Simplification of the Weierstrass equation is effected by suitable changes of coordinates. A special kind of transformation is allowed in order to preserve the geometric and algebraic properties of an elliptic curve.
|
Two elliptic curves
defined over K are isomorphic (Definition 2.72) if and only if there exist Equation 2.7
|
The theorem is not proved here. Formulas (2.7) can be checked by tedious calculations. A change of variables as in Theorem 2.44 is referred to as an admissible change of variables. We denote this by
(X, Y) ← (u2X + r, u3Y + u2sX + t).
The inverse transformation is also admissible and is given by

Isomorphism is an equivalence relation on the set of all elliptic curves over K.
Consider the elliptic curve E over K given by Equation (2.6). If char K ≠ 2, the admissible change
transforms E to the form
E1 : Y2 = X3 + b2X2 + b4X + b6.
If, in addition, char K ≠ 3, the admissible change
transforms E1 to E2 : Y2 = X3 + aX + b. We henceforth assume that an elliptic curve over a field of characteristic ≠ 2, 3 is defined by
Equation 2.8

(instead of by the original Weierstrass Equation (2.6)).
If char K = 2, the Weierstrass equation cannot be simplified as in Equation (2.8). In this case, we consider two cases separately, namely a1 ≠ 0 or otherwise. In the former case, the admissible change
allows us to write Equation (2.6) in the simplified form
Equation 2.9

On the other hand, if a1 = 0, then the admissible change (X, Y) ← (X + a2, Y) shows that E can be written in the form
Equation 2.10

A curve defined by Equation (2.9) is called non-supersingular, whereas one defined by Equation (2.10) is called supersingular.
Now we associate two quantities with an elliptic curve. The importance of these quantities follows from the subsequent theorem. We start with the generic Weierstrass equation and later specialize to the simplified formulas.
|
For the curve given by Equation (2.6), we define the following quantities: Equation 2.11
Δ(E) is called the discriminant of the curve E, and j(E) the j-invariant of E. |
For the special cases given by the simplified equations above, these quantities have more compact formulas as given in Table 2.5.
|
For the curve E defined by Equation (2.6), the following properties hold:
Proof
|
Consider an elliptic curve E over a field K. We now define an operation (which is conventionally denoted by +) on the set E(K) of K-rational points on E in the projective plane
. This operation provides a group structure on E(K). It is important to point out that this group is not the same as the group DivK(E) of divisors on E(K) (Definition 2.74), since the sum of points we are going to define is not formal. However, there is a connection between these two groups (See Exercise 2.125).
|
Let E be the elliptic curve defined by Equation (2.6) and
|
No simple proof of this theorem is known. Indeed the only group axiom that is difficult to check is associativity, that is, to check that (P + Q) + R = P + (Q + R) for all P, Q,
. An elementary strategy would be to write explicit formulas for (P + Q) + R and P + (Q + R) (using the formulas for P + Q given below) and show that they are equal, but this process involves a lot of awful calculations and consideration of many cases.
There are other proofs that are more elegant, but not as elementary. One possibility is to use the theory of divisors and is outlined now. It turns out that the Jacobian
has a bijective correspondence with the set E(K) via the map
which takes
to
(more correctly to the equivalence class of the divisor
in
). Furthermore,
, where the addition on the left is the addition on E(K) as defined above and the addition on the right is that in the Jacobian
. By definition,
is naturally an additive Abelian group. It immediately follows that E(K) is an additive Abelian group too. (See Exercise 2.125.)
We now give the formulas for the coordinates of the points –P and P + Q on E(K). The derivation of these formulas for the general case is left to the reader (Exercise 2.102). We concentrate on the important special cases. We assume that P = (h1, k1) and Q = (h2, k2) are finite points on E(K) with Q ≠ –P so that
.
If char K ≠ 2, 3 and E is defined by Equation 2.8, we have:

Next, we consider char K = 2 and non-supersingular curves (Equation 2.9). The formulas in this case are:

Finally, for supersingular curves (Equation 2.10) with char K = 2, we have:

We denote by mP the sum P + · · · + P (m times) for a point
and for
. We also define
and (–m)P := –(mP) (for
).
|
|
Let |
Multiples mP of a point
can be expressed using nice formulas.
|
For an elliptic curve defined over K by the equation E : f(X, Y) = 0 and for mP = (θm(h, k)/ψm(h, k)2, ωm(h, k)/ψm(h, k)3). The polynomial ψm is called the m-th division polynomial of E. |
Using the addition formula one can verify the following recursive description for ψm and the expressions for θm and ωm in terms of ψm.
|
For an elliptic curve E defined by the general Weierstrass Equation (2.6) over a field K, the division polynomials ψm,
where di are as in Definition 2.76. The polynomials θm satisfy
and for char K ≠ 2, one has
|
It follows by induction on m that these formulas really give polynomial expressions for ψm, θm and ωm for all
. For even m, the polynomial ψm is divisible by ψ2. Furthermore, for
the polynomials
defined as

can be expressed as polynomials in x only. These univariate polynomials
are easier to handle than the bivariate ones ψm and, by an abuse of notation, are also called division polynomials. The degrees of
satisfy the inequality:

Points of E[m] can be characterized in terms of the division polynomials:
|
Ler |
We finally define polynomials
as follows. If char K ≠ 2, then
for all
. On the other hand, for char K = 2 and for non-supersingular curves over K we already have
(Exercise 2.107), and it is customary to define fm(x) := ψm(x, y) for all
. By further abuse of notations, we also call fm the m-th division polynomial of E.
In this section, we take
, a finite field of cardinality q and characteristic p. We do not deal with the case p = 3. Let E be an elliptic curve defined over
. If p > 3, we assume that E is defined by Equation (2.8), whereas for p = 2, we assume that E is defined by Equation (2.10) or Equation (2.9) depending on whether E is supersingular or not.
Since
is a subset of
, the cardinality
is finite. The next theorem shows that
is quite close to q.
|
|
The implication of this theorem is that the possible cardinalities of
lie in a rather narrow interval
. If q = p is a prime, then for every n,
, there is at least one curve E with
. Moreover, the values of
are distributed almost uniformly in the interval
. However, if q is not a prime, these nice results do not continue to hold.
|
If t = 1 (that is, if |
Anomalous and supersingular curves are cryptographically weak, because certain algorithms are known with running time better than exponential to solve the so-called elliptic curve discrete logarithm problem over these curves. Determination of the order
gives t from which one can easily check whether E is anomalous or supersingular. If p = 2, we have an easier check for supersingularity.
|
An elliptic curve E over a finite field of characteristic 2 is supersingular if and only if j(E) = 0 or, equivalently, if and only if a1 = 0 in Equation (2.6). |
For arbitrary characteristic p, we have the following characterization.
|
An elliptic curve E over |
By Theorem 2.38, the group
is always cyclic. However, the group
is not always cyclic, but is of a special kind. We need a few definitions to explain the structure of
. The notion of internal direct product for multiplicative groups (Exercise 2.19) can be readily applied to additive groups as follows.

|
The elliptic curve group |
Once we know the order
of the group
, it is easy to compute the order of
as the following theorem suggests.
|
Let α, |
| 2.100 | Show that the following curves over K are not smooth (and hence not elliptic curves): |
| 2.101 |
|
| 2.102 | Let P = (h1, k1) and Q = (h2, k2) be two points (different from ) in E(K) defined by the Weierstrass Equation (2.6). Assume that Q ≠ –P. Determine R = (h3, k3) = P + Q as follows:
|
| 2.103 | Let . Show that there exists an elliptic curve E over K such that . [H]
|
| 2.104 | Assume that char K ≠ 2, 3 and consider the elliptic curve E given by Equation (2.8). Let K[E] be the affine coordinate ring and K(E) the field of rational functions on E.
|
| 2.105 | Show that the division polynomials for the general Weierstrass equation can be recursively defined as
where F = 4x3 + d2x2 + 2d4x + d6. |
| 2.106 | Write the recursive formulas for the division polynomials ψm(x, y) and for the elliptic curve E defined by Equation 2.8 over a field K of characteristic ≠ 2, 3. Show that for m ≥ 2 and for we have
|
| 2.107 | Write the recursive formulas for the division polynomials ψm(x, y) and for the elliptic curve E defined by Equation 2.9 over a field K of characteristic 2. Conclude that ψm are polynomials in only x for all . With fm := ψm for all show that for m ≥ 2 and for we have
|
| 2.108 | Consider the elliptic curve defined over the field :
Ea,b : Y2 = X3 + aX + b. Verify the following assertions: (You may write a computer program.)
|
| 2.109 | Consider the representation of as , where ξ is a root of T3 + T + 1 in . Identify an element (where ) with the integer (a2a1a0)2 = a222 + a12 + a0. For integers a, , b ≠ 0, define the non-supersingular elliptic curve:
Ea,b : Y2 + XY = X3 + aX2 + b. Verify the following assertions: (You may write a computer program.)
|
| 2.110 | Consider the representation of and the identification of elements of with integers as in Exercise 2.109. For a, b, , a ≠ 0, define the supersingular elliptic curve:
Ea,b,c : Y2 + aY = X3 + bX + c. Verify the following assertions: (You may write a computer program.)
|
| 2.111 | Consider the elliptic curve E : Y2 + XY = X3 + X2 + 1 defined over for all . Show that
where r = ⌊n/2⌋. [H] Conclude that E is anomalous over |
| 2.112 | Let K be a finite field of characteristic ≠ 2, 3 and E : Y2 = X3 + aX + b an elliptic curve defined over K. Prove that:
|
| 2.113 | Let E : Y2 + XY = X3 + aX2 + b be a non-supersingular elliptic curve defined over . Prove that:
|
| 2.114 | Let E : Y2 + aY = X3 + bX + c be a supersingular elliptic curve over . Prove that:
|
| 2.115 |
|
| 2.116 | Let , p ≡ 3 (mod 4), and a, . Consider the elliptic curve E : Y2 = X3 – a2X over (or over ). Prove that:
|
| 2.117 | A Weierstrass equation of an elliptic curve defined over a field K is said to be in the Legendre form, if it can be written as
Equation 2.12
for some |
Hyperelliptic curves are generalizations of elliptic curves. We cannot define a group structure on a general hyperelliptic curve in the way as we did for elliptic curves. We instead work in the Jacobian of a hyperelliptic curve. For an elliptic curve E over an algebraically closed field K, the Jacobian
is canonically isomorphic to the group E(K). Thus one can as well use the techniques for hyperelliptic curves for describing and working in elliptic curve groups. However, the exposition of the previous section turns out to be more intuitive and computationally oriented.
A hyperelliptic curve C of genus
over a field K is defined by a polynomial equation of the form
Equation 2.13

In order that C qualifies as a hyperelliptic curve, we additionally require that C (as a projective curve) be smooth over
. The set of K-rational points on C is denoted as usual by C(K). For g = 1, Equation (2.13) is the same as the Weierstrass Equation (2.6) on p 98, that is, elliptic curves are hyperelliptic curves of genus one. A hyperelliptic curve of genus 2 over
is shown in Figure 2.2.
: Y2 = X(X2 – 1)(X2 – 2)

A hyperelliptic curve has only one point at infinity
(Exercise 2.97(f)) and is smooth at
. If char K ≠ 2, substituting
simplifies Equation (2.13) as
. Since
is a monic polynomial in K[X] of degree 2g + 1, we may assume that if char K ≠ 2, the equation for C is of the form:
Equation 2.14

|
If char K ≠ 2, then the hyperelliptic curve C defined by Equation (2.14) is smooth if and only if v has no multiple roots (in Proof First, consider char K ≠ 2. If v has a multiple root, say For char K = 2 and |
|
Let P = (h, k) be a finite point on the hyperelliptic curve C defined by Equation (2.13). The point
|
All the general theory we described in Section 2.10 continues to be valid for hyperelliptic curves. However, since we are now given an explicit equation describing the curves, we can give more explicit expressions for polynomial and rational functions on hyperelliptic curves. For simplicity, we consider the affine equation and extend our definitions separately for the point at infinity.
Consider the hyperelliptic curve C defined by Equation (2.13). By Exercise 2.98, the defining polynomial f(X, Y) := Y2 + u(X)Y – v(X) (or its homogenization) is irreducible over
, so that the affine (or projective) coordinate ring of C is an integral domain and the corresponding function field is simply the field of fractions of the coordinate ring.
Let
. Since y2 + u(x)y – v(x) = 0 in K[C], we can repeatedly substitute y2 by –u(x)y + v(x) in G(x, y) until the y-degree of G(x, y) becomes less than 2. This proves part of the following:
|
Every polynomial function Proof In order to establish the uniqueness, note that if G(x, y) = a1(x) + yb1(x) = a2(x) + yb2(x), then |
|
Let |
Some useful properties of the norm function are listed in the following lemma, the proof of which is left to the reader as an easy exercise.
|
For G,
|
We also have an easy description of the rational functions on C.
|
Every rational function Proof We can write r(x, y) = G(x, y)/H(x, y) for G, |
The value of a rational function on C at a finite point on C can be defined as in the case of general curves (See Definition 2.68). In order to define the value of a rational function at the point
, we need some other concepts.
For a moment, let us assume that
. From the equation of C, we see that k2 ≈ h2g+1 (neglecting lower-degree terms) for sufficiently large coordinates h, k of a point
. This means that k tends to infinity exponentially (2g + 1)/2 times as fast as h does. So it is customary to give Y a weight (2g + 1)/2 times a weight we give to X. The smallest integral weights of X and Y to satisfy this are 2 and 2g + 1 respectively. This motivates us to provide Definition 2.84 (generalized for any K).
|
Let If 0 ≠ G = a(x)+yb(x), d1 = degx a and d2 = degx b, then the leading coefficient of G is taken to be the coefficient of xd1 in a(x) if deg G = 2d1, or to be the coefficient of xd2 in b(x) if deg G = 2g + 1 + 2d2. (We cannot have 2d1 = 2g + 1 + 2d2, since the left side is even and the right side is odd.) |
Some basic properties of the degree function follow.
|
For G,
Proof Easy exercise. |
Now we are in a position to give an explicit definition of the value of a rational function at
.
|
For If deg(G) < deg(H), then If deg(G) > deg(H), then If deg(G) = deg(H), then |
Now that we have a complete description of the value of a rational function at any point on C, poles and zeros of rational functions on C can be defined as in Definition 2.70. In order to define the order of a polynomial or rational function at a point P on C, we should find a uniformizing parameter uP at P. Tedious calculations help one deduce the following explicit expressions for uP.
|
Let
as a uniformizing parameter at P. Finally, |
We give an alternative definition of the order (independent of uP), which is computationally useful and which is equivalent to Definition 2.71 for a hyperelliptic curve.
|
Let Now consider r = (x – h)m for some m < 0. Write r = G/H with G = 1 and H = (x – h)–m. Since ordQ(r) = ordQ(G) – ordQ(h), we continue to have
If m ≥ 0, then r is a polynomial function and has zeros P and |
|
A non-constant polynomial function |
We continue to work with the hyperelliptic curve C of Equation (2.13). We first impose the restriction that K is algebraically closed and use the theory of Section 2.10 to define the set Div(C) of divisors on C, the degree zero part Div0(C) of Div(C), the divisor Div(r) of a rational function
, the set Prin(C) of principal divisors on C, the Picard group Pic(C) = Div(C)/ Prin(C) and the Jacobian
.
|
For the rational function r := (x – h)m of Example 2.23, we have:
|
The Jacobian
is the set of all cosets of Prin(C) in Div0(C). It is not a good idea to work with cosets (which are equivalence classes). Recall that in the case of
, we represented a coset
by the remainder of Euclidean division of a by n. In case of the representation
, we took polynomials of smallest degrees as canonical representatives of the cosets of 〈f(X)〉. In case of
too, we intend to find such good representatives, one from each coset. We now introduce the concept of reduced divisors for that purpose.
|
Two divisors D1, |
Our goal is to associate to every divisor
some unique reduced divisor
with D ~ Dred, that is, Dred plays the role of the canonical representative of
. We start with the following definition.
|
A divisor |
|
Every divisor Proof Let
and
with m1 and m2 so chosen that D1,
Now, we explain how we can represent a semi-reduced divisor by a pair of polynomials a(x), |
|
Let
|
We denote the divisor gcd
by Div(a, b). The zero divisor has the representation Div(1, 0).
A representation of the elements of
by semi-reduced divisors (that is, by pairs of polynomials in K[x]) suffers from two disadvantages. First, the representation is not unique, and second, the degrees of the representing polynomials may be quite large. These difficulties are removed if we consider semi-reduced divisors of a special kind.
|
A semi-reduced divisor |
The following theorem establishes the desirable properties of a reduced divisor.
|
For Proof We only prove the existence of reduced divisors. For the proof of the uniqueness, one may, for example, see Koblitz [154]. The norm of a divisor Let |
From the viewpoint of cryptography, the field K should be a finite field which is never algebraically closed. So we must remove the restriction
. Since C is naturally defined over
as well, we start with the Jacobian
and define a particular subgroup of
to be the Jacobian
of C over K.
|
Let |
Every element of
can be represented uniquely as a reduced divisor Div(a, b) for polynomials a(x),
with degx a ≤ g and degx b < degx a.
is, therefore, a finite Abelian group. For suitably chosen hyperelliptic curves, these groups can be used to build cryptographic protocols.
In this exercise set, we let C denote a hyperelliptic curve of genus g defined by Equation (2.13) over a field K (not necessarily algebraically closed).
| 2.118 |
|
| 2.119 | Represent as , where ξ is a root of the irreducible polynomial .
|
| 2.120 | Let . Prove the following assertions:
|
| 2.121 | Prove Lemmas 2.9 and 2.10. |
| 2.122 | Let and .
|
| 2.123 | Prove Theorem 2.52. [H] |
| 2.124 | A line on C is a polynomial function of the form with a, b, , a and b not both 0.
|
| 2.125 | Let E be an elliptic curve (that is, a hyperelliptic curve of genus 1) defined over K.
|
In this section, we develop the theory of number fields and rings. Our aim is to make accessible to the readers the working of the cryptanalytic algorithms based on number field sieves.
Commutative algebra is the study of commutative rings with identity (rings by our definition). Modern number theory and geometry are based on results from this area of mathematics. Here we give a brief sketch of some commutative algebra tools that we need for developing the theory of number fields.
We start with some basic operations on ideals (cf. Example 2.7, Definition 2.23).
One can readily check that the operations intersection, sum and product on ideals in a ring are associative and commutative.
Commutative algebra extensively uses the theory of prime and maximal ideals (Definition 2.19, Proposition 2.9, Corollary 2.2 and Exercise 2.23). The set of all prime ideals in A is called the (prime) spectrum of A and is denoted by Spec A. The set of all maximal ideals of A is called the maximal spectrum of A and denoted by Spm A. We have Spm A ⊆ Spec A. These two sets play an extremely useful role for the study of the ring A. If A is non-zero, both these sets are non-empty.
The concept of formation of fractions of integers to give the rationals can be applied in a more general setting. Instead of having any non-zero element in the denominator of a fraction we may allow only elements from a specific subset. All we require to make the collection of fractions a ring is that the allowed denominators should be closed under multiplication.
|
Let A be a ring. A non-empty subset S of A is called multiplicatively closed or simply multiplicative, if |
|
Let A be a ring and S a multiplicative subset of A. We define a relation ~ on A × S as: (a, s) ~ (b, t) if and only if u(at – bs) = 0 for some
. (If A is an integral domain, one may take u = 1 in the definition of ~.) It is easy to check that ~ is an equivalence relation on A × S. The set of equivalence classes of A × S under ~ is denoted by S–1A, whereas the equivalence class of
is denoted as a/s. For a/s,
, define (a/s) + (b/t) := (at + bs)/(st) and (a/s)(b/t) := (ab)/(st). It is easy to check that these operations are well-defined and make S–1 A a ring with identity 1/1, in which each s/1,
, is invertible. There is a canonical ring homomorphism
taking a ↦ a/1. In general,
is not injective. However, if A is an integral domain and 0 ∉ S, then the injectivity of
can be proved easily and we say that the ring A is canonically embedded in the ring S–1A.
|
Let A be a ring and S a multiplicative subset of A. The ring S–1A constructed as above is called the localization of A away from S or the ring of fractions of A with respect to S. |
|
The concept of integral dependence generalizes the notion of integers. Recall that for a field extension K ⊆ L, an element
is called algebraic over K, if α is a root of a non-zero polynomial
. Since K is a field, the polynomial f can be divided by its leading coefficient, giving a monic polynomial in K[X] of which α is a root. However, if K is not a field, division by the leading coefficient is not always permissible. So we require the minimal polynomial to be monic in order to define a special class of objects.
|
Let A ⊆ B be an extension of rings. An element
|
|
Now let A ⊆ B be an extension of rings and let C consist of all the elements of B that are integral over A. Clearly, A ⊆ C ⊆ B. It turns out that C is again a ring. This result is not at all immediate from the definition of integral elements. We prove this by using the following lemma which generalizes Theorem 2.33.
|
For a ring extension A ⊆ B and for
Proof [(a)⇒(b)] Let αn + an–1αn–1 + · · · + a1α + a0 = 0, [(b)⇒(c)] Take C := A[α]. [(c)⇒(a)] Let |
|
For an extension A ⊆ B of rings, the set
is a subring of B containing A. Proof Clearly, A ⊆ C ⊆ B as sets. To show that C is a ring let α, |
|
The ring C of Proposition 2.42 is called the integral closure of A in B. A is called integrally closed in B, if C = A. On the other hand, if C = B, we say that B is an integral extension of A or that B is integral over A. An integral domain A is called integrally closed (without specific mention of the ring in which it is so), if A is integrally closed in its quotient field Q(A). An integrally closed integral domain is called a normal domain (ND). |
|
Recall that a PID is a ring (integral domain) in which every ideal is principal, that is, generated by a single element. We now want to be a bit more general and demand every ideal to be finitely generated. If a ring meets our demand, we call it a Noetherian ring. These rings are named after Emmy Noether (1882–1935) who was one of the most celebrated lady mathematicians of all ages and whose work on Noetherian rings has been very fundamental and deep in the branch of algebra. Emmy’s father Max Noether (1844 –1921) was also an eminent mathematician.
|
For a ring A, the following conditions are equivalent:
Proof [(a)⇒(b)] Let [(b)⇒(c)] Let S be a non-empty set of ideals of A. Order S by inclusion. The ACC implies that every chain in S has an upper bound in S. By Zorn’s lemma, S has a maximal element. [(c)⇒(a)] Let |
|
A ring A is called Noetherian, if A satisfies (one and hence all of) the equivalent conditions of Proposition 2.43. |
We have seen that if A is a PID, the polynomial ring A[X] need not be a PID. However, the property of being Noetherian is preserved during the passage from A to A[X] (Theorem 2.8).
A class of rings proves to be vital in the study of number fields:
|
An integral domain A is called a Dedekind domain, if it satisfies all of the following three conditions: |
After much ado we are finally in a position to define the basic objects of study in this section.
|
A number field K is defined to be a finite (and hence algebraic) extension of the field |
Note that there exist considerable controversies among mathematicians in accepting this definition of number fields. Some insist that any field K satisfying
should be called a number field. Some others restrict the definition by demanding that one must have K algebraic over
; however, fields K with infinite extension degree
are allowed. We restrict the definition further by imposing the condition that
has to be finite. Our restricted definition is seemingly the most widely accepted one. In this book, we study only the number fields of Definition 2.100 and accepting this definition would at the minimum save us from writing huge expressions like “(algebraic) number fields of finite extension degree over
” to denote number fields.
For number fields, the notion of integral closure leads to the following definition.
|
A number field K contains |
By Example 2.27(2), the ring of integers of the number field
is
, that is,
. It is, therefore, customary to call the elements of
rational integers. Since
is naturally embedded in
for any number field K, it is important to notice the distinction between the integers of K (that is, the elements of
) and the rational integers of K (that is, the images of the canonical inclusion
).
Some simple properties of number rings are listed below.
|
For a number field K, we have:
Proof (1) follows immediately from Example 2.27(2), (2) follows from Exercise 2.60, and (3) follows from Exercise 2.126(b). |
Let K be a number field of degree d. By Corollary 2.13, K is a simple extension of
, that is, there exists an element
with a minimal polynomial f(X) over
such that deg
and
. The field K is a
-vector space of dimension d with basis 1, α, . . . , αd–1. There exists a nonzero integer a such that
is an algebraic integer and we continue to have
. Thus, without loss of generality, we may take α to be an algebraic integer. In this case, the
-basis 1, α, . . . , αd–1 of K consists only of algebraic integers.
Conversely, let
be an irreducible polynomial of degree d ≥ 1. The field
is a number field of degree d and the elements of K can be represented by polynomials with rational coefficients and of degrees < d. Arithmetic in K is carried out as the polynomial arithmetic of
followed by reduction modulo the defining irreducible polynomial f(X). This gives us an algebraic representation of K independent of any element of K. Now, K can also be viewed as a subfield of
and the elements of K can be represented as complex numbers.[16] A representation
with a field isomorphism
is called a complex embedding of K in
.[17] Such a representation is not unique as Proposition 2.45 demonstrates.
[16] A complex number
has a representation by a pair (a, b) of real numbers. Here,
plays the role of X + 〈X2 + 1〉 in
. Finally, every real number has a decimal (or binary or hexadecimal or . . .) representation.
[17] The field
is canonically embedded in K. It is evident that the embedding σ : K → K′ fixes
element-wise.
|
A number field K of degree d ≥ 1 has exactly d distinct complex embeddings. Proof As above we take |
This proposition says that the conjugates α1, . . . , αd are algebraically indistinguishable. For example, X2 + 1 has two roots ±i, where
. But it makes little sense to talk about the positive and the negative square roots of –1? They are algebraically indistinguishable and if one calls one of these i, the other one becomes –i.[18] However, if a representation of
is given, we can distinguish between
and
by associating these quantities with the elements
and
respectively, where
is the positive real square root of 5 and where
is the imaginary unit available from the given representation of
.
[18] In a number theory seminar in 1996, Hendrik W. Lenstra, Jr. commented:
Suppose the Martians defined the complex numbers by adjoining a root of –1 they called j. And when the Earth and Martians start talking, they have to translate i to be either j or –j. So we take i to j, because I think that’s what the scientists will decide. ··· But it was later discovered that most Martians are left handed, so the philosophers decide it’s better to send i to –j instead.
It is also quite customary to start with
for some algebraic
and seek for the complex embeddings of K in
. One then considers the minimal polynomial f(X) of α (over
) and proceeds as in the proof of Proposition 2.45 but now defining the map
as the unique field isomorphism that fixes
and takes α ↦ αi. If we take α = α1, then σ1 is the identity map, whereas σ2, . . . , σd are non-identity field isomorphisms.
The moral of this story is that whether one wants to view the number field K as
or as
for any
is one’s personal choice. In any case, one will be dealing with the same mathematical object and as long as representation issues are not brought into the scene, all these definitions of a number field are absolutely equivalent.
The embeddings
need not be all distinct as sets. For example, the two embeddings
and
of
are identical as sets. But the maps x ↦ i and x ↦ –i are distinct (where x := X + 〈X2 + 1〉). Thus while specifying a complex embedding of a number field K, it is necessary to mention not only the subfield K′ of
isomorphic to K, but also the explicit field isomorphism K → K′.
|
|
The simplest examples of number fields are the quadratic number fields, that is, number fields of degree 2. Some special properties of quadratic number fields are covered in the exercises. It follows from Exercise 2.136 that every quadratic number field is of the form
for some non-zero square-free integer D ≠ 1.
Now we investigate the
-module structure of
for a number field K of degree d. Let σ1, . . . , σd be the complex embeddings of K.
|
For an element Equation 2.15
and the norm of α (over
|
If g(X) is the minimal polynomial of α over
and r := deg g, then r|d. Moreover,
. So Tr(α) and N(α) belong to
. If α is an algebraic integer, then
, that is, Tr(α),
.
The following properties of the norm and trace functions can be readily verified. Here α,
and
.
| Tr(α + β) | = | Tr(α) + Tr(β), |
| N(αβ) | = | N(α)N(β), |
| Tr(cα) | = | c Tr(α), |
| N(cα) | = | cdN(α), |
| Tr(c) | = | cd, |
| N(c) | = | cd. |
|
Let |
|
Δ(β1, . . . , βd) = (det(σj(βi)))2. Proof Consider the matrices D := (Tr(βiβj)) and E := (σj(βi)). By definition, we have Δ(β1, . . . , βd) = det D. We show that D = EEt, which implies that det D = (det E)2. The ij-th entry of EEt is
where the last equality follows from Equation (2.15). |
Let
for some
and let f(X) be the minimal polynomial of α over
. We define the discriminant of f as
Δ(f) := Δ(1, α, α2, ..., αd–1).
We have to show that the quantity Δ(f) is well-defined, that is, independent of the choice of the root α of f(X). Let α = α1, α2, . . . .αd be all the roots of f(X) and let the complex embedding σj of K map α to αj. By Proposition 2.46, we have Δ(f) = (det E)2, where
. Computing the determinant of E gives
, which implies that Δ(f) is independent of the permutations of the conjugates α1, . . . , αd of α. Notice that since α1, . . . , αd are all distinct, Δ(f) ≠ 0.
Let us deduce a useful formula for Δ(f). Write
and take formal derivative to get
, that is,
. Therefore,
, that is,
Equation 2.16

For arbitrary
, the discriminant Δ(β1, . . . , βd) discriminates between the cases that β1, . . . , βd form a
-basis of K and that they do not.
|
Let Proof Let E1 := (σj(βi)) and E2 := (σj(γi)). Now
is the ij-th entry of the matrix T E1, that is, E2 = T E1. Hence Δ(γ1, . . . , γd) = (det E2)2 = (det T)2(det E1)2 = (det T)2Δ(β1, . . . , βd). |
|
Let |
|
Proof Let |
Finally comes the desired characterization of
.
|
For a number field K of degree d, the ring Proof Let Claim:
Claim: Assume not, that is, there exists
by Lemma 2.12, we have Δ(γ1, . . . , γd) = (det T)2Δ(β1, . . . , βd) = r2Δ(β1, . . . , βd). Since r ≠ 0, Δ(γ1, . . . , γd) ≠ 0, that is, (γ1, . . . , γd) is again a |
|
Every integral basis of K has the same discriminant (for a given K). Proof Let |
|
Let |
Recall that K, as a vector space over
, always possesses a
-basis of the form 1, α, . . . , αd–1.
, as a
-module, is free of rank d, but every number field K need not possess an integral basis of the form 1, α, . . . , αd–1. Whenever it does,
is called monogenic and an integral basis 1, α, . . . , αd–1 of K is called a power integral basis. Clearly, if K has a power integral basis 1, α, . . . , αd–1, then
. But the converse is not true, that is, for
with
, 1, α, . . . , αd–1 need not be an integral basis of K, even when
is monogenic.
|
Consider the quadratic number field Case 1: D ≡ 2, 3 (mod 4) Here
Case 2: D ≡ 1 (mod 4) In this case,
|
Ideals in a number ring possess very rich structures. We prove that number rings are Dedekind domains (Definition 2.99). A Dedekind domain (henceforth abbreviated as DD) need not be a UFD (or a PID). However, it is a ring in which ideals admit unique factorizations into products of prime ideals.
Let K be a number field of degree
and
its ring of integers. If
is a homomorphism of rings and if
is a prime ideal of B, then the contraction
is a prime ideal of A. We say that
lies above or over
. If A ⊆ B and
is the inclusion homomorphism, then
. For a number field K, we consider the natural inclusion
.
|
Let Proof Let |
|
Proof Let |
|
The ring Proof We have proved that |
Now we derive the unique factorization theorem for ideals in a DD. It is going to be a long story. We refer the reader to Definition 2.92 to recall how the product of two ideals is defined.
|
Let A be a ring, Proof The proof is obvious for r = 1. So assume that r > 1. If |
We now generalize the concept of ideals.
|
Let A be an integral domain and K := Q(A). An A-submodule |
Every ideal of A is evidently a fractional ideal of A and hence is often called an integral ideal of A. Conversely, every fractional ideal of A contained in A is an integral ideal of A. The principal fractional ideal Ax is the A-submodule of K generated by
. If A is a Noetherian domain, we have the following equivalent characterization of fractional ideals.
|
Let A be a Noetherian integral domain, K := Q(A) and Proof [if] Let [only if] Let |
We define the product of two fractional ideals
,
of an integral domain A as we did for integral ideals:

It is easy to check that
is again a fractional ideal of A. Let
denote the set of non-zero fractional ideals of A. The product of fractional ideals defines a commutative and associative binary operation on
. The ideal A acts as a (multiplicative) identity in
. A fractional ideal
of A is called invertible, if
for some fractional ideal
of A. We deduce shortly that if A is a DD, then every non-zero fractional ideal of A is invertible and, therefore,
is a group under multiplication of fractional ideals.
|
Let A be a Noetherian domain and Proof Let S be the set of ideals of A for which the lemma does not hold. Assume that |
Note that the condition “each containing
” was necessary in Lemma 2.16 in order to rule out the trivial possibility that
for some
.
|
Let A be a DD, K := Q(A) and
Then we have:
Proof
|
|
Every non-zero ideal Proof If In order to prove the uniqueness of this product, let |
In the factorization of a non-zero ideal of a DD, we do not rule out the possibility of repeated occurrences of factors. Taking this into account shows that every non-zero ideal
in a DD A admits a unique factorization

with distinct non-zero prime ideals
and with exponents
. Here uniqueness is up to permutations of the indexes 1, . . . , r. This factorization can be extended to fractional ideals, but this time we have to allow non-positive exponents. First note that for integers e1, . . . , er and non-zero prime ideals
of A the product
is well-defined and is a fractional ideal of
. The converse is proved in the following corollary.
|
Every non-zero fractional ideal Proof By definition, there exists |
The fractional ideal
in Corollary 2.22 is denoted by
. We have
. One can easily verify that
defined as above is equal to the set

In fact, one can use the last equality as the definition for
.
To sum up, every non-zero fractional ideal of a DD A is invertible and the set
of all non-zero fractional ideals of A is a group. The unit ideal A acts as the identity in
.
As in every group, we have the cancellation law(s) in
.
|
Let A be a DD and |
In view of unique factorization of ideals in A, we can speak of the divisibility of integral ideals in A. Let
and
be two integral ideals of A. We say that
divides
and write
, if
for some integral ideal
of A. We now show that the condition
is equivalent to the condition
. Thus for ideals in a DD the term divides is synonymous with contains.
|
Let Proof [if] If Also [only if] If |
As we pass from
to
, the notion of unique factorization passes from the element level to the ideal level. If a DD is already a PID, these two concepts are equivalent. (Non-zero prime ideals in a PID are generated by prime elements.) Though every UFD need not be a PID, we have the following result for a DD.
|
A Dedekind domain A is a UFD, if and only if A is a PID. Proof [if] Every PID is a UFD (Theorem 2.11). [only if] Let A be a UFD. In order to show that A is a PID, it suffices (in view of Theorem 2.57) to show that every non-zero prime ideal |
In the rest of this section, we abbreviate
as
, if K is implicit in the context.
We have seen that the ring
is a free
-module of rank d. The same result holds for every non-zero ideal
of
. Let β1, . . . , βd constitute an integral basis of K.
One can choose rational integers aij with each aii positive such that
Equation 2.17

constitute a
-basis of
. Moreover, the discriminant Δ(γ1, . . . , γd) is independent of the choice of an integral basis γ1, . . . , γd of
and is called the discriminant of
, denoted
. It follows that
can be generated as an ideal (that is, as an
-module) by at most d elements. We omit the proof of the following tighter result.
|
Every (integral) ideal in a DD A is generated by (at most) two elements. More precisely, for a proper non-zero ideal |
|
The norm |
Using the integers aij of Equations (2.17), we can write
Equation 2.18

|
For every non-zero ideal |
It is tempting to define the norm of an element
to be the norm of the principal ideal
. It turns out that this new definition is (almost) the same as the old definition of N(α). More precisely:
|
For any element Proof The result is obvious for α = 0. So assume that α ≠ 0 and call
It follows that |
|
For any |
Like the norm of elements, the norm of ideals is also multiplicative. We omit the (not-so-difficult) proof here.
The following immediate corollary often comes handy.
|
Let |
The behaviour of rational primes in number rings is an interesting topic of study in algebraic number theory. Let K be a number field of degree d and
. Consider a rational prime p and denote by 〈p〉 the ideal
generated by p in
. We use the symbol
to denote the (prime) ideal of
generated by p. Further let
Equation 2.19

be the prime factorization of 〈p〉 with
, with pairwise distinct non-zero prime ideals
of
and with
. For each i, we have
, that is,
, that is,
(Lemma 2.13), that is,
lies over
. Conversely if
is an ideal of
lying over
, then
, that is,
, that is,
, that is,
for some i. Thus,
are precisely all the prime ideals of
that lie over
.
By Corollary 2.27, N(〈p〉) = pd. By Corollary 2.28, each
divides pd and is again a power pdi of p.
|
We define the ramification index of |
By the multiplicative property of norms, we have

|
If r = d, so that each ei = di = 1, we say that the prime p (or |
The following important result is due to Dedekind. Its proof is long and complicated and is omitted here.
|
A rational prime p ramifies in |
Though this is not the case in general, let us assume that the ring
is monogenic (that is,
for some
) and try to compute the explicit factorization (Equality (2.19)) of 〈p〉 in
. In this case,
and let
be the minimal polynomial of α. We then have
.
Let us agree to write the canonical image of any polynomial
in
as
. We write the factorization of
as

with
and with pairwise distinct irreducible polynomials
. If
, then
. For each i = 1, . . . , r choose
whose reduction modulo p is
. Define the ideals

of
. Since
, we have

and

Therefore,
are non-zero prime ideals of
with
. Thus
. On the other hand,
, since f(α) = 0 and
. Thus we must have
, that is, we have obtained the desired factorization of 〈p〉.
Let us now concentrate on an example of this explicit factorization.
|
Let D ≠ 0, 1 be a square-free integer congruent to 2 or 3 modulo 4. If Case 1: In this case, p|D, that is, Case 2: Since p is assumed to be an odd prime, the two square roots of D modulo p are distinct. Let δ be an integer with δ2 ≡ D (mod p). Then Case 3: The polynomial Thus the quadratic residuosity of D modulo p dictates the behaviour of p in Let us finally look at the fate of the even prime 2 in Recall from Example 2.31 that ΔK = 4D. Thus we have a confirmation of the fact that a rational prime p ramifies in |
One can similarly study the behaviour of rational primes in
,
where D ≡ 1 (mod 4) is a square-free integer ≠ 0, 1.
There are just two units in
, namely ±1. In a general number ring, there may be many more units. For example, all the units in the ring
of Gaussian integers are ±1, ±i. There may even be an infinite number of units in a number ring. It can be shown that
,
, are all the units of
. (Note that for all n ≠ 0 the absolute values of
are different from 1.)
is a PID. So we can think of factorizations in
as element-wise factorizations. To start with, we fix a set of pairwise non-associate prime elements of
. Every non-zero element of
admits a factorization
for prime “representatives” pi and for a unit u of the form
. Thus, in order to complete the picture of factorization, we need machinery to handle the units in a number ring.
Let K be a number field of degree d and signature (r1, r2). We have d = r1 + 2r2. The set of units in
is denoted by
. We know that
is an (Abelian) group under (complex) multiplication. Our basic aim now is to reveal the structure of the group
.
Every Abelian group is a
-module and, if finitely generated and not free, contains torsion elements, that is, (non-identity) elements of finite order > 1.[19]
always contains the element –1 of order 2. The torsion subgroup of
is denoted by
. We have
, where
is a torsion-free group. It turns out that ℜ is a finite group (and hence cyclic) and that
is finitely generated and hence free, that is,
for some
. From Dirichlet’s unit theorem (which we do not prove), it follows that ρ = r1 + r2 – 1. Thus,
has a
-basis consisting of ρ elements, say ξ1, . . . , ξρ, and every unit of
can be uniquely expressed as
, where ω is a root of unity and
. A set of generators of
is called a set of fundamental units.
[19] Every finitely generated torsion-free module over a PID is free.
|
Let D ≠ 0, 1 be a square-free integer, Now, suppose D > 0. K is a real field in this case, so that |
| 2.126 |
|
| 2.127 | Let A ⊆ B be an extension of integral domains, a finitely generated non-zero ideal of A and . If , show that γ is integral over A. [H]
|
| 2.128 |
|
| 2.129 | Let A be a ring and S a multiplicatively closed subset of A. Show that:
|
| 2.130 | Let A ⊆ B be a ring extension and C the integral closure of A in B. Show that for any multiplicative subset S of A (and hence of B and C) the integral closure of S–1A in S–1B is S–1C. In particular, if A is integrally closed in B, then so is S–1A in S–1B. |
| 2.131 | Recall that an integrally closed integral domain is called a normal domain (ND).
(Remark: The reader should note the following important implications:
That is, a Euclidean domain is a PID, a PID is a UFD and a UFD is a normal domain. Neither of the reverse implications is true. For example, the ring |
| 2.132 | A (non-zero) ring A with a unique maximal ideal m is called a local ring. In that case, the field A/m is called the residue field of A.
Let A be ring and |
| 2.133 | A ring A is called a discrete valuation ring (DVR) or a discrete valuation domain (DVD), if A is a local principal ideal domain. Let A be a DVR with maximal ideal m = 〈p〉. Prove the following assertions:
(Remark: The prime p of A is called a uniformizing parameter or a uniformizer for A and is unique up to multiplication by units. The map |
| 2.134 |
|
| 2.135 |
|
| 2.136 |
(In particular, the ring of integers of |
| 2.137 | Let A be a Dedekind domain.
|
| 2.138 | Let A be a Dedekind domain and a non-zero (integral) ideal of A. Show that:
|
| 2.139 | Let and , ei, , be the prime decompositions of two non-zero ideals , of a DD A. Define the gcd and lcm of and as
Show that |
| 2.140 | Let K be a number field and .
|
| 2.141 | Let K be a number field, , , and . Show that:
|
| 2.142 | Let K be a number field. We say that K is norm-Euclidean, if for every α, , β ≠ 0, there exist q, such that α = qβ + r and | N(r)| < | N(β)|.
|
| 2.143 | In this exercise, one derives that the only (rational) integer solutions of Bachet’s equation
Equation 2.20
are x = 3, y = ±5.
|
Let us now study a different area of algebraic number theory, introduced by Kurt Hensel in an attempt to apply power series expansions in connection with numbers. While trying to explain the properties of (rational) integers mathematicians started embedding
in bigger and bigger structures, richer and richer in properties.
came in a natural attempt to form quotients, and for some time people believed that that is all about reality. Pythagoras was seemingly the first to locate and prove the irrationality of a number, namely,
. It took humankind centuries for completing the picture of the real line. One possibility is to look
as the completion of
. A sequence an,
, of rational numbers is called a Cauchy sequence if for every real ε > 0, there exists
such that |am – an| ≤ ε for all m,
, m, n ≥ N. Every Cauchy sequence should converge to a limit and it is
(and not
) where this happens. Seeing convergence of Cauchy sequences, people were not wholeheartedly happy, because the real polynomial X2 + 1 did not have—it continues not to have—roots in
. So the next question that arose was that of algebraic closure.
was invented and turned out to be a nice field which is both algebraically closed and complete.
Throughout the above business, we were led by the conventional notion of distance between points (that is, between numbers)—the so-called Archimedean distance or the absolute value. For every rational prime p, there exists a p-adic distance which leads to a ring
strictly bigger than and containing
. This is the ring of p-adic integers. The quotient field of
is the field
of p-adic numbers.
is complete in the sense of convergence of Cauchy sequences (under the p-adic distance), but is not algebraically closed. We know anyway that a (unique) algebraic closure
of
exists. We have
, that is, it was necessary and sufficient to add the imaginary quantity i to
to get an algebraically closed field. Unfortunately in the case of the p-adic distance the closure
is of infinite extension degree over
. In addition,
is not complete. An attempt to make
complete gives an even bigger field Ωp and the story stops here, Ωp being both algebraically closed and complete. But Ωp is already a pretty huge field and very little is known about it.
In the rest of this section, we, without specific mention, denote by p an arbitrary rational prime.
There are various ways in which p-adic integers can be defined. A simple way is to use infinite sequences.
|
A p-adic integer is defined as an infinite sequence
|
See Exercise 2.144 for another way of defining p-adic integers. We now show that
is a ring. Before doing that, we mention that the ring
is canonically embedded in
by the injective map
, a ↦ (a).
It turns out that
is an integral domain. In order to see why, let us focus our attention on the units of
. Let us plan to denote
(the multiplicative group of units of
) by Up. The next result characterizes elements of Up.
|
For
Proof [(a)⇒(b)] Let (an)(bn) = (anbn) = 1 = (1) for some [(b)⇒(c)] Obvious. [(c)⇒(a)] Let us construct a p-coherent sequence bn, We also have an+1bn+1 ≡ 1 (mod pn), that is, anbn+1 ≡ 1 (mod pn), that is, |
|
Every Proof If p a1, take r := 0 and y := x. So assume that p|a1. Choose |
|
Proof Let x1 and x2 be non-zero elements of |
|
The quotient field |
|
Every non-zero Proof One can write x = a/b for some a, |
The canonical inclusion
naturally extends to the canonical inclusion
. We can identify
with the rational a/b and say that
is contained in
. Being a field of characteristic 0,
contains an isomorphic copy of
. The map
gives this isomorphism explicitly. Note that the ring
is strictly bigger than
and the field
is strictly bigger than the field
(Exercise 2.147).
Proposition 2.55 leads to the notion of p-adic distance between pairs of points in
. Let us start with some formal definitions.
|
A metric on a set S is a map
A set S together with a metric d is called a metric space (with metric d). |
|
A norm on a field K is a map
It is an easy check that for a norm ‖ ‖ on K the function A norm ‖ ‖ on a field K is called non-Archimedean (or a finite valuation), if ‖x + y‖ ≤ max(‖x‖, ‖y‖) for all x, |
|
|
The p-adic norm
|
|
The p-adic norm | |p is a non-Archimedean norm on Proof Non-negative-ness, non-degeneracy and multiplicativity of | |p are immediate. For proving the triangle inequality, it is sufficient to prove the non-Archimedean condition. Take x, |
|
Two metrics d1 and d2 on a metric space S are called equivalent if a sequence (xn) from S is Cauchy with respect to d1 if and only if it is Cauchy with respect to d2. Two norms on a field are called equivalent if they induce equivalent metrics. |
For every
, the field
is canonically embedded in
and thus we have a notion of a p-adic distance on
. We also have the usual Archimedean distance | |∞ on
. We now state an interesting result without a proof, which asserts that any distance on
must be essentially the same as either the usual Archimedean distance or one of the p-adic distances.
The notions of sequences and series and their convergences can be readily extended to
under the norm | |p. Since the p-adic distance assumes only the discrete values pr,
, it is often customary to restrict ourselves only to these values while talking about the convergence criteria of sequences and series, that is, instead of an infinitesimally small real ε > 0 one can talk about an arbitrarily large
with p–M ≤ ε.
|
Let x1, x2, . . . be a sequence of elements of Consider the partial sums A sequence x1, x2, . . . of elements of |
|
A field K is called complete under a norm ‖ ‖ if every sequence of elements of K, which is Cauchy under ‖ ‖, converges to an element in K. |
For example,
is complete under | |∞. We shortly demonstrate that
is complete under | |p.
Consider a field K not (necessarily) complete under a norm ‖ ‖. Let C denote the set of all Cauchy sequences
from K. Define addition and multiplication in C as (an) + (bn) := (an + bn) and (an)(bn) := (anbn). Under these operations C becomes a commutative ring with identity having a maximal ideal
. The field
is called the completion of K with respect to the norm ‖ ‖. K is canonically embedded in L via the map
. The norm ‖ ‖ on K extends to elements
of L as limn→∞ ‖an‖. L is a complete field under this extended norm. In fact, it is the smallest field containing K and complete under ‖ ‖.
is the completion of
with respect to the Archimedean norm | |∞. On the other hand,
turns out to be the completion of
with respect to the p-adic norm | |p. Before proving this let us first prove that
itself is a complete field under the p-adic norm. Let us start with a lemma.
|
A sequence (an) of p-adic numbers is a Cauchy sequence if and only if the sequence (an+1 – an) converges to 0. Proof [if] Take any Thus (an) is a Cauchy sequence. [only if] Take any |
|
The field Proof Let (an) be a Cauchy sequence in
It then follows that |an|p ≤ p–m for all Let an = an,0+an,1p+an,2p2+· · · be the p-adic expansion of an (Exercise 2.145). Since (an) is Cauchy, for every |
|
Proof Let C denote the ring of Cauchy sequences from If What remains is to show that the map |
|
The p-adic series Proof The only if part is obvious. For the if part, take a sequence (an) of p-adic numbers with |an|p → 0. Define |
This is quite unlike the Archimedean norm | |∞. For example, with respect to this norm
, whereas the series
diverges.
Let us conclude our short study of p-adic methods by proving an important theorem due to Hensel. This theorem talks about the solvability of polynomial equations f(X) = 0 for
. Before proceeding further, let us introduce a notation. Recall that every
has a unique p-adic expansion of the form a = a0 + a1p + a2p2 + · · · with 0 ≤ an < p (Exercises 2.144 and 2.145). If a0 = a1 = · · · = an–1 = 0, then a = anpn + an+1pn+1 + an +2pn+2 + · · · = pnb, where
. Thus pn|a in
. We denote this by saying that a ≡ 0 (mod pn). Notice that a ≡ 0 (mod pn) if and only if |a|p ≤ p–n. We write a ≡ b (mod pn) for a,
, if a – b ≡ 0 (mod pn). Since pn can be viewed as the element
of
, this congruence notation conforms to that for a general PID. (
is a PID by Exercise 2.148.)
Since by our assumption any ring A comes with identity (that we denote by 1 = 1A), it makes sense to talk for every
about an element n = nA in A, which is the n-fold sum of 1. More precisely:

Given any
, one can define the formal derivative of f as
. Properties of formal derivatives of polynomials are covered in Exercise 2.61.
|
Let
Then there exists a unique Proof Let us inductively construct a sequence α0, α1, α2, · · · of p-adic integers with the properties that |f(αn)|p ≤ p–(2M+n+1) and |f′(αn)|p = p–M for every
We want to find a suitable kn for which |f(αn)|p ≤ p–(2M+n+1). Taylor expansion gives f(αn) = f(αn–1) + knpM+nf′(αn–1) + cnp2(M+n) for some
Since pM+1 f′(αn–1), the element
This value of kn yields f (αn) = p2M + n(bnp + cnpn) ≡ 0 (mod p2M+n+1) for some Since |αn – αn–1|p ≤ p–(M+n), it follows that αn – αn–1 → 0, that is, (αn) is a Cauchy sequence (under | |p). By the completeness of For proving the uniqueness of α, let |
Note that αn in the last proof satisfies the congruence
f(αn) ≡ 0 (mod p2M+n+1)
for each
. We are given the solution α0 corresponding to n = 0. From this, we inductively construct the solutions α1, α2, . . . corresponding to n = 1, 2, . . . respectively. The process for computing αn from αn–1 as described in the proof of Hensel’s lemma is referred to as Hensel lifting. The given conditions ensure that this lifting is possible (and uniquely doable) for every
, and in the limit n → ∞ we get a root
of f. Since each kn is required modulo p, we can take
. So α admits a p-adic expansion of the form α = α0 + k1pM+1 + k2pM+2 + k3pM+3 + · · ·.
The special case M = 0 for Hensel’s lemma is now singled out:
|
Let
Then there exists a unique |
For this special case, we compute solutions αn of f(x) ≡ 0 (mod pn+1) inductively for n = 1, 2, 3, . . . , given a suitable solution α0 of this congruence for n = 0. The lifting formula is now:
Equation 2.21

|
For example, let p be an odd prime and As a specific numerical example, take p = 7, a = 2 and α0 = 3. Using Formula (2.21), we compute k1 = 1, α1 = 10, k2 = 2, α2 = 108, k3 = 6, α3 = 2166, and so on. Thus a square root of 2 in |
| 2.144 |
|
| 2.145 | In view of Exercise 2.144, every admits a unique expansion of the form x = x0 + x1p + x2p2 + · · · , where each . This notion of p-adic expansion can be extended to the elements of .
|
| 2.146 | Let p be an odd prime and with . From elementary number theory we know that the congruence x2 ≡ a (mod pn) has two solutions for every . Let x1 be a solution of x2 ≡ a (mod p). We know that a solution xn of x2 ≡ a (mod pn) lifts uniquely to a solution xn+1 of x2 ≡ a (mod pn+1). Thus we can inductively compute a sequence x1, x2, x3, · · · of integers. Show that (xn) is a p-adic integer and that (xn)2 = (a).
|
| 2.147 |
|
| 2.148 | Prove the following assertions:
|
| 2.149 | Compute the p-adic expansion of 1/3 in and of –2/5 in .
|
| 2.150 | Show that is dense in under the p-adic norm | |p, that is, show that given any and real ε > 0, there exists with |x – a|p < ε. Show also that is dense in .
|
| 2.151 | Prove the following assertions that establish that is the closure of in under | |p.
|
| 2.152 | Show that:
|
| 2.153 | Prove that for any non-zero . [H]
|
| 2.154 | Prove that for any the sequence (apn) converges in . [H]
|
| 2.155 | Let p, , p ≠ q. Show that the fields and are not isomorphic.
|
| 2.156 | Let a be an integer congruent to 1 modulo 8. Show that there exists an such that α2 = a and .
|
| 2.157 | Compute with α2 + α + 223 = 0 and α ≡ 4 (mod 243).
|
| 2.158 | Let p be an odd prime and . Show that the polynomial X2 – a has exactly root in .
|
| 2.159 | Show that the polynomial X2 – p is irreducible in .
|
| 2.160 | Teichmüller representative Let |
| 2.161 | Show that the algebraic closure of is of infinite extension degree over . [H]
|
Many attacks on cryptosystems involve statistical analysis of ciphertexts and also of data collected from the victim’s machine during one or more private-key operations. For a proper understanding of these analysis techniques, one requires some knowledge of statistics and random variables. In this section, we provide a quick overview of some statistical gadgets. We make the assumption that the reader is already familiar with the elementary notion of probability. We denote the probability of an event E by Pr(E).
An experiment whose outcome is random is referred to as a random experiment. The set of all possible outcomes of a random experiment is called the sample space of the experiment. For example, the outcomes of tossing a coin can be mapped to the set {H, T} with H and T standing respectively for head and tail. It is convenient to assign numerical values to the outcomes of a random experiment. Identifying head with 0 and tail with 1, one can view coin tossing as a random experiment with sample space {0, 1}. Some other random experiments include throwing a die (with sample space {1, 2, 3, 4, 5, 6}), the life of an electric bulb (with sample space
, the set of all non-negative real numbers), and so on. Unless otherwise specified, we henceforth assume that sample spaces are subsets of
.
A random variable is a variable which can assume (all and only) the values from a (given) sample space.
A discrete random variable can assume only countably many values, that is, the sample space SX of a discrete random variable X either is finite or has a bijection with
, that is, we can enumerate the elements of SX as x1, x2, x3, . . ..
The probability distribution function or the probability mass function
fX : SX → [0, 1]
of a discrete random variable X assigns to each x in the sample space SX of X the probability of the occurrence of the value x in a random experiment.[21] We have
[21] [a, b] is the closed interval consisting of all real numbers u satisfying a ≤ u ≤ b. Similarly, the open interval (a, b) is the set of all real values u satisfying a < u < b. In order to make a distinction between the open interval (a, b) and the ordered pair (a, b), many—mostly Europeans—use the notation ]a, b[ for denoting open intervals.

A continuous random variable assumes uncountable number of values, that is, the sample space SX of a continuous random variable X cannot be in bijective correspondence with a subset of
. Typically SX is an interval [a, b] or (a, b) with –∞ ≤ a < b ≤ +∞.
One does not assign individual probabilities Pr(X = x) to a value assumed by a continuous random variable X.[22] The probabilistic behaviour of X is in this case described by the probability density function
[22] More correctly, Pr(X = x) = 0 for each
.

with the implication that the probability that X occurs in the interval [c, d] (or (c, d)) is given by the integral

that is, by the area between the x-axis, the curve fX(x) and the vertical lines x = c and x = d. We have

It is sometimes useful to set fX(x) :=0 for
, so that fX is defined on the entire real line
.
The cumulative probability distribution
of a random variable X (discrete or continuous) is the function FX (x) := Pr(X ≤ x) for all
. If X is continuous, we have

which implies that

Let X and Y be discrete random variables. The joint probability distribution of X, Y refers to a random variable Z with SZ = SX × SY. For z = (x, y), the probability of Z = z is denoted by fZ(z) = Pr(Z = z) = Pr(X = x, Y = y). The probability Pr(X = x, Y = y) stands for the probability that X = x and Y = y. The random variables X and Y are called independent, if
Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y)
for all x, y.
|
Suppose that we have an urn containing three identical balls with labels 1, 2, 3. We draw two balls randomly from the urn. Let us denote the outcome of the first drawing by X and that of the second drawing by Y. We consider the joint distribution X, Y of the two outcomes in the two following cases:
|
For continuous random variables X and Y, the joint distribution is defined by the probability density function fX,Y (x, y) and the cumulative distribution is obtained by the double integral

X and Y are independent, if fX,Y (x, y) = fX(x)fY (y) for all x, y. In this case, we also have FX,Y (c, d) = FX(c)FY (d) for all c, d.
Now, we define arithmetic operations on random variables. First, let X and Y be discrete random variables. The sum X + Y is defined to be a random variable U which assumes the values u = x + y for
and
with probability

The product XY of X and Y is defined to be a random variable V which assumes the values v = xy for
and
with probability

For
, the random variable W = αX assumes the values w = αx for
with probability
fW(w) = Pr(W = αx) = Pr(X = x) = fX(x).
|
Let us consider the random variables X and Y of Example 2.36. For the sake of brevity, we denote Pr(X = x, Y = y) by Pxy. The distributions of U = X + Y in the two cases are as follows:
|
Now, let us consider continuous random variables X and Y. In this case, it is easier to define first the cumulative density functions of U = X + Y, V = XY and W = αX and then the probability density functions by taking derivatives:

One can easily generalize sums and products to an arbitrary finite number of random variables. More generally, if X1, . . . , Xn are random variables and
, one can talk about the probability distribution or density function of the random variable g(X1, . . . , Xn). (See Exercise 2.163.)
Now, we introduce the important concept of conditional probability. Let X and Y be two random variables. To start with, suppose that they are discrete. We denote by f(x, y) = Pr(X = x, Y = y) the joint probability distribution function of X, Y. For
with Pr(Y = y) > 0, we define the conditional probability of X = x given Y = y as:

For a fixed
, the probabilities fX|y(x),
, constitute the probability distribution function of the random variable X|y (X given Y = y). If X and Y are independent, f(x, y) = fX(x)fY (y) and so fX|y(x) = fX(x) for all
, that is, the random variables X and X|y have the same probability distribution. This is expected, because in this case the probability of X = x does not depend on whatever value y the variable Y takes.
If X and Y are continuous random variables with joint density f(x, y) and
, the conditional probability density function of X|y (X given Y = y) is defined by

Again if X and Y are independent, we have fX|y(x) = fX(x) for all x, y.
For a fixed
, one can likewise define the conditional probabilities fY|x (y) := f(x, y)/fX (x) for all
.
Let X and Y be discrete random variables with joint distribution f(x, y). Also let Γ ⊆ SX and Δ ⊆ SY. One defines the probability fX(Γ) as:

The joint probability f(Γ, Δ), is defined as:

If Γ = {x} is a singleton, we prefer to write f(x, Δ) instead of f({x}, Δ). Similarly, f(Γ, y) stands for f (Γ,{y}). We also define the conditional distributions:

We abbreviate fX|Δ (Γ) as Pr(Γ|Δ) and fY|Γ (Δ) as Pr(Δ|Γ).
|
Let X, Y be discrete random variables and Δ ⊆ SY with fY (Δ) > 0. Also let Γ1,..., Γn form a partition of SX with fX (Γi) > 0 for all i = 1, . . . , n. Then we have:
that is, in terms of probability:
Proof Pr(Γi, Δ) = Pr(Δ|Γi) Pr(Γi) = Pr(Γi|Δ) Pr(Δ). So it is sufficient to show that Pr(Δ) equals the sum in the denominator. The event Δ is the union of the pairwise disjoint events (Γj, Δ), j = 1,..., n, and so |
The Bayes rule relates the a priori probabilities Pr(Γj) and Pr(Δ|Γj) to the a posteriori probabilities Pr(Γi|Δ). The following example demonstrates this terminology.
|
Consider the random experiment of Example 2.36(2). Take Γj := {j} for
The a posteriori probability Pr(Γ1|Δ) that the first ball was obtained in the first draw given that the ball obtained in the second draw is the second or the third one is calculated using the Bayes rule as:
One can similarly calculate |
Let X be a random variable. The expectation E(X) of X is defined as follows:

E(X) is also called the (arithmetic) mean or average of X. One uses the alternative symbols μX and
to denote E(X). More generally, let X1, . . . , Xn be n random variables with joint probability distribution/density function f(x1, . . . , xn). Also let
. We define the following expectations:
X is discrete:

X is continuous:
Let g(X) and h(Y) be real polynomial functions of the random variables X and Y and let
. Then
| E(g(X) + h(Y)) | = | E(g(X)) + E(h(Y)), |
| E(g(X)h(Y)) | = | E(g(X)) E(h(Y)) if X and Y are independent, |
| E(αg(X)) | = | αE(g(X)). |
Let us derive the sum and product formulas for discrete variables X and Y.

If X and Y are independent, then

The variance Var(X) of a random variable X is defined as
Var (X) := E[(X – E(X))2].
From the observation that E[(X – E(X))2] = E[X2 – 2 E(X)X + [E(X)]2] = E(X2) – 2 E(X) E(X) + [E(X)]2, we derive the computational formula:
Var (X) = E[X2] – [E(X)]2.
Var(X) is a measure of how the values of X are dispersed about the mean E(X) and is always a non-negative quantity. The (non-negative) square root of Var(X) is called the standard deviation σX of X:

The following formulas can be easily verified:
| Var(X + α) | = | Var(X). |
| Var(αX) | = | α2 Var(X). |
| Var(X + Y) | = | Var(X) + Var(Y) + 2 Cov(X, Y), |
where
and where the covariance Cov(X, Y) of X and Y is defined as:
Cov(X, Y) := E[(X – E(X))(Y – E(Y))] = E(XY) – E(X) E(Y).
Normalized covariance is a measure of correlation between the two random variables X and Y. More precisely, the correlation coefficient ρX,Y is defined as:

If X and Y are independent, E(XY) = E(X) E(Y) so that Cov(X, Y) = 0 and so ρX,Y = 0. The converse of this is, however, not true, that is, ρX,Y = 0 does not necessarily imply that X and Y are independent. ρX,Y is a real value in the interval [–1, 1] and is a measure of linear relationship between X and Y. If larger (resp. smaller) values of X are (in general) associated with larger (resp. smaller) values of Y, then ρX,Y is positive. On the other hand, if larger (resp. smaller) values of X are (in general) associated with smaller (resp. larger) values of Y, then ρX,Y is negative.
|
Once again consider the drawing of two balls from an urn containing three balls labelled {1, 2, 3} (Examples 2.36, 2.37 and 2.38). Look at the second case (drawing without replacement). We use the shorthand notation Pxy for Pr(X = x, Y = y). The individual probability distributions of X and Y can be obtained from the joint distribution as follows:
Thus E(X) = 1 × (1/3) + 2 × (1/3) + 3 × (1/3) = 2. Similarly, E(Y) = 2. Therefore, E(X + Y) = E(X) + E(Y) = 4. This can also be verified by direct calculations: E(X + Y) = 3 × (1/3) + 4 × (1/3) + 5 × (1/3) = 4. E(X2) = E(Y2) = 12 × (1/3) + 22 × (1/3) + 32 × (1/3) = 14/3 and Var(X) = Var(Y) = (14/3) – 22 = 2/3. The probability distribution for XY is
so that E(XY) = 2 × (1/3) + 3 × (1/3) + 6 × (1/3) = 11/3. Therefore, Cov(XY) = E(XY) – E(X) E(Y) = (11/3) – 2 × 2 = –1/3, that is,
The negative correlation between X and Y is expected. If X = 1 (small), Y takes bigger values (2, 3). On the other hand, if X = 3 (large), Y assumes smaller values (1, 2). Of course, the correlation is not perfect, since for X = 2 the values of Y can be smaller (1) or larger (3). So, we should feel happy to see a not-so-negative correlation of –1/2 between X and Y. |
Some probability distributions that occur frequently in statistical theory and in practice are described now. Some other useful probability distributions are considered in the Exercises 2.169, 2.170 and 2.171.
A discrete uniform random variable U has sample space SU := {x1, . . . , xn} and probability distribution

A continuous uniform random variable U has sample space SU and probability density function

where A > 0 is the size[23] of SU. For example, if SU is the real interval [a, b] for a < b, we have
[23] If
, “size” means length. If
or
, “size” refers to area or volume respectively. We assume that the size of SU is “measurable”.

In this case, we have
| E(U) = (a + b)/2 | and | Var(U) = (b – a)2/12. |
Uniform random variables often occur naturally. For example, if we throw an unbiased die, the six possible outcomes (1 through 6) are equally likely, that is, each possible outcome has the probability 1/6. Similarly, if a real number is chosen randomly in the interval [0, 1], we have a continuous uniform random variable. The built-in C library call rand() (pretends to) return an integer between 0 and 231 – 1, each with equal probability (namely, 2–31).
The Bernoulli random variable B = B(n, p) is a discrete random variable characterized by two parameters
and
, where p stands for the probability of a certain event E and n represents the number of (independent) trials. It is assumed that the probability of E remains constant (namely, p) in each of the n trials. The sample space SB = {0, 1, . . . , n} comprises the (exact) numbers of occurrences of E in the n trials. B has the probability distribution

as follows from simple combinatorial arguments. The mean and variance of B are:
| E(B) = np | and | Var(B) = np(1 – p). |
The Bernoulli distribution is also called the binomial distribution.
The normal random variable or the Gaussian random variable N = N (μ, σ2) is a continuous random variable characterized by two real parameters μ and σ with σ > 0. The density function of N is

The cumulative distribution for N can be expressed in terms of the error function erf():

The error function does not have a known closed-form expression. Figure 2.3 shows the curves for fN (x) and FN (x) for the parameter values μ = 0 and σ = 1 (in this case, N is called the standard normal variable).
Some statistical properties of N are:
| E(N) = μ | and | Var(N) = σ2. |
The curve fN (x) is symmetric about x = μ. Most of the area under the curve is concentrated in the region μ – 3σ ≤ x ≤ μ + 3σ. More precisely:
| Pr(μ – σ ≤ X ≤ μ + σ) | ≈ | 0.68, |
| Pr(μ – 2σ ≤ X ≤ μ + 2σ) | ≈ | 0.95, |
| Pr(μ – 3σ ≤ X ≤ μ + 3σ) | ≈ | 0.997. |
Many distributions occurring in practice (and in nature) approximately follow normal distributions. For example, the height of (adult) people in a given community is roughly normally distributed. Of course, the height of a person cannot be negative, whereas a normal random variable may assume negative values. But, in practice, the probability that such an approximating normal variable assumes a negative value is typically negligibly low.
In practice, we often do not know a priori the probability distribution or density function of a random variable X. In some cases, we do not have the complete data, whereas in some other cases we need an infinite amount of data to obtain the actual probability distribution of a random variable. For example, let X represent the life of an electric bulb manufactured by a given company in the last ten years. Even though there are only finitely many such bulbs and even if we assume that it is possible to trace the working of every such bulb, we have to wait until all these bulbs burn out, before we know the actual distribution of X. That is certainly impractical. On the contrary, if we have data on the life-times of some sample bulbs, we can approximate the properties of X by those of the samples.
Suppose that S := (x1, x2, . . . , xn) is a sample of size n. We assume that all xi are real numbers. We define the following quantities for S:
Here
is the mean of the collection
.
If T := (y1, y2, . . . , ym) is another sample (of real numbers), the (linear) relationship between S and T is measured by the following quantities:

Here
is the mean of the collection ST := (xiyj | i = 1, . . . , n, j = 1, . . . , m).
An important property of the normal distribution is the following:
|
Let X be any random variable with mean μ and variance σ2 and let |
| 2.162 | An urn contains n1 red balls and n2 black balls. We draw k balls sequentially and randomly from the urn, where 1 ≤ k ≤ n1 + n2.
| ||||||||||||||||||
| 2.163 | Let X and Y be the random variables of Example 2.36. For each of the two cases, calculate the probability distribution functions, expectations and variances of the following random variables:
| ||||||||||||||||||
| 2.164 | Let X and Y be continuous random variables, g(X) and h(Y) non-constant real polynomials and α, β, . Prove that:
| ||||||||||||||||||
| 2.165 | Let X be a random variable and Y := αX + β for some α, . What is ρX,Y ?
| ||||||||||||||||||
| 2.166 |
| ||||||||||||||||||
| 2.167 | Let X and Y be continuous random variables whose joint distribution is the uniform distribution in the triangle 0 ≤ X ≤ Y ≤ 1.
| ||||||||||||||||||
| 2.168 | Let X, Y, Z be random variables. Show that:
| ||||||||||||||||||
| 2.169 | Geometric distribution Assume that in each trial of an experiment, an event E has a constant probability p of occurrence. Let G = G(p) denote the random variable with
| ||||||||||||||||||
| 2.170 | Poisson distribution Let P = P (λ) be the discrete random variable with | ||||||||||||||||||
| 2.171 | Exponential distribution
Show that the exponential variable X of Part (a) is memoryless. | ||||||||||||||||||
| 2.172 | The birthday paradox Let S be a finite set of cardinality n.
|
This chapter provides the foundations of public-key cryptology. The long compilation of mathematical concepts presented in the chapter would be indispensable for understanding the topics that follow in the next chapters.
This chapter begins with the basic concepts of sets, functions and relations. We also present the fundamental axioms of mathematics. Although the curricula of plus-two courses of many examination boards do include these topics, we planned to have a discussion on them in order to make our treatment self-sufficient.
Next comes a study of groups which are sets with binary operations satisfying some nice properties (associativity, identity, inverse and optionally commutativity). Groups are extremely important for cryptology. In particular, all discrete-log-based cryptosystems use suitable groups. Subgroups, cosets and formation of quotient groups constitute a prototypical feature that illustrates the basic paradigm of modern algebra. Secure cryptographic algorithms on groups rely on the availability of elements of large orders: for example, generators of big cyclic groups. We study these topics at length. Finally, we present Sylow’s theorem. For us, this theorem has only theoretical significance; it is used for proving some other theorems.
A set with a single operation (like a group) is often too restrictive. Many mathematical structures we are familiar with (like integers, polynomials) are endowed with two basic operations addition and multiplication. A set with two such (compatible) operations is called a ring. A study of rings, fields, ideals and quotient rings is essential in algebra (and so in cryptography too). Three important types of rings, namely unique factorization domains, principal ideal domains and Euclidean domains, are also discussed. Euclidean division is an important property of integers and polynomials, and is useful from a computational perspective.
Then, as a specific example, we study the properties of
, the ring of integers. We concentrate mostly on elementary properties of integers like divisibility, congruence, Chinese remainder theorem, Fermat’s and Euler’s theorems, quadratic residues and the law of quadratic reciprocity. We finally discuss some assorted topics from analytic number theory. In cryptography, we require many big randomly generated primes. The prime number theorem guarantees that there is essentially an abundant source of primes. Smooth integers (that is, integers having only small prime divisors) are useful for modern algorithms that compute factorization and discrete logarithms. We present an estimate on the density of smooth integers. The last topic we study is the Riemann hypothesis and its generalizations. This yet unproven hypothesis has a bearing on the running times of many number-theoretic algorithms relevant to cryptology.
The next example is the ring of polynomials over a ring. Polynomials over a field admit Euclidean division and consequently unique factorization. Irreducible polynomials are useful for constructing field extensions. Extension fields of characteristic 2 are quite frequently used in cryptographic systems.
We subsequently study the theory of vector spaces. Linear transformations are appropriate maps between vector spaces and necessitate the theory of matrices. Matrix algebra is widely useful in cryptology as it is in any other branch of algorithmic computer science. Algorithms to solve linear systems over rings and fields constitute a basic computational tool. A study of modules and algebras at the end of this section is mostly theoretical and can be avoided if the reader is willing to accept some theorems without proofs.
In the next section, we discuss the theory of field extensions. As mentioned earlier, cryptography relies heavily on extension fields of characteristic 2. Some related topics include splitting fields and algebraic closure of fields. At the end of this section, we have a short theoretical treatment of Galois theory.
Many popular cryptosystems are based on the multiplicative groups of finite fields. We study these fields as the next topic. Polynomials over finite fields are extremely useful for the construction and representation of finite fields. At the end of this section, we discuss several ways in which (elements of) finite fields can be represented in a computer’s memory. This study expedites the design, analysis and efficient implementation of finite-field arithmetic.
Elliptic- and hyperelliptic-curve cryptography having gained popularity in recent years, one needs to study the theory of plane algebraic curves. This is what we do in the next three sections. To start with, we define affine and projective spaces and curves. Going from the affine space to the projective space is necessitated by a systematic (algebraic) inclusion of points at infinity on a plane curve. We also discuss the theory of divisors and the Jacobian on plane curves. For elliptic curves, the Jacobian can be replaced by the equivalent group described in terms of the chord and tangent rule. For hyperelliptic curves, on the other hand, we have little option other than understanding the Jacobian itself.
Two kinds of elliptic curves that must be avoided in cryptography are supersingular curves and anomalous curves. The elliptic curve group (over a finite field) is the basic set used in elliptic curve cryptosystems. The orders (cardinality) of these groups are given by Hasse’s theorem. The structure theorem establishes that an elliptic curve group (over a finite field) is not necessarily cyclic, but has a rank of at most two.
We then study Jacobians of hyperelliptic curves over finite fields. This study supplements the theory of divisors on general curves. Reduced and semi-reduced divisors are expedient for the representation of the elements in the Jacobian of a hyperelliptic curve.
Many popular cryptosystems (including RSA) derive their security (presumably) from the intractability of the integer factorization problem. The best algorithm known till date for factoring integers is the number-field sieve method. An understanding of this algorithm requires the knowledge of number fields and number rings. We devote a section to the study of these mathematical objects. We start with some necessary commutative algebra including localization, integral dependence and Noetherian rings. Next, we deal with Dedekind domains. All number rings are Dedekind domains in which ideals admit unique factorization. We also discuss the factorization of ideals in number rings generated by rational primes and the structure of units in number rings (Dirichlet’s unit theorem).
The next section is a gentle introduction to the theory of p-adic numbers. These numbers are useful, for example, for designing attacks against elliptic curve cryptosystems.
In the last section, we summarize some statistical tools. Under the assumption that the reader is already familiar with the elementary notion of probability, we discuss properties of random variables and of some common probability distributions (including uniform and normal distributions). The birthday paradox described in an exercise is often useful in cryptographic context (for example, for collision attacks on hash functions).
That is the end of this chapter. The compilation may initially look long and boring, perhaps intimidating too. The unfortunate reality is that public-key cryptology is mathematical, and it is arguably better to treat it in the formal way. If the reader is not comfortable with mathematics (in general), cryptology is perhaps not her cup of tea. An elementary approach to cryptology is what many other books have adopted. This book aims at being different in that respect. It is up to the reader to decide to what level of details she is willing to study cryptography.
Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.
—Samuel Johnson
In this chapter, we have summarized the basic mathematical facts that cryptologists are expected to know in order to have a decent understanding of the present-day public-key technology. Our discussion has been often more intuitive than mathematically complete. A reader willing to gain further insight in these areas should look at materials written specifically to deal with the specialized topics. Here are our (biased) suggestions.
There are numerous textbooks on introductory algebra. The books by Herstein [125], Fraleigh [96], Dummit and Foote [81], Hungerford [133] and Adkins and Weintraub [1] are some of our favourites. The algebra of commutative rings with identity (rings by our definition) is called commutative algebra and is the basic for learning advanced areas of mathematics like algebraic geometry and algebraic number theory. A serious study of these disciplines demands more in-depth knowledge of commutative algebra than we have presented in Section 2.13.1. Atiyah and MacDonald’s book [14] is a de facto standard on commutative algebra. Hoffman and Kunze’s book [127] is a good reference for linear algebra and matrix algebra.
Elementary number theory deals with the theory of (natural) numbers without using sophisticated techniques from complex analysis and algebra. Zuckerman et al. [316] can be consulted for a lucid introduction to this subject. The books by Burton [42] and Mollin [207] are good alternatives.
Thorough mathematical treatise on finite fields can be found in the books by Lidl and Niederreiter [179, 180] of which the second also deals with computational issues. Other books of computational flavour include those by Menezes [191] and by Shparlinski [274]. Also see the paper [273] by Shparlinski.
The use of elliptic curves in cryptography has been proposed by Koblitz [150] and Miller [205], and that of hyperelliptic curves by Koblitz [151]. A fair mathematical understanding of elliptic curves banks on the knowledge of commutative algebra (see above) and algebraic geometry. Hartshorne’s book [124] is a detailed introduction to algebraic geometry. Fulton’s book [99] on algebraic curves is another good reference. Rigorous mathematical treatment on elliptic curves can be found in Silverman’s books [275, 276]. The book by Koblitz [152] is elementary, but has a somewhat different focus than needed in cryptology. By far, the best short-cut is the recent textbook from Washington [298]. Some other books by Koblitz [150, 153, 154], Blake et al. [24], Menezes [192] and Hankerson et al. [123] are written for non-experts in algebraic geometry (and hence lack mathematical details), but are good from computational viewpoint. The expository reports [46, 47] by Charlap et al. provide nice elementary introduction to elliptic curves. For hyperelliptic curves, on the other hand, no such books are available. Koblitz’s book [154] includes a chapter on hyperelliptic curves. In addition, an appendix in the same book, written by Menezes et al. much in the style of Charlap et al. [46, 47], provides an introductory and elementary coverage.
In an oversimplified sense, algebraic number theory deals with the study of number fields. The books by Janusz [140], Lang [160], Mollin [208] and Ribenboim [251] go well beyond what we cover in Section 2.13. Also see [89]. For a more modern and sophisticated treatment, look at Neukirch’s book [216]. A book dedicated to p-adic numbers is due to Koblitz [149]. Course notes from one of the authors of this book can also be useful in this regard. The notes are freely downloadable from:
http://www.facweb.iitkgp.ernet.in/~adas/IITK/course/MTH617/SS02/
Analytic number theory deals with the application of complex analytic techniques to solve problems in number theory. Although we do not explicitly need this branch of mathematics (apart from a few theorems that we mention without proofs), it is rather important for the study of numbers. Consult the books by Apostol [12] and by Ireland and Rosen [136] for this. Also see [249]. For complex analysis, we recommend the book by Ahlfors [6]
Feller’s celebrated book [92] is a classical reference on probability theory. Grinstead and Snell’s book [121] is available in the Internet.
| 3.1 | Introduction |
| 3.2 | Complexity Issues |
| 3.3 | Multiple-precision Integer Arithmetic |
| 3.4 | Elementary Number-theoretic Computations |
| 3.5 | Arithmetic in Finite Fields |
| 3.6 | Arithmetic on Elliptic Curves |
| 3.7 | Arithmetic on Hyperelliptic Curves |
| 3.8 | Random Numbers |
| Chapter Summary | |
| Sugestions for Further Reading |
From the start there has been a curious affinity between mathematics, mind and computing . . . It is perhaps no accident that Pascal and Leibniz in the seventeenth century, Babbage and George Boole in the nineteenth, and Alan Turing and John von Neumann in the twentieth – seminal figures in the history of computing – were all, among their other accomplishments, mathematicians, possessing a natural affinity for symbol, representation, abstraction and logic.
—Doron Swade [295]
. . . the laws of physics and of logic . . . the number system . . . the principle of algebraic substitution. These are ghosts. We just believe in them so thoroughly they seem real.
—Robert M. Pirsig [233]
The world is continuous, but the mind is discrete.
—David Mumford
Now that we have studied the properties of important mathematical objects that play vital roles in public-key cryptology, it is time to concentrate on the algorithmic and implementation issues for working with these objects. We need well-defined schemes (data structures) to represent these objects and well-defined procedures (algorithms) to manipulate them. While a theoretical analysis of the performance of our data structures and algorithms is of great concern, it still leaves us in the abstract domain. In the long run, one has to translate the abstract statements in the algorithms to machine codes that the computer understands, and this is where the implementation tidbits come into picture. It is our personal experience that a naive implementation of an algorithm may run hundred times slower than a carefully optimized implementation of the same algorithm. In certain specific applications (like those based on smart cards), where memory is a scarce resource, one should also pay attention to the storage requirements of the data structures and code segments. This chapter is an introduction to all these specialized topics.
Before we proceed further, certain comments are in order. In this book, we describe algorithms using a pseudocode that closely resembles the syntax of the programming language C. The biggest difference between C and our pseudocode is that we have given preference to mathematical notations in place of C syntax. For example, = means equality in our codes, whereas assignment is denoted by :=. Similarly, our while and for loops look more human-readable, for example, for i = 0, 1, . . . , m – 1 instead of C’s for (i=0; i<m; i++). In order to understand our pseudocode, a knowledge of C (or a similar programming language) is helpful, but not essential, on the part of the reader.
For certain implementations, we assume that the target machine carries out 32-bit 2’s-complement arithmetic. This is indeed true for most modern PCs and personal work stations. By the term word, we mean a 32-bit unit in the computer memory. We will also assume that the compiler provides facilities for storing and doing arithmetic with unsigned 64-bit integers. Though this is not an ANSI C feature, most popular compilers used today do support this built-in data type (Examples: unsigned __int64 for the Microsoft Visual C++ Compiler and unsigned long long for the GNU C Compiler). Though it is apparently desirable to be more generic and to avoid these specific assumptions on the part of the machine and the compiler, our exposition highlights the power of fine-tuning based on the knowledge of the underlying system.
Given an algorithm (or an implementation of the same), the time and space required for the execution of the algorithm on a machine depend very much on the machine’s architecture and on the compiler. But this does not mean that we cannot make some general theoretical estimates. The so-called asymptotic estimates that we are going to introduce now tend to approach the real situation as the input size tends to infinity. For finite input sizes (which is always the case in practice), these theoretical predictions turn out to provide valuable guidelines.
We start with the following important definitions.
|
Let f and g be positive real-valued functions of natural numbers.
|
|
The order notation is used to analyse algorithms in the following way. For an algorithm, the input size is defined as the total number of bits needed to represent the input of the algorithm. We find asymptotic estimates of the running time and the memory requirement of the algorithm in terms of its input size. Let f(n) denote the running time[2] of an algorithm A for an input of size
. If f(n) = Θ(na) (or, more generally, if f = O(na)) for some a > 0, A is called a polynomial-time algorithm. If a = 1 (resp. 2, 3, . . .), then A is specifically called a linear-time (resp. quadratic-time, cubic-time, . . .) algorithm. A Θ(1) algorithm is often called a constant-time algorithm. If f = Θ(bn) for some b > 1, A is called an exponential-time algorithm. Similarly, if f satisfies Equation (3.1) with 0 < α < 1, A is called a subexponential-time algorithm.
[2] The practical running time of an algorithm may vary widely depending on its implementation and also on the processor, the compiler and even on run-time conditions. Since we are talking about the order of growth of running times in relation to the input size, we neglect the constants of proportionality and so these variations are usually not a problem. If one plans to be more concrete, one may measure the running time by the number of bit operations needed by the algorithm.
One has similar classifications of an algorithm in terms of its space requirements, namely, polynomial-space, linear-space, exponential-space, and so on. We can afford to be lazy and drop -time from the adjectives introduced in the previous paragraph. Thus, an exponential algorithm is an exponential-time algorithm, not an exponential-space algorithm.
It is expedient to note here that the running time of an algorithm may depend on the particular instance of the input, even when the input size is kept fixed. For an example, see Exercise 3.3. We should, therefore, be prepared to distinguish, for a given algorithm and for a given input size n, between the best (that is, shortest) running time fb(n), the worst (that is, longest) running time fw(n), the average running time fa(n) on all possible inputs (of size n) and the expected running time fe(n) for a randomly chosen input (of size n). In typical situations, fw(n), fa(n) and fe(n) are of the same order, in which case we simply denote, by running time, one of these functions. If this is not the case, an unqualified use of the phrase running time would denote the worst running time fw(n).
The order notation, though apparently attractive and useful, has certain drawbacks. First it depicts the behaviour of functions (like running times) as the input size tends to infinity. In practice, one always has finite input sizes. One can check that if f(n) = n100 and g(n) = (1.01)n are the running times of two algorithms A and B respectively (for solving the same problem), then f(n) ≤ g(n) if and only if n = 1 or n ≥ 117,309. But then if the input size is only 1,000, one would prefer the exponential-time algorithm B over the polynomial-time algorithm A. Thus asymptotic estimates need not guarantee correct suggestions at practical ranges of interest. On the other hand, an algorithm which is a product of human intellect does not tend to have such extreme values for the parameters; that is, in a polynomial-time algorithm, the degree is usually ≤ 10 and the base for an exponential-time algorithm is usually not as close to 1 as 1.01 is. If we have f(n) = n5 and g(n) = 2n as the respective running times of the algorithms A and B, then A outperforms B (in terms of speed) for all n ≥ 23.
The second drawback of the order notation is that it suppresses the constant of proportionality; that is, an algorithm whose running time is 100n2 has the same order as one whose running time is n2. This is, however, a situation that we cannot neglect in practice. In particular, when we compare two different implementations of the same algorithm, the one with a smaller constant of proportionality is more desirable than the one with a larger constant. This is where implementation tricks prove to be important and even indispensable for large-scale applications.
A deterministic algorithm is one that always follows the same sequence of computations (and thereby produces the same output) for a given input. The deterministic running time of a computational problem P is the fastest of the running times (in order notation) of the known algorithms to solve P.
If an algorithm makes some random choices during execution, we call the algorithm randomized or probabilistic. The exact sequence of computations followed by the algorithm depends on these random choices and as a result different executions of the same algorithm may produce different outputs for a given input. At first glance, randomized algorithms look useless, because getting different outputs for a given input is apparently not what one would really want. But there are situations where this is desirable. For example, in an implementation of the RSA protocol, one generates random primes p and q of given bit lengths. Here we require our prime generation procedure to produce different primes during different executions (that is, for different entities on the net).
More importantly, randomized algorithms often provide practical computational solutions for many problems for which no practical deterministic algorithms are known. We will shortly encounter many such situations where randomized algorithms are simplest and/or fastest known algorithms. However, this sudden enhancement in performance by random choices does not come for free. To explain the so-called darker sides of randomization, we explain two different types of randomized algorithms.
A Monte Carlo algorithm is a randomized algorithm that may produce incorrect outputs. However, for such an algorithm to be useful, we require that the running time be always small and the probability of an error sufficiently low. A good example of a Monte Carlo algorithm is Miller–Rabin’s algorithm (Algorithm 3.13) for testing the primality of an integer. For an integer of bit size n, the Miller–Rabin test with t iterations runs in time O(tn3). Whenever the algorithm outputs false, it is always correct. But an answer of true is incorrect with an error probability ≤ 2–2t, that is, it certifies a composite integer as a prime with probability ≤ 2–2t. For t = 20, an error is expected to occur less than once in every 1012 executions. With this little sacrifice we achieve a running time of O(n3) (for a fixed t), whereas the best deterministic primality testing algorithm (known to the authors at the time of writing this book) takes time O(n7.5) and hence is not practical.
A Las Vegas algorithm is a randomized algorithm which always produces the correct output. However the running time of such an algorithm depends on the random choices made. For such an algorithm to be useful, we expect that for most random choices the running time is small. As an example, consider the problem of finding a random (monic) irreducible polynomial of degree n over
. Algorithm 3.22 tests the irreducibility of a polynomial in
in deterministic polynomial time. We generate random polynomials of degree n and check the irreducibility of these polynomials by Algorithm 3.22. From Section 2.9.2, we know that a randomly chosen monic polynomial of degree n over a finite field is irreducible with an approximate probability of 1/n. This implies that after O(n) random polynomials are tried, one expects to find an irreducible polynomial. The resulting Las Vegas algorithm (Algorithm 3.23) runs in expected polynomial time. It may, however, happen that for certain random choices we keep on generating reducible polynomials for an exponential number of times, but the likelihood of such an accident is very, very low (Exercise 3.5).
An algorithm is said to be a probabilistic or randomized polynomial-time algorithm, if it is either a Monte Carlo algorithm with polynomial worst running time or a Las Vegas algorithm with polynomial expected running time. Both the above examples of randomized algorithms are probabilistic polynomial-time algorithms. A combination of these two types of algorithms can also be conceived; namely, algorithms that produce correct outputs with high probability and have polynomial expected running time. Some computational problems are so challenging that even such probably correct and probably fast algorithms are quite welcome.
We finally note that there are certain computational problems for which the deterministic running time is exponential and for which randomization also does not help much. In some cases, we have subexponential randomized algorithms which are still too slow to be of reasonable practical use. Some of these so-called intractable problems are at the heart of the security of many public-key cryptographic protocols.
In the last two sections, we have introduced theoretical measures (the order notations) for estimating the (known) difficulty of solving computational problems. In this section, we introduce another concept by which we can compare the relative difficulty of two computational problems.
Let P1 and P2 be two computational problems. We say that P1 is polynomial-time reducible to P2 and denote this as
, if there is a polynomial-time algorithm which, given a solution of P2, provides a solution for P1. This means that if
, then the problem P1 is no more difficult than P2 apart from the extra polynomial-time reduction effort. In that case, if we know an algorithm to solve P2 in polynomial time, then we have a polynomial-time algorithm for P1 too. If
and
, we say that the problems P1 and P2 are polynomial-time equivalent and write P1 ≅ P2.
In order to give an example of these concepts, we let G be a finite cyclic multiplicative group of order n and g a generator of G. The discrete logarithm problem (DLP) is the problem of computing for a given
an integer x such that a = gx. The Diffie–Hellman problem (DHP), on the other hand, is the problem of computing gxy from the given values of gx and gy. If one can compute y from gy, one can also compute gxy = (gx)y by performing an exponentiation in the group G. Therefore,
, if exponentiations in G can be computed in polynomial time. In other words, if a solution for DLP is known, a solution for DHP is also available: that is, DHP is no more difficult than DLP except for the additional exponentiation effort. However, the reverse implication (that is, whether
) is not known for many groups.
So far we have assumed that our reduction algorithms are deterministic. If we allow randomized (that is, probabilistic) polynomial-time reduction algorithms, we can similarly introduce the concepts of randomized polynomial-time reducibility and of randomized polynomial-time equivalence. We urge the reader to formulate the formal definitions for these concepts.
| 3.1 |
| |||||||||||||||||||||||||
| 3.2 |
| |||||||||||||||||||||||||
| 3.3 | Suppose that an algorithm A takes as input a bit string and runs in time g(t), where t is the number of one-bits in the input string. Let fb(n), fw(n), fa(n) and fe(n) respectively denote the best, worst, average and expected running times of A for inputs of size n. Derive the following table under the assumption that each of the 2n bit strings of length n is equally likely.
| |||||||||||||||||||||||||
| 3.4 |
| |||||||||||||||||||||||||
| 3.5 | Consider the Las Vegas algorithm discussed in Section 3.2.2 for generating a random irreducible polynomial of degree n over . Assume that a randomly chosen polynomial in of degree n has (an exact) probability of 1/n for being irreducible. Find out the probability pr that r polynomials chosen randomly (with repetition) from are all reducible. For n = 1000, calculate the numerical values of pr for r = 10i, i = 1, . . . , 6, and find the smallest integers r for which pr ≤ 1/2 and pr ≤ 10–12. Find the expected number of polynomials tested for irreducibility, before the algorithm terminates.
| |||||||||||||||||||||||||
| 3.6 | Let n = pq be the product of two distinct primes p and q. Show that factoring n is polynomial-time equivalent to computing φ(n) = (p–1)(q–1), where φ is Euler’s totient function. (Assume that an arithmetic operation (including computation of integer square roots) on integers of bit size t can be performed in polynomial time (in t).) | |||||||||||||||||||||||||
| 3.7 | Let G be a finite cyclic multiplicative group and let H be the subgroup of G generated by whose order is known. The generalized discrete logarithm problem (GDLP) is the following: Given , find out if and, if so, find an integer x for which a = hx. Show that GDLP ≅ DLP, if exponentiations in G can be carried out in polynomial time and if DLP in H is polynomial-time equivalent to DLP in G. [H]
| |||||||||||||||||||||||||
Cryptographic protocols based on the rings
and
demand n and p to be sufficiently large (of bit length ≥ 512) in order to achieve the desired level of security. However, standard compilers do not support data types to hold with full precision the integers of this size. For example, C compilers support integers of size ≤ 64 bits. So one must employ custom-designed data types for representing and working with such big integers. Many libraries are already available that can handle integers of arbitrary length. FREELIP, GMP, LiDIA, NTL and ZEN are some such libraries that are even freely available.
Alternatively, one may design one’s own functions for multiple-precision integers. Such a programming exercise is not very difficult, but making the functions run efficiently is a huge challenge. Several tricks and optimization techniques can turn a naive implementation to a much faster and more memory-efficient code and it needs years of experimental experience to find out the subtleties. Theoretical asymptotic estimates might serve as a guideline, but only experimentation can settle the relative merits and demerits of the available algorithms for input sizes of practical interest. For example, the theoretically fastest algorithm known for multiplying two multiple-precision integers is based on the so-called fast Fourier transform (FFT) techniques. But our experience shows that this algorithm starts to outperform other common but asymptotically slower algorithms only when the input size is at least several thousand bits. Since such very large integers are rarely needed by cryptographic protocols, FFT-based multiplication is not useful in this context.
In order to represent a large integer, we break it up into small parts and store each part in a memory word[3] accessible by built-in data types. The simplest way to break up a (positive) integer a is to predetermine a radix ℜ and compute the ℜ-ary representation (as–1, . . . , a0)ℜ of a (see Exercise 3.8). One should have ℜ ≤ 232 so that each ℜ-ary digit ai can be stored in a memory word. For the sake of efficiency, it is advisable to take ℜ to be a power of 2. It is also expedient to take ℜ as large as possible, because smaller values of ℜ lead to (possibly) longer size s and thereby add to the storage requirement and also to the running time of arithmetic functions. The best choice is ℜ = 232. We denote by ulong a built-in unsigned integer data type provided by the compiler (like the ANSI C standard unsigned long). We use an array of ulong for storing the digits. The array can be static or dynamic. Though dynamic arrays are more storage-efficient (because they can be allocated only as much memory as needed), they have memory allocation and deallocation overheads and are somewhat more complicated to programme than static arrays. Moreover, for cryptographic protocols one typically needs integers no longer than 4096 bits. Since the product of two integers of bit size t has bit size ≤ 2t, a static array of 8192/32 = 256 ulong suffices for storing cryptographic integers. It is also necessary to keep track of the actual size of an integer, since filling up with leading 0 digits is not an efficient strategy. Finally, it is often useful to have a signed representation of integers. A sign bit is also necessary for this case. We state three possible declarations in Exercise 3.11.
[3] We assume that a word in the memory is 32 bits long.
We now describe the implementations of addition, subtraction, multiplication and Euclidean division of multiple-precision integers. Every other complex operation (like modular arithmetic, gcd) is based on these primitives. It is, therefore, of utmost importance to write efficient codes for these basic operations.
For integers of cryptographic sizes, the most efficient algorithms are the standard ones we use for doing arithmetic on decimal numbers, that is, for two positive integers a = as–1 . . . a0 and b = bt–1 . . . b0 we compute the sum c = a + b = cr–1 . . . c0 as follows. We first compute a0 + b0. If this sum is ≥ ℜ, then c0 = a0 + b0 – ℜ and the carry is 1, otherwise c0 = a0 + b0 and the carry is 0. We then compute a1 + b1 plus the carry available from the previous digit, and compute c1 and the next carry as before.
For computing the product d = ab = dl–1 . . . d0, we do the usual quadratic procedure; namely, we initialize all the digits of d to 0 and for each i = 0, . . . , s – 1 and j = 0, . . . , t – 1 we compute aibj and add it to the (i + j)-th digit of d. If this sum (call it σ) at the (i + j)-th location exceeds ℜ – 1, we find out q, r with σ = qℜ + r, r < ℜ. Then di+j is assigned r, and q is added to the (i + j + 1)-st location. If that addition results in a carry, we propagate the carry to higher locations until it gets fully absorbed in some word of d.
All these sound simple, but complications arise when we consider the fact that the sum of two 32-bit words (and a possible carry from the previous location) may be 33 bits long. For multiplication, the situation is even worse, because the product aibj can be 64 bits long. Since our machine word can hold only 32 bits, it becomes problematic to hold all these intermediate sums and products to full precision. We assume that the least significant 32 bits are correctly returned and assigned to the output variable (ulong), whereas the leading 32 bits are lost.[4] The most efficient way to keep track of these overflows is to use assembly instructions and this is what many number theory packages (like PARI and UBASIC) do. But this means that for every target architecture we have to write different assembly codes. Here we describe certain tricks that make it possible to grab the overflow information with only high-level languages, without sufficiently degrading the performance compared to assembly instructions.
[4] This is the typical behaviour of a CPU that supports 2’s complement arithmetic.
First consider the sum ai + bi. We compute the least significant 32 bits by assigning ci = ai + bi. It is easy to see that an overflow occurs during this sum if and only if ci < ai. We set the output carry accordingly. Now, let us consider the situation when we have an input carry: that is, when we compute the sum ci = ai + bi+1. Here an overflow occurs if and only if ci ≤ ai. Algorithm 3.1 performs this addition of words.
|
Input: Words ai and bi and the input carry Output: Word ci and the output carry Steps: ci := ai + bi. if (γi) { ci ++, δi := ( (ci ≤ ai) ? 1 : 0 ). } else { δi := ( (ci < ai) ? 1 : 0 ). } |
Algorithm 3.1 assumes that ci and ai are stored in different memory words. If this is not the case, we should store ai + bi in a temporary variable and, after the second line, ci should be assigned the value of this temporary variable. Note also that many processors provide an increment primitive which is faster than the general addition primitive. In that case, the statement ci++ is preferable to ci := ci+1.
For subtraction, we proceed analogously from right to left and keep track of the borrow. Here the check for overflow can be done before the subtraction of words is carried out (and, therefore, no temporary variable is needed, if we assume that the output carry is not stored in the location of the operands).
|
Input: Words ai and bi and the input borrow Output: Word ci and the output borrow Steps: if (γi) { δi := ( (ai ≤ bi) ? 1 : 0 ), ci := ai – bi, ci – –. } else { δi := ( (ai < bi) ? 1 : 0 ), ci := ai – bi. } |
We urge the reader to develop the complete addition and subtraction procedures for multiple-precision integers, based on the above primitives for words.
The product of two 32-bit words can be as long as 64 bits, and we plan to (compute and) store this product in two words. Assuming the availability of a built-in 64-bit unsigned integer data type (which we will henceforth denote as ullong), this can be performed as in Algorithm 3.3.
|
Input: Words a and b. Output: Words c and d with ab = cℜ + d. Steps: /* We use a temporary variable t of data type ullong */ t := (ullong)(a) * (ullong)(b), c := (ulong)(t ≫ 32), d := (ulong)t. |
We use a temporary 64-bit integer variable t to store the product ab. The lower 32 bits of t is stored in d by simply typecasting, whereas the higher 32 bits of t is obtained by right-shifting t (the operator ≫) by 32 bits. This is a reasonable strategy given that we do not explore assembly-level instructions. Algorithm 3.4 describes a multiplication algorithm for two multiple-precision integer operands, that does not directly use the word-multiplying primitive of Algorithm 3.3.
The reader can verify easily that this code properly computes the product. We now highlight how this makes the computation efficient. The intermediate results are stored in the array t of 64-bit ullong. This means that after the 64-bit product aibj of words ai and bj is computed (in the temporary variable T), we directly add T to the location ti+j. If the sum exceeds ℜ2 – 1 = 264 – 1, that is, if an overflow occurs, we should add ℜ to ti + j + 1 or equivalently 1 to ti+j+2. This last addition is one of ullong integers and can be made more efficient, if this is replaced by ulong increments, and this is what we do using the temporary array u. Since the quadratic loop is the bottleneck of the multiplication procedure, it is absolutely necessary to make this loop as efficient as possible.
|
Input: Integers a = (ar–1 . . . a0)ℜ and b = (bs–1 . . . b0)ℜ Output: The product c = (cr+s–1 . . . c0)ℜ = ab. Steps: /* Let T be a variable and t0, . . . , tr+s–1 an array of ullong variables */ /* Let v be a variable and u0, . . . , ur+s–1 an array of ulong variables */ Initialize the array locations ci, ti and ui to 0 for all i = 0, . . . , r + s – 1. /* The quadratic loop */ |
After the quadratic loop, we do deferred normalization from the array of 64-bit double-words ti to the array of 32-bit words ci. This is done using the typecasting and right-shift strategy mentioned in Algorithm 3.3. We should also take care of the intermediate carries stored in the array u. The normalization loop takes a total time of O(r + s), whereas the quadratic loop takes time O(rs). If we had done normalization inside the quadratic loop itself, that would incur an additional O(rs) cost (which is significantly more than that of deferred normalization).
If both the operands a and b of multiplication are same, it is not necessary to compute aibj and ajbi separately. We should add to ti+j the product
, if i = j, or the product 2aiaj, if i < j. Note that 2aiaj can be computed by left shifting aiaj by one bit. This might result in an overflow which can be checked before shifting by looking at the 64th bit of aiaj. Algorithm 3.5 incorporates these changes.
For the multiplication of two multiple-precision integers, there are algorithms that are asymptotically faster than the quadratic Algorithms 3.4 and 3.5. However, not all these theoretically faster algorithms are practical for sizes of integers used in cryptology. Our practical experience shows that a strategy due to Karatsuba outperforms the quadratic algorithm, if both the operands are of roughly equal sizes and if the bit lengths of the operands are 300 or more. We describe Karatsuba’s algorithm in connection with squaring, where the two operands are same (and hence of the same size). Suppose we want to compute a2 for a multiple-precision integer a = (ar–1 . . . a0)ℜ. We first break a into two integers of almost equal sizes, namely, α := (ar–1 . . . at)ℜ and β := (at–1 . . . a0)ℜ, so that a = ℜtα + β. Now, a2 = α2ℜ2t + 2αβℜt + β2 and 2αβ = (α2 + β2) – (α – β)2. We recursively invoke Karatsuba’s multiplication with operands α, β and α – β. Recursion continues as long as the operands are not too small and the depth of recursion is within a prescribed limit. One can check that Karatsuba’s algorithm runs in time O(rlg 3 lg r) = O(r1.585 lg r) which is a definite improvement over the O(r2) running time taken by the quadratic algorithm.
|
for (i = 0, . . . , r – 1) and (j = i, . . . , r – 1) { |
The best-known algorithm for multiplication of two multiple-precision integers is based on the fast Fourier transform (FFT) techniques and has running time O~(r). However, for integers used in cryptology this algorithm is usually not practical. Therefore, we will not discuss FFT multiplication in this book.
Euclidean division with remainder of multiple-precision integers is somewhat cumbersome, although conceptually as difficult (that is, as simple) as the division procedure of decimal integers, taught in early days of school. The most challenging part in the procedure is guessing the next digit in the quotient. For decimal integers, we usually do this by looking at the first few (decimal) digits of the divisor and the dividend. This need not give us the correct digit, but something close to the same. In the case of ℜ-ary digits, we also make a guess of the quotient digit based on a few leading ℜ-ary digits of the divisor and the dividend, but certain precautions have to be taken to ensure that the guess is not too different from the correct one.
Suppose we are given positive integers a = (ar–1 . . . a0)ℜ and b = (bs–1 . . . b0)ℜ with ar–1 ≠ 0 and bs–1 ≠ 0, and we want to compute the integers x = (xr–s . . . x0)ℜ and y = (ys–1 . . . y0)ℜ with a = xb + y, 0 ≤ y < b. First, we want that bs–1 ≥ ℜ/2 (you’ll see why, later). If this condition is already not met, we force it by multiplying both a and b by 2t for some suitable t, 0 < t < 32. In that case, the quotient remains the same, but the remainder gets multiplied by 2t. The desired remainder can be later found out easily by right-shifting the computed remainder by t bits. The process of making bs–1 ≥ ℜ/2 is often called normalization (of b). Henceforth, we will assume that b is normalized. Note that normalization may increase the word-size of a by 1.
|
Input: Integers a = (ar–1 . . . a0)ℜ and b = (bs–1 . . . b0)ℜ with r ≥ 3, s ≥ 2, ar–1 ≠ 0, bs–1 ≥ ℜ/2 and a ≥ b. Output: The quotient x = (xr–s . . . x0)ℜ = a quot b and the remainder y = (ys–1 . . . y0)ℜ = a rem b of Euclidean division of a by b. Steps: Initialize the quotient digits xi to 0 for i = 0, . . . , r – s. |
Algorithm 3.6 implements multiple-precision division. It is not difficult to prove the correctness of the algorithm. We refrain from doing so, but make some useful comments. The initial check inside the main loop may cause the increment of xi–s+1. This may lead to a carry which has to be adjusted to higher digits. This carry propagation is not mentioned in the code for simplicity. Since b is assumed to be normalized, this initial check needs to be carried out only once; that is, for a non-normalized b we have to replace the if statement by a while loop. This is the first advantage of normalization. In the first step of guessing the quotient digit xi–s, we compute ⌊(aiℜ + ai–1)/bs–1⌋ using ullong arithmetic. At this point, the guess is based only on two leading digits of a and one leading digit of b. In the while loop, we refine this guess by considering one more digit of a and b each. Since b is normalized, this while loop is executed no more than twice (the second advantage of normalization). The guess for xi–s made in this way is either equal to or one more than the correct value which is then computed by comparing a with xi–sbℜi–s. The running time of the algorithm is O(s(r – s)). For a fixed r, this is maximum (namely O(r2)) when s ≈ r/2.
Multiplication and division by a power of 2 can be carried out more efficiently using bit operations (on words) instead of calling the general procedures just described. It is also often necessary to compute the bit length of a non-zero multiple-precision integer and the multiplicity of 2 in it. In these cases also, one should use bit operations for efficiency. For these implementations, it is advantageous to maintain precomputed tables of the constants 2i, i = 0, . . . , 31, and of 2i – 1, i = 0, . . . , 32, rather than computing them in situ every time they are needed. In Algorithm 3.7, we describe an implementation of multiplication by a power of 2 (that is, the left shift operation). We use the symbols OR, ≫ and ≪ to denote bit-wise or, right shift and left shift operations on 32-bit integers.
|
Input: Integer a = (ar–1 . . . a0)ℜ ≠ 0, ar–1 ≠ 0, and Output: The integer c = (cs–1 . . . c0)ℜ = a · 2t, cs–1 ≠ 0. Steps: u := t quot 32, v := t rem 32. |
Unless otherwise mentioned, we will henceforth forget about the above structural representation of multiple-precision integers and denote arithmetic operations on them by the standard symbols (+, –, * or · or ×, quot, rem and so on).
Computing the greatest common divisor of two (multiple-precision) integers has important applications. In this section, we assume that we want to compute the (positive) gcd of two positive integers a and b. The Euclidean gcd loop comprising repeated division (Proposition 2.15) is not usually the most efficient way to compute integer gcds. We describe the binary gcd algorithm that turns out to be faster for practical bit sizes of the operands a and b. If a = 2ra′ and b = 2sb′ with a′ and b′ odd, then gcd(a, b) = 2min(r,s) gcd(a′, b′). Therefore, we may assume that a and b are odd. In that case, if a > b, then gcd(a, b) = gcd(a – b, b) = gcd((a – b)/2t, b), where t := v2(a – b) is the multiplicity of 2 in a – b. Since the sum of the bit sizes of (a – b)/2t and b is strictly smaller than that of a and b, repeating the above computation eventually terminates the algorithm after finitely many iterations.
|
Input: Two positive integers a, b with a ≥ b and b odd. Output: Integers d, u and v with d = gcd(a, b) = ua + vb > 0. If (a, b) ≠ (1, 1), then |u| < b and |v| < a. Steps: /* Initial reduction */ |
Multiple-precision division is much costlier than subtraction followed by division by a power of 2. This is why the binary gcd algorithm outperforms the Euclidean gcd algorithm. However, if the bit sizes of a and b differ reasonably, it is preferable to use Euclidean division once and replace the pair (a, b) by (b, a rem b), before entering the binary gcd loop. Even when the original bit sizes of a and b are not much different, one may carry out this initial reduction, because in this case Euclidean division does not take much time.
Recall from Proposition 2.16 that if d := gcd(a, b), then for some integers u and v we have d = ua + vb. Computation of d along with a pair of integers u, v is called the extended gcd computation. Both the Euclidean and the binary gcd loops can be augmented to compute these integers u and v. Since binary gcd is faster than Euclidean gcd, we describe an implementation of the extended binary gcd algorithm. We assume that 0 < b ≤ a and compute u and v in such a way that if (a, b) ≠ (1, 1), then |u| < b and |v| < a. Algorithm 3.8, which shows the details, requires b to be odd. The other operand a may also be odd, though the working of the algorithm does not require this.
In order to prove the correctness of Algorithm 3.8, we introduce the sequence of integers xk, yk, u1,k, u2,k, v1,k and v2,k for k = 0, 1, 2, . . . , initialized as:
| x0 := b, | u1, 0 := 1, | v1, 0 := 0, |
| y0 := r, | u2, 0 := 0, | v2, 0 := 1. |
During the k-th iteration of the main loop, k = 1, 2, . . . , we modify the values xk–1, yk–1, u1,k–1, u2,k–1, v1,k–1 and v2,k–1 to xk, yk, u1,k, u2,k, v1,k and v2,k in such a way that we always maintain the relations:
| u1,kx0 + v1,ky0 | = | xk, |
| u2,kx0 + v2,ky0 | = | yk. |
The main loop terminates when xk = 0, and at that point we have the desired relation yk = gcd(b, r) = u2,kb + v2,kr. For the updating during the k-th iteration, we assume that xk–1 ≥ yk–1. (The converse inequality can be handled analogously.) The x and y values are updated as xk := (xk–1 – yk–1)/2tk, yk := yk–1, where tk := v2(xk–1 – yk–1). Thus, we have u2,k = u2,k–1 and v2,k = v2,k–1, whereas if tk > 0, we write

All the expressions within square brackets in the last equation are integers, since x0 = b is odd. Note that updating the variables in the loop requires only the values of these variables available from the previous iteration. Therefore, we may drop the prefix k and call these variables x, y, u1, u2, v1 and v2. Moreover, the variables u1 and u2 need not be maintained and updated in every iteration, since the updating procedure for the other variables does not depend on the values of u1 and u2. We need the value of u2 only at the end of the main loop, and this is available from the relation y = u2b + v2r maintained throughout the loop. The formula u2b + v2r = y = gcd(b, r) is then combined with the relations a = qb + r and gcd(a, b) = gcd(b, r) to get the final relation gcd(a, b) = v2a + (u2 – v2q)b.
Algorithm 3.8 continues to work even when a < b, but in that case the initial reduction simply interchanges a and b and we forfeit the possibility of the reduction in size of the arguments (x and y) caused by the initial Euclidean division.
Finally, we remove the restriction that b is odd. We write a = 2ra′ and b = 2sb′ with a′, b′ odd and call Algorithm 3.8 with a′ and b′ as parameters (swapping a′ and b′, if a′ < b′) to compute integers d′, u′, v′ with d′ = gcd(a′, b′) = u′a′ + v′b′. Without loss of generality, assume that r ≥ s. Then d := gcd(a, b) = 2sd′ = u′(2sa′) + v′b. If r = s, then 2sa′ = a and we are done. So assume that r > s. If u′ is even, we can extract a power of 2 from u′ and multiply 2sa′ by this power. So let’s say that we have a situation of the form
for some integers
and
, with
odd, and for s ≤ t < r. We can rewrite this as
. Since
is even, this gives us
, where τ > t and where
is odd or τ = r. Proceeding in this way, we eventually reach a relation of the form d = u(2ra′) + vb = ua + vb. It is easy to check that if (a′, b′) ≠ (1, 1), then the integers u and v obtained as above satisfy |u| < b and |v| < a.
So far, we have described how we can represent and work with the elements of
. In cryptology, we are seemingly more interested in the arithmetic of the rings
for multiple-precision integers n. We canonically represent the elements of
by integers between 0 and n – 1.
Let a,
. In order to compute a + b in
, we compute the integer sum a + b, and, if a + b ≥ n, we subtract n from a + b. This gives us the desired canonical representative in
. Similarly, for computing a – b in
, we subtract b from a as integers, and, if the difference is negative, we add n to it. For computing
, we multiply a and b as integers and then take the remainder of Euclidean division of this product by n.
Note that
is invertible (that is,
) if and only if gcd(a, n) = 1. For
, a ≠ 0, we call the extended (binary) gcd algorithm with a and n as the arguments and get integers d, u, v satisfying d = gcd(a, n) = ua+vn. If d > 1, a is not invertible modulo n. Otherwise, we have ua ≡ 1 (mod n), that is, a–1 ≡ u (mod n). The extended gcd algorithm indeed returns a value of u satisfying |u| < n. Thus if u > 0, it is the canonical representative of a–1, whereas if u < 0, then u + n is the canonical representative of a–1.
Another frequently needed operation in
is modular exponentiation, that is, the computation of ae for some
and
. Since a0 = 1 for all
and since ae = (a–1)–e for e < 0 and
, we may assume, without loss of generality, that
. Computing the integral power ae followed by taking the remainder of Euclidean division by n is not an efficient way to compute ae in
. Instead, after every multiplication, we reduce the product modulo n. This keeps the size of the intermediate products small. Furthermore, it is also a bad idea to compute ae as (· · ·((a·a)·a)· · ·a) which involves e – 1 multiplications. It is possible to compute ae using O(lg e) multiplications and O(lg e) squarings in
, as Algorithm 3.9 suggests. This algorithm requires the bits of the binary expansion of the exponent e, which are easily obtained by bit operations on the words of e.
The for loop iteratively computes bi := a(er–1 ... ei)2 (mod n) starting from the initial value br := 1. Since (er–1 . . . ei)2 = 2(er–1 . . . ei+1)2 + ei, we have
(mod n). This establishes the correctness of the algorithm. The squaring (b2) and multiplication (ba) inside the for loop of the algorithm are computed in
(that is, as integer multiplication followed by reduction modulo n). If we assume that er–1 = 1, then r = ⌈lg e⌉. The algorithm carries out r squares and ρ ≤ r multiplications in
, where ρ is the number of bits of e, that are 1. On an average ρ = r/2. Algorithm 3.9 runs in time O((log e)(log n)2). Typically, e = O(n), so this running time is O((log n)3).
|
Input: Output: Steps: Let the binary expansion of e be e = (er–1 . . . e1e0)2, where each |
Now, we describe a simple variant of this square-and-multiply algorithm, in which we choose a small t and use the 2t-ary representation of the exponent e. The case t = 1 corresponds to Algorithm 3.9. In practical situations, t = 4 is a good choice. As in Algorithm 3.9, multiplication and squaring are done in
.
|
Input: Output: Steps: Let e = (er–1 . . . e1e0)2t, where each |
In Algorithm 3.10, the powers al, l = 0, 1, . . . , 2t – 1, are precomputed using the formulas: a0 = 1, a1 = a and al = al–1 · a for l ≥ 2. The number of squares inside the for loop remains (almost) the same as in Algorithm 3.9. However, the number of multiplications in this loop reduces at the expense of the precomputation step. For example, let n be an integer of bit length 1024 and let e ≈ n. A randomly chosen e of this size has about 512 one-bits. Therefore, the for loop of Algorithm 3.9 does about 512 multiplications, whereas with t = 4 Algorithm 3.10 does only 1024/4 = 256 multiplications with the precomputation step requiring 14 multiplications. Thus, the total number of multiplications reduces from (about) 512 to 14 + 256 = 270.
During a modular exponentiation in
, every reduction (computation of remainder) is done by the fixed modulus n. Montgomery exponentiation exploits this fact and speeds up each modular reduction at the cost of some preprocessing overhead.
Assume that the storage of n requires s ℜ-ary digits, that is, n = (ns–1 . . . n0)ℜ (with ns–1 ≠ 0). Take R := ℜs = 232s, so that R > n. As is typical in most cryptographic situations, n is an odd integer (for example, a big prime or a product of two big primes). Then gcd(ℜ, n) = gcd(R, n) = 1. Use the extended gcd algorithm to precompute n′ := –n–1 (mod ℜ).
We associate
with
, where
(mod n). Since R is invertible modulo n, this association gives a bijection of
onto itself. This bijection respects the addition in
: that is,
in
. Multiplication in
, on the other hand, corresponds to
, and can be implemented as Algorithm 3.11 suggests.
|
Input: Output: Montgomery representation Steps:
|
Montgomery multiplication works as follows. In the first step, it computes the integer product
. The subsequent for loop computes
(mod n). Since n′ ≡ –n–1 (mod ℜ), the i-th iteration of the loop makes wi = 0 (and leaves wi–1, . . . ,w0 unchanged). So when the for loop terminates, we have w0 = w1 = · · · = ws–1 = 0: that is,
is a multiple of ℜs = R. Therefore,
is an integer. Furthermore, this
is obtained by adding to
a multiple of n: that is,
for some integer k ≥ 0. Since R is coprime to n, it follows that
(mod n). But this
may be bigger than the canonical representative of
. Since k is an integer with s ℜ-ary digits (so that k < R) and
and n < R, it follows that
. Therefore, if
exceeds n – 1, a single subtraction suffices.
Computation of
requires ≤ s2 single-precision multiplications. One can use the optimized Algorithm 3.4 for that purpose. In case of squaring,
and further optimizations (say, in the form of Karatsuba’s method) can be employed.
Each iteration of the for loop carries out s + 1 single-precision multiplications. (The reduction modulo ℜ is just returning the more significant word in the two-word product win′.) Since, the for loop is executed s times, Algorithm 3.11 performs a total of ≤ s2 + s(s+1) = 2s2 + s single-precision multiplications.
Integer multiplication (Algorithm 3.4) followed by classical modular reduction (Algorithm 3.6) does almost an equal number of single-precision multiplications, but also O(s) divisions of double-precision integers by single-precision ones. It turns out that the complicated for loop of Algorithm 3.6 is slower than the much simpler loop in Algorithm 3.11. But if precomputations in the Montgomery multiplication are taken into account, we do not tend to achieve a speed-up with this new technique. For modular exponentiations, however, precomputations need to be done only once: that is, outside the square-and-multiply loop, and Montgomery multiplication pays off. In Algorithm 3.12, we rewrite Algorithm 3.9 in terms of the Montgomery arithmetic. A similar rewriting applies to Algorithm 3.10.
|
Input: Output: b = ae (mod n). Steps: /* Precomputations */
|
| 3.8 | Let , ℜ > 1. Show that every can be represented uniquely as a tuple (as–1, . . . , a1, a0) for some (depending on a) with
a = as–1ℜs–1 + · · · + a1ℜ + a0, 0 ≤ ai < ℜ for all i and as–1 ≠ 0. In this case, we write a as (as–1 . . . a0)ℜ or simply as as–1 . . . a0, when ℜ is understood from the context. ℜ is called the radix or base of this representation, as–1, . . . , a0 the (ℜ-ary) digits of a, as–1 the most significant digit, a0 the least significant digit and s the size of a with respect to the radix ℜ. | |||
| 3.9 | Let . Show that every can be written uniquely as
a = asRs + as–1Rs–1 + · · · + a1R + a0 with each | |||
| 3.10 | Negative radix Show that every integer with | |||
| 3.11 | Investigate the relative merits and demerits of the following three representations (in C) of multiple-precision integers needed for cryptography. In each case, we have room for storing 256 ℜ-ary words, the actual size and a sign indicator. In the second and third representations, we use two extra locations (sizeIdx and signIdx) in the digit array for holding the size and sign information.
Remark: We recommend the third representation. | |||
| 3.12 | Write an algorithm that prints a multiple-precision integer in decimal and an algorithm that accepts a string of decimal digits (optionally preceded by a + or – sign) and stores the corresponding integer as a multiple-precision integer. Also write algorithms for input and output of multiple-precision integers in hexadecimal, octal and binary. | |||
| 3.13 | Write an algorithm which, given two multiple-precision integers a and b, compares the absolute values |a| and |b|. Also write an algorithm to compare a and b as signed integers. | |||
| 3.14 |
| |||
| 3.15 | Describe a representation of rational numbers with exact multiple-precision numerators and denominators. Implement the arithmetic (addition, subtraction, multiplication and division) of rational numbers under this representation. | |||
| 3.16 | Sliding window exponentiation Suppose we want to compute the modular exponentiation ae (mod n). Consider the following variant of the square-and-multiply algorithm: Choose a small t (say, t = 4) and precompute a2t–1, a2t–1+1, . . . , a2t–1 modulo n. Do squaring for every bit of e, but skip the multiplication for zero bits in e. Whenever a 1 bit is found, consider the next t bits of e (including the 1 bit). Let these t bits represent the integer l, 2t–1 ≤ l ≤ 2t – 1. Multiply by al (mod n) (after computing usual t squares) and move right in e by t bit positions. Argue that this method works and write an algorithm based on this strategy. What are the advantages and disadvantages of this method over Algorithm 3.10? | |||
| 3.17 | Suppose we want to compute aebf (mod n), where both e and f are positive r-bit integers. One possibility is to compute ae and bf modulo n individually, followed by a modular multiplication. This strategy requires the running time of two exponentiations (neglecting the time for the final multiplication). In this exercise, we investigate a trick to reduce this running time to something close to 1.25 times the time for one exponentiation. Precompute ab (mod n). Inside the square-and-multiply loop, either skip the multiplication or multiply by a, b or ab, depending upon the next bits in the two exponents e and f. Complete the details of this algorithm. Deduce that, on an average, the running time of this algorithm is as declared above. | |||
| 3.18 | Let , m ≠ 1. An addition chain for m of length l is a sequence 1 = a1, a2, . . . , al = m of natural numbers such that for every index i, 2 ≤ i ≤ l, there exist indices i1, i2 < i with ai = ai1 + ai2. (It is allowed to have i1 = i2.)
|
Now that we know how to work in
and in the residue class rings
,
, we address some important computational problems associated with these rings. In this chapter, we restrict ourselves only to those problems that are needed for setting up various cryptographic protocols.
One of the simplest and oldest questions in algorithmic number theory is to decide if a given integer
, n > 1, is prime or composite. Practical primality testing algorithms are based on randomization techniques. In this section, we describe the Monte Carlo algorithm due to Miller and Rabin. The obvious question that comes next is to find one (or all) of the prime factors of an integer, deterministically or probabilistically proven to be composite. This is the celebrated integer factorization problem and will be formally introduced in Section 4.2. In spite of the apparent proximity between the primality testing and the integer factoring problems, they currently have widely different (known) complexities. Primality testing is easy and thereby promotes efficient setting up of cryptographic protocols. On the other hand, the difficulty of factoring integers protects these protocols against cryptanalytic attacks.
|
Let n be an odd integer greater than 1 and let |
By Fermat’s little theorem, a prime p is a pseudoprime to every base
with gcd(a, p) = 1. However, the converse of this is not true. By Exercise 3.19, n is not a pseudoprime to at least half of the bases in
, provided that there is at least one such base in
. Unfortunately, there exist composite integers m, known as Carmichael numbers, such that m is a pseudoprime to every base
. The smallest Carmichael number is 561 = 3 × 11 × 17. Exercises 3.21 and 3.22 investigate some properties of these numbers. Though Carmichael numbers are not very abundant in nature (
), they are still infinite in number. So a robust primality test requires n to satisfy certain constraints in addition to being a pseudoprime to one or more bases. The following constraint is due to Solovay and Strassen.
|
Let n be an odd integer > 1 and let |
By Euler’s criterion (Proposition 2.21), if p is a prime and gcd(a, p) = 1, then p is an Euler pseudoprime to the base a. The converse in not true, in general, but if n is composite, then n is an Euler pseudoprime to at most φ(n)/2 bases in
(Exercise 3.20). This, in turn, implies that if n is an Euler pseudoprime to t randomly chosen bases in
, then the chance that n is composite is no more than 1/2t. This observation leads to a Monte Carlo algorithm for testing the primality of an integer, where the probability of error (1/2t) can be made arbitrarily small by choosing large values of t. A more efficient algorithm can be developed using the following concept due to Miller and Rabin.
|
Let n be an odd integer > 1 with n – 1 = 2rn′, r := v2(n – 1) > 0, n′ odd, and let |
The rationale behind this definition is the following. If for some
we have an–1 ≢ 1 (mod n), we conclude with certainty that n is composite. So assume that an–1 ≡ 1 (mod n) and consider the powers bi := a2in′ (mod n) for i = 0, 1, . . . , r to see how the sequence b0, b1, . . . eventually reaches br ≡ 1 (mod n). If b0 ≡ 1 (mod n) already, this dynamics is clear. If, on the other hand, we have an i such that bi ≢ 1 (mod n), whereas bi+1 ≡ 1 (mod n), then bi is a square root of 1 modulo n. If n is a prime, the only square roots of 1 modulo n are ±1 and so n must be a strong pseudoprime to the base a. On the other hand, if n is composite but not the power of a prime, then 1 has at least two non-trivial square roots (that is, square roots other than ±1) modulo n (Exercise 3.30). We hope to find one such non-trivial square root of 1 in the sequence b0, b1, . . . , br–1 and if we are successful, the compositeness of n is proved with certainty.
A complete residue system modulo an odd composite n contains at most n/4 bases to which n is a strong pseudoprime. The proof of this fact is somewhat involved (though elementary) and can be found elsewhere, for example, in Chapter V of Koblitz [153]. Here, we concentrate on the Monte Carlo Algorithm 3.13 known as the Miller–Rabin primality test and based on this observation.
|
Input: An odd integer Output: A certificate that either “n is composite” or “n is prime”. Steps: Find out n′ and r such that n – 1 = 2rn′ with |
Whenever Algorithm 3.13 outputs n is composite, it is correct. On the other hand, if it certifies n as prime, there is a probability δ that n is composite. This probability can be made very small by choosing a suitably large value of the iteration count t. For cryptographic applications, δ ≤ 1/280 is considered sufficiently safe. In view of the first statement of the last paragraph, we can take t = 40 to meet this error bound. In practice, much smaller values of t offer the desired confidence. For example, if n is of bit length 250, 500, 750 or 1000, the respective values t = 12, 6, 4 and 3 suffice.
Although, in Algorithm 3.13, we have chosen a to be an arbitrary integer between 2 and n – 2, there is apparently no harm, if we choose a randomly in the interval 2 ≤ a < 232. In fact, such a choice of single-precision bases is desirable, because that makes the exponentiation an′ (mod n) more efficient (See Algorithm 3.9). A typical cryptographic application loads at start-up a precalculated table of small primes (say, the first thousand primes). Choosing the bases randomly from this list of small primes is indeed a good idea.
While the Miller–Rabin algorithm settles the primality testing problem in a practical sense, it is, after all, a randomized algorithm. It is interesting, at the minimum theoretically, to investigate the deterministic complexity of primality testing. There has been a good amount of research in this line. Let us sketch here the history of deterministic primality proving, without going to rigorous mathematical details.
One natural strategy to check for the primality of a positive integer n is to factor it. However, factoring integers is a computationally difficult problem. Primality proving has been found to be a much easier computational exercise. That is, one need not factorize n explicitly in order to claim about the primality of n.
The (seemingly) first modern primality testing algorithm is due to Miller[204]. This algorithm is deterministic polynomial-time, provided that the extended Riemann hypothesis or ERH (Conjecture 2.3) is true. Since the ERH is still an unsolved problem in mathematics, it cannot be claimed with certainty if Miller’s test is really a polynomial-time algorithm. Rabin [248] provided a version of Miller’s test which is unconditionally polynomial-time, but is, at the same time, randomized. This is what we have discussed earlier under the name Miller–Rabin primality test. This is a Monte Carlo algorithm which produces the answer no (composite) with certainty, but the answer yes (prime) with some (small) probability of error. Solovay and Strassen’s test [287] based on Definition 3.3 is another no-biased randomized polynomial-time primality test and can be made deterministically polynomial-time under the ERH.
Adleman and Huang [3], using the work of Goldwasser and Kilian [116], provide a yes-biased randomized primality-proving algorithm that runs in expected polynomial time unconditionally. Adleman et al. [4] propose the first deterministic algorithm that runs unconditionally in time less than fully exponential (in log n). Its (worst) running time is (ln n)O(ln ln ln n), which is still not polynomial. (The exponent ln ln ln n grows very slowly with n, but still is not a constant.)
In August 2002, Agarwal, Kayal and Saxena came up with the first deterministic primality testing algorithm that runs in polynomial time unconditionally, that is, under no unproven assumptions. This algorithm, popularly abbreviated as the AKS algorithm, is based on the observation that n is prime if and only if (X + a)n ≡ Xn + a (mod n) for every
(Exercise 3.26). A naive application of this observation requires computing an exponential number of coefficients in the binomial expansion of (X + a)n. The AKS algorithm gets around with this difficulty by checking the new congruence
Equation 3.2

for some polynomial h(X) of small degree. Here the notation (mod n, h(X)) means modulo the ideal
of
. If deg h(X) is bounded by a polynomial in log n, (X + a)n (and also Xn + a) can be computed modulo n, h(X) in polynomial time. However, reduction modulo h(X) may allow a composite n to satisfy the new congruence. Agarwal et al. took h(X) := Xr –1 for some prime r = O(ln6 n) with r – 1 having a prime divisor
ln n. From a result in analytic number theory due to Fouvry, such a prime r always exists. Congruence (3.2) is verified for this h(X) and for at most
ln n values of a. An elementary proof presented in Agarwal et al. [5] demonstrates that this suffices to conclude deterministically and unconditionally about the primality of n. The AKS algorithm in this form runs in time O~(ln12 n).
Lenstra and Pomerance [175] have reduced the running time of the AKS algorithm to O~(ln6 n). The AKS paper comes with another conjecture which, if true, yields a O~(ln3 n) deterministic primality-proving algorithm.
|
Let n be an odd integer > 1, and (X – 1)n ≡ Xn – 1 (mod n, Xr – 1), then either n is prime or n2 ≡ 1 (mod r). |
It remains an open question whether a future version of the AKS algorithm would supersede the Miller–Rabin test in terms of performance. As long as the answers are not favourable to the AKS algorithm, these new theoretical endeavours do not seem to have sufficient impacts on cryptography. Primes certified by the Miller–Rabin test are at present secure enough for all applications. Nonetheless, the AKS breakthrough has solid theoretical implications and deserves mention in a prime context.
If a random prime of a given bit length t is called for, we can keep on generating random odd integers of bit length t and check these integers for primality using the Miller–Rabin test. The prime number Theorem 2.20 ascertains that after O(t) iterations we expect to find a prime. A somewhat similar but reasonably faster algorithm is discussed in Exercise 4.14. We will henceforth call random primes of a given bit length and having no additional imposed properties as naive primes. Naive primes are often not cryptographically secure, because the primes used in many protocols should satisfy certain properties in order to preclude some known cryptanalytic attacks.
|
Let p be an odd prime. Then p is called a safe prime, if (p – 1)/2 is also a prime, whereas p is called a strong prime, if
In cryptography, a large prime divisor typically refers to one with bit length ≥ 160. |
A random safe prime of a given bit length t can be found by generating a random sequence of natural numbers n congruent to 3 modulo 4 and of bit length t, until one is found for which both n and (n – 1)/2 are primes (as certified by the Miller–Rabin primality test). The prime number theorem once again implies that this search is expected to terminate after O(t2) iterations.
For generating a random strong prime p of bit length t, we first generate q′ and q″ and then q and finally p. (See the notations of Definition 3.5.) Algorithm 3.14 describes Gordon’s algorithm in which the bit lengths l and l′ of q and q′ are nearly t/2 and the bit length l″ of q″ is slightly smaller than l′. In our concrete implementation of the algorithm, we choose l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20 and l″ := ⌈t/2⌉ – 22. If t is sufficiently large (say, t ≥ 400), the prime divisors q, q′ and q″ are then cryptographically large.
The simple check that Gordon’s algorithm correctly computes a strong prime of bit length t with q, q′ and q″ as in Definition 3.5 is based on Fermat’s little theorem and is left to the reader. Note that with our choice of l, l′ and l″, the loop variables i and j run through single-precision values only, thereby making arithmetic involving them efficient. Also note that the ranges over which i and j vary are sufficiently large so that we expect the (outer) while loop to be executed only once. This implementation has a tendency to generate smaller values of q and p (with the given bit sizes). In practice, this is not a serious problem and can be avoided, if desired, by choosing random values of i and j from the indicated ranges.
|
Output: A strong prime p of bit length t. Steps: l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20, l″ := ⌈t/2⌉ – 22. while (1) { |
Gordon’s algorithm takes only nominally more expected running time than that needed by the algorithm discussed at the beginning of Section 3.4.2 for generating naive primes of the same bit length. On the other hand, safe primes are much costlier to generate and may be avoided, unless the situation specifically demands their usage.
Determination of square roots modulo a prime p is frequently needed in cryptographic applications. In this section, we assume that p is an odd prime and want to compute the square roots of
, gcd(a, p) = 1, modulo p, provided that a is a quadratic residue modulo p, that is, if
. Using the Jacobi symbol the value
can be computed efficiently as Algorithm 3.15 suggests.
The correctness of Algorithm 3.15 follows from the properties of the Jacobi symbol (Proposition 2.22 and Theorem 2.19). The value of (–1)(b2–1)/8 is determined by the value of b modulo 8, that is, by the three least significant bits of b:

Similarly, (–1)(a – 1)(b – 1)/4 can be computed using only the second least significant bits of a and b as:

If
, our next task is to compute
with x2 ≡ a (mod p). If one such x is found, the other square root of a modulo p is –x ≡ p – x (mod p). If p ≡ 3 (mod 4) or p ≡ 5 (mod 8), we have explicit formulas for a square root x. The remaining case, namely p ≡ 1 (mod 8), is somewhat complicated. In this case, we use the probabilistic algorithm due to Tonelli and Shanks. The details are given in Algorithm 3.16. The explicit formulas for the first two cases are easy to verify. We now prove the correctness of the algorithm in the remaining case.
|
Input: An odd prime p and an integer a, 1 ≤ a < p. Output: The Legendre symbol Steps:
/* The Euclidean loop */
|
Since
is cyclic and has order p – 1 = 2vq, the 2-Sylow subgroup G of
has order 2v and is also cyclic. Let g be a generator of G. By Euler’s criterion, aq is a square in G and, therefore, aqge = 1 (in G) for some even integer e, 0 ≤ e < 2v, and x ≡ a(q + 1)/2ge/2 (mod p) is a square root of a modulo p.
A generator g of G can be obtained by choosing random elements b from
and computing the Legendre symbol
. It is easy to see that
. Furthermore, bq is a generator of G if and only if
. Finding a quadratic non-residue in
is the probabilistic part of the algorithm. Since exactly half of the elements of
are quadratic non-residues, one expects to find one after a few random trials. In order to make the exponentiation bq efficient, b should be chosen as single-precision integers. The while loop of the algorithm computes the multiplier ge/2 in x using O(v) iterations by successively locating the 1 bits of e starting from the least significant end.
To sum up, square roots modulo a prime can be computed in probabilistic polynomial time. Computing square roots modulo a composite integer n is, on the other hand, a very difficult problem, unless the complete factorization of n is known (see Section 4.2 and Exercise 3.29).
| 3.19 | Let be odd and composite and suppose that there exists (at least) one with an–1 ≢ 1 (mod n). Show that bn–1 ≢ 1 (mod n) for at least half of the bases . [H]
Algorithm 3.16. Modular square root
| |
| 3.20 | Let be odd and composite.
| |
| 3.21 | Let be a Carmichael number, that is, a composite integer for which an–1 ≡ 1 (mod n) for all a coprime to n, that is, ordn(a)|(n – 1) for all . Prove that:
| |
| 3.22 |
| |
| 3.23 | Fermat’s test for prime numbers Let | |
| 3.24 | Pépin’s test for Fermat numbers Show that the Fermat number n := 22k + 1 is prime if and only if 3(n – 1)/2 ≡ –1 (mod n). | |
| 3.25 | Write an algorithm that, given natural numbers t, l with l < t, outputs a (probable) prime p of bit length t such that p – 1 has a (probable) prime divisor q of bit length l. | |
| 3.26 | Let .
| |
| 3.27 | Modify Algorithm 3.15 to compute the (generalized) Jacobi symbol for odd and for arbitrary .
| |
| 3.28 | A Implement the Chinese remainder theorem for integers, that is, write an algorithm that takes as input pairwise relatively prime moduli and integers for i = 1, . . . , r and that outputs with a ≡ ai (mod ni) for all i = 1, . . . , r. [H]
| |
| 3.29 | Let f(X) be a non-constant polynomial in .
| |
| 3.30 | Let be odd and . Deduce that the congruence x2 ≡ a (mod n) has exactly solutions modulo n.
| |
| 3.31 | Show that Algorithm 3.17 correctly computes for . Specify a strategy to initialize a before the while loop. Determine how Algorithm 3.17 can be used to check if a given is a perfect square. [H]
Algorithm 3.17. Integer square root
| |
| 3.32 |
|
Many cryptographic protocols are based on the (apparent) intractability of the discrete logarithm problem (Section 4.2) in the multiplicative group of a finite field
. The arithmetic of the finite fields
,
, and
,
, is easy to implement and run efficiently. In view of this, these two kinds of finite fields are most popular in cryptography and we concentrate our algorithmic study on these fields only.
A prime field
is the quotient ring
. In Section 3.3.4, we have already made a thorough study of the arithmetic of the rings
,
. We recall that the elements of
are represented as integers from the set {0, 1, . . . , p – 1} and the arithmetic in
is the modulo p integer arithmetic. Since p is typically multiple-precision, the characteristic p of
is odd. The fields of even characteristic that we will study are the non-prime fields
.
Section 2.9.3 explains several representations of extension fields. The most common one is the polynomial-basis representation
for an irreducible polynomial f(X) of degree n in
. In that case, an element of
has the canonical representation as a polynomial a0 + a1X + · · · + an–1Xn–1,
, of degree < n. An arithmetic operation on two elements of
is the same operation in
followed by reduction modulo the defining polynomial f(X). So we start with the implementation of the polynomial arithmetic over
.

A polynomial over
(or any field) is identified by its coefficients of which only finitely many are non-zero. Thus for storing a polynomial g(X) = adXd + ad–1Xd–1 + · · · + a1X + a0 it is sufficient to store the finite ordered sequence adad–1 . . . a1a0. It is not necessary to demand ad ≠ 0, but the shortest sequence representing a non-zero polynomial corresponds to ad ≠ 0 and in this case deg g = d. On the other hand, as we see later it is often useful to pad such a sequence with leading zero coefficients. As an example, the polynomial
is representable as 101 or as 0101 or as 00101 or · · ·.
Since
can be viewed as the set {0, 1} with operations modulo 2, a polynomial in
is essentially a bit string unique up to insertion (and deletion) of leading zero bits. As in the case of multiple-precision integers, we pack these coefficients in an array of 32-bit words and maintain the number of coefficients belonging to the polynomial. For example, the polynomial g(X) = X64 + X31 + X7 + 1 can be stored in an array w2w1w0 of three 32-bit words. w0 consists of the coefficients of X0, X1, . . . , X31, w1 consists of the coefficients of X32, X33, . . . , X63, and w2 consists of the coefficient of X64. It is up to the implementation scheme to decide whether the coefficients are to be stored from left to right or from right to left in the bits of a word. We assume that less significant coefficients go to the less significant bits of a word. For the polynomial g above, the word w0 viewed as an unsigned integer will then be w0 = 231 + 27 + 1, whereas we have w1 = 0. The least significant bit of w2 would be 1. The remaining 31 bits of w2 are not important and can be assigned any value as long as we maintain the information that only the coefficients of Xi, 0 ≤ i ≤ 64, need to be considered. On the other hand, if we want to store the coefficients of g upto that of X80, then the bits of w2 at locations 1, . . . , 16 must be zero, whereas those at locations 17, . . . , 31 may be of any value. We, however, always recommend the use of leading zero-bits to fill the portion of the leading word not belonging to the polynomial.
Such a representation of elements of
, in addition to being compact, facilitates efficient implementation of arithmetic functions. As we will shortly see, we need not often extract the individual coefficients of a polynomial but apply bit operations on entire words to process 32 coefficients simultaneously per operation. We usually do not need polynomials of degrees > 4096 for cryptographic applications. It is, therefore, sufficient to declare a static array capable of storing all the 8193 coefficients of a product of two such largest polynomials. The zero polynomial may be represented as one with zero word size, whereas the degree of the zero polynomial is taken to be –∞ which may be representable as –1.
We now describe the arithmetic functions on two non-zero polynomials
Equation 3.3

Under our implementation, a and b demand ρ := ⌈(r + 1)/32⌉ and σ := ⌈(s + 1)/32⌉ machine words αρ – 1 . . . α1α0 and βσ – 1 . . . β1β0. We also assume paddings with leading zero bits in the areas not belonging to the operands.
Note that the addition of
is the same as the XOR (⊕) of two bits. Applying this bit operation on words αi and βi adds 32 coefficients of the operand polynomials simultaneously (see Algorithm 3.18). Finally note that –1 = 1 in any field of characteristic 2, that is, subtraction is the same as addition in such a field.
The product a(X)b(X) can be computed as in Algorithm 3.19. Once again, using wordwise operations yields faster implementation. By AND and OR, we denote the bit-wise and and or operations on 32-bit words. The easy verification of the correctness of this algorithm is left to the reader. As in the case of addition, one might want to make the polynomial c compact after its words γτ – 1, . . . , γ0 are computed.
|
Input: a(X), Output: c(X) = a(X) + b(X) (to be stored in the array γτ – 1 . . . γ1γ0). Steps: τ := max(ρ, σ). |
|
Input: a(X), Output: c(X) = a(X)b(X) (to be stored in the array γτ – 1 . . . γ1γ0). Steps: τ := ρ + σ – 1. /* The size of the product */ |
The square of
can be computed very easily using the fact that
a(X)2 = (arXr + · · · + a1X + a0)2 = arX2r + · · · + a1X2 + a0.
This gives us a linear-time (in terms of r or ρ) algorithm instead of the quadratic general-purpose multiplication Algorithm 3.19. We leave the implementational details to the reader.
Division with remainder in
is implemented in Algorithm 3.20. As before, we continue to work with the operands a(X) and b(X) as in Equation (3.3). But now we make a further assumption that bs = 1, so that βσ–1 ≠ 0, and also that s ≤ r. When the Euclidean division loop of Algorithm 3.20 terminates, the array locations δσ–1, . . . , δ1, δ0 contain the remainder. The arrays γ and δ may be made compact to discard the leading zero bits, if any.
|
Input: a(X), Output: c(X) = a(X) quot b(X) (to be stored in the array γτ – 1 . . . γ1γ0) and d(X) = a(X) rem b(X) (to be sored in the array δρ–1 . . . δ1δ0). Steps: τ := ⌈(s – r + 1)/32⌉. /* The size of the quotient */ |
Computing modular inverses requires computation of extended gcds of polynomials in
. We again start with the non-zero polynomials a(X),
and compute polynomials d(X), u(X) and v(X) in
with d(X) = gcd(a(X), b(X)) = u(X)a(X) + v(X)b(X), deg u < deg b and deg v < deg a. For polynomials, we do not have an equivalent of the binary gcd algorithm (Algorithm 3.8). We use repeated Euclidean divisions instead.
The proof for the correctness of Algorithm 3.21 is similar to that for Algorithm 3.8. Here, we introduce the variables rk, Uk and Vk for k = 0, 1, 2, . . . . The initialization goes as: r0 := a, r1 := b, U0 := 1, U1 := 0, V0 := 0 and V1 := 1. During the k-th iteration (k = 1, 2, . . .), we first use Euclidean division to get rk–1 = qkrk + rk + 1 which gives rk + 1 = rk–1 – qkrk. We also compute Uk + 1 = Uk–1 – qkUk and Vk + 1 = Vk–1 –qkVk using the values available from the previous two iterations so as to maintain the relation rk + 1 = Uk + 1r0 + Vk + 1r1 for all k = 1, 2, . . . . In Algorithm 3.21, the k-th iteration of the while loop begins with x = rk–1, y = rk, u1 = Uk and u2 = Uk–1 and ends after updating the values to x = rk, y = rk + 1, u1 = Uk + 1 and u2 = Uk. It is not necessary to maintain the values Vk in the main loop. After the loop terminates, one computes Vk = (rk – Ukr0)/r1.
Modular arithmetic in
is very much similar to the modular arithmetic in
. If f(X) is a non-constant polynomial of
(not necessarily irreducible), we represent elements of
as polynomials in
of degrees < n. Given two such polynomials a and b, we compute the sum a + b simply as the sum in
. The product ab is computed by first computing the product ab in
and then computing the remainder of Euclidean division of this product by f. Inverse of a modulo f exists if and only if gcd(a, f) = 1 (in
). In that case, extended gcd computation gives us polynomials u, v such that 1 = ua + vf, so that ua ≡ 1 (mod f). If a ≠ 0, then Algorithm 3.21 computes u with deg u < deg f = n, so that we take this u to be the canonical representative of a–1 in
. Finally, for
the computation of the modular exponentiation ae (mod f) can be done using an algorithm very similar to Algorithm 3.9 or Algorithm 3.10. We leave the details to the reader.
|
Input: Nonzero polynomials a, Output: Polynomials d, u, d = gcd(a, b) = ua + vb, deg u < deg b, deg v < deg a. Steps: /* Initialize */ |
For the polynomial basis representation
, we need an irreducible polynomial
of degree n. We shortly present a probabilistic algorithm that generates a random monic irreducible polynomial in
of given degree
. Although we are interested only in the case q = 2, this algorithm holds even if q is any arbitrary prime or an arbitrary prime power.
First, we describe a deterministic polynomial-time algorithm for checking the irreducibility of a non-constant polynomial
(over
). If f is reducible, it has a factor of degree i ≤ ⌊n/2⌋. Also recall (Theorem 2.40, p 82) that Xqi –X is the product of all monic irreducible polynomials of
of degrees dividing i. Therefore, if f has an irreducible factor of degree i, then gcd(f, Xqi – X) = gcd(f, Xqi – X rem f) will be a non-constant polynomial. Algorithm 3.22 employs these simple observations.
Now, recall from Section 2.9.2 that a random monic polynomial of
of degree n is irreducible with probability approximately 1/n. Therefore, if we keep on checking for irreducibility random monic polynomials in
of degree n, then after O(n) checks we expect to find an irreducible polynomial. This leads to the Las Vegas probabilistic Algorithm 3.23.
|
Input: A non-constant polynomial Output: A (deterministic) certificate whether f is irreducible or not. Steps: n := deg f, g := X. |
|
Input: Output: A random monic irreducible polynomial Steps: while (1) { |
Once the defining irreducible polynomial f is available, we carry out the arithmetic in
as modular polynomial arithmetic with respect to the modulus f. This is described at the end of Section 3.5.1. Since this modular arithmetic involves taking the remainder of Euclidean division by f, it is sometimes expedient to choose f to be an irreducible polynomial of certain special types. The randomized algorithm described above gives a random monic irreducible polynomial f of degree n having on an average ≈ n/2 non-zero coefficients. The division algorithm (Algorithm 3.20) in that case takes time O(n2). On the other hand, if f is a sparse polynomial (like a trinomial), the Euclidean division loop can be rewritten to exploit this sparsity, thereby bringing down the running time of the division procedure to O(n). (See Exercise 3.34. Also see Exercise 3.38 for computing isomorphisms between different polynomial-basis representations of the same field.)
Let p be a prime and let
. We have seen how to implement arithmetic in
and hence by Exercise 3.35 that in
too. If
is an irreducible polynomial of degree n and if q = pn, then
and we implement the arithmetic of
as the polynomial arithmetic of
modulo f. Again by Exercise 3.35, this gives us the arithmetic of
. Now, for
and a monic irreducible polynomial
we have a representation
. Instead of having such a two-way representation of
we may also represent
as
, where
is a monic irreducible polynomial of degree nm. It usually turns out that the second representation of
is more efficient. However, there are some situations where the two-way representation performs better. This is, in particular, the case when the arithmetic of
can be made more efficient than the modular polynomial arithmetic of
. For example, we might precompute tables of arithmetic operations of
and use table lookups for performing the coefficient arithmetic of
. This demands O(q2) storage and is feasible only when q is small. On the other hand, if we find a primitive element γ of
and precompute a table that maps i ↦ γi and another that maps γi → i, then products in
can be computed in time O(1) using table lookups. If, in addition, we store the Zech’s logarithm table (Section 2.9.3) for
, then addition in
can also be performed in O(1) time with table lookup. Both these three tables take O(q) memory which (though better than O(q2) storage of the previous scheme) is feasible only for small q.
Not all finite fields are suitable for cryptographic applications. In this section, we discuss the desirable properties of a field
so that secured protocols on
can be developed. We first note that such protocols are usually based on the apparent intractability of the so-called discrete logarithm problem (DLP) (Section 4.2). As a result, selections of suitable fields are dictated by the known cryptanalytic algorithms to solve the DLP (See Section 4.4). We shall mostly concentrate on
with either q = p a prime or q = 2n for some
. By the bit size of q, denoted |q|, we mean the number of bits in the binary representation of q, that is, |q| = ⌈lg q⌉. As we have seen, each element of
is representable using O(|q|) bits and, therefore, |q| is often also called the size of
.
The first requirement on a cryptographically suitable field
is that the size |q| should be sufficiently large. Recent cryptanalytic studies show that sizes |q| ≤ 512 are not secure enough. Sizes |q| ≥ 768 are recommended for secure applications. For long-term security, one might even require |q| ≥ 2048.
Any field of the recommended size is, however, not adequately secure. The cardinality #Fq = q must be such that q – 1 has at least one large prime divisor q′ (See the Pohlig–Hellman method in Section 4.4). By large, we usually mean |q′| ≥ 160. In addition, this prime factor q′ of q – 1 should be known to us. If q = p is a prime, then a safe prime or a strong prime serves our purpose (Definition 3.5, Algorithm 3.14). Also see Exercise 3.25. On the other hand, if q = 2n, the only way to obtain q′ is by factorizing the Mersenne number Mn := q – 1 = 2n – 1. Factorizing Mn for n ≥ 768 is a very difficult task. Luckily, extensive tables of complete or partial factorizations of Mn are available. For example, for n = 769 (a prime number), we have
M769 = 2769 – 1 = 1,591,805,393 × 6,123,566,623,856,435,977,170,641 × q′,
where q′ is a 657-bit prime. These tables should be consulted for choosing a suitable value of n.
The multiplicative group
is cyclic (Theorem 2.38). If the complete integer factorization of q – 1 is known, then it is possible to find, in polynomial time (in |q|), a primitive element of
. Algorithm 3.24 computes r = O(lg n) exponentiations in G in order to conclude whether a given element
is a generator of G. For
, we have polynomial-time exponentiation algorithms, so Algorithm 3.24 runs in deterministic polynomial time. By Exercise 2.47, the probability of a randomly chosen element of G being primitive is φ(m)/m. In view of the lower bound on φ(m)/m, given in Theorem 3.1 and proved by Rosser and Schoenfield [253], Algorithm 3.25 is expected to return a random primitive element of G after O(ln ln m) iterations.
|
Let |
|
Input: A cyclic group G of cardinality #G = m with known factorization Output: A deterministic certificate that a is a generator of G. Steps: /* We assume that G is multiplicatively written and has the identity e */ |
|
Input: A cyclic group G of cardinality #G = m with known factorization Output: A generator g of G. Steps: while (1) { |
If, however, the factorization of #G = m is not known, there are no known (deterministic or probabilistic) algorithms for finding a random generator of G or even for checking if a given element of G is primitive. This is indeed one of the intractable problems of computational algebraic number theory. This problem for
can be bypassed as follows.
Recall that we have chosen q in such a way that
has a large known prime factor q′. Let H be the unique subgroup of G of order q′. Then H is also cyclic and we choose to work in H (using the arithmetic of G). It turns out that if q′ ≥ 2160 and if H is not contained in a proper subfield of
, the security of cryptographic protocols over
does not degrade too much by the use of H (instead of the full G) as the ground group. But we now face a new problem, that is, the problem of finding a generator of H. Since #H = q′ is a prime, every element of H \ {1} is a generator of H. So the problem essentially reduces to that of finding any non-identity element of H. This latter problem has a simple probabilistic solution. First of all, if q – 1 = q′ is itself prime, choosing any random non-identity element of
will do. So assume q′ < q – 1. Choose a random
and let b := a(q – 1)/q′. By Lagrange’s theorem (Theorem 2.2, p 24), bq′ = aq–1 = 1 and, therefore, by Proposition 2.5
. Now,
being a field, the polynomial
can have at most (q – 1)/q′ roots in
(that is, in
) and hence the probability that b = 1 is ≤ ((q – 1)/q′)/(q – 1) = 1/q′. This justifies the randomized polynomial running time of the Las Vegas Algorithm 3.26. Indeed if q′ ≥ 2160, the while loop of the algorithm is executed only once almost always.
|
Input: A finite field Output: An element Steps: while (1) { |
Polynomial factorization over finite fields is an interesting computational problem. All deterministic algorithms known for this purpose are quite poor: that is, fully exponential in the size of the field. However, if randomization is allowed, we have reasonably efficient (polynomial-time) algorithms. In this section, we outline the basic working of the modern probabilistic algorithms for polynomial factorization over finite fields. We assume that a non-constant polynomial
is to be factored. Without loss of generality, we can take f to be monic. We assume further that the arithmetic of
and that of
is available. We work with a general value of q = pn, p prime and
, though in some cases we have to treat the case p = 2 separately. Irreducibility (or otherwise) in this section means the same over
.
The factorization algorithm we are going to discuss is a generalization of the root finding algorithm (see Exercise 3.36) and consists of three steps:
We now provide a separate detailed discussion for each of these three steps.
Theorem 3.2 is at the very heart of the square-free factorization algorithm and is a generalization of Exercise 2.61.
|
Let K be a field and Proof Let |
The algorithm for SFF over
is now almost immediate except for one subtlety, namely, the consideration of the case f/gcd(f, f′) = 1, or equivalently, f′ = 0. In order to see when this case can occur, let us write the non-zero terms of f as f = a1Xe1 + · · · + atXet with distinct exponents e1, . . . , et and
. Then f′ = a1e1Xe1 – 1 + · · · + atetXet – 1 = 0 if and only if
, that is, if p divides all of e1, . . . , et. But then f(X) = h(X)p, where
, since
for all i. These observations motivate the recursive Algorithm 3.27. It is easy to check that this (deterministic) algorithm runs in time polynomially bounded by deg f and log q.
|
Input: A monic non-constant polynomial Output: A square-free factorization of f. Steps: Compute f′. |
Let
be a square-free polynomial of degree d. We can write f = f1 · · · fd, where for each i the polynomial
is the product of all the irreducible factors of f of degree i. If f does not have an irreducible factor of degree i, then we take fi = 1 as usual.[5] In order to compute the polynomials fi, we make use of the fact that
is the product of all monic irreducible polynomials in
whose degrees divide i (see Theorem 2.40 on p 82). It immediately follows that
. Thus a few (at most d) gcd computations give us all fi. But the polynomials
are of rather large degrees. But since
, keeping polynomials reduced modulo f implies that we take gcds of polynomials of degrees ≤ d. This, in turn, implies that the DDF can be performed in (deterministic) polynomial time (in d and ln q).
[5] Conventionally, an empty product is taken to be the multiplicative identity and an empty sum to be the additive identity.
Algorithm 3.28 shows an implementation of the DDF. Though the algorithm does not require f to be monic, there is no harm in assuming so.
|
Input: A (non-constant) square-free polynomial Output: The DDF of f, that is, the polynomials f1, . . . , fd as explained above. Steps: g := f. /* Make a local copy of f */ |
This simple-minded implementation of the DDF is theoretically not the most efficient one known. In fact, it turns out that the DDF (and not the seemingly more complicated EDF) is the bottleneck of the entire polynomial factorization process. Therefore, making the DDF more efficient is important and there are lots of improvements suggested in the literature. All these improved algorithms essentially do the same thing as above (that is, the computation of
), but they optimize the computation of the polynomials
rem f. The best-known method (due to Kaltofen and Shoup) is based on the observation that, in general, most of the fi are 1. Therefore, instead of computing each
, one may break the interval 1, . . . , d into several subintervals I1, I2, . . . , Il and compute
, j = 1, . . . , l. Only those Fj that turn up to be non-constant are further decomposed.
For cryptographic purposes, we will, however, deal with rather small values of d = deg f. (Typically d is at most a few thousands.) The asymptotically better algorithms usually do not outperform the simple Algorithm 3.28 for these values of d.
Equal-degree factorization, the last step of the polynomial factorization process, is the only probabilistic part of the algorithm. We may assume that f is a (monic) square-free polynomial of degree d and that each irreducible factor of f has the same (known) degree, say δ. If d = δ, then f is irreducible. So we assume that d > δ, that is, d = rδ for some
. Theorem 3.3 provides the basic foundations for the EDF.
|
Let g be any polynomial in Proof If g = 0, there is nothing to prove. If g = alXl + · · · + a1X + a0 ≠ 0 with |
Now, we have to separate two cases, namely, q is odd and q is even. Theorem 3.3 is valid for any q, even or odd, but taking q as odd allows us to write gqδ – g = g(g(qδ –1)/2–1)(g(qδ –1)/2 + 1). With the above assumptions on f we have f|(Xqδ –X) and, therefore, f|(gqδ – g), so that f = gcd(gqδ – g, f) = gcd(g, f) gcd(g(qδ –1)/2 – 1, f) gcd(g(qδ –1)/2 + 1, f). If g is randomly chosen, then gcd(g(qδ –1)/2 – 1, f) is with probability ≈ 1/2 a non-trivial factor of f. The idea is, therefore, to keep on choosing random g and computing
until one gets
. One then recursively applies the algorithm to
and
. It is sufficient to choose g with deg g < 2δ. Obviously, the exponentiation gqδ has to be carried out modulo f. We leave the details to the reader, but note that trying O(1) random polynomials g is expected to split f and, therefore, the EDF runs in expected polynomial time.
For the case q = 2n, essentially the same algorithm works, but we have to use the split gqδ + g = g2nδ + g = (g2nδ–1 + g2nδ–2 + · · · + g2 + g)(g2nδ–1 + g2nδ–2 + · · · + g2 + g + 1). Once again computing gcd(g2nδ–1 + g2nδ–2 + · · · + g2 + g, f) for a random
splits f with probability ≈ 1/2 and, thus, we get an EDF algorithm that runs in expected polynomial time.
| 3.33 | Find a (polynomial-basis) representation of . Compute a primitive element in this representation.
| ||
| 3.34 |
| ||
| 3.35 | Implement the polynomial arithmetic of given that of .
| ||
| 3.36 | Let q = pn (p prime and ), a non-constant polynomial and let g := gcd(f, Xq – X).
Explain how Algorithm 3.30 produces two non-trivial factors of g (over Algorithm 3.30. Computing roots of a polynomial: characteristic 2
| ||
| 3.37 | Use Exercise 3.36 to compute all the roots of the following polynomials: | ||
| 3.38 | Let f and g be two monic irreducible polynomials over and of the same degree . Consider the two representations . In this exercise, we study how we can compute an isomorphism between these two representations. The polynomial f(Y) splits into linear factors over . Consider a root α = α(Y) of f(Y) in . Show that 1, α, α2, . . . , αn–1 is an -basis of (the -vector space) . For i = 0, . . . , n – 1, write (uniquely) with , and consider the matrix A = (αij)0≤i≤n–1, 0≤j≤n–1. Show that the map that maps (the equivalence class of) a0 + a1X + · · · + an–1Xn–1 to (the equivalence class of) b0 + b1Y + · · · + bn–1Yn–1, where (b0b1 . . . bn–1) = (a0a1 . . . an–1)A, is an -isomorphism.
| ||
| 3.39 | Let q = pn for a prime p and . We have seen that the elements of can be represented as integers between 0 and p – 1, whereas the elements of can be represented as polynomials modulo some irreducible polynomial of degree n, that is, as polynomials of of degrees < n. Show that the substitution X = p in the polynomial representation of elements of gives a representation of elements of as integers between 0 and q – 1. We call this latter representation of elements of the packed representation. Compare the advantages and disadvantages of the packed representation over the polynomial representation.
| ||
| 3.40 | Let G be a cyclic multiplicatively written group of order m (and with the identity element e). Assume that the factorization of is known. Devise an algorithm that computes the order of an arbitrary element in G. [H]
| ||
| 3.41 | Berlekamp’s Q-matrix factorization Let
|
The recent popularity of cryptographic systems based on elliptic curve groups over
stems from two considerations. First, discrete logarithms in
can be computed in subexponential time. This demands q to be sufficiently large, typically of length 768 bits or more. On the other hand, if the elliptic curve E over
is carefully chosen, the only known algorithms for solving the discrete logarithm problem in
are fully exponential in lg q. As a result, smaller values of q suffice to achieve the desired level of security. In practice, the length of q is required to be between 160 and 400 bits. This leads to smaller key sizes for elliptic curve cryptosystems. The second advantage of using elliptic curves is that for a given prime power q, there is only one group
, whereas there are many elliptic curve groups
(over the same field
) with orders ranging from
to
. If a particular group
is compromised, we can switch to another curve without changing the base field
.
In this section, we start with the description of efficient implementation of the arithmetic in the groups
. Then we concentrate on some algorithms for counting the order
. Knowledge of this order is necessary to find out cryptographically suitable elliptic curves. We consider only prime fields
or fields
of characteristic 2. So we assume that the curve is defined by Equation (2.8) or Equation (2.9) on p 100 (supersingular curves are not used in cryptography) instead of by the general Weierstrass Equation (2.6) on p 98.
Let us first see how we can efficiently represent points on an elliptic curve E over
. Since
corresponds to two elements h,
and since each element of
can be represented using ≤ s = ⌈lg q⌉ bits, 2s bits suffice to represent P. We can do better than this. Substituting X = h in the equation for E leaves us with a quadratic equation in Y. This equation has two roots of which k is one. If we adopt a convention (for example, see Section 6.2.1) that identifies, using a single bit, which of the two roots the coordinate k is, the storage requirement for P drops to s + 1 bits. During an on-line computation this compressed representation incurs some overhead and may be avoided. However, for off-line storage and transmission (of public keys, for example), this compression may be helpful.
Explicit formulas for the sum of two points and for the opposite of a point on an elliptic curve E are given in Section 2.11.2. These operations in
can be implemented using a few operations in the ground field
.
Computation of mP for
and
(or, more generally, for
) can be performed using a repeated-double-and-add algorithm similar to the repeated-square-and-multiply Algorithm 3.9. We leave out the trivial modifications and urge the reader to carry out the details.
Finding a random point
is another useful problem. If q = p is an odd prime and we use the short Weierstrass Equation (2.8), we first choose a random
and substitute X by h to get Y2 = h3 + ah + b. This equation has 2, 0 or 1 solution(s) depending on whether h3 + ah + b is a quadratic residue or non-residue or 0 modulo p. Quadratic residuosity can be checked by computing the Legendre symbol (Algorithm 3.15), whereas square roots modulo p can be computed using Tonelli and Shanks’ Algorithm 3.16.
For a non-supersingular curve E over
defined by Equation (2.9), a random point
is chosen by first choosing a random
. Substituting X = h in the defining equation gives Y2 + hY + (h3 + ah2 + b) = 0. If h = 0, then the unique solution for k is b2n–1. If h ≠ 0, replacing Y by hY and dividing by h2 transforms the equation to the form Y2 + Y + α = 0 for some
. This equation has two or zero solutions depending on whether the absolute trace
is 0 or 1. If k is a solution, the other solution is k + 1. In order to find a solution (if it exists), one may use the (probabilistic) root-finding algorithm of Exercise 3.36. Another possibility is discussed now.
We consider two separate cases. First, if n is odd, then
is a solution, since Tr(α) = k2 + k + α. On the other hand, if n is even, we first find a
with Tr(β) = 1. Since Tr is a homomorphism of the additive groups
and Tr(1) = 1, exactly half of the elements of
have trace 1. Therefore, a desired β can be quickly found out by selecting elements of
at random and computing traces of these elements. Now, it is easy to check that
gives a solution of Y2 + Y + α = 0.
Counting points on elliptic curves is a challenging problem, both theoretically and computationally. The first polynomial time (in log q) algorithm invented by Schoof and later made efficient by Elkies and Atkins (and many others), is popularly called the SEA algorithm. Unfortunately, even the most efficient implementation of this algorithm is not quite efficient, but it is the only known reasonable strategy, in particular, when q = p is a large (odd) prime of a size of cryptographic interest. The more recent Satoh–FGH algorithm, named after its discoverer Satoh and after Fouquet, Gaudry and Harley who proposed its generalized and efficient versions, is a remarkable breakthrough for the case q = 2n. Both the SEA and the Satoh–FGH algorithms are mathematically quite sophisticated. We now present a brief overview of these algorithms.
We assume that q = p is a large odd prime, this being the typical situation when we apply the SEA algorithm. We also assume that E is given by the short Weierstrass equation Y2 = X3 + aX + b. Let q1 = 2, q2 = 3, q3 = 5, . . . be the sequence of prime numbers and t the Frobenius trace of E at p. By Hasse’s theorem (Theorem 2.48, p 106),
with
. A knowledge of t modulo sufficiently many small primes l allows us to reconstruct t using the Chinese remainder theorem. Because of the Hasse bound on t, it is sufficient to choose l from the primes q1, q2, . . . in succession, until the product q1q2 · · · qr exceeds
. By the prime number theorem (Theorem 2.20, p 53), we have r = O(ln p) and also qi = O(ln p) for each i = 1, . . . , r.
The most innovative idea of Algorithm 3.31 is the determination of the integers ti. For l = q1 = 2, the process is easy. We have t1 ≡ t ≡ 0 (mod 2) if and only if
contains a point of order 2 (a point of the form (h, 0)), or equivalently, if and only if the polynomial X3 + aX + b has a root in
. We compute the polynomial gcd g(X) := gcd(X3 + aX + b, Xp–X) over
and conclude that 
|
Input: A prime field Output: The order of the group Steps: Find (the smallest) |
Determination of ti for i > 1 involves more work. We explain here the original idea due to Schoof. We denote by l the i-th prime qi and by
the set of all l-torsion points of
(Definition 2.78, p 105). The Frobenius endomorphism
that maps
and
to (hp, kp) satisfies the relation
. If we restrict our attention only to the group E[l], then this relation reduces to
, where ti = t rem l and pi = p rem l, that is,
for all
.
In terms of polynomials, the last relation is equivalent to
Equation 3.4

where the sum and difference follow the formulas for the elliptic curve E. Now, one has to calculate symbolically rather than numerically, since X and Y are indeterminates. These computations can be carried out in the ring
(instead of in
), where f(X, Y) = Y2 – (X3 + aX + b) is the defining polynomial of E and fl = fl(X) is the l-th division polynomial of E (Section 2.11.2 and Theorem 2.47, p 106). Reduction of a polynomial in
modulo f makes its Y-degree ≤ 1, whereas reduction modulo fl makes the X-degree less than deg fl which is O(l2). We can try the values ti = 0, 1, . . . , l – 1 successively until the desired value satisfying Equation (3.4) is found out.
It is not difficult to verify that Schoof’s algorithm runs in time O(log8 p) (under standard arithmetic in
) and is thus a deterministic polynomial-time algorithm for the point-counting problem. Essentially the same algorithm works for fields
with q = 2n and has the same running time. Unfortunately, the big exponent (8) in the running time makes Schoof’s algorithm quite impractical. Numerous improvements are suggested to bring down this exponent. Elkies and Atkin’s modification for the case q = p gives rise to the SEA algorithm which has a running time of O(log6 p) under the standard arithmetic in
. This speed-up is achieved by working in the ring
, where gl is a suitable factor of fl and has degree O(l). Couveignes suggests improvements for the fields of characteristic 2. Efficient implementations of the SEA algorithm are reported by Morain, Müller, Dewaghe, Vercauteren and many others. At the time of writing this book, the largest values of q for which the algorithm has been successfully applied are 10499 + 153 (a prime) and 21999 (a power of 2).
The Satoh–FGH algorithm is well suited for fields
of small characteristic p and, in particular, for the fields
of characteristic 2. This algorithm has enabled point-counting over fields as large as
. A generic description of the Satoh–FGH algorithm now follows after the introduction of some mathematical notions. Though our practical interest concentrates on the fields
only, we consider curves over a general
with q = pn, p a prime.
Recall from Section 2.14 that the ring
of p-adic integers is a discrete valuation ring (Exercises 2.133 and 2.148) with the unique maximal ideal generated by
, and the residue field
is isomorphic to
.
We represent
as a polynomial algebra over
. We analogously define the p-adic ring
, where f is an irreducible polynomial of degree n in
. The elements of
can be viewed as polynomials of degrees < n and with p-adic integers as coefficients. The arithmetic operations in
are polynomial operations in
modulo the defining polynomial f. The ring
is canonically embedded in the ring
(consider constant polynomials).
turns out to be a discrete valuation ring with maximal ideal
, and the residue field
is isomorphic to
.
|
The projection map The (Teichmüller) lift The semi-Witt decomposition of |
The p-th power Frobenius endomorphism
, a ↦ ap, can now be extended to an endomorphism
as follows. Let
have the semi-Witt decomposition a0, a1, . . . with
. Then,
is the unique element
having the semi-Witt decomposition
One can show that
. We have
and similarly
.
Now, let E = E0 be an elliptic curve defined over
. Application of
to the coefficients of E0 gives another elliptic curve E1 over
whose rational points are
,
, where
, together with the point at infinity. We may apply
to E1 to get another curve E2 over
and so on. Since
, we get a cycle of elliptic curves defined over
:
Equation 3.5

Similarly, if ε = ε0 is an elliptic curve defined over
, application of
leads to a sequence of elliptic curves defined over
:
Equation 3.6

We need the canonical lifting of an elliptic curve E over
to a curve ε over
. Explaining that requires some more mathematical concepts:
|
Let K be a field and let E and E′ be two elliptic curves defined over K. A morphism (Definition 2.72, p 95) that maps the point The kernel ker The set Hom(E, E′) of all isogenies E → E′ is an Abelian group defined as The multiplication-by-m map of E is an isogeny. If End(E) contains an isogeny not of this type, we call E an elliptic curve with complex multiplication. |
|
For each |
|
The polynomials
|
The next theorem establishes the foundation for lifting curves from
to
.
|
Let E be an elliptic curve defined over |
With this definition of lifting of elliptic curves, Cycles (3.5) and (3.6) satisfy the following commutative diagram, where εi is the canonical lift of Ei for each i = 0, 1, . . . , n.

Algorithm 3.32 outlines the Satoh–FGH algorithm. In order to complete the description of the algorithm, one should specify how to lift curves (that is, a procedural equivalent of Theorem 3.5) and their p-torsion points and how the lifted data can be used to compute the Frobenius trace t. We leave out the details here.
|
Input: An elliptic curve E over Output: The cardinality Steps: Compute the curves E0, . . . , En–1 and their j-invariants j0, . . . , jn–1. |
The elements of
(and hence of
) are infinite sequences and hence cannot be represented in computer memory. However, we make an approximate representation by considering only the first m terms of the sequences representing elements of
. Working in
with this approximate representation is then essentially the same as working in
. For the Satoh–FGH algorithm, we need m ≈ n/2.
For small p (for example, p = 2) and with standard arithmetic in
, the Satoh–FGH algorithm has a deterministic running time O(n5) and space requirement O(n3). With Karatsuba arithmetic the exponent in the running time drops from 5 to nearly 4.17. In addition, this algorithm is significantly easier to implement than optimized versions of the SEA algorithm. These facts are responsible for a superior performance of the Satoh–FGH algorithm over the SEA algorithm (for small p).
Choosing cryptographically suitable elliptic curves is more difficult than choosing good finite fields. First, the order of the elliptic curve group
must have a suitably large prime divisor, say, of bit length 160 or more. In addition, the MOV attack applies to supersingular curves and the anomalous attack to anomalous curves (Definition 2.80 and Section 4.5). So a secure curve must be non-supersingular and non-anomalous. Checking all these criteria for a random curve E over
requires the group order
. One may use either the SEA algorithm or the Satoh–FGH algorithm to compute
. Once
is known, it is easy to check whether E is supersingular or anomalous. But factoring
to find its largest prime divisor may be a difficult task and is not recommended. One may instead extract all the small prime factors of
by trial divisions with the primes q1 = 2, q2 = 3, q3 = 5, . . . , qr for a predetermined r and write
where m1 has all prime factors ≤ qr and m2 has all prime factors > qr. If m2 is prime and of the desired size, then E is treated as a good curve. Algorithm 3.33 illustrates these steps.
The computation of the group orders
takes up most of the execution time of the above algorithm. It is, therefore, of utmost importance to employ good algorithms for point counting. The best algorithms known till date (the SEA and the Satoh–FGH algorithms) are only reasonable. Further research in this area may lead to better algorithms in future.
|
Input: A suitably large finite field Output: A cryptographically good elliptic curve E over Steps: while (1) { |
There are ways of generating good curves without requiring the point counting algorithms over large finite fields. One possibility is to use the so-called subfield curves. If
has a subfield
of relatively small cardinality, one can choose a random curve E over
and compute
. Since E is also a curve defined over
and
can be easily obtained using Theorem 2.51 (p 107), we save the lengthy direct computation of
. However, the drawback of this method is that since E is now chosen with coefficients from a small field
, we do not have many choices. The second drawback is that we must have a small divisor q′ of q. If q is already a prime, this strategy does not work at all. If q = pn, p a small prime, we need n to have a small divisor n′ that corresponds to q′ = pn′. Sometimes small odd primes p are suggested, but the arithmetic in a non-prime field of some odd characteristic is inherently much slower than that in a field of nearly equal size but of characteristic 2.
Specific curves with complex multiplication (Definition 3.7) over large prime fields have also been suggested in the literature. Finding good curves with complex multiplication involves less computational overhead than Algorithm 3.33, but (like subfield curves) offers limited choice. However, it is important to mention that no special attacks are currently known for subfield curves and also for those chosen by the complex multiplication strategy.
Let
be a finite field and C a hyperelliptic curve of genus g defined over K by Equation (2.13), that is, by
C : Y2 + u(X)Y = v(X)
for suitable polynomials u,
. We want to implement the arithmetic in the Jacobian
. Recall from Section 2.12 that an element of
can be represented uniquely as a reduced divisor Div(a, b) for a pair of polynomials a(x),
with a monic, degx a ≤ g, degx b < degx a and a|(b2 + bu – v). Thus, each element of
requires O(g log q) storage.
We first present Algorithm 3.34 that, given two elements Div(a1, b1), Div(a2, b2) of
, computes the reduced divisor Div
which satisfies Div(a, b) ~ Div(a1, b1) + Div(a2, b2). The algorithm proceeds in two steps:
Compute a semi-reduced divisor Div(a′, b′) ~ Div(a1, b1) + Div(a2, b2).
Compute the reduced divisor Div(a, b) ~ Div(a′, b′).
Both these steps can be performed in (deterministic) polynomial time (in the input size, that is, g log q). Algorithm 3.34 implements the first step and continues to work even when the input divisors are semi-reduced (and not completely reduced).
|
Input: (Semi-)reduced divisors Div(a1, b1) and Div(a2, b2) defined over K. Output: A semi-reduced divisor Div(a′, b′) ~ Div(a1, b1) + Div(a2, b2). Steps:
|
It is an easy check that the two expressions appearing between pairs of big parentheses in Algorithm 3.34 are polynomials. This algorithm does only a few gcd calculations and some elementary arithmetic operations on polynomials of K[X]. If the input polynomials (a1, a2, b1, b2) correspond to reduced divisors, then their degrees are ≤ g and hence this algorithm runs in polynomial time in the input size. Furthermore, in that case, the output polynomials a′ and b′ are of degrees ≤ 2g.
We now want to compute the unique reduced divisor Div(a, b) equivalent to the semi-reduced divisor Div(a′, b′). This can be performed using Algorithm 3.35. If the degrees of the input polynomials a′ and b′ are O(g) (as is the case with those output by Algorithm 3.34), Algorithm 3.35 takes a time polynomial in g log q. To sum up, two elements of
can be added in polynomial time. The correctness of the two algorithms is not difficult to establish, but the proof is long and involved and hence omitted. Interested readers might look at the appendix of Koblitz’s book [154].
For an element
and
, one can easily write an algorithm (similar to Algorithm 3.9) to compute nα using O(log n) additions and doublings in
.
For a hyperelliptic curve C of genus g defined over a field
, we are interested in the order of the Jacobian
rather than in the cardinality of the curve
. Algorithmic and implementational studies of counting
have not received enough research endeavour till date and though polynomial-time algorithms are known to this effect (at least for curves of small genus), these algorithms are far from practical for hyperelliptic curves of cryptographic sizes. In this section, we look at some of these algorithms.
|
Input: A semi-reduced divisor Div(a′, b′) defined over K. Output: The reduced divisor Div(a, b) ~ Div(a′, b′). Steps: (a, b) := (a′, b′). |
We start with some theoretical results which are generalizations of those for elliptic curves. The Frobenius endomorphism
, x ↦ xq, is a (non-trivial)
-automorphism of
. The map
naturally (that is, coordinate-wise) extends to the points on
and also to divisors and, in particular, to the Jacobian
as well as to
. For a reduced divisor Div
, we have
, where for a polynomial
the polynomial
is obtained by applying the map
to the coefficients of h. It is known that
satisfies a monic polynomial χ(X) of degree 2g with integer coefficients. For example, for g = 1 (elliptic curves) we have
χ(X) = X2 – tX + q,
where t is the trace of Frobenius at q. For g = 2, we have
Equation 3.7

for integers t1, t2. The cardinality
is related to the polynomial χ(X) as

and satisfies the inequalities
Equation 3.8

Thus n lies in a rather narrow interval, called the Hasse–Weil interval, of width
,
Theorem 2.50 can be generalized as follows:

|
The Jacobian |
The exponent of
(See Exercise 3.42) is clearly m := Exp
. Since m|n, there are ≤ ⌈(w + 1)/m⌉ possibilities for n for a given m (where w is the width of the Hasse–Weil interval). In particular, n is uniquely determined by m, if m > w. From the Hasse–Weil bound, we have
, that is,
. There are examples with
. On the other hand,
. So it is possible to have m ≤ w, though such curves are relatively rare. In the more frequent case (m > w), Algorithm 3.36 determines n.
|
Input: A hyperelliptic curve C of genus g defined over Output: The cardinality n of the Jacobian Steps: m := 1. |
Since Exp
, the above algorithm eventually (in practice, after few executions of the while loop) computes this exponent. However, if Exp
, the algorithm never terminates. Thus, we may forcibly terminate the algorithm by reporting failure, after sufficiently many random elements x are tried (and we continue to have m ≤ w). In order to complete the description of the algorithm, we must specify a strategy to compute ν := ord x for a randomly chosen
. Instead of computing ν directly, we compute an (integral) multiple μ of ν, factorize μ and then determine ν. Since nx = 0, we search for a desired multiple μ in the Hasse–Weil interval. This search can be carried out using a baby-step–giant-step (Section 4.4) or a birthday-paradox (Exercise 2.172) method, and the algorithm achieves an expected running-time of
which is exponential in the input size. This method, therefore, cannot be used except when n is small.
For hyperelliptic curves of small genus g, generalizations of Schoof’s algorithm (Algorithm 3.31) can be used. Gaudry and Harley [106] describe the case g = 2. One computes the polynomial χ(X) of Equation (3.7), that is, the values of t1 and t2 modulo sufficiently many small primes l. Since the roots of χ(X) are of absolute value
, we have
and |t2| ≤ 6q. Therefore, determination of t1 and t2 modulo O(log q) small primes l uniquely determines χ(X) (as well as n = χ(1)).
Let
be the set of l-torsion points of
. The Frobenius map restricted to
satisfies
Equation 3.9

where t1, l := t1 rem l, t2, l := t2 rem l and ql := q rem l. By exhaustively trying all (that is, ≤ l2) possibilities for t1,l and t2,l, one can find out their actual values, that is, those values that cause the left side of Equation (3.9) to vanish (symbolically).
A result by Kampkötter [144] allows us to consider only the reduced divisors
of the form D = Div(a, b) with a(X) = X2 + a1X + a0 and b(X) = b1X + b0. There exists an ideal
of the polynomial ring
such that a reduced divisor D of this special form lies in
if and only if f(a1, a0, b1, b0) = 0 for all
. Thus the computation of the left side of Equation (3.9) may be carried out in the ring
. An explicit set of generators for
can be found in Kampkötter [144]. To sum up, we get a polynomial-time algorithm.
Working (modulo
) in the 4-variate polynomial ring
is, indeed, expensive. Use of Cantor’s division polynomials [43] essentially reduces the arithmetic to proceed with a single variable (instead of four). We do not explore further along this line, but only mention that for g = 2 Schoof’s algorithm employing division polynomials runs in time O(log9 q). Although this is a theoretical breakthrough, the prohibitively large exponent (9) in the running-time precludes the feasibility of using the algorithm in the range of interest in cryptography.
| 3.42 | Let G be a multiplicative group (not necessarily Abelian and/or finite) with identity e.
Let
|
So far we have met several situations where we needed random elements from a (finite) set, for example, the set
(or
) or the set
(or
) or the set
of
-rational points on an elliptic (or hyperelliptic) curve. By randomness, we here mean that each element
is equally likely to get selected, that is, if #S = n, then each element of S is selected with probability 1/n. Since elements of a set S of cardinality n can be represented as bit strings of length ≤ ⌈lg(n + 1)⌉, the problem of selecting a random element of S essentially reduces to the problem of generating (finite) random sequences of bits. A random sequence of bits is one in which every bit has a probability of 1/2 of being either 0 or 1 (irrespective of the other bits in the sequence).
Generating a (truly) random sequence of bits seems to be an impossible task. Some natural phenomena, such as electronic noise from a specifically designed integrated circuit, can be used to generate random bit sequences. However, such systems are prone to malfunctioning, often influenced by observations and are, of course, costly. A software solution is definitely the more practical alternative. Phenomena, like the system clock, the work load or memory usage of a machine, that can be captured by programs may be used to generate random bit sequences. But this strategy also suffers from various drawbacks. First of all the sequences generated by these methods would not be (truly) random. Moreover they are vulnerable to attacks by adversaries (for example, if a random bit generator is based on the system clock and if the adversary knows the approximate time when a bit sequence is generated using that generator, she will have to try only a few possibilities to generate the same sequence).
In order to obviate these difficulties, pseudorandom bit generators (PRBG) are commonly used. A bit string a0a1a2 . . . is generated by a PRBG following a specific strategy, which is more often that not a (mathematical) algorithm. The first bit a0 is based on certain initial value, called a seed, whereas for i ≥ 1the bit ai is generated as a predetermined function of some or all of the previous bits a0, . . . , ai–1. Since the resulting bit ai is now functionally dependent on the previous bits, the sequence is not at all random (but deterministic), but we are happy if the sequence a0a1a2 . . . looks or behaves random. A random behaviour of a sequence is often examined by certain well-known statistical tests. If a generator generates bit sequences passed by these tests, we call it a PRBG and sequences available from such a generator pseudorandom bit sequences. Various kinds of PRBGs are used for generating pseudorandom bit sequences. We won’t describe them here, but concentrate on a particular kind of generators that has a special significance in cryptography.
A PRBG for which no polynomial time algorithms exist (provably or not) in order to predict with probability significantly larger than 1/2 a bit in a sequence generated by the PRBG from a knowledge of the previous bits (but without the knowledge of the seed) is called a cryptographically strong (or secure) pseudorandom bit generator or a CSPRBG in short. Usually, an intractable computational problem (see Section 4.2) is at the heart of the security of a CSPRBG. As an example, we now explain the Blum–Blum–Shub (or BBS) generator.
|
Input: Output: A cryptographically strong pseudorandom bit sequence a0a1a2 . . . . Steps: Generate two (distinct) large primes p and q each ≡ 3 (mod 4). |
In Algorithm 3.37, we have used indices for the sequence xi for the sake of clarity. In an actual implementation, all indices may be removed, that is, one may use a single variable x to store and update the sequence xi. Furthermore, if there is no harm in altering the value of s, one might even use the same variable for s and x.
The cryptographic security of the BBS generator stems from the presumed intractability of factoring integers or of computing square roots modulo a composite integer (here n = pq) (see Exercise 3.43). Note that p, q and s have to be kept secret, whereas n can be made public. A knowledge of xm + 1 is also not expected to help an opponent and may too be made public. For achieving the desired level of secrecy, p and q should be of nearly equal size and the size of n should be sufficiently large (say, 768 bits or more). Generating each bit by the BBS generator involves a modular squaring and is, therefore, somewhat slow (compared to the traditional PRBGs which do not guarantee cryptographic security). However, the BBS generator can be used for moderately infrequent purposes, for example, for the generation of a session key. Moreover, a maximum of lg lg n (least significant) bits (instead of 1 as in the above snippet) can be extracted from each xi without degrading the security of the generator.
It is evident that any (infinite) sequence a0a1 · · · generated by the BBS generator must be periodic. As an extreme example, if s = 1, then the BBS generator outputs a sequence of one-bits only. We are interested in rather short (sub)sequences (of such infinite sequences). Therefore, it suffices if the length of the period is reasonably large (for a random seed s). This is guaranteed if one uses strong primes (Definition 3.5)
The way we have defined PRBG (or CSPRBG) makes it evident that the unpredictability of a pseudorandom bit sequence essentially reduces to that of the seed. Care should, therefore, be taken in order to choose the values of the seed. The seed need not be randomly or pseudorandomly generated, but should have a high degree of unpredictability, so that it is infeasible for an adversary to have a reasonably quick guess of it. As an example, assume that we intend to generate a suitable seed s for the BBS generator with a 1024-bit modulus n. If we employ for that purpose a specific algorithm (known to the opponent) using only the built-in random number generator of a standard compiler and if this built-in generator has a 32-bit seed σ, then there are only 232 possibilities for s, even when s itself is 1024 bits long. Thus an adversary has to try at most 232 (231 on an average) values of σ in order to guess the correct value of s. So we must add further unpredictability to the resulting seed value s. This can be done by setting the bits of s depending on several factors, like the system clock, the system load, the memory usage, keyboard inputs from a human user and so on. Each of such factors might not be individually completely unpredictable, but their combined effect should preclude the feasibility of an exhaustive search by the opponent. After all, we have 1024 bits of s to fill up and even if the total search space of possible values of s is as low as 2160, it would be impossible for the opponent to guess s in a reasonable span of time. Note that more often than not the values of the seed need not be remembered: that is, need not be regenerated afterwards. As a result, there is no harm in introducing unpredictability in s caused by certain factors that we would not ourselves be able to reproduce in future.
| 3.43 | With the notations of Algorithm 3.37 show that:
|
This chapter deals with the algorithmic details needed for setting up public-key cryptosystems. We study algorithms for selecting public-key parameters and for carrying out the basic cryptographic primitives. Algorithms required for cryptanalysis are dealt with in Chapters 4 and 7.
We start the chapter with a discussion on algorithms. Time and space complexities of algorithms are discussed first and the standard order notations are explained. Next we study the class of randomized algorithms which provide practical solutions to many computational problems that do not have known efficient deterministic algorithms. In the worst case, a randomized algorithm may take exponential running time and/or may output an incorrect answer. However, the probability of these bad behaviours of a randomized algorithm can be made arbitrarily low. We finally discuss reduction between computational problems. A reduction helps us conclude about the complexity of one problem relative to that for another problem.
Many popular public-key cryptosystems are based on working modulo big integers. These integers have sizes up to several thousand bits. One can not represent such integers with full precision by built-in data types supplied by common programming languages. So we require efficient ways of representing and doing arithmetic on big integers. We carefully deal with the implementation of the arithmetic on multiple-precision integers. We provide a special treatment of computation of gcd’s and extended gcd’s of integers. We utilize these arithmetic functions in order to implement modular arithmetic. Most public-key primitives involve modular exponentiations as the most time-consuming steps. In addition to the standard square-and-multiply algorithm, certain special tricks (including Montgomery exponentiation) that help speed up modular exponentiation are described at length in this section.
In the next section, we deal with some other number-theoretic algorithms. One important topic is the determination of whether a given integer is prime. The Miller–Rabin primality test is an efficient algorithm for primality testing. This algorithm is, however, randomized in the sense that it may declare some composite integers as primes. Using suitable choices of the relevant parameters, the probability of this error may be reduced to very low values (≤ 2–80). We also briefly introduce the deterministic polynomial-time AKS algorithm for primality testing. Since we can easily check for the primality of integers, we can generate random primes by essentially searching in a pool of randomly generated odd integers of a given size. Security in some cryptosystems require such random primes to possess some special properties. We present Gordon’s algorithm for generating cryptographically strong primes. The section ends with a study of the Tonelli–Shanks algorithm for computing square roots modulo a big prime.
Next, we concentrate on the implementation of the finite field arithmetic. The arithmetic of a field of prime cardinality p is the same as integer arithmetic modulo p and is discussed in detail earlier. The other finite fields that are of interest to cryptology are extension fields of characteristic 2. In order to study the arithmetic in these fields, one first requires arithmetic of the polynomial ring
. We discuss the basic operations in this ring. Next we talk about algorithms for checking irreducibility of polynomials and for obtaining (random) irreducible polynomials in
. If f(X) is such a polynomial of degree d, the arithmetic of the field
is the same as the arithmetic of
modulo the defining polynomial f(X). In order that a finite field
is cryptographically safe, we require q – 1 to have a prime factor of sufficiently big size (160 bits or more). Suppose that the factorization of q – 1 is provided. We discuss algorithms that compute the order of elements in
, that check if a given element is a generator of the cyclic group
, and that produce random generators of
. We end the study of finite fields by discussing a way to factor polynomials over finite fields. The standard algorithm comprising the three steps square-free factorization, distinct-degree factorization and equal-degree factorization is explained in detail. The exercises cover the details of an algorithm to compute the roots of polynomials over finite fields.
The arithmetic of elliptic curves over finite fields is dealt with next. Each operation in the elliptic curve group can be realized by a sequence of operations over the underlying field. The multiple of a point on an elliptic curve can be computed by a repeated double-and-add algorithm which is the same as the square-and-multiply algorithm for modular exponentiation, applied to an additive setting. We also discuss ways of selecting random points on elliptic curves. We then present two algorithms for counting points in an elliptic curve group. The SEA algorithm is suitable for curves over prime fields, whereas the Satoh–FGH algorithm works efficiently for curves over fields of characteristic 2. Once we can determine the order of an elliptic curve group, we can choose good elliptic curves for cryptographic usage.
In the next section, we study the arithmetic of hyperelliptic curves. We describe ways to represent elements of the Jacobian by pairs of polynomials and to do arithmetic on elements in this representation. We also discuss two algorithms for counting points in a Jacobian.
In the last section, we address the issue of generation of pseudorandom bits. We define the concept of cryptographically strong pseudorandom bit generator and provide an example, namely the Blum–Blum–Shub generator, which is cryptographically strong under the assumption that taking square roots modulo a big composite integer is computationally intractable.
The basic algorithmic issues discussed in Section 3.2 can be found in any text-book on data structure and algorithms. One can, for example, look at [7, 8, 61]. However, most of these elementary books do not talk about randomization and parallelization issues. We refer to [214] for a recent treatise on randomized algorithm. Also see Rabin’s papers [247, 248].
Complexity theory deals with classifying computational problems based on the known algorithms for solving them and on reduction of one problem to another. A simple introduction to complexity theory is the book [280] by Sipser. Chapter 2 of Koblitz’s book [154] is also a compact introduction to computational complexity meant for cryptographers. Also see [113].
Knuth’s book [147] is seemingly the best resource to look at for a comprehensive treatment on multiple-precision integer arithmetic. The proofs of correctness of many algorithms, that we omitted in Section 3.3, can be obtained in this book. This can be supplemented by the more advanced algorithms and important practical tips compiled in the book [56] by Cohen who designed a versatile computational number theory package known as PARI. Montgomery’s multiplication algorithm appeared in [210]. Also see Chapter 14 of Menezes et al. [194] for more algorithms and implementation issues.
Most of the important papers on primality testing [3, 4, 5, 116, 175, 204, 248, 287] have been referred in Section 3.4.1. Also see the survey [164] due to Lenstra and Lenstra. Gordon’s algorithm for generating strong primes appeared in [118]. The book [69] by Crandall and Pomerance is an interesting treatise on prime numbers, written with a computational perspective. The modular square-root Algorithm 3.16 is essentially due to Tonelli (1891). Algebraic number theory is treated from a computation perspective in Cohen [56] and Pohst and Zassenhaus [235].
Arithmetic on finite fields is discussed in many books including [179, 191]. Finite fields find recent applications in cryptography and coding theory and as such it is necessary to have efficient software and hardware implementations of finite field arithmetic. A huge number of papers have appeared in the last two decades, that talk about these implementation issues. Chapter 5 of Menezes [191] talks about optimal normal bases (Section 2.9.3 of the current book) which speeds up exponentiation in finite fields.
Factoring univariate polynomials over finite fields is a topic that has attracted a lot of research attention. Berlekamp’s Q-matrix method [21] is the first modern algorithm for this purpose. Computationally efficient versions of the algorithm discussed in Section 3.5.4 have been presented in Gathen and Shoup [104] and Kaltofen and Shoup [143]. The best-known running time for a deterministic algorithm for univariate factorization over finite fields is due to Shoup [272]. Shparlinski shows [274] that Shoup’s algorithm on a polynomial in
of degree d uses O(q1/2(log q)d2+ε) bit operations. This is fully exponential in log q.
The book [103] by von zur Gathen and Gerhard is a detailed treatise on many topics discussed in Sections 3.2 to 3.5 of the current book. Mignotte’s book [203] and the one by [108] by Geddes et al. also have interesting coverage. Also see Chapter 1 of Das [72] for a survey of algorithms for various computational problems on finite fields.
For elliptic curve arithmetic, look at Blake et al. [24], Hankerson et al. [123] and Menezes [192]. The first polynomial-time algorithm for counting points in elliptic curves over a finite field
has been proposed by Schoof. The original version of this algorithm runs in time O(log8 q). Later Elkies improved this running time to O(log6 q) for most of the elliptic curves. Further modifications due to Atkin gave rise to what we call the SEA algorithm. Schoof’s paper [264] talks about this point-counting algorithm and includes the modifications due to Elkies and Atkin. Also look at the article [85] by Elkies.
The Satoh–FGH algorithm is originally due to Satoh [256]. Fouquet et al. [94] have proposed a modification of Satoh’s algorithm to work for fields of characteristic 2. They also report large-scale implementations of the modified algorithm. Also see Fouquet et al. [95] and Skjernaa [281].
Recently, there has been lot of progress in point counting algorithms, in particular, for fields of characteristic 2. The most recent account of this can be found in Lercier and Lubicz [177]. The authors of this paper later reported implementation of their algorithm for counting points in an elliptic curve over
. This computation took nearly 82 hours on a 731 MHz Alpha EV6 processor. With these new developments, the point counting problem is practically solved for fields of small characteristics. However, for prime fields the known algorithms require further enhancements in order to be useful on a wide scale.
Finding good random elliptic curves for cryptographic purposes has also been an area of active research recently. With the current status of solving the elliptic curve discrete-log problem, the strategy we mentioned in Algorithm 3.33 is quite acceptable as long as good point-counting algorithms are at our disposal (they are now). For further discussions on this topic, we refer the reader to two papers [95, 176].
The appendix in Koblitz’s book [154] is seemingly the best source for learning hyperelliptic curve arithmetic. This is also available as a CACR technical report [195]. Gaudry and Harley’s paper [106] has more on the hyperelliptic curve point-counting algorithms we discussed in Section 3.7.2. Hess et al. [126] discuss methods for computing hyperelliptic curves for cryptographic usage.
Chapter 5 of Menezes et al. [194] is devoted to the generation of pseudorandom bits and sequences. This chapter lists the statistical tests for checking the randomness of a bit sequence. It also describes two cryptographically secure pseudorandom bit generators other than the BBS generator (Algorithm 3.37). The BBS generator was originally proposed by Blum et al. [26]. Also see Chapter 3 of Knuth [147].
| 4.1 | Introduction |
| 4.2 | The Problems at a Glance |
| 4.3 | The Integer Factorization Problem |
| 4.4 | The Finite Field Discrete Logarithm Problem |
| 4.5 | The Elliptic Curve Discrete Logarithm Problem |
| 4.6 | The Hyperelliptic Curve Discrete Logarithm Problem |
| 4.7 | Solving Large Sparse Linear Systems over Finite Rings |
| 4.8 | The Subset Sum Problem |
| Chapter Summary | |
| Sugestions for Further Reading |
It is insufficient to protect ourselves with laws; we need to protect ourselves with mathematics.
—Bruce Schneier
Most number theorists considered the small group of colleagues that occupied themselves with these problems as being inflicted with an incurable but harmless obsession.
—Arjen K. Lenstra and Hendrik W. Lenstra, Jr. [164]
All mathematics is divided into three parts: cryptography (paid for by CIA, KGB and the like), hydrodynamics (supported by manufacturers of atomic submarines) and celestial mechanics (financed by military and other institutions dealing with missiles, such as NASA).
—V. I. Arnold [13]
Public-key cryptographic systems are based on the apparent intractability of solving certain computational problems. However, there is very little evidence (if any) to corroborate the fact that algorithmic solutions to these problems are really very difficult. In spite of intensive studies over a long period, mathematicians and cryptologists have not come up with good algorithms, and it is their failures that justify the attempts to go on building secure cryptographic protocols based on these problems. The inherent assumption is that it would be infeasible for an opponent having practical amounts of computing resources to break these cryptosystems in a reasonable amount of time. Of course, the fear remains that someone may devise a fast algorithm and our cryptosystems may not pass the security guarantees. On the other extreme, it is also possible that someone proves the theoretical (and, hence, practical) impossibility of solving such a problem in a small (like polynomial) amount of time, and our cryptosystems become secure for ever (well, at least until other paradigms of computing, like the yet practically non-implementable quantum computing, solve the problems efficiently).
Whether you are a cryptographer or a cryptanalyst, it is important, if not essential, to be aware of the best methods available till date to attack the intractable problems of cryptography. In the first place, this knowledge quantifies practical security margins of the protocols, for instance, by dictating the determination of the input sizes as a function of the security requirements. Let us take a specific example: With today’s computing power and known integer factorization algorithms, we assert that a message that needs to be kept secret for a day or two may be encrypted by a 768-bit RSA key, whereas if one wants to maintain the security for a year or more, much longer keys are needed. The second point in studying the known cryptanalytic algorithms is that though general-purpose algorithms for solving these problems are still unknown, there are good algorithms for specific cases—the cases to be avoided by the designers of cryptographic applications. For example, there is a linear-time algorithm to attack cryptographic systems based on anomalous elliptic curves. The moral is that one must not employ these curves for cryptographic applications. The third reason for studying cryptanalytic algorithms is sentimental. The fact that we are still unable to answer some simply stated questions even after spending a reasonable amount of collective effort is indeed humbling. To worsen matters, cryptography thrives by exploiting this scientific inadequacy. Cryptanalysis, though seemingly unlawful from a cryptographer’s viewpoint, turns out to be a deep and beautiful area of applied mathematics. Ironically enough, it is quite common that the proponents of cryptographic protocols are themselves most interested to see the end. The journey goes on. . . Read on!
It may appear somewhat unusual to discuss the cryptanalytic algorithms prior to the cryptographic ones (see Chapter 5). We find this order convenient in that one must first know the intractable problems before applying them in cryptographic protocols. Moreover, the known attacks help one fix the parameters for use in the cryptographic algorithms. We defer till Chapter 7 other cryptanalytic techniques which do not directly involve solving these mathematical problems. The full power of the mathematical machinery of Chapters 2 and 3 is felt here in the science of cryptology. Understanding the various aspects of cryptology hence becomes easier.
Let us first introduce the intractable problems of cryptology. In the rest of this chapter, we describe some known methods to solve these problems.
The integer factorization problem (IFP) is perhaps the most studied one in the lot. We know that
is a unique factorization domain (UFD) (Definition 2.25, p 40), that is, given a natural number n there are (pairwise distinct) primes
(unique up to rearrangement) such that
for some
. Broadly speaking, the IFP is the determination of these pi and αi from the knowledge of n. Note that once the prime divisors pi of n are known, it is rather easy to compute the multiplicities αi = vpi(n) by trial divisions. It is, therefore, sufficient to find out the primes pi only. It is easy (Algorithm 3.13) to check if n is composite. If n is already prime, then its prime factorization is known. On the other hand, if n is known to be composite, an algorithm that splits n into two non-trivial factors, that is, that outputs n1,
with n = n1n2, n1 < n and n2 < n, can be repeatedly used to compute the complete factorization of n. It is enough that a non-trivial factor n1 of n is made available, the cofactor n2 = n/n1 is obtained by a single division. Finally, it is sometimes known a priori that n is the product of two (distinct odd) primes (as in the RSA protocols). In this case, the non-trivial split of n immediately gives the desired factorization of n. To sum up, the IFP can be stated in various versions, the presumed difficulty of all these versions being essentially the same.
| Problem 4.1 | General integer factorization problem Given an integer | ||||||||
| Problem 4.2 | Integer factorization problem (IFP) Given a composite integer | ||||||||
| Problem 4.3 | RSA integer factorization problem Given a product n = pq of two (distinct odd) primes p and q, find the prime divisors p and q of n. Recall that if is the prime factorization of n, then the Euler totient function φ(n) of n is . Thus, if the prime factorization of n is known, it is easy to compute φ(n). The converse is not known to be true in general. However, if n = pq is the product of two primes, factoring n is polynomial-time equivalent to computing φ(n) (Exercise 3.6).
| ||||||||
| Problem 4.4 | Totient problem Given a natural number | ||||||||
| Problem 4.5 | RSA totient problem Given a product n = pq of two (distinct odd) primes p and q, compute φ(n). Note that is also a UFD. Quite interestingly, it is computationally easy to find a non-trivial factor g of a polynomial (that is, 0 < deg g < deg f). One might, for example, use the polynomial-time deterministic L3 algorithm named after Lenstra, Lenstra and Lovasz (Section 4.8.2).
Square roots modulo an integer | ||||||||
| Problem 4.6 | Modular square root problem (SQRTP) Given a composite integer can be written as a = gx for some integer x unique modulo n. In this case, x is called the discrete logarithm or the index of a with respect to the base g and is denoted by indg a.
| ||||||||
| Problem 4.7 | Discrete logarithm problem (DLP) Given a finite cyclic group G, a generator g of G and an element is anyway cyclic. If , then the discrete logarithm or index of a with respect to the base g is an integer x unique modulo m := ord H such that a = gx. In this case, we denote such an integer x by indg a. On the other hand, if a ∉ H, then we say that the discrete logarithm indg a is not defined. Recall from Proposition 2.5 that if G is cyclic and if m is known, then checking if a belongs to H amounts to computing an exponentiation in G (that is, if and only if am is the identity of G). If G is not cyclic (or if m is not known), then it is not easy, in general, to develop such a nice criterion.
| ||||||||
| Problem 4.8 | Generalized discrete logarithm problem (GDLP) Given a finite Abelian group G and elements g, and g is an integer with gcd(g, n) = 1, then for every integer a we have indg a ≡ g–1a (mod n), where the modular inverse g–1 (mod n) can be computed efficiently using the extended gcd algorithm (Algorithm 3.8) on g and n. Also note that if G is cyclic and if each element of G is represented as indg a for a given generator g of G (see, for example, Section 2.9.3), then computing discrete logarithms in G to the base g is a trivial problem. In that case, it is also trivial to compute discrete logarithms (if existent) to any other base h (Exercise 4.3).
On the other hand, there are certain groups G in which discrete logarithms cannot be computed so easily; that is, computing indices in G may demand time not bounded by any polynomial in log n, where n = ord G. However, if the group operation on any two elements of G can be performed in time bounded by a polynomial in log n, then cryptographic protocols can be based on G. Typical candidates for such groups are listed below together with the conventional names for the DLP over such groups.
Note that if we are interested in computing indices to a base Another problem that is widely believed to be computationally equivalent to the DLP (at least for the groups mentioned in the above table) is called the Diffie–Hellman problem (DHP). Similar to the DLP, the DHP is presumably difficult to solve for the groups | ||||||||
| Problem 4.9 | Diffie–Hellman problem (DHP) Let G be a multiplicative group and let There are some other difficult problems on which cryptographic systems can be built. Problem 4.10 deserves specific mention in this regard. | ||||||||
| Problem 4.10 | Subset sum problem (SSP) Given a set A := {a1, . . . , an} of natural numbers and Some of the early cryptographic systems based on the SSP have succumbed to efficient (even polynomial-time) cryptanalytic attacks. However, some schemes have been proposed in the recent years, which seem to be resistant to such attacks, or, in other words, for which good attacks are not yet known. As a result, it is important to study the SSP in some detail. The SSP is often mapped to problems on lattices. Let v1, . . . , vn be linearly independent vectors in
| ||||||||
| Problem 4.11 | Shortest vector problem (SVP) Find a non-zero vector | ||||||||
| Problem 4.12 | Closest vector problem (CVP) Given a vector |
| 4.1 |
|
| 4.2 | Show that the following problems are polynomial-time reducible to the IFP.
|
| 4.3 | Let G be a finite cyclic group of order n and let g, g′ be two arbitrary generators of G.
|
| 4.4 | Let G be a finite cyclic multiplicatively written group of order n. An algorithm on G is said to be polynomial-time if it runs in time bounded above by a polynomial function of log n. Assume that the product of any two elements in G can be computed in polynomial time. Recall from Exercise 2.47 that . Show that the computation of an isomorphism is polynomial-time equivalent to computing discrete logarithms in G. (That is, assuming that we are given a (two-way) black box that returns in polynomial time or for every and , discrete logarithms in G can be computed in polynomial time. Conversely, if discrete logarithms with respect to a primitive element can be computed in polynomial time, then such a black box can be realized.)
|
| 4.5 | Let p be an (odd) prime and let g be a primitive root modulo p. Show that is a quadratic residue modulo p if and only if the index indg a is even. Hence, conclude that there is a polynomial-time (in log p) algorithm that computes the least significant bit of indg a, given any . More generally, let p – 1 = 2r s, where r, and s is odd. Show that there exists a polynomial-time algorithm that computes the r least significant bits of indg a given any . (This exercise shows that the DLP has a polynomial-time solution for Fermat primes Fn := 22n + 1. Note that Fn is prime for n = 0, 1, 2, 3, 4. No other Fermat primes are known.)
|
The integer factorization problem (IFP) (Problems 4.1, 4.2 and 4.3) is one of the most easily stated and yet hopelessly difficult computational problem that has attracted researchers’ attention for ages and most notably in the age of electronic computers. A huge number of algorithms varying widely in the basic strategy, mathematical sophistication and implementation intricacy have been suggested, and, in spite of these, factoring a general integer having only 1000 bits seems to be an impossible task today even using the fastest computers on earth.
It is important to note here that even proving rigorous bounds on the running times of the integer-factoring algorithms is quite often a very difficult task. In many cases, we have to be satisfied with clever heuristic bounds based on one or more reasonable but unprovable assumptions.
This section highlights human achievements in the battle against the IFP. Before going into the details of this account we want to mention some relevant points. Throughout this section we assume that we want to factor a (positive) integer n. Since such an integer can be represented by ⌈lg(n + 1)⌉ bits, the input size is taken to be lg n (or, ln n, or log n). Most modern factorization algorithms take time given by the following subexponential expression in ln n:
L(n, α, c) := exp((c + o(1))(ln n)α(ln ln n)1–α),
where 0 < α < 1 and c > 0 are constants. As described in Section 3.2, the smaller the value of α is, the closer the expression L(n, α, c) is to a polynomial expression (in ln n). If n is understood from the context, we write L(α, c) in place of L(n, α, c). Although the current best-known algorithms correspond to α = 1/3, the algorithms with α = 1/2 are also quite interesting. In this case, we use the shorter notation L[c] := L(1/2, c).
Henceforth we will use, without explicit mention, the notation q1 := 2, q2 := 3, q3 := 5, . . . to denote the sequence of primes. The concept of qt-smoothness (for some
) will often be referred to as B-smoothness, where B = {q1, . . . , qt}. Recall from Theorem 2.21 that smaller integers have higher probability of being B-smooth for a given B. This observation plays an important role in designing integer factoring algorithms. The following special case of Theorem 2.21 is often useful.
|
Let Before any attempt of factoring n is made, it is worthwhile to check for the primality of n. Since probabilistic primality tests (like Algorithm 3.13) are quite efficient, we should first run one such test before we are sure that n is really composite. Henceforth, we will assume that n is known to be composite. |
“Factoring in the dark ages” (a phrase attributed to Hendrik Lenstra) used fully exponential algorithms some of which are discussed now. Though the worst-case performances of these algorithms are quite poor, there are many situations when they might factor even a large integer quite fast. It is, therefore, worthwhile to spend some time on these algorithms.
A composite integer n admits a factor ≤
, that can be found by trial divisions of n by integers ≤
. This demands
trial divisions and is clearly impractical, even when n contains only 30 decimal digits. It is also true that n has a prime divisor ≤
. So it suffices to carry out trial divisions by primes only. Though this modified strategy saves us many unnecessary divisions, the asymptotic complexity does not reduce much, since by the prime number theorem the number of primes ≤
is about
. In addition, we need to have a list of primes ≤
or generate the primes on the fly, neither of which is really practical. A trade-off can be made by noting that an integer m ≥ 30 cannot be prime unless m ≡ 1, 7, 11, 13, 17, 19, 23, 29 (mod 30). This means that we need to perform the trial divisions only by those integers m congruent to one of these values modulo 30 and this reduces the number of trial divisions to about 25 per cent. Though trial division is not a practical general-purpose algorithm for factoring large integers, we recommend extracting all the small prime factors of n, if any, by dividing n by a predetermined set {q1, . . . , qt} of small primes. If n is indeed qt-smooth or has all prime factors ≤ qt except only one, then the trial division method completely factors n quite fast. Even when n is not of this type, trial division might reduce its size, so that other algorithms run somewhat more efficiently.
Pollard’s rho method solves the IFP in an expected O~(n1/4) time and is based on the birthday paradox (Exercise 2.172).
Let
be an (unknown) prime divisor of n and let
be a random map. We start with an initial value
and generate a sequence xi+1 = f(xi),
, of elements of
. Let yi denote the smallest non-negative integer satisfying yi ≡ xi (mod p). By the birthday paradox, after
iterates x1, . . . , xt are generated, we have a high chance that yi = yj, that is, xi ≡ xj (mod p) for some 1 ≤ i < j ≤ t. This means that p|(xi – xj) and computing gcd(xi – xj, n) splits n into two non-trivial factors with high probability. The method fails if this gcd is n. For a random n, this incident of having a gcd equal to n is of very low probability.
Algorithm 4.1 gives a specific implementation of this method. Computing gcds for all the pairs (xi – xj, n) is a massive investment of time. Instead we store (in the variable ξ) the values xr, r = 2t, for
and compute only gcd(xr+s – xr, n) for s = 1, . . . , r. Since the sequence yi,
, is ultimately periodic with expected length of period
, we eventually reach a t with r = 2t ≥ τ. In that case, the for loop detects a match. Typically, the update function f is taken to be f(x) = x2 –1 (mod n), which, though not a random function, behaves like one. Note that the iterates yi,
, may be visualized as being located on the Greek letter ρ as shown in Figure 4.1 (with a tail of the first μ iterates followed by a cycle of length τ). This is how this method derives its name.

Algorithm 4.1 takes an expected running time
. Since
, Pollard’s rho method runs in expected time
.
|
Input: A composite integer Output: A non-trivial factor of n. Steps: Choose a random element while (1) { |
Many modifications of Pollard’s rho method have been proposed in the literature. Perhaps the most notable one is an idea due to R. P. Brent. All these modifications considerably speed up Algorithm 4.1, though leaving the complexity essentially the same, that is,
. We will not describe these modifications in this book.
Pollard’s p – 1 method is dependent on the prime factors of p – 1 for a prime divisor p of n. Indeed if p – 1 is rather smooth, this method may extract a (non-trivial) factor of n pretty fast, even when p itself is quite large. To start with we extend the definition of smoothness as follows.
|
Let |
Let p be an (unknown) prime divisor of n. We may assume, without loss of generality, that
. Assume that p–1 is M-power-smooth. Then (p – 1)| lcm(1, . . . , M) and, therefore, for an integer a with gcd(a, n) = 1 (and hence with gcd(a, p) = 1), we have alcm(1,...,M) ≡ 1 (mod p) by Fermat’s little theorem, that is, d := gcd(alcm(1,...,M) – 1, n) > 1. If d ≠ n, then d is a non-trivial factor of n. In case we have d = n (a very rare occurrence), we may try with another a or declare failure.
The problem with this method is that p and so M are not known in advance. One may proceed by guessing successively increasing values of M, till the method succeeds. In the worst case, that is, when p is a safe prime, we have M = (p – 1)/2. Since
, this algorithm runs in a worst-case time of
. However, if M is quite small, then this algorithm is rather efficient, irrespective of how large p itself is.
In Algorithm 4.2, we give a variant of the p – 1 method, where we supply a predetermined value of the bound M. We also assume that we have at our disposal a precalculated list of all primes q1, . . . , qt ≤ M.
There is a modification of this algorithm known as Stage 2 or the second stage. For this, we choose a second bound M′ larger than M. Assume that p – 1 = rq, where r is M-power-smooth and q is a prime in the range M < q ≤ M′. In this case, Stage 2 computes with high probability a factor of n after doing an
operations as follows. When Algorithm 4.2 returns “failure” at the last step, it has already computed the value A := am (mod n), where
, ei = ⌊ln M/ln qi⌋. In this case, A has the multiplicative order of q modulo p, that is, the subgroup H of
generated by A has order q. We choose
random integers
. By the birthday paradox (Exercise 2.172), we have with high probability Ali ≡ Alj (mod p) for some i ≠ j. In that case, d := gcd(Ali – Alj, n) is divisible by p and is a desired factor of n (unless d = n, a case that occurs with a very low probability). In practice, we do not know q and so we determine s and the integers l1, . . . , ls using the bound M′ instead of q.
|
Input: A composite integer Output: A non-trivial factor d of n or “failure”. Steps: Select a random integer a, 1 < a < n. /* For example, we may take a := 2 */ if (d := gcd(a, n) ≠ 1) { Return d. } |
In another variant of Stage 2, we compute the powers Aqt+1 , . . . , Aqt′ (mod n), where qt+1, . . . , qt′ are all the primes qj satisfying M < qj ≤ M′. If p – 1 = rq is of the desired form, we would find q = qj for some t < j ≤ t′, and then gcd(Aq – 1, n), if not equal to n, would be a non-trivial factor of n.
In practice, one may try one’s luck using this algorithm for some M in the range 105 ≤ M ≤ 106 (and possibly also the second stage with 106 ≤ M′ ≤ 108) before attempting a more sophisticated algorithm like the MPQSM, the ECM or the NFSM.
As always, we assume that n is a composite integer and that p is an (unknown) prime divisor of n. Pollard’s p – 1 method uses an element a in the group
whose multiplicative order is p – 1. The idea of Williams’ p + 1 method is very similar, that is, it works with an element a, this time in
, whose multiplicative order is p + 1. If p + 1 is M-power-smooth for a reasonably small bound M, then computing d := gcd(ap+1 – 1, n) > 1 splits n with high probability.
In order to find an element
of order p + 1, we proceed as follows. Let α be an integer such that α2 – 4 is a quadratic non-residue modulo p. Then the polynomial
is irreducible and
. Let a,
be the two roots of f. Then ab = 1 and a + b = α. Since f(ap) = 0 (check it!) and since
, we have ap = b = a–1, that is, ap+1 = 1.
Unfortunately, p is not known in advance. Therefore, we represent elements of
as integers modulo n and the elements of
as polynomials c0 + c1X with c0,
. Multiplying two such elements of
is accomplished by multiplying the two polynomials representing these elements modulo the defining polynomial f(X), the coefficient arithmetic being that of
. This gives us a way to do exponentiations in
in order to compute am – 1 for a suitable m (for example, m = lcm(1, . . . , M)).
However, the absence of knowledge of p has a graver consequence, namely, it is impossible to decide whether α2 – 4 is a quadratic non-residue modulo p for a given integer α. The only thing we can do is to try several random values of α. This is justified, because if k random integers α are tried, then the probability that for all of these α the integers α2 – 4 are quadratic residues modulo p is only 1/2k.
The code for the p + 1 method is very similar to Algorithm 4.2. We urge the reader to complete the details. Since p3 – 1 = (p – 1)(p2 + p + 1), p4 – 1 = (p2 – 1)(p2 + 1) and so on, we can work in higher extensions like
,
to find elements of order p2 + p + 1, p2 + 1 and so on, and generalize the p ± 1 methods. However, the integers p2 + p + 1, p2 + 1, being large (compared to p ± 1), have smaller chance of being M-smooth (or M-power-smooth) for a given bound M.
The reader should have recognized why we paid attention to strong primes and safe primes (Definition 3.5, p 199, and Algorithm 3.14, p 200). Let us now concentrate on the recent developments in the IFP arena.
Carl Pomerance’s quadratic sieve method (QSM) is one of the (reasonably) successful modern methods of factoring integers. Though the number field sieve factoring method is the current champion, there was a time in the recent past when the quadratic sieve method and the elliptic curve method were known to be the fastest algorithms for solving the IFP.
We assume that n is a composite integer which is not a perfect square (because it is easy to detect if n is a perfect square and if so, we replace n by
). The basic idea is to reach at a congruence of the form
Equation 4.1

with x ≢ ±y (mod n). In that case, gcd(x – y, n) is a non-trivial factor of n.
We start with a factor base B = {q1, . . . , qt} comprising the first t primes and let
and J := H2 – n. Then H and J are each
and hence for a small integer c the right side of the congruence
(H + c)2 ≡ J + 2cH + c2 (mod n)
is also
. We try to factor T(c) := J + 2cH + c2 using trial divisions by elements of B. If the factorization is successful, that is, if T(c) is B-smooth, then we get a relation of the form
Equation 4.2

where
. (Note that T(c) ≠ 0, since n is assumed not to be a perfect square.) If all αi are even, say, αi = 2βi, then we get the desired Congruence (4.1) with
and y = H + c. But this is rarely the case. So we keep on generating other relations. After sufficiently many relations are available, we combine these together (by multiplication) to get Congruence (4.1) and compute gcd(x – y, n). If this does not give a non-trivial factor, we try to recombine the collected relations in order to get another Congruence (4.1). This is how Pomerance’s QSM works.
In order to find suitable combinations for yielding Congruence (4.1), we employ a method similar to Gaussian elimination. Assume that we have collected r relations of the form

We search for integers
such that the product

is a desired Congruence (4.1). The left side of this congruence is already a square. In order to make the right side a square too, we have to essentially solve the following system of linear congruences modulo 2:

This is a system of t equations over
in r unknowns β1, . . . , βr and is expected to have solutions, if r is slightly larger than t. Note that the values of αij modulo 2 are only needed for solving the above linear system. This means that we can have a compact representation of the coefficient matrix (αij) by packing 32 of the coefficients as bits per word. Gaussian elimination (over
) can be done using bit operations only.
The running time of this method can be derived using Corollary 4.1. Note that the integers T(c) that are tested for B-smoothness are O(n1/2) which corresponds to α = 1/2 in the corollary. We take qt = L[1/2] (so that t = L[1/2]/ ln L[1/2] = L[1/2] by the prime number theorem) which corresponds to β = 1/2. Assuming that the integers T(c) behave as random integers of magnitude O(n1/2), the probability that one such T(c) is B-smooth is L[–1/2]. Therefore, if L[1] values of c are tried, we expect to get L[1/2] relations involving the L[1/2] primes q1, . . . , qt. Combining these relations by Gaussian elimination is now expected to produce a non-trivial Congruence (4.1). This gives us a running-time of the order of L[3/2] for the relation collection stage. Gaussian elimination using L[1/2] unknowns also takes asymptotically the same time. However, each T(c) can have at most O(log n) distinct prime factors, implying that Relation (4.2) is necessarily sparse. This sparsity can be effectively exploited and the Gaussian elimination can be done essentially in time L[1]. Nevertheless, the entire procedure runs in time L[3/2], a subexponential expression in ln n.
In order to reduce the running time from L[3/2] to L[1], we employ what is known as sieving (and from which the algorithm derives its name). Let us fix a priori the sieving interval, that is, the values of c for which T(c) is tested for B-smoothness, to be –M ≤ c ≤ M, where M = L[1]. Let
be a small prime (that is, q = qi for some i = 1, . . . , t). We intend to find out the values of c such that qh|T(c) for small exponents h = 1, 2, . . . . Since T(c) = J + 2cH + c2 = (c + H)2 – n, the solvability for c of the condition qh|T(c) or of q|T(c) is equivalent to the solvability of the congruence (c + H)2 ≡ n (mod q). If n is a quadratic non-residue modulo q, no c satisfies the above condition. Consequently, the factor base B may comprise only those primes q for which n is a quadratic residue modulo q (instead of all primes ≤ qt). So we assume that q meets this condition. We may also assume that q n, because it is a good strategy to perform trial divisions of n by all the primes in B before we go for sieving. The sieving process makes use of an array
indexed by c. We initialize the array location
for each c, –M ≤ c ≤ M.
We explain the sieving process only for an odd prime q. The modifications for the case q = 2 are left to the reader as an easy exercise. The congruence x2 – n ≡ 0 (mod q) has two distinct solutions for x, say, x1 and
mod q. These correspond to two solutions for c of (H + c)2 ≡ n (mod q), namely, c1 ≡ x1 – H (mod q) and
(mod q). For each value of c in the interval –M ≤ c ≤ M, that is congruent either to c1 or
modulo q, we subtract ln q from
. We then lift the solutions x1 and
to the (unique) solutions x2 and
of the congruence x2 – n ≡ 0 (mod q2) (Exercise 3.29), compute c2 ≡ x2 – H (mod q2) and
(mod q2) and for each c in the range –M ≤ c ≤ M congruent to c2 or
modulo q2 subtract ln q from
. We then again lift to obtain the solutions modulo q3 and proceed as above. We repeat this process of lifting and subtracting ln q from appropriate locations of
until we reach a sufficiently large
for which neither ch nor
corresponds to any value of c in the range –M ≤ c ≤ M. We then choose another q from the factor base and repeat the procedure explained in this paragraph for this q.
After the sieving procedure is carried out for all small primes q in the factor base B, we check for which c, –M ≤ c ≤ M, the array location
is 0. These are precisely the values of c in the indicated range for which T(c) is B-smooth. For each smooth T(c), we then compute Relation (4.2) using trial division (by primes of B).
The sieving process replaces trial divisions (of every T(c) by every q) by subtractions (of ln q from appropriate
). This is intuitively the reason why sieving speeds up the relation collection stage. For a more rigorous analysis of the running time, note that in order to get the desired ci and
modulo qi for each
and for each i = 1, . . . , h we have either to compute a square root modulo q (for i = 1) or to solve a congruence (during lifting for i ≥ 2), each of which can be done in polynomial time. Also the bound h on the exponent of q satisfy
, that is, h = O(log n). Finally, there are L[1/2] primes in B. Therefore, the computation of the ci and
for all q and i takes a total of L[1/2] time.
Now, we count the total number ν of subtractions of different ln q values from all the locations of the array
. The size of
is 2M + 1. For each qi, we need to subtract ln q from at most 2 ⌈(2M + 1)/qi⌉ locations (for odd q), and we also have
. Therefore, ν is of the order of
, where Q is the maximum of all qi and is
, and where Hm,
, denote the harmonic numbers (Exercise 4.6). But Hm = O(ln m), and so ν = O(2(2M + 1) log n) = L[1], since M = L[1].
The logarithms ln q (as well as the initial array values ln |T(c)|) are irrational numbers and hence need infinite precision for storing. We, however, need to work with only crude approximations of these logarithms, say up to three places after the decimal point. In that case, we cannot take
as the criterion for selecting smooth values of T(c), because the approximate representation of logarithms leads to truncation (and/or rounding) errors. In practice, this is not a severe problem, because T(c) is not smooth if and only if it has a prime factor at least as large as qt+1 (the smallest prime not in B). This implies that at the end of the sieving operation the values of
for smooth T(c) are close to 0, whereas those for non-smooth T(c) are much larger (close to a number at least as large as ln qt+1). Thus we may set the selection criterion for smooth integers as
or as
ln qt+1. It is also possible to replace floating point subtraction by integer subtraction by doing the arithmetic on 1000 times the logarithm values. To sum up, the ν = L[1] subtractions the sieving procedure does would be only single-precision operations and hence take a total of L[1] time.
As mentioned earlier, Gaussian elimination with sparse equations can also be performed in time L[1]. So Pomerance’s algorithm with sieving takes time L[1].
Numerous modifications over this basic strategy speed up the algorithm reasonably. One possibility is to do sieving every time only for h = 1 and ignore all higher powers of q. That is, for every q we check which of the integers T(c) are divisible by q and then subtract ln q from the corresponding indices of the array
. If some T(c) is divisible by a higher power of q, this strategy fails to subtract ln q the required number of times. As a result, this T(c), even if smooth, may fail to pass the smoothness criterion. This problem can be overcome by increasing the cut-off from 1 (or 0.1 ln qt+1) to a value ξ ln qt for some ξ ≥ 1. But then some non-smooth T(c) will pass through the selection criterion in addition to some smooth ones that could not, otherwise, be detected. This is reasonable, because the non-smooth ones can be later filtered out from the smooth ones and one might use even trial divisions to do so. Experimentations show that values of ξ ≤ 2.5 work quite well in practice.
The reason why this strategy performs well is as follows. If q is small, for example q = 2, we should subtract only 0.693 from
for every power of 2 dividing T(c). On the other hand, if q is much larger, say q = 1,299,709 (the 105-th prime), then ln q ≈ 14.078 is large. But T(c) would not, in general, be divisible by a high power of this q. This modification, therefore, leads to a situation where the probability that a smooth T(c) is actually detected as smooth is quite high. A few relations would still be missed out even with the modified selection criterion, but that is more than compensated by the speed-up gained by the method. Henceforth, we will call this modified strategy as incomplete sieving and the original strategy (of considering all powers of q) as complete sieving.
Another trick known as large prime variation also tends to give more usable relations than are available from the original (complete or incomplete) sieving. In this context, we call a prime q′ large, if q′ ∉ B. A value of T(c) is often expected to be B-smooth except for a single large prime factor:
Equation 4.3

with q′ ∉ B. Such a value of T(c) can be easily detected. For example, incomplete sieving with the relaxed selection criterion is expected to give many such relations naturally, whereas for complete sieving, if the left-over of ln |T(c)| in
at the end of the subtraction steps is < 2 ln qt, then this must correspond to a large prime factor <
. Instead of throwing away an apparently unusable Equation (4.3), we may keep track of them. If a large prime q′ is not large enough (that is, not much larger than qt), then it might appear on the right side of Equation (4.3) for more than one values of c, and if that is the case, all these relations taken together now become usable for the subsequent Gaussian elimination stage (after including q′ in the factor base). This means that for each large prime occurring more frequently than once, the factor base size increases by 1, whereas the number of relations increases by at least 2. Thus with a little additional effort we enrich the factor base and the relations collected, and this, in turn, increases the probability of finding a useful Congruence (4.1), our ultimate goal. Viewed from another angle, the strategy of large prime variation allows us to start with smaller values of t and/or M and thereby speed up the sieving stage and still end up with a system capable of yielding the desired Congruence (4.1). Note that an increased factor base size leads to a larger system to solve by Gaussian elimination. But this is not a serious problem in practice, because the sieving stage (and not the Gaussian elimination stage) is usually the bottleneck of the running time of the algorithm.
It is natural that the above discussion on handling one large prime is applicable to situations where a T(c) value has more than one large prime factors, say q′ and q″. Such a T(c) value leads to a usable relation if
. This situation can be detected by a compositeness test on the non-smooth part of T(c). Subsequently, we have to factor the non-smooth part to obtain the two large primes q′ and q″. This is called two large prime variation. As the size of the integer n to be factored becomes larger, one may go for three and four large prime variations.
We will shortly encounter many other instances of sieving (for solving the IFP and the DLP). Both incomplete sieving and the use of large primes, if carefully applied, help speed up most of these sieving methods much in the same way as they do in connection with the QSM.
Easy computations (Exercise 4.11) show that the average and maximum of the integers |T(c)| checked for smoothness in the QSM are approximately M H and 2M H respectively. Though these values are theoretically
, in practice the factor of M (or 2M) makes the integers |T(c)| somewhat large leading to a poor yield of B-smooth integers for larger values of |c| in the sieving interval. The multiple-polynomial quadratic sieve method (MPQSM) applies a nice trick to reduce these average and maximum values. In the original QSM, we work with a single polynomial in c, namely,
T(c) = J + 2cH + c2 = (H + c)2 – n.
Now, we work with a more general quadratic polynomial

with W > 0 and V2 – UW = n. (The original T(c) corresponds to U = J, V = H and W = 1.) Then we have
, that is, in this case a relation looks like

This relation has an additional factor of W that was absent in Relation (4.2). However, if W is chosen to be a prime (possibly a large one), then the Gaussian elimination stage proceeds exactly as in the original method. Indeed in this case W appears in every relation and hence poses no problem. Only the integers
need to be checked for B-smoothness and hence should have small values. The sieving procedure (that is, computing the appropriate locations of
for subtracting ln q,
) for the general polynomial
is very much similar to that for T(c). The details are left to the reader as an easy exercise.
Let us now explain how we can choose the parameters U, V, W. To start with we fix a suitable sieving interval
and then choose W to be a prime close to
such that n is a quadratic residue modulo W. Then we compute a square root V of n modulo W (Algorithm 3.16) and finally take U = (V2 – n)/W. This choice clearly gives
and
. (Indeed one may choose 0 < V < W/2, but this is not an important issue.) Now, the maximum value of
becomes
. Thus even for
, this maximum value is smaller by a factor of
than the maximum value of |T(c)| in the original QSM. Moreover, we may choose somewhat smaller values of
(compared to M) by working with several polynomials
corresponding to different choices for the prime W. This is why the MPQSM, despite having the same theoretical running-time (L[1]) as the original QSM, runs faster in practice.
The QSM is highly parallelizable. More specifically, different processors can handle pairwise disjoint subsets of B during the sieving process. That is, each processor P maintains a local array
indexed by c, –M ≤ c ≤ M. The (local) sieving process at P starts with initializing all the locations
to 0. For each prime q in the subset BP of the factor base B assigned to P, one adds ln q to appropriate locations (and appropriate numbers of times). After all these processors finish local sieving, a central processor computes, for each c in the sieving interval, the value ln
(where the sum extends over all processors P which have done local sieving) based on which T(c) is recognized as smooth or not. For the multiple-polynomial variant of the QSM, different processors might handle different polynomials
and/or different subsets of B.
Adi Shamir has proposed the complete design of a (hardware) device, TWINKLE (The Weizmann INstitute Key Location Engine), that can perform the sieving stage of QSM a hundred to thousand times faster than software implementations in usual PCs available nowadays. This speed-up is obtained by using a high clock speed (10 GHz) and opto-electronic technology for detecting smooth integers. Each TWINKLE, if mass produced, has an estimated cost of US $5,000.
The working of TWINKLE is described in Figure 4.2. It uses an opaque cylinder of a height of about 10 inches and a diameter of about 6 inches. At the bottom of the cylinder is an array of LEDs,[1] each LED representing a prime in the factor base. The i-th LED (corresponding to the i-th prime qi) emits light of intensity proportional to log qi. The device is clocked and the i-th LED emits light only during the clock cycles c for which qi|T(c). The light emitted by all the active LEDs at a given clock cycle is focused by a lens and a photo-detector senses the total emitted light. If this total light exceeds a certain threshold, the corresponding clock cycle (that is, the time c) is reported to a PC attached to TWINKLE. The PC then analyses the particular T(c) for smoothness over {q1, . . . , qt} by trial division.
[1] An LED (light emitting diode) is an electronic device that emits light, when current passes through it. A GaAs(Gallium arsenide)-based LED emits (infra-red) light of wavelength ~870 nano-meters. In the operational range of an LED, the intensity of emitted light is roughly proportional to the current passing through the LED.

Thus, TWINKLE implements incomplete sieving by opto-electronic means. The major difference between TWINKLE’s sieving and software sieving is that in the latter we used an array of times (the c values) and the iteration went over the set of small primes. In TWINKLE, we use an array of small primes and allow time to iterate over the different values of c in the sieving interval –M ≤ c ≤ M. An electronic circuit in TWINKLE computes for each LED the cycles c at which that LED is expected to emanate light. That is to say that the i-th LED emits light only in the clock cycles c congruent modulo qi to any of the two solutions c1 and
of T(c) ≡ 0 (mod qi). Shamir’s original design uses two LEDs for each prime qi, one corresponding to c1, the other to
. In that case, each LED emits light at regularly spaced clock cycles and this simplifies the electronic circuitry (at the cost of having twice the number of LEDs).
Another difference of TWINKLE from software sieving is that here we add the log qi values (to zero) instead of subtracting them from log |T(c)|. By Exercise 4.11, the values |T(c)| typically have variations by small constant factors. Taking logs reduces this variation further and, therefore, comparing the sum of the active log qi values for a given c with a fixed predefined threshold (say log M H) independent of c is a neat way of bypassing the computation of all log |T(c)|, –M ≤ c ≤ M. (This strategy can also be used for software sieving.)
The reasons, why TWINKLE speeds up the sieving procedure over software implementations in conventional PCs, are the following:
Silicon-based PC chips at present can withstand clock frequencies on the order of 1 GHz. On the contrary a GaAs-based wafer containing the LED array can be clocked faster than 10 GHz.
There is no need to initialize the array
(to log |T(c)| or zero). Similarly at the end, there is no need to compare the final values in all these array locations with a threshold.
The addition of all the log qi values effective at a given c is done instantly by analog optical means. We do not require an explicit electronic adder.
Shamir [269] reports the full details of a VLSI[2] design of TWINKLE.
[2] very large-scale integration
H. W. Lenstra’s elliptic curve method (ECM) is another modern algorithm to solve the IFP and runs in expected time
, where p is the smallest prime factor of n (the integer to be factored). Since
, this running time is L[1] = L(n, 1/2, 1): that is, the same as the QSM. However, if p is small (that is, if p = O(nα) for some α < 1/2), then the ECM is expected to outperform the QSM, since the working of the QSM is incapable of exploiting smaller values of p.
As before, let n be a composite natural number having no small prime divisors and let p be the smallest prime divisor of n. For denoting subexponential expressions in ln p, we use the symbol Lp[c] := L(p, 1/2, c), whereas the unsubscripted symbol L[c] stands for L(n, 1/2, c). We work with random elliptic curves

and consider the group
of rational points on E modulo p. However, since p is not known a priori, we intend to work modulo n. The canonical surjection
allows us to identify the
-rational points on E as points on E over
. We now define a bound
and let B = {q1, . . . , qt} be all the primes smaller than or equal to M, so that by the prime number theorem (Theorem 2.20) #B ≈ M/ln
. Of course, p is not known in advance, so that M and B are also not known. We will discuss about the choice of M and B later. For the time being, let us assume that we know some approximate value of p, so that M and B can be fixed, at least approximately, ate the beginning of the algorithm.
By Hasse’s theorem (Theorem 2.48, p 106), the cardinality
satisfies
, that is, ν = O(p). If we make the heuristic assumption that ν is a random integer on the order O(p), then Corollary 4.1 tells us that ν is B-smooth with probability
. This assumption is certainly not rigorous, but accepting it gives us a way to analyse the running time of the algorithm.
If
random curves are tried, then we expect to find one B-smooth value of ν. In this case, a non-trivial factor of n can be computed with high probability as follows. Define ei := ⌊ln n/ln qi⌋ for i = 1, . . . , t, and
, where t is the number of primes in B. If ν is B-smooth, then ν|m and, therefore, for any point
we have
. Computation of mP involves computation of many sums P1 + P2 of points P1 := (h1, k1) and P2 := (h2, k2). At some point of time, we would certainly compute
, that is, P1 = –P2, that is, h1 ≡ h2 (mod p) and k1 ≡ –k2 (mod p). Since p was unknown, we worked modulo n, that is, the values of h1, h2, k1 and k2 are known modulo n. Let d := gcd(h1 – h2, n). Then p|d and if d ≠ n (the case d = n has a very small probability!), we have the non-trivial factor d of n. The computation of the coordinates of P1 + P2 (assuming P1 ≠ P2) demands computing the inverse of h1 – h2 modulo n (Section 2.11.2). However, if d = gcd(h1 – h2, n) ≠ 1, then this inverse does not exist and so the computation of P1 + P2 fails, and we have a non-trivial factor of n. If ν is B-smooth, then the computation of mP is bound to fail. The basic steps of the ECM are then as shown in Algorithm 4.3.
|
Input: A composite integer Output: A non-trivial divisor d of n. Steps: while (1) { |
Before we derive the running time of the ECM, some comments are in order. A random curve E is chosen by selecting random integers a and b modulo n. It turns out that taking a as single-precision integers and b = 1 works quite well in practice. Indeed one can keep on trying the values a = 0, 1, 2, . . . successively. Note that the curve E is an elliptic curve, that is, non-singular, if and only if δ := gcd(n, 4a3 + 27b2) = 1. However, having δ > 1 is an extremely rare occurrence and one might skip the computation of δ before starting the trial with a curve. The choice b = 1 is attractive, because in that case we may take the point P = (0, 1). In Section 3.6, we have described a strategy to find a random point on an elliptic curve over a field K. This is based on the assumption that computing square roots in K is easy. The same method can be applied to curves over
, but n being composite, it is difficult to compute square roots modulo n. So taking b to be 1 (or the square of a known integer) is indeed a pragmatic decision. After all, we do not need P to be a random point on E.
Recall that we have taken
, where ei = ⌊ln n/ln qi⌋. If instead we take ei := ⌊ln M/ln qi⌋ (where M is the bound mentioned earlier), the computation of mP per trial reduces much, whereas the probability of a successful trial (that is, a failure of computing mP) does not decrease much. The integer m can be quite large. One, however, need not compute m explicitly, but proceed as follows: first take Q0 := P and subsequently for each i = 1, . . . , t compute
. One finally gets mP = Qt.
Now comes the analysis of the running time of the ECM. We have fixed the parameter M to be
, so that B contains
small primes. The most expensive part of a trial with a random elliptic curve is the (attempted) computation of the point mP. This involves
additions of points. Since an expected number of
elliptic curves needs to be tried for finding a non-trivial factor of n, the algorithm performs an expected number of
additions of points on curves modulo n. Since each such addition can be done in polynomial time, the announced running time follows.
Note that
is the optimal running time of the ECM and can be shown to be achieved by taking
. But, in practice, p is not known a priori. Various ad hoc ways may be adopted to get around with this difficulty. One possibility is to use the worst-case bound
. For example, for factoring integers of the form n = pq, where p and q are primes of roughly the same size, this is a good approximation for p. Another strategy is to start with a small value of M and increase M gradually with the number of trials performed. For larger values of M, the probability of a successful trial increases implying that less number of elliptic curves needs to be tried, whereas the time per iteration (that is, for the computation of mP) increases. In other words, the total running time of the ECM is apparently not very sensitive to the choice of M.
A second stage can be used for each elliptic curve in order to increase the probability of a trial being successful. A strategy very similar to the second stage of the p – 1 method can be employed. The reader is urged to fill out the details. Employing the second stage leads to reasonable speed-up in practice, though it does not affect the asymptotic running time.
The ECM can be effectively parallelized, since different processors can carry out the trials, that is, computations of mP (together with the second stage) with different sets of (random) elliptic curves.
The number field sieve method (NFSM) is till date the most successful of all integer factoring algorithms. Under certain heuristic assumptions it achieves a running time of the form L(n, 1/3, c), which is better than the L(n, 1/2, c′) algorithms described so far. The NFSM was first designed for integers of a special form. This variant of the NFSM is called the special NFS method (SNFSM) and was later modified to the general NFS method (GNFSM) that can handle arbitrary integers. The running time of the SNFSM has c = (32/9)1/3 ≈ 1.526, whereas that for the GNFSM has c = (64/9)1/3 ≈ 1.923. For the sake of simplicity, we describe only the SNFSM in this book (see Cohen [56] and Lenstra and Lenstra [165] for further details).
We choose an integer
and a polynomial
such that f(m) ≡ 0 (mod n). We assume that f is irreducible in
; otherwise a non-trivial factor of f yields a non-trivial factor of n. Consider the number field
. Let d := deg f be the degree of the number field K. We use the complex embedding
for some root
of f. The special NFS method makes certain simplifying assumptions:
f is monic, so that
.
is monogenic.
is a PID.
Consider the ring homomorphism

This is well-defined, since f(m) ≡ 0 (mod n). We choose small coprime (rational) integers a, b and note that Φ(a+bα) = a + bm (mod n). Let
be a predetermined smoothness bound. Assume that for a given pair (a, b), both a + bm and a + bα are B-smooth. For the rational integer a + bm, this means

being the set of all rational primes ≤ B. On the other hand, smoothness of the algebraic integer a + bα means that the principal ideal
is a product of prime ideals of prime norms ≤ B; that is, we have a factorization

where
is the set of all prime ideals of
of prime norms ≤ B. By assumption, each
is a principal ideal. Let
denote a set of generators, one for each ideal in
. Further let
denote a set of generators of the multiplicative group of units of
. The smoothness of a + bα can, therefore, be rephrased as
Equation 4.4

Applying Φ then yields

This is a relation for the SNFSM. After
relations are available, Gaussian elimination modulo 2 (as in the case of the QSM) is expected to give us a congruence of the form
x2 ≡ y2 (mod n),
and gcd(x – y, n) is possibly a non-trivial factor of n. This is the basic strategy of the SNFSM. We clarify some details now.
There is no clearly specified way to select the polynomial f for defining the number field
. We require f to have small coefficients. Typically, m is much smaller than n and one writes the expansion of n in base m as n = btmt + bt–1mt–1 + ··· + b1m + b0 with 0 ≤ bi < m. Taking f(X) = btXt + bt–1Xt–1 + ··· + b1X + b0 is often suggested.
For integers n of certain special forms, we have natural choices for f. The seminal paper on the NFSM by Lenstra et al. [167] assumes that n = re – s for a small integer
and a non-zero integer s with small absolute value. In this case, one first chooses a small extension degree
and sets m := r⌈e/d⌉ and f(X) := Xd – sr⌈e/d⌉d–e. Typically, d = 5 works quite well in practice. Lenstra et al. report the implementation of the SNFSM for factoring n = 3239 – 1. The parameters chosen are d = 5, m = 348 and f(X) = X5 – 3. In this case,
is monogenic and a PID.

Take a small rational prime
. From Section 2.13, it follows that if
is the factorization of the canonical image of f(X) modulo p, then
, i = 1, . . . , r, are all the primes lying over p. We have also seen that
,
, is prime if and only if di = 1, that is,
for some
. Thus, each root of
in
corresponds to a prime ideal of
of prime norm p.
To sum up, a prime ideal in
of prime norm is specified by a pair (p, cp) of values (in
). We denote this ideal by
. All ideals in
can be precomputed by finding the roots of the defining polynomial f(X) modulo the small primes p ≤ B. One can use the root-finding algorithms of Exercise 3.29.

Constructing a set
of generators of ideals in
is a costly operation. We have just seen that each prime ideal
of
corresponds to a pair (p, cp) and is a principal ideal by assumption. A generator
of such an ideal
is an element of the form
,
, with N(gp,cp) = ±p and
(mod p). Algorithm 4.4 (quoted from Lenstra et al. [167]) computes the generators gp,cp for all relevant pairs (p, cp). The first for loop exhaustively searches over all small polynomials h(α) in order to locate for each (p, cp) an element
of norm kp with |k| as small as possible. If the smallest k (stored in ap,cp) is ±1,
is already a generator gp,cp of
, else some additional adjustments need to be performed.
|
Choose two suitable positive constants aB and CB (depending on B and K). Initialize an array ap,cp := aB indexed by the relevant pairs (p, cp). for each |

Let K have the signature (r1, r2). Write ρ = r1 + r2 – 1. By Dirichlet’s unit theorem, the group
of units of
is generated by an appropriate root u0 of unity and ρ multiplicatively independent[3] elements u1, . . . , uρ of infinite order. Each unit u of
has norm N(u) = ±1. Thus, one may keep on generating elements
, hi small integers, of norm ±1, until ρ independent elements are found. Many elements of
are available as a by-product during the construction of
, which involves the computation of norms of many elements in
. For a more general exposition on this topic, see Algorithm 6.5.9 of Cohen [56].
[3] The elements u1, . . . , uρ in a (multiplicatively written) group are called (multiplicatively) independent if
,
, is the group identity only for n1 = ··· = nρ = 0.
In order to compute the factorization of Equation (4.4), we first factor the integer N(a + bα) = bdf(–a/b). If
is the prime factorization of 〈a + bα〉 with pairwise distinct prime ideals
of
, by the multiplicative property of norms we obtain
.
Now, let p ≤ B be a small prime. If p N(a + bα), it is clear that no prime ideal of
of norm p (or a power of p) appears in the factorization of 〈a + bα〉. On the other hand, if p| N(a + bα), then
for some
. The assumption
implies that the inertial degree of
is 1: that is,
, that is,
, that is, there is a cp with f(cp) ≡ 0 (mod p) such that the prime ideal
corresponds to the pair (p, cp). In this case, we have a ≡ –cpb (mod p). Assume that another prime ideal
of norm p appears in the prime factorization of 〈a + bα〉. If
corresponds to the pair p,
, then
. Since cp and
are distinct modulo p, it follows that p|gcd(a, b), a contradiction, since gcd(a, b) = 1. Thus, a unique ideal
of norm p appears in the factorization of 〈a + bα〉. Moreover, the multiplicity of
in the factorization of 〈a + bα〉 is the same as the multiplicity vp(N(a + bα)).
Thus, one may attempt to factorize N(a + bα) using trial divisions by primes ≤ B. If the factorization is successful, that is, if N(a + bα) is B-smooth, then for each prime divisor p of N(a + bα) we find out the ideal
and its multiplicity in the factorization of 〈a + bα〉 as explained above. Since we know a generator of each
, we eventually compute a factorization
, where u is a unit in
. What remains is to factor u as a product of elements of
. We don’t discuss this step here, but refer the reader to Lenstra et al. [167].
In the QSM, we check the smoothness of a single integer T(c) per trial, whereas for the NFS method we do so for two integers, namely, a + bm and N(a + bα). However, both these integers are much smaller than T(c), and the probability that they are simultaneously smooth is larger than the probability that T(c) alone is smooth. This accounts for the better asymptotic performance of the NFS method compared to the QSM.
One has to check the smoothness of a + bm and N(a + bα) for each coprime a, b in a predetermined interval. This check can be carried out efficiently using sieves. We have to use two sieves, one for filtering out the non-smooth a + bm values and the other for filtering out the non-smooth a + bα values. We should have gcd(a, b) = 1, but computing gcd(a, b) for all values of a and b is rather costly. We may instead use a third sieve to throw away the values of a for a given b for which gcd(a, b) is divisible by primes ≤ B. This still leaves us with some pairs (a, b) for which gcd(a, b) > 1. But this is not a serious problem, since such values are small in number and can be later discarded from the list of pairs (a, b) selected by the smoothness test.
We fix b and allow a to vary in the interval –M ≤ a ≤ M for a predetermined bound M. We use an array
indexed by a. Before the first sieve we initialize this array to
. We may set
for those values of a for which gcd(a, b) is known to be > 1 (where +∞ stands for a suitably large positive value). For each small prime p ≤ B and small exponent
, we compute a′ := –mb (mod ph) and subtract ln p from
for each a, –M ≤ a ≤ M, with a ≡ a′ (mod ph). Finally, for each value of a for which
is not (close to) 0, that is, for which a + mb is not B-smooth, we set
. For the other values of a, we set
. One may use incomplete sieving (with a liberal selection criterion) during the first sieve.
The second sieve proceeds as follows. We continue to work with the value of b fixed before the first sieve and with the array
available from the first sieve. For each prime ideal
, we compute a″ := –bcp (mod p) and subtract ln p from each location
for which a ≡ a″ (mod p). For those a for which
ln B for some real ξ ≥ 1, say ξ = 2, we try to factorize a + bα over
and
. If the attempt is successful, both a + bm and a + bα are smooth. This second sieve is an incomplete one and, therefore, we must use a liberal selection criterion.
For deriving the running time of the SNFSM, take d ≤ (3 ln n/(2 ln ln n))1/3, m = L(n, 2/3, (2/3)1/3), B = L(n, 1/3, (2/3)2/3) and M = L(n, 1/3, (2/3)2/3). From the prime number theorem and from the fact that d is small, it follows that both
and
have the same asymptotic bound as B. Also
meets this bound. We then have L(n, 1/3, (2/3)2/3) unknown quantities on which we have to do Gaussian elimination.
The integers a + mb have absolute values ≤ L(n, 2/3, (2/3)1/3). If the coefficients of f are small, then
|N(a + bα)| = |bdf(–a/b)| ≤ L(n, 1/3, d · (2/3)2/3) = L(n, 2/3, (2/3)1/3).
Under the heuristic assumption that a + mb and N(a + bα) behave as random integers of magnitude L(n, 2/3, (2/3)1/3), the probability that both these are B-smooth turns out to be L(n, 1/3, –(2/3)2/3), and so trying L(n, 1/3, 2(2/3)2/3) pairs (a, b) is expected to give us L(n, 1/3, (2/3)2/3) relations. The entire sieving process takes time L(n, 1/3, 2(2/3)2/3), whereas solving a sparse system in L(n, 1/3, (2/3)2/3) unknowns can be done essentially in the same time. Thus the running time of the SNFSM is L(n, 1/3, 2(2/3)2/3) = L(n, 1/3, (32/9)1/3).
| 4.6 | For , define the harmonic numbers . Show that for each we have ln(m + 1) ≤ Hm ≤ 1 + ln m. [H] Deduce that the sequence Hm, , is not convergent. (Note, however, that the sequence Hm – ln m, , converges to the constant γ = 0.57721566 . . . known as the Euler constant. It is not known whether γ is rational or not.)
| |
| 4.7 | Let k, c, c′, α be positive constants with α < 1. Prove the following assertions.
| |
| 4.8 | Let us assume that an adversary C has computing power to carry out 1012 floating point operations (flops) per second. Let A be an algorithm that computes a certain function P(n) using T(n) flops for an input . We say that it is infeasible for C to compute P(n) using algorithm A, if it takes ≥ 100 years for the computation or, equivalently, if T(n) ≥ 3.1536 × 1021. Find, for the following expressions of T(n), the smallest values of n that make the computation of P(n) by Algorithm A infeasible: T(n) = (ln n)3, T(n) = (ln n)10, T(n) = n, , T(n) = n1/4, T(n) = L[2], T(n) = L[1], T(n) = L[0.5], T(n) = L(n, 1/3, 2) and T(n) = L(n, 1/3, 1). (Neglect the o(1) terms in the definitions of L( ) and L[ ].)
| |
| 4.9 | Let be an odd integer and let r be the total number of distinct (odd) prime divisors of n. Show that for each integer a the congruence x2 ≡ a2 (mod n) has ≤ 2r solutions for x modulo n. If gcd(a, n) = 1, show that this congruence has exactly 2r solutions. [H]
| |
| 4.10 | Show that the problems IFP and SQRTP are probabilistic polynomial-time equivalent. [H] | |
| 4.11 | In this exercise, we use the notations introduced in connection with the Quadratic Sieve method for factoring integers (Section 4.3.2). We assume that M ≪ H, since , whereas M = L[1].
| |
| 4.12 | Reyneri’s cubic sieve method (CSM) Suppose that we want to factor an odd integer n. Suppose also that we know a triple (x, y, z) of integers satisfying x3 ≡ y2z (mod n) with x3 ≠ y2z (as integers). We assume further that |x|, |y|, |z| are all O(pξ) for some ξ, 1/3 < ξ < 1/2.
| |
| 4.13 | Sieve of Eratosthenes Two hundred years before Christ, Eratosthenes proposed a sieve (Algorithm 4.5) for computing all primes between 1 and a positive integer n. Prove the correctness of this algorithm and compute its running time. [H]
Algorithm 4.5. The sieve of Eratosthenes
| |
| 4.14 | This exercise proposes an adaptation of the sieve of Eratosthenes for computing a random prime of a given bit length l. In Section 3.4.2, we have described an algorithm for this computation, that generates random (odd) integers of bit length l and checks the primality of each such integer, until a (probable) prime is found. An alternative strategy is to generate a random l-bit odd integer n and check the integers n, n + 2, n + 4, . . . for primality.
|
The discrete logarithm problem (DLP) has attracted somewhat less attention of the research community than the IFP has done. Nonetheless, many algorithms exist to solve the DLP, most of which are direct adaptations of algorithms for solving the IFP. We start with the older algorithms collectively known as the square root methods, since the worst-case running time of each of these is
for the field
. The new family of algorithms based on the index calculus method provides subexponential solutions to the DLP and is described next. For the sake of simplicity, we assume in this section that we want to compute the discrete logarithm indg a of
with respect to a primitive element g of
. We concentrate only on the fields
, p odd prime, and
,
, since non-prime fields of odd characteristics are only rarely used in cryptography.
Square root methods are applicable to any finite (cyclic) group. To avoid repetitions we provide here a generic description. That is, we assume that G is a multiplicatively written group of order n and
. The identity of G is denoted as 1. It is not necessary to assume that G is cyclic or that g is a generator of G. However, these assumptions are expected to make the descriptions of the algorithms somewhat easier and hence we will stick to them. The necessary modifications for non-cyclic groups G or non-primitive elements g are rather easy and the reader is requested to fill out the details. We assume that each element of G can be represented by O(lg n) bits (so that the input size is taken to be lg n) and that multiplications, exponentiations and inverses in G can be computed in time polynomially bounded by this input size.
Let us assume that the elements of G can be (totally) ordered in such a way that comparing two elements of G with respect to this order can be done in polynomial time of the input size. For example, a natural order on
is the relation ≤ on
. Note that k elements of G can be sorted (under the above order) using O(k log k) comparisons.
Let
. Then d := indg a is uniquely determined by two (nonnegative) integers d0, d1 < m such that d = d0 + d1m (the base m representation of d). In Shanks’ baby-step–giant-step (BSGS) method, we compute d0 and d1 as follows. To start with we compute a list of pairs (d0, gd0) for d0 = 0, 1, . . . , m – 1 and store these pairs in a table sorted with respect to the second coordinate (the baby steps). Now, for each d1 = 0, 1, . . . , m – 1, we compute g–md1 (the giant steps) and search if ag–md1 is the second coordinate of a pair (d0, gd0) of some entry in the table mentioned above. If so, we have found the desired d0 and d1, otherwise we try with the next value of d1. An optimized implementation of this strategy is given as Algorithm 4.6.
The computation of all the elements of T and sorting T can be done in time O~(m). If we use a binary search algorithm (Exercise 4.15), then the search for h in T can be performed using O(lg m) comparisons in G. Therefore, the giant steps also take a total running time of O~(m). Since
, the BSGS method runs in time
. The memory requirement of the BSGS (that is, of the table T) is
. Thus this method becomes impractical, even when n contains as few as 30 decimal digits.
Pollard’s rho method for solving the DLP is similar in idea to the method with the same name for solving the IFP. Let
be a random map and let us generate a sequence of tuples
,
, starting with a random (r1, s1) and subsequently computing (ri+1, si+1) = f(ri, si) for each i = 1, 2, . . . . The elements
for i = 1, 2, . . . can then be thought of as randomly chosen ones from G. By the birthday paradox (Exercise 2.172), we expect to get a match bi = bj for some i ≠ j, after
of the elements b1, b2, . . . are generated. But then we have ari–rj = gsj –si, that is, indg a ≡ (ri – rj)–1(sj – si) (mod n), provided that the inverse exists, that is, gcd(ri – rj, n) = 1. The expected running time of this algorithm is
, the same as that of the BSGS method, but the storage requirement drops to only O(1) elements of G.
|
Input: G, g and a as described above. Output: d = indg a. Steps: n : = ord(G), /* Baby steps */ Initialize T to an empty table. Insert the pairs (0, 1) and (1, g) in T. h := g. |
The Pohlig–Hellman (PH) method assumes that the prime factorization of n = ord
is known. Since d := indg a is unique modulo n, we can easily compute d using the CRT from a knowledge of d modulo
, j = 1, . . . , r. So assume that p is a prime dividing n and that
. Let d0 + d1p + ··· + dα–1pα–1,
, be the p-ary representation of d modulo pα. The p-ary digits d0, d1, . . . , dα–1 can be successively computed as follows.
Let H be the subgroup of G generated by h := gn/p. We have ord H = p (Exercise 2.44). For the computation of di,
, from the knowledge of d0, . . . , di–1, consider the element
But ord(gn/pi+1) = pi+1, so that
Thus,
and di = indh b, that is, each di can be obtained by computing a discrete logarithm in the group H of order p (using the BSGS method or the rho method).
From the prime factorization of n, we see that the computations of d modulo
for all j = 1, . . . , r can be done in time
, q being the largest prime factor of n, since αj and r are O(log n). Combining the values of d modulo
by the CRT can be done in polynomial time (in log n). In the worst case, q = O(n) and the PH method takes time
which is fully exponential in the input size log n. But if q (or, equivalently, all the prime divisors p1, . . . , pr of n) are small, then the PH method runs quite efficiently. In particular, if q = O((log p)c) for some (small) constant c, then the PH method computes discrete logarithms in G in polynomial time. This fact has an important bearing on the selection of a group G for cryptographic applications, namely, n = ord G is required to have a suitably large prime divisor, so that the PH method cannot compute discrete logarithms in G in feasible time.
The index calculus method (ICM) is not applicable to all (cyclic) groups. But whenever it applies, it usually leads to the fastest algorithms to solve the DLP. Several variants of the ICM are used for prime finite fields and also for finite fields of characteristic 2. On such a field
they achieve subexponential running times of the order of L(q, 1/2, c) = L[c] or L(q, 1/3, c) for positive constants c. We start with a generic description of the ICM. We assume that g is a primitive element of
and want to compute d := indg a for some
.
To start with we fix a suitable subset B = {b1, . . . , bt} of
of small cardinality, so that a reasonably large fraction of the elements of
can be expressed easily as products of elements of B. We call B a factor base. In the ICM, we search for relations of the form
Equation 4.5

for integers α, β, γi and δi. This gives us a linear congruence
Equation 4.6

The ICM proceeds in two[4] stages. In the first stage, we compute di := indg bi for each element bi in the factor base B. For that, we collect Relation (4.5) with β = 0. When sufficiently many relations are available, the corresponding system of linear Congruences (4.6) is solved mod q – 1 for the unknowns di. In the second stage, a single relation with gcd(β, q – 1) = 1 is found. Substituting the values of di available from the first stage yields indg a.
[4] Some authors prefer to say that the number of stages in the ICM in actually three, because they decouple the congruence-solving phase from the first stage. This is indeed justified, since implementations by several researchers reveal that for large fields this linear algebra part often demands running time comparable to that needed by the relation collection part. Our philosophy is to call the entire precomputation work the first stage. Now, although it hardly matters, it is up to the reader which camp she wants to join.
Note that as long as q (and g) are fixed, we don’t have to carry out the first stage every time the discrete logarithm of an element of
is to be computed. If the values di, i = 1, . . . , t, are stored, then only the second stage needs to be carried out for computing the indices of any number of elements of
. This is the reason why the first stage of the ICM is often called the precomputation stage.
In order to make the algorithm more concrete, we have to specify:
how to choose a factor base B;
how to find Relation (4.5);
In the rest of this section, we describe variants of the ICM based on their strategies for selecting the factor base and for collecting relations. We discuss the third issue in Section 4.7.
Let
be a finite field of prime cardinality. For cryptographic applications, p should be quite large, say, of length around thousand bits or more, and so naturally p is odd. Elements of
are canonically represented as integers between (and including) 0 and p–1. The equality x = y in
means equality of two integers in the range 0, . . . , p–1, whereas x ≡ y (mod p) means that the two integers x and y may be different, but their equivalence classes in
are the same.
In the basic version of the ICM, we choose the factor base B to comprise the first t primes q1, . . . , qt, where t = L[ζ]. (The optimal value of ζ is determined below.) In the first stage, we choose random values of
and compute gα. Any integer representing gα can be considered, but we think of
as an integer in {1, . . . , p – 1}. We then try to factorize gα using trial divisions by elements of the factor base B. If gα is found to be B-smooth, then we get a desired relation for the first stage, namely,

If gα is not B-smooth, we try another random α and proceed as above. After sufficiently many relations are available, we solve the resulting system of linear congruences modulo p – 1. This gives us di := indg qi for i = 1, . . . , t.
In the second stage, we again choose random integers α and try to factorize agα completely over B. Once the factorization gets successful, that is, we have
, we compute
.
In order to optimize the running time, we note that the relation collection phase of the first stage is usually the bottleneck of the algorithm. If ζ (or equivalently t) is chosen to be too small, then finding B-smooth integers would be very difficult. On the other hand, if ζ is too large, then we have to collect too many relations to have a solvable linear system of congruences. More explicitly, since the integers gα can be regarded as random integers of the order of p, the probability that gα is B-smooth is
(Corollary 4.1). Thus we expect to get each relation after
random values of α are tried. Since for each α we need to carry out L[ζ] divisions by elements of the factor base B (the exponentiation gα can be done in polynomial time and hence can be neglected for this analysis), each relation can be found in expected time
. Now, in order to solve for di, i = 1, . . . , t, we must have (slightly more than) t = L[ζ] relations. Thus, the relation collection phase takes a total time of
. It can be easily checked that
is minimized for ζ = 1/2. This gives a running time of L[2] for the relation collection phase.
Since each gα is a positive integer less than p, it is evident that it can have at most O(log p) prime divisors. In other words, the congruences collected are necessarily sparse. As we will see later, such a system can be solved in time O~(t2), that is, in time L[1] for ζ = 1/2.
In the second stage, it is sufficient to have a single relation to compute d = indg a. As explained before, such a relation can be found in expected time
. Thus the total running time of the basic ICM is L[2].
The second stage of the basic ICM is much faster than the first stage. In fact, this is a typical phenomenon associated with most variants of the ICM. Speeding up the first stage is, therefore, our primary concern.
Each step in the search for relations consists of an exponentiation (gα) modulo p followed by trial divisions by q1, . . . , qt. Now, gα may be non-smooth, but gα + kp (integer sum) may be smooth for some
. Once gα is computed and found to be non-smooth, one can check for the smoothness of gα + kp for k = ±1, ±2, . . ., before another α is tried. Since these integers are available by addition (or subtraction) only (which is much faster than exponentiation), this strategy tends to speed up the relation collection phase. Moreover, information about the divisibility of gα + kp by qi can be obtained from that of gα + (k – 1)p by qi. So using suitable tricks one might reduce the cost of trial divisions. Two such possibilities are explored in Exercise 4.18. Though these modifications lead to some speed-up in practice, they have the disadvantage that as |k| increases, the size of |gα+kp| also increases, so that the chance of getting smooth candidates reduces, and therefore using high values of k does not effectively help.
There are other heuristic modification schemes that help us gain some speed-up in practice. For example, the large prime variation as discussed in connection with the QSM applies equally well here. Another trick is to use the early abort strategy. A random B-smooth integer has higher probability of having many small prime factors rather than a few large prime factors. This observation can be incorporated in the smoothness tests as follows. Let us assume that we do trial divisions by the small primes in the order q1, q2, . . . , qt. After we do trial divisions of a candidate x by the first t′ < t primes (say, t′ ≈ t/2), we check how far we have been able to reduce x. If the reduction of x is already substantial, we continue with the trial divisions by the remaining primes qt′+1, . . . , qt. In the other case, we abort the smoothness test for x and try another candidate. Obviously, this strategy prematurely rejects some smooth candidates (which are anyway rather small in number), but since most candidates are expected to be non-smooth, it saves a lot of trial divisions in the long run. Determination of t′ and/or the quantization of “substantial” reduction actually depend on practical experiences. With suitable choices one may expect to get a speed-up of about 2. The drawback with the early abort strategy is that it often does not go well with sieving. Sieving, whenever applicable, should be given higher preference.
To sum up, the basic ICM and all its modifications can be used for computing discrete logarithms only in small fields, say, of size ≤ 80 bits. For bigger fields, we need newer ideas.
The linear sieve method (LSM) is a direct adaptation of the quadratic sieve method for factoring integers (Section 4.3.2). In the basic ICM just discussed, we try to find smooth integers from candidates that are on an average as large as O(p). The LSM, on the other hand, finds smooth ones from a pool of integers each of which is
. As a result, we expect to have a higher density of smooth integers among the candidates tested in the LSM than those in the basic method. Furthermore, the LSM employs sieving techniques instead of trial divisions. All these help LSM achieve a running time of L[1], a definite improvement over the L[2] performance of the basic method.
Let
and J := H2 – p. Then
. Let’s consider the congruence
Equation 4.7

For small integers c1, c2, the right side of the above congruence, henceforth denoted as
T(c1, c2) := J + (c1 + c2)H + c1c2,
is of the order of
. If the integer T(c1, c2) is smooth with respect to the first t primes q1, q2, . . . , qt, that is, if we have a factorization like
, then we have a relation

For the linear sieve method, the factor base comprises primes less than L[1/2] (so that t = L[1/2] by the prime number theorem) and integers H + c for –M ≤ c ≤ M. The bound M on c is chosen to be of the order of L[1/2]. Each T(c1, c2), being
in absolute value, has a probability of L[–1/2] for being qt-smooth. Thus once we check the factorization of T(c1, c2) for all (that is, for a total of L[1]) values of the pair (c1, c2) with –M ≤ c1 ≤ c2 ≤ M, we expect to get L[1/2] Relations (4.7) involving the unknown indices of the factor base elements. If we further assume that the primitive element g is a small prime which itself is in the factor base, then we get a free relation indg g = 1. The resulting system is then solved to compute the discrete logarithms of elements in the factor base. This is the basic principle for the first stage of the LSM.
If we compute all T(c1, c2) and use trial divisions by q1, . . . , qt to separate out the smooth ones, we achieve a running time of L[1.5], as can be easily seen. Sieving is employed to reduce the running time to L[1]. First one fixes a
and initializes to ln |T(c1, c2)| an array
indexed by c2 in the range c1 ≤ c2 ≤ M. One then computes for each prime power qh, q being a small prime in the factor base and h a small positive exponent, a solution for c2 of the congruence (H + c1)c2 + (J + c1H) ≡ 0 (mod qh).
If gcd(H + c1, q) = 1, that is, if H + c1 is not a multiple of q, then the solution is given by σ ≡ –(J + c1H)(H + c1)–1 (mod qh). The inverse in the last congruence can be calculated by running the extended gcd algorithm (Algorithm 3.8) on H + c1 and qh. Then for each value of c2 (in the range c1 ≤ c2 ≤ M) that is congruent to σ (mod qh), ln q is subtracted from the array location
.
If q|(H + c1), we find out h1 := vq(H + c1) > 0 and h2 := vq(J + c1H) ≥ 0. If h1 > h2, then for each value of c2, the expression T(c1, c2) is divisible by qh2 and by no higher powers of q. So we subtract the quantity h2 ln q from
for all c2. Finally, if h1 ≤ h2, then we subtract h1 ln q from
for all c2 and for h > h1 solve the congruence as
.
Once the above procedure is carried out for each small prime q in the factor base and for each small exponent h, we check for which values of c2, the value
is equal (that is, sufficiently close) to 0. These are precisely the values of c2 such that for the given c1 the integer T(c1, c2) factors smoothly over the small primes in the factor base.
As in the QSM for integer factorization, it is sufficient to have some approximate representations of the logarithms (like ln q). Incomplete sieving and large prime variation can also be adopted as in the QSM.
Finally, we change c1 and repeat the sieving process described above. It is easy to see that the sieving operations for all c1 in the range –M ≤ c1 ≤ M take time L[1] as announced earlier. Gaussian elimination involving sparse congruences in L[1/2] variables also meets the same running time bound.
The second stage of the LSM can be performed in L[1/2] time. Using a method similar to the second stage of the basic ICM leads to a huge running time (L[3/2]), because we have only L[1/2] small primes in the factor base. We instead do the following. We start with a random j and try to obtain a factorization of the form
, where q runs over L[1/2] small primes in the factor base and u runs over medium-sized primes, that is, primes less than L[2]. One can use an integer factorization algorithm to this effect. Lenstra’s ECM is, in particular, recommended, since it can detect smooth integers fast. More specifically, about L[1/4] random values of j need to be tried, before we expect to get an integer with the desired factorization. Each attempt of factorization using the ECM takes time less than L[1/4].
Now, we have
. The indices indg q are available from the first stage, whereas for each u (with wu ≠ 0) the index indg u is calculated as follows. First we sieve in an interval of size L[1/2] around
and collect integers y in this interval, which are smooth with respect to the L[1/2] primes in the factor base. A second sieve in an interval of size L[1/2] around H gives us a small integer c, such that (H + c)yu – p is smooth again with respect to the L[1/2] primes in the factor base. Since H + c is in the factor base, we get indg u. The reader can easily verify that computing individual logarithms indg a using this method takes time L[1/2] as claimed earlier.
There are some other L[1] methods (like the Gaussian integer method and the residue list sieve method) known for computing discrete logarithms in prime fields. We will not discuss these methods in this book. Interested readers may refer to Coppersmith et al. [59] to know about these L[1] methods. A faster method (running time L[0.816]), namely the cubic sieve method, is covered in Exercise 4.21. Now, we turn our attention to the best method known till date.
The number field sieve method (NFSM) for solving the DLP in a prime field
is a direct adaptation of the NFSM used to factor integers (Section 4.3.4). As before, we let g be a generator of
and are interested in computing the index d = indg a for some
.
We choose an irreducible polynomial
with small integer coefficients and of degree d, and use the number field
for some root
of f. For the sake of simplicity, we consider the special case (SNFSM) that f is monic,
is a PID, and
. We also choose an integer m such that f(m) ≡ 0 (mod p) and define the ring homomorphism

Finally, we predetermine a bound
and let
be the set of (rational) primes
,
the set of prime ideals of
of prime norms
,
a set of generators of the (principal) ideals
and
a set of generators of the group of units of
.
We try to find coprime integers c, d of small absolute values such that both c + dα and Φ(c + dα) = c + dm are smooth with respect to
and
respectively, that is, we have factorizations of the forms
and
or equivalently,
. But then
, that is,
Equation 4.8

This motivates us to define the factor base as

We assume that
so that we have the free relation indg g ≡ 1 (mod p – 1).
Trying sufficiently many pairs (c, d) we generate many Relations (4.8). The resulting sparse linear system is solved for the unknown indices of the elements of B. This completes the first stage of the SNFSM.
In the second stage, we bring a to the scene in the following manner. First assume that a is small such that either a is
-smooth, that is,

or for some
the ideal
can be written as a product of prime ideals of
, that is,


In both the cases, taking logarithms and substituting the indices of the elements of the factor base (available from the first stage) yields d = indg a.
However, a is not small, in general, and it is a non-trivial task to find a
such that 〈γ〉 is
-smooth. We instead write a as a product
Equation 4.9

where each ai is small enough so that indg ai can be computed using the method described above. This gives
. In order to see how one can find a representation of a as a product of small integers as in Congruence (4.9), we refer the reader to Weber [300].
As in most variants of the ICM, the running time of the SNFSM is dominated by the first stage and under certain heuristic assumptions can be shown to be of the order of L(p, 1/3, (32/9)1/3). Look at Section 4.3.4 to see how the different parameters can be set in order to achieve this running time. For the general NFS method (GNFSM), the running time is L(p, 1/3, (64/9)1/3). The GNFSM has been implemented by Weber and Denny [301] for computing discrete logarithms modulo a particular prime having 129 decimal digits (see McCurley [189]).
We wish to compute the discrete logarithm indg a of an element
, q = 2n, with respect to a primitive element g of
. We work with the representation
for some irreducible polynomial
with deg f = n. For certain algorithms, we require f to be of special forms. This does not create enough difficulties, since it is easy to compute isomorphisms between two polynomial basis representations of
(Exercise 3.38).
Recall that we have defined the smoothness of an integer x in terms of the magnitudes of the prime divisors of x. Now, we deal with polynomials (over
) and extend the definition of smoothness in the obvious way: that is, a polynomial is called smooth if it factors into irreducible polynomials of low degrees. The next theorem is an analog of Theorem 2.21 for polynomials. By an abuse of notation, we use ψ(·, ·) here also. The context should make it clear what we are talking about – smoothness of integers or of polynomials.
|
Let r, ψ(r, m) := u–u+o(u) = e–[(1+o(1))u ln u]. |
The above expression for ψ(r, m), though valid asymptotically, gives good approximations for finite values of r and m. The condition r1/100 ≤ m ≤ r99/100 is met in most practical situations. The probability ψ(r, m) is a very sensitive function of u = r/m. For a fixed m, polynomials of smaller degrees have higher chances of being smooth (that is, of having all irreducible factors of degrees ≤ m).
Now, let us consider the field
with q = 2n. The elements of
are represented as polynomials of degrees ≤ n–1. For a given m, the probability
that a randomly chosen element of
has all irreducible factors of degrees ≤ m is then approximately given by
, as n, m → ∞ with n1/100 ≤ m ≤ n99/100. We can, therefore, approximate
by ψ(n, m).
For many algorithms that we will come across shortly, we have r ≈ n/α and
for some positive α and β, so that
and, consequently,
The idea of the basic ICM for
is analogous to that for prime fields. Now, the factor base B comprises all irreducible polynomials of
having degrees ≤ m. We choose
. (As in the case of the basic ICM for prime fields, this can be shown to be the optimal choice.) By Approximation (2.5) on p 84, we then have
.
In the first stage, we choose random α, 1 ≤ α ≤ q – 2, compute gα and check if gα is B-smooth. If so, we get a relation. For a random α, the polynomial gα is a random polynomial of degree < n and hence has a probability of nearly
of being smooth. Note that unlike integers a polynomial over
can be factored in probabilistic polynomial time (though for small m it may be preferable to do trial division by elements of B). Thus checking the smoothness of a random element of
can be done in (probabilistic) polynomial time, and each relation is available in expected time
. Since we need (slightly more than)
relations for setting up the linear system, the relation collection stage runs in expected time
. A sparse system with
unknowns can also be solved in time
.
In the second stage, we need a single smooth polynomial of the form gαa. If α is randomly chosen, we expect to get this relation in time
. Therefore, the second stage is again faster than the first and the basic method takes a total expected running time of
. Recall that the basic method for
requires time L[2]. The difference arises because polynomial factorization is much easier than integer factorization.
We now explain a modification of the basic method, proposed by Blake et al. [23]. Let
: that is, a non-zero polynomial in
of degree < n. If h is randomly chosen from
(as in the case of gα or gαa for random α), then we expect the degree of h to be close to n. Let us write h ≡ h1/h2 (mod f) (f being the defining polynomial) with h1 and h2 each having degree ≈ n/2. Then the ratio of the probability that both h1 and h2 are smooth to the probability that h is smooth is ψ(n/2, m)2/ψ(n, m) ≈ 2n/m (neglecting the o( ) terms). For practical values of n and m, this ratio of probabilities can be substantially large implying that it is easier to get relations by trying to factor both h1 and h2 instead of trying to factor h. This is the key observation behind the modification due to Blake et al. [23]. Simple calculations show that this modification does not affect the asymptotic
behaviour of the basic method, but it leads to considerable speed-up in practice.
In order to complete the description of the modification of Blake et al. [23], we mention an efficient way to write h as h1/h2 (mod f). Since 0 ≤ deg h < n and since f is irreducible of degree n, we must have gcd(h, f) = 1. During the iteration of the extended gcd algorithm we actually compute a sequence of polynomials uk, vk, xk such that ukh + vkf = xk for all k = 0, 1, 2, . . . . At the start of the algorithm we have u0 = 1, v0 = 0 and x0 = h. As the algorithm proceeds, the sequence deg uk changes non-decreasingly, whereas the sequence deg xk changes non-increasingly and at the end of the extended gcd algorithm we have xk = 1 and the desired Bézout relation ukh + vkf = 1 with deg uk ≤ n – 1. Instead of proceeding till the end of the gcd loop, we stop at the value k = k′ for which deg xk′ is closest to n/2. We will then usually have deg uk′ ≈ n/2, so that taking h1 = xk′ and h2 = uk′ serves our purpose.
The concept of large prime variation is applicable for the basic ICM. Moreover, if trial divisions are used for smoothness tests, one can employ the early abort strategy. Despite all these modifications the basic variant continues to be rather slow. Our hunt for faster algorithms continues.
The LSM for prime fields can be readily adapted to the fields
, q = 2n. Let us assume that the defining polynomial f is of the special form f(X) = Xn + f1(X), where deg f1 is small. The total number of choices for such f with deg f1 < k is 2k. Under the assumption that irreducible polynomials (over
) of degree n are randomly distributed among the set of polynomials of degree n, we expect to find an irreducible polynomial f = Xn + f1 for deg f1 = O(lg n) (see Approximation (2.5) on p 84). In particular, we may assume that deg f1 ≤ n/2.
Let k := ⌈n/2⌉ and
. For polynomials h1,
of small degrees, we then have
(Xk + h1)(Xk + h2) ≡ Xσf1 + (h1 + h2)Xk + h1h2 (mod f).
The right side of the congruence, namely,
T(h1, h2) := Xσf1 + (h1 + h2)Xk + h1h2,
has degree slightly larger than n/2. This motivates the following algorithm.
We take
and let the factor base B be the (disjoint) union of B1 and B2, where B1 contains irreducible polynomials of degrees ≤ m, and where B2 contains polynomials of the form Xk + h, deg h ≤ m. Both B1 and B2 (and hence B) contain L[1/2] elements. For each Xk + h1,
, we then check the smoothness of T(h1, h2) over B1. Since deg T(h1, h2) ≈ n/2, the probability of finding a smooth candidate per trial is L[–1/2]. Therefore, trying L[1] values of the pair (h1, h2) is expected to give L[1/2] relations (in L[1/2] variables). Since factoring each T(h1, h2) can be performed in probabilistic polynomial time, the relation collection stage takes time L[1]. Gaussian elimination (with sparse congruences) can be done in the same time. As in the case of the LSM for prime fields, the second stage can be carried out in time L[1/2]. To sum up, the LSM for fields of characteristic 2 takes L[1] running time.
Note that the running time L[1] is achievable in this case without employing any sieving techniques. This is again because checking the smoothness of each T(h1, h2) can simply be performed in polynomial time. Application of polynomial sieving, though unable to improve upon the L[1] running time, often speeds up the method in practice. We will describe such a sieving procedure in connection with Coppersmith’s algorithm that we describe next.
Coppersmith’s algorithm is the fastest algorithm known to compute discrete logarithms in finite fields
of characteristic 2. Theoretically it achieves the (heuristic) running time L(q, 1/3, c) and is, therefore, subexponentially faster than the L[c′] = L(q, 1/2, c′) algorithms described so far. Gordon and McCurley have made aggressive attempts to compute discrete logarithms in fields as large as
using Coppersmith’s algorithm in tandem with a polynomial sieving procedure and, thereby, established the practicality of the algorithm.
In the basic method, each trial during the search for relations involves checking the smoothness of a polynomial of degree nearly n. The modification due to Blake et al. [23] replaces this by checking the smoothness of two polynomials of degree ≈ n/2. For the adaptation of the LSM, on the other hand, we check the smoothness of a single polynomial of degree ≈ n/2. In Coppersmith’s algorithm, each trial consists of checking the smoothness of two polynomials of degrees ≈ n2/3. This is the basic reason behind the improved performance of Coppersmith’s algorithm.
To start with we make the assumption that the defining polynomial f of
is of the form f(X) = Xn + f1(X) with deg f1 = O(lg n). We have argued earlier that an irreducible polynomial f of this special form is expected to be available. We now choose three integers m, M, k such that
m ≈ αn1/3(ln n)2/3, M ≈ βn1/3(ln n)2/3 and 2k ≈ γn1/3(ln n)–1/3,
where the (positive real) constants α, β and γ are to be chosen appropriately to optimize the running time. The factor base B comprises irreducible polynomials (over
) of degrees ≤ m. Let
l := ⌊n/2k⌋ + 1,
so that l ≈ (1/γ)n2/3(ln n)1/3. Choose relatively prime polynomials u1(X) and u2(X) (in
) of degrees ≤ M and let
h1(X) := u1(X)Xl + u2(X) and h2(X) := (h1(X))2k rem f(X).
But then, since indg h2 ≡ 2k indg h1 (mod q – 1), we get a relation if both h1 and h2 are smooth over B. By choice, deg h1 is clearly O~(n2/3), whereas
h2(X)≡ u1(X2k)Xl2k + u2(X2k)≡ u1(X2k)Xl2k–nf1(X) + u2(X2k)(mod f)
and, therefore, deg h2 = O~(n2/3) too.
For each pair (u1, u2) of relatively prime polynomials of degrees ≤ M, we compute h1 and h2 as above and collect all the relations corresponding to the smooth values of both h1 and h2. This gives us the desired (sparse) system of linear congruences in the unknown indices of the elements of B, which is subsequently solved modulo q – 1.
The choice
and γ = α–1/2 gives the optimal running time of the first stage as
e[(2α ln 2)+o(1))n1/3(ln n)2/3] = L(q, 1/3, 2α/(ln 2)1/3) ≈ L(q, 1/3, 1.526).
The second stage of Coppersmith’s algorithm is somewhat involved. The factor base now contains only nearly L(q, 1/3, 0.763) elements. Therefore, finding a relation using a method similar to the second stage of the basic method requires time L(q, 2/3, c) for some c, which is much worse than even L[c′] = L(q, 1/2, c′). To work around this difficulty we start by finding a polynomial gαa all of whose irreducible factors have degrees ≤ n2/3(ln n)1/3. This takes time of the order of L(q, 1/3, c1) (where c1 ≈ 0.377) and gives us
, where vi have degrees ≤ n2/3(ln n)1/3. Note that the number of vi is less than n, since deg(gαa) < n. We then have

All these vi need not belong to the factor base, so we cannot simply substitute the values of indg vi. We instead reduce the problem of computing each indg vi to the problem of computing indg vii′ for several i′ with deg vii′ ≤ σ deg vi for some constant 0 < σ < 1. Subsequently, computing each indg vii′ is reduced to computing indg vii′i″ for several i″ with deg vii′i″ < σ deg vii′. Repeating this process, we eventually end up with the polynomials in the factor base. Because reduction of a polynomial generates new polynomials with degrees reduced by at least the constant factor σ, it is clear that the recursion depth is O(ln n). Now, if for each i the number of i′ is ≤ n and for each i′ the number of i″ is ≤ n and so on, we have to carry out the reduction of ≤ nO(ln n) = eO((ln n)2) = L(q, 1/3, 0) polynomials. Therefore, if each reduction can be performed in time L(q, 1/3, c2), the second stage will run in time L(q, 1/3, max(c1, c2)).
In order to explain how a polynomial v of degree ≤ d ≤ n2/3(ln n)1/3 can be reduced in the desired time, we choose
such that
, and let l := ⌊n/2k⌋ + 1. As in the first stage, we fix a suitable bound M, choose relatively prime polynomials u1(X), u2(X) of degrees ≤ M and define
h1(X) := u1(X)Xl + u2(X)
and
h2(X) := (h1(X))2k rem f(X) = u1(X2k)Xl2k–nf1(X) + u2(X2k).
The polynomials u1 and u2 should be so chosen that v|h1. We see that h1 and h2 have low degrees and we try to factor h1/v and h2. Once we get a factorization of the form

with deg vi, deg wj < σ deg v, we have the desired reduction of v, namely,

that is, the reduction of computation of indg v to that of all indg vi and indg wj. With the choice M ≈ (n1/3(ln n)2/3(ln 2)–1 + deg v)/2 and σ = 0.9, reduction of each polynomial can be shown to run in time L(q, 1/3, (ln 2)–1/3) ≈ L(q, 1/3, 1.130). Thus the second stage of Coppersmith’s algorithm runs in time L(q, 1/3, 1.130) and is faster than the first stage.
Large prime variation is a useful strategy to speed up Coppersmith’s algorithm. In case of trial divisions for smoothness tests, early abort strategy can also be applied. However, a more efficient idea (though seemingly non-collaborative with the early abort strategy) is to use polynomial sieving as introduced by Gordon and McCurley.
Recall that in the first stage we take relatively prime polynomials u1 and u2 of degrees ≤ M and check the smoothness of both h1(X) = u1(X)Xl + u2(X) and h2(X) = h1(X)2k rem f(X). We now explain the (incomplete) sieving technique for filtering out the (non-)smooth values of h1 = (h1)u1,u2 for the different values of u1 and u2. To start with we fix u1 and let u2 vary. We need an array
indexed by u2, a polynomial of degree ≤ M. Clearly, u2 can assume 2M+1 values and so
must contain 2M+1 elements. To be very concrete we will denote by
the location
, where u2(2) ≥ 0 is the integer obtained canonically by substituting 2 for X in u2(X) considered to be a polynomial in
with coefficients 0 and 1. We initialize all the locations of
to zero.
Let t = t(X) be a small irreducible polynomial in the factor base B (or a small power of such an irreducible polynomial) with δ := deg t. The values of u2 for which t divides (h1)u1,u2 satisfy the polynomial congruence u2(X) ≡ u1(X)Xl (mod t). Let
be the solution of this congruence with
. If δ* > M, then no value of u2 corresponds to smooth (h1)u1,u2. So assume that δ* ≤ M. If δ > M, then the only value of u2 for which (h1)u1,u2 is smooth is
. So we may also assume that δ ≤ M. Then the values of u2 that makes (h1)u1,u2 smooth are given by
for all polynomials v(X) of degrees ≤ M – δ. For each of these 2M–δ+1 values of u2, we add δ = deg t to the location
.
When the process mentioned in the last paragraph is completed for all
, we find out for which values of u2 the array locations
contain values close to deg(h1)u1,u2. These values of u2 correspond to the smooth values of (h1)u1,u2 for the chosen u1. Finally, we vary u1 and repeat the sieving procedure again.
In each sieving process described above, we have to find out all the values
as v runs through all polynomials of degrees ≤ M – δ. We may choose the different possibilities for v in any sequence, compute the products vt and then add these products to
. While doing so serves our purpose, it is not very efficient, because computing each u2 involves performing a polynomial multiplication vt. Gordon and McCurley’s trick steps through all the possibilities of v in a clever sequence that helps one get each value of u2 from the previous one by a much reduced effort (compared to polynomial multiplication). The 2M–δ+1 choices of v can be naturally mapped to the bit strings of length (exactly) M – δ + 1 (with the coefficients of lower powers of X appearing later in the sequence). This motivates using the following concept.
|
Let
where juxtaposition denotes string concatenation. |
For example, the gray code of dimension 2 is 00, 01, 11, 10 and that of dimension 3 is 000, 001, 011, 010, 110, 111, 101, 100. Proposition 4.1 can be easily proved by induction on the dimension d.
|
Let |
Back to our sieving business! Let us agree to step through the values of v in the sequence v1, v2, . . . , v2M – δ+1, where vi corresponds to the bit string
for the (M – δ + 1)-dimensional gray code. Let us also call the corresponding values of u2 as
. Now, v1 is 0 and the corresponding
is available at the beginning. By Proposition 4.1 we have for 1 ≤ i < 2M–δ+1 the equality vi+1 = vi + Xv2(i), so that (u2)i+1 = (u2)i + Xv2(i)t. Computing the product Xv2(i)t involves shifting the coefficients of t and is done efficiently using bit operations only (assuming data structures introduced in Section 3.5). Thus (u2)i+1 is obtained from (u2)i by a shift followed by a polynomial addition. This is much faster than computing (u2)i+1 directly as
.
We mentioned earlier that efficient implementations of Coppersmith’s algorithm allows one to compute, in feasible time, discrete logarithms in fields as large as
. However, for much larger fields, say for n ≥ 1024, this algorithm is still not a practical breakthrough. The intractability of the DLP continues to remain cryptographically exploitable.
| 4.15 | Binary search Let ≤ be a total order on a set S (finite or infinite) and let a1 ≤ a2 ≤ ··· ≤ am be a given sequence of elements of S. Device an algorithm that, given an arbitrary element |
| 4.16 | |
| 4.17 | Let p be a prime and g a primitive element of . For a , prove the explicit formula (mod p). What is the problem in using this formula for computing indices in ?
|
| 4.18 | In the basic ICM for the prime field , we try to factor random powers gα over the factor base B = {q1, . . . , qt}. In addition to the canonical representative of gα in the set {1, . . . , p – 1}, one can also check for the smoothness of the integers gα + kp for –M ≤ k ≤ M, where M is a small positive integer (to be determined experimentally).
|
| 4.19 |
|
| 4.20 | Consider the following modification of the LSM for . Define for the integers and . Choose a small and repeat the linear sieve method for each r, 1 ≤ r ≤ s, that is, check the smoothness (over the first t = L[1] primes) of the integers Tr(c1, c2) := Jr + (c1 + c2)Hr + c1c2 for all 1 ≤ r ≤ s, –μ ≤ c1 ≤ c2 ≤ μ. Let be the average of |Tr(c1, c2)| over all choices of r, c1 and c2. Show that , where is as defined in Exercise 4.19. In particular, for both the choices: (1) and (2) μ = ⌊M/s⌋, that is, on an average we check smaller integers for smoothness under this modified strategy. Determine the size of the factor base and the total number of integers Tr(c1, c2) checked for smoothness for the two values of μ given above.
|
| 4.21 | Cubic sieve method (CSM) for
|
| 4.22 | The problem with the CSM is that it is not known how to efficiently compute a solution of the congruence
Equation 4.10
subject to the condition that x3 ≠ y2z and x, y, z = O(pξ) for 1/3 ≤ ξ < 1/2. In this exercise, we estimate the number of solutions of Congruence (4.10).
|
| 4.23 | Adaptation of CSM for |
Unlike the finite field DLP, there are no general-purpose subexponential algorithms to solve the ECDLP. Though good algorithms are known for certain specific types of elliptic curves, all known algorithms that apply to general curves take fully exponential time. The square root methods of Section 4.4 are the fastest known methods for solving the ECDLP over an arbitrary curve. As a result, elliptic curves are gaining popularity for building cryptosystems. The absence of subexponential algorithms implies that smaller fields can be chosen compared to those needed for cryptosystems based on the (finite field) DLP. This, in particular, results in smaller sizes of keys.
We start with Menezes, Okamoto and Vanstone’s (MOV) algorithm that reduces the ECDLP in a curve over
to the DLP over the field
for some suitable
. Since, the DLP can be solved in subexponential time, the ECDLP is also solved in that time, provided that the extension degree
is small. For supersingular curves, one can choose k ≤ 6. For non-supersingular curves, this k is quite large, in general, and the MOV reduction takes exponential time.
A linear-time algorithm is known to solve the ECDLP over anomalous curves (that is, curves with trace of Frobenius equal to 1). This algorithm is called the SmartASS method after its inventors Smart, Araki, Satoh and Semaev [257, 265, 282].
J. H. Silverman [277] has proposed an algorithm known as the xedni calculus method for solving the ECDLP over an arbitrary curve. Rigorous running times of this algorithm are not known, however heuristic analysis and experiments suggest that this algorithm is not really practical.
Let E be an elliptic curve over a finite field
and let
be of order m. We want to compute indP Q (if it exists) for a point
. Unless it is necessary, we will not assume any specific defining equation for E or a specific value of q.
Let us first look at the structure of the group
of m-torsion points on an elliptic curve defined over K. Here
is the algebraic closure of K.
|
Let K be a field of characteristic
|
Now, let E be an elliptic curve defined over a finite field K of characteristic p. Let
with gcd(m, p) = 1. We use the shorthand notation E[m] for
(and not for EK[m]). We want to define a function
em : E[m] × E[m] → μm,
where
is the group of m-th roots of unity (Exercise 4.24). This function em, known as the Weil pairing, helps us reduce the ECDLP in
to the DLP in a suitable field
. Let P,
. The definition of em(P, R) calls for using divisors on E. Recall from Exercise 2.125 that a divisor
belongs to
(that is, is the divisor of a rational function on E) if and only if
and
. Since
, there is a rational function
such that
. Now,
as well and p m2. Hence, by Theorem 4.2 there exists a point R′ of order m2 such that R = mR′. Since, #E[m] = m2, it follows that
and, therefore, there exists a rational function
with
. The functions f and g as introduced above are unique up to multiplication by elements of
. One can show that we can choose f and g in such a manner that f ο λm = gm, where
is the multiplication map Q ↦ mQ. Then for
and
we have gm(P + U) = f(mP + mU) = f(mU) = gm(U). Since g has only finitely many poles and zeros (whereas
is infinite), we can choose U such that both g(U) and g(P + U) are defined and non-zero. For such a point U, we then have
and define
em(P, R) := g(P + U)/g(U).
The right side can be shown to be independent of the choice of U. The relevant properties of the Weil pairing em are now listed.
|
Let P, P′, R,
| |||||||||||||||||||||
The above definition of em is not computationally effective. We will see later how we can compute em(P, T) in probabilistic polynomial time using an alternative (but equivalent) definition.
Algorithm 4.7 shows how the MOV reduction algorithm makes use of Weil pairing. We now clarify the subtle details of this algorithm.
|
Input: A point Output: The index indP Q, that is, an integer l with Q = lP. Steps: Choose the smallest |
From the bilinearity of the Weil pairing, it follows that if Q = lP, 0 ≤ l < m, then β = em(Q, R) = em(lP, R) = em(P, R)l = αl. Thus treating indα β as the least nonnegative integer modulo ord α we conclude that l = indα β if and only if ord α = m, that is, α is a primitive m-th root of unity. That α is an m-th root of unity for any
is obvious from the definition of em. We now show that there exists some
for which α = em(P, R) is primitive.
|
Let Proof If R1 + 〈P〉 = R2 + 〈P〉, then R1 = R2 + rP for some integer r and so by bilinearity and identity of Weil pairing em(P, R1) = em(P, R2)em(P, P)r = em(P, R2). Conversely, let em(P, R1) = em(P, R2). By Theorem 4.2, |
As an immediate corollary to Lemma 4.1, the desired result follows.
|
Let
Then #S/#E[m] = φ(m)/m. In particular, S is non-empty. Proof There are m distinct cosets of 〈P〉 in E[m]. Now, as R ranges over all points of E[m], the coset R+〈P〉 ranges over all of these m possibilities and, accordingly by Lemma 4.1 the value em(P, R) ranges over m distinct values. Since μm is cyclic of order m and hence with φ(m) generators, the theorem follows. |
By Theorem 3.1, one should try an expected number of O(ln ln m) random points
before a primitive m-th root α = em(P, R) is found.
Since E[m] consists of finitely many (m2) points, it is obvious that there exist finite values of k such that
. It can also be shown that if
, then
that is,
for all P,
. The computation of the discrete logarithm indα β is then carried out in
. For Algorithm 4.7 to be efficient, one requires k to be rather small. However, for most curves, k is rather large implying that the MOV reduction is impractical for these curves. For the specific class of curves, the so-called supersingular curves, one can choose k to be rather small, namely k ≤ 6. We don’t go to the details of the choices for k for various cases of supersingular curves, but refer the reader to Menezes [192].
We start with an alternative definition of the Weil pairing
for P,
. First note that if
is a divisor and if
is a rational function on E such that for every pole or zero T of f one has mT = 0 (that is, such that Div(f) and T have disjoint supports), then one can define

Choose points U,
(where
) and consider the divisors DP := [P + U] – [U] and DR := [R + V] – [V]. Since
) is infinite, one can choose both P + U and U distinct from R + V and V. Since P,
, it follows that mDP and mDR are principal, namely, there are rational functions fP and fR such that Div(fP) = mDP = m[P + U] – m[U] and Div(fR) = mDR = m[R + V] – m[V]. One can show that
Equation 4.11

independent of the choice of U and V as long as fP (DR) and fR(DP) are defined. Therefore, em(P, R) can be computed efficiently, if fP and fR can be computed efficiently. To this effect we now describe an algorithm for computing the rational function f of a principal divisor
, where
. Since deg
, we can write
. Suppose that we have an Algorithm A that, for a pair of reduced divisors

and

computes the sum (a reduced divisor)

Then, f can be computed by repeated application of Algorithm A as follows.
Compute for each i = 1, . . . , r the reduced divisor
. Let 1 = ai1, ai2, . . . , aiti = |mi| be an addition chain for |mi| (Exercise 3.18). Clearly, ti – 1 applications of Algorithm A computes Δi. Since we can choose ti ≤ 2 ⌈lg |mi|⌉, each Δi can be computed using O(log |mi|) applications of Algorithm A.
Compute f by computing D = Div(f) = Δ1 + ··· + Δr. This can be done by applying Algorithm A a total of r – 1 times.
What remains is the description of Algorithm A that computes P3 and f3 from a knowledge of P1, P2, f1 and f2. Clearly, if
, then we have P3 = P2 and f3 = f1f2. Similar is the case for
. So assume
and
. Let l1 be the line passing through P1 and P2 and P′ := –(P1 + P2). First, assume that
. By Exercise 2.125, we have
. Let l2 be the (vertical) line passing through P′ and –P′. Again by Exercise 2.125, we have
. But then
, that is, we take P3 = –P′ = P1+P2 and f3 = f1f2l1/l2. Finally, if
, then
and, therefore,
. Thus, in this case too, we take
and f3 = f1f2l1/l2 with l2 := 1.
Before we finish the description of the MOV reduction, some comments are in order. First note that if f1,
and P1,
, then both l1 and l2 are in K(E) and the computation of f3 and P3 can be carried out by working in K only.
Second, consider the (general) case
. Since
, the rational function f3 has poles and is, therefore, undefined only at the points P3 and
. f3 is certainly defined at –P3, but l2(–P3) = 0 and, therefore, evaluating f3(–P3) as (f1f2l1)(–P3)/l2(–P3) fails. Of course, there is a rational function g such that both f1f2l1g and l2g are defined and non-zero at –P3, but finding such a rational function is an added headache. So we choose to continue to have the representation f3 = f1f2l1/l2 and agree not to evaluate f3 at –P3. Recall from Equation (4.11) that we want to evaluate fP at DR (that is, at R + V and V) and also fR and DP (that is, at P + U and U). Let us assume that we use the addition chain 1 = a1, a2, . . . , at = m for m. This means that we cannot evaluate fP at the points ±ai(P + U) and ±aiU for all i = 1, . . . , t. Therefore, V should be chosen such that both R + V and V are not one of these points. Similar constraints dictate the choice of U. However, if m is sufficiently large (m ≥ 1024) and if we choose an addition chain of length t ≤ 2 ⌈lg m⌉, then it can be easily seen that for a random choice of (U, V) the evaluation of fP (DR) or fR(DP) fails with a probability of no more than 1/2. Therefore, few random choices of (U, V) are expected to make the algorithm work. This is the only place where a probabilistic behaviour of the algorithm creeps in. In practice, however, this is not a serious problem, since we have much larger values of m (than 1024) and accordingly the above probability of failure becomes negligibly small.
Finally, note that if we multiply the factors f1, f2 and l1 in the numerator, then the coefficients of the numerator grow very rapidly, when the algorithm is applied repeatedly. Thus we prefer to keep the numerator in the factored form. The same applies to the denominator as well.
The SmartASS method, named after its inventors Smart [282], Satoh and Araki [257] and Semaev [265], is also called the anomalous attack to solve the ECDLP, since it is applicable to anomalous elliptic curves. Let
be a finite field of odd prime cardinality p and E an elliptic curve defined over
. We assume that E is anomalous: that is, the trace of Frobenius of E at p is 1; that is,
. Since p is prime, the group
is cyclic and, in particular, isomorphic to the additive group (
, +). This isomorphism is effectively exploited by the SmartASS method to give a polynomial time algorithm for computing ECDLP in the group
.
Before proceeding further we introduce some auxiliary results. Recall (Exercise 2.133) that a local PID is called a discrete valuation ring (DVR). Now, we see an equivalent definition of a DVR, that gives a justification to its name.
|
A discrete valuation on a field K is a surjective group homomorphism
such that for every a,
is a ring called the valuation ring of v. |
A DVR can be characterized as follows:
|
Let R be an integral domain and let K := Q(R) be the field of fractions of R. Then R is a DVR if and only if there exists a discrete valuation Proof [if] By definition, Let By definition, |
Recall that the ring
of p-adic integers (Definition 2.111) is a DVR. The field
of fractions of
is called the field of p-adic numbers. We now explicitly describe a valuation v on
of which
is the valuation ring. Let the p-adic expansion (Exercises 2.144 and 2.145) of a p-adic integer α be
Equation 4.12

A rational integer can be naturally viewed as a p-adic integer with finitely many nonzero terms, that is, one for which ki = 0 except for finitely many
. However, a p-adic integer with infinitely many non-zero ki does not correspond to a rational integer. If in Expansion (4.12) we have k1 = k2 = ··· = kr–1 = 0, we can write
α = pr(kr + kr+1p + kr+2p2 + ···).
A p-adic integer is, in general, an infinite series and a representation with finite precision looks like
k0 + k1p + k2p2 + ··· + ksps + O(ps+1).
Arithmetic on p-adic numbers is done like integers written in base p, but from left to right. Thus, for example, if one wants to add two p-adic integers k0 + k1p + k2p2 + ... and
, one may add the base-p integers ... k2k1k0 and
in the usual manner till the desired level of precision. A p-adic integer α = k0 + k1p k2p2 + ··· is invertible (in
) if and only if k0 ≠ 0 (Proposition 2.52).
An element
also has a p-adic expansion, but in this case one has to allow terms involving a finite number of negative exponents of p. That is to say, we have an expansion of the form
β = k–tp–t + k–t+1p–t+1 + ··· + k–1p–1 + k0 + k1p + k2p2 + ···
or
β = p–t(k–t + k–t+1p + ··· + k–1pt–1 + k0pt + k1pt+1 + k2pt+2 + ···).
Of course, if k–t = k–t+1 = ··· = k–1 = 0, then β is already in
.
From the arguments above, it follows that any non-zero
can be written uniquely as γ = pδ(γ0 + γ1p + γ2p2 + ···) with
and γ0 ≠ 0. We then set v(γ) := δ. It is easy to see that v defines a discrete valuation on
of which
is the valuation ring. Moreover, since γ0 + γ1p + γ2p2 + ··· is a unit in
, p = 0 + 1 · p + 0 · p2 + ··· plays the role of a uniformizer of the DVR
. As usual, we write v(0) = +∞.
Now, back to our ECDLP business. Let E be an elliptic curve defined over
. Here we consider the case that E is anomalous. We can naturally think of E as a curve over the field
as well and denote this curve by ε. The coordinate-wise application of the canonical surjection
induces the reduction homomorphism
. Now, we define the following subgroups of
:

It can be shown that
is a subgroup of
and
is a subgroup of
. Furthermore, since E is anomalous, we have

Now, let
and Q a point in the subgroup of
generated by P. Our purpose is to find an integer l such that Q = lP. Let
,
be such that
and
. It is not difficult to find such points
and
. For example, if P = (a, b), we can take
, where b0 = b and b1, b2, . . . are successively obtained by Hensel lifting.
Since
, the point
and, therefore,
. Now, if we take the so-called p-adic elliptic logarithm ψp on both sides, we get
(mod p2), whence it follows that

provided that
is invertible modulo p. The function ψp can be easily calculated. Therefore, this gives a very efficient probabilistic algorithm for computing discrete logarithms over anomalous elliptic curves. Here the most time-consuming step is the linear-time computation of the points p
and p
. For further details on the algorithm (like the computation of
and
from P and Q, and the definition of p-adic elliptic logarithms), see Blake et al. [24] and Silverman [275].
Joseph Silverman’s xedni calculus method (XCM) is a recent algorithm for solving the ECDLP in an arbitrary elliptic curve over a finite field. The algorithm is based on some deep mathematical conjectures and heuristic ideas. However, its performance has been experimentally established to be poor. Here we give a sketchy description of the XCM. For simplicity, we concentrate on elliptic curves over prime fields
only.
The basic idea of the XCM is to lift an elliptic curve E over
to a curve ε over
. In view of this, we start with a couple of important results regarding elliptic curves over
(or, more generally, over a number field). See Silverman [275], for example, for the proofs.
Let ε be an elliptic curve defined over a number field K.
|
The group ε(K) is finitely generated. The group structure of ε(K) is made explicit by the next theorem. Note that the elements of ε(K) of finite order form a subgroup εtors(K) of ε(K), called the torsion subgroup of ε(K) (Exercise 4.26). |
|
The non-negative integer ρ of Theorem 4.4 is called the rank of ε(K). Now, let E be an elliptic curve defined over a prime field The basic idea of the XCM is to select r points
|
We start by fixing an integer r, 4 ≤ r ≤ 9. We then choose r random pairs (si, ti) of integers and compute the points

We now apply a change of coordinates of the form
Equation 4.13

so that the first four of the points Rp,i become Rp,1 = [1, 0, 0], Rp,2 = [0, 1, 0], Rp,3 = [0, 0, 1] and Rp,4 = [1, 1, 1]. This change of coordinates fails if some three of the four points Rp,1, Rp,2, Rp,3 and Rp,4 sum to
. But in that case the desired index indP Q can be computed with high probability. If, for example,
, then we have (s1 + s2 + s3)P = (t1 + t2 + t3)Q and, therefore, if gcd(t1 + t2 + t3, n) = 1, then indP Q ≡ (t1 + t2 + t3)–1(s1 + s2 + s3) (mod n). On the other hand, if gcd(t1 + t2 + t3, n) ≠ 1, we repeat with a different set of pairs (si, ti).
Henceforth, we assume that the change of coordinates, as given in Equation (4.13), is successful. This transforms the equation for E to a general cubic equation:
Cp : up,1X3 + up,2X2Y + up,3XY2 + up,4Y3 + up,5X2Z + up,6XY Z + up,7Y2Z + up,8XZ2 + up,9Y Z2 + up,10Z3 = 0.
Now, we carry out a step that heuristically ensures that the curve ε over
(that we are going to construct) has a small rank. We choose a product M of small primes with p M, a cubic curve
CM : uM,1X3 + uM,2X2Y + uM,3XY2 + uM,4Y3 + uM,5X2Z + uM,6XYZ + uM,7Y2Z + uM,8XZ2 + uM,9Y Z2 + uM,10Z3 ≡ 0 (mod M)
over
and points RM,1, . . . , RM,r on CM and with coordinates in
. The first four points should be RM,1 = [1, 0, 0], RM,2 = [0, 1, 0], RM,3 = [0, 0, 1] and RM,4 = [1, 1, 1]. We have to ensure also that for every prime divisor q of M, the matrix B(RM,1, . . . , RM,r) has maximal rank modulo q. In practice, it is easier to choose the points RM,1, . . . , RM,r first and then compute a curve CM passing through these points by solving a set of linear equations in the coefficients uM,1, . . . , uM,10 of CM. The curve CM should be so chosen that it has the minimum possible number of solutions modulo M. This, in conjunction with some deep conjectures in the theory of elliptic curves, guarantees that the curve ε that we will construct shortly will have a rank less than the expected value.
We now combine the curves Cp and CM as follows. Using the Chinese remainder theorem, we compute integers
such that
(mod p) and
(mod M) for each i = 1, . . . , 10. Similarly, we compute points R1, . . . , Rr with integer coefficients such that Ri ≡ Rp,i (mod p) and Ri ≡ RM,i (mod M) for each i = 1, . . . , r, where congruence of points stands for coordinate-wise congruence. Here we have R1 = [1, 0, 0], R2 = [0, 1, 0], R3 = [0, 0, 1] and R4 = [1, 1, 1].
Clearly, the points R1, . . . , Rr are lifts of the points Rp,1, . . . , Rp,r respectively, whereas the cubic curve

over
is a lift of E. However,
, treated as a curve over
, need not pass through the points R1, . . . , Rr. In order to ensure this last condition, we modify the coefficients
of
to the (small integer) coefficients u1, . . . , u10 by solving the system of linear equations

subject to the condition that
(mod pM) for each i = 1, . . . , 10. The resulting cubic curve
C : u1X3 + u2X2Y + u3XY2 + u4Y3 + u5X2Z + u6XYZ + u7Y2Z + u8XZ2 + u9Y Z2 + u10Z3 = 0
over
evidently continues to be a lift of E.
Now, we apply a change of coordinates in order to transfer
to the standard Weierstrass equation
ε : Y2 + a1XY + a3Y = X3 + a2X2 + a4X + a6
with integer coefficients ai. This transformation changes the points R1, . . . , Rr to the points S1, . . . , Sr. One should also ensure that
.
Finally, we check if S2, . . . , Sr are linearly dependent. If so, we determine a (non-trivial) relation
with
. This corresponds to the relation
, where n1 := –(n2 + ··· + nr), that is, sP = tQ with s := n1s1 + ··· + nrsr and t := n1t1 + ··· + nrtr. If gcd(t, n) = 1, we have indP Q ≡ t–1s (mod n).
On the other hand, if S2, . . . , Sr are linearly independent or if gcd(t, n) > 1, then the lifted data fail to compute indP Q. In that case, we repeat the entire process by selecting new pairs (si, ti) and/or new points RM,1, . . . , RM,r.
This completes our description of the XCM. See Silverman [277] for further details. No rigorous or heuristic analysis of the running time of the XCM is available in the literature. Practical experience (reported in Jacobson et al. [139]) shows that the algorithm is rather impractical. The predominant cause for failure of a trial of the XCM is that the probability that the points S2, . . . , Sr are linearly dependent is amazingly low. Suitable choices of the curve CM help us to construct curves ε of low rank, but not low enough, in general, to render S2, . . . , Sr linearly dependent. Larger values of r are expected to increase the probability of success in each trial, but it is not clear how to handle the values r > 9. Nevertheless, the XCM is a radically new idea to solve the ECDLP. As Joseph Silverman [277] says, “some of the ideas may prove useful in future work on ECDLP”.
| 4.24 | Let K be a field, and . Elements of μm are called the m-th roots of unity. Prove the following assertions.
|
| 4.25 | We use the notations of the last exercise and assume that #μm = m, that is, either char K = 0 or p := char K > 0 is coprime to m. In this case, a generator of μm is called a primitive m-th root of unity. If is a primitive m-th root of unity and ωr = 1 for some , then evidently m|r. In particular, m is the smallest of the exponents such that ωr = 1. The (monic) polynomial
where the product runs over all primitive m-th roots of unity, is called the m-th cyclotomic polynomial (over K). Clearly, deg Φm(X) = φ(m) (where φ is Euler’s totient function).
|
| 4.26 |
|
The hyperelliptic curve discrete logarithm problem (HECDLP) has attracted less research attention than the ECDLP. Surprisingly, however, there exist subexponential (index calculus) algorithms for solving the HECDLP over curves of large genus. Adleman, DeMarrais and Huang first proposed such an algorithm [2] (which we will refer to as the ADH algorithm). Enge [86] suggested some modifications of the ADH algorithm and provided rigorous analysis of its running time. Gaudry [105] simplified the ADH algorithm and even implemented it. Gaudry’s experimentation suggests that it is feasible to compute discrete logarithms in Jacobians of almost cryptographic sizes, given that the genus of the underlying curve is high (say ≥ 6). Enge and Gaudry [87] proved rigorously that as long as the genus g is greater than ln q (
being the field over which the curve is defined), the ADH algorithm (and its improvements) run in time L(qg, 1/2,
).
In what follows, we outline Gaudry’s version of the ADH algorithm and refer to this as the ADH–Gaudry algorithm. Let C : Y2 + u(X)Y = v(X) be a hyperelliptic curve of genus g defined over a finite field
. We assume that the cardinality of the Jacobian
is known and has a suitably large prime divisor m. We assume further that a reduced divisor
of order m is available, and we want to compute the discrete logarithm indα β of
with respect to α.
Recall that every reduced divisor
can be written uniquely as
, l ≤ g, where for i ≠ j the points Pi and Pj are not opposite of each other. Only ordinary points (not special points) may appear more than once in the list P1, . . . , Pl. We also know that such a divisor can be represented by a pair of unique polynomials a,
satisfying deg b < deg a ≤ g and a|(b2 + bu – v). In that case, we write D = Div(a, b). What interests us is the fact that the roots of the polynomial a are precisely the X-coordinates of the points P1, . . . , Pl. This fact leads to the very useful concepts of prime divisors and smooth divisors.
|
A divisor For an arbitrary divisor |
In order to set up a factor base B, we predetermine a smoothness bound δ and let B consist of all the prime divisors
with deg a ≤ δ. For simplicity, we take δ = 1. This is indeed a practical choice, when the genus g is not too large (say, g ≤ 9). Let
be an (irreducible) polynomial of degree 1. In order to find out
such that Div(a, b) is a prime divisor, we first see that deg b < deg a, that is,
. Furthermore, a|(b2 + bu – v): that is, b2 + bu – v ≡ 0 (mod X – h); that is, b2 + bu(h) – v(h) = 0. Thus, the desired values of
, if existent, can be found by solving a quadratic equation over
. There are q irreducible polynomials
of degree 1 and for each such a there are either two or no solutions for
. Assuming that both these possibilities are equally likely, we conclude that the size of the factor base is ≈ q.
In order to check for the smoothness of a divisor
over the factor base B, we first factor a over
. Under the assumption that δ = 1, the divisor D is smooth if and only if a splits completely over
. Let us write a(X) = (X –h1) ··· (X –hl),
. Then for some
we have
, where Di := Div(X – hi, ki). We may use trial divisions (that is, trial subtractions in this additive setting) by elements of B in order to determine the prime divisors D1, . . . , Dl of D. Proposition 4.5 establishes the probability that a randomly chosen element of
is smooth.
|
For q ≫ 4g2, there are approximately qg/g! (1-)smooth divisors in The assumption q ≫ 4g2 is practical, since we usually employ curves of (fixed) small genus g over finite fields |
Now, we have all the machinery required to describe the basic version of the index calculus method for computing indα β in
. In the first stage, we choose a random
and compute the (reduced) divisor jα and check if jα is smooth over the factor base B. Every smooth jα gives a relation: that is, a linear congruence modulo m involving the (unknown) indices of the elements of B to the base α. After sufficiently many (say, ≥ 2(#B)) such relations are found, the system of linear congruences collected is expected to be of full rank and is solved modulo m. This gives us the indices of the elements of the factor base. Each congruence collected above contains at most g non-zero coefficients and so the system is necessarily sparse. In the second stage, we find out a single random j for which β +jα is smooth. The database prepared in the first stage then immediately gives indα β.
The Hasse–Weil Bounds (3.8) on p 226 show that the cardinality of
is approximately qg. Thus O(g log q) bits are needed to represent an element of
. This fact is consistent with the representation of reduced divisors by pairs of polynomials. Gaudry [105] calculates that this variant of the ICM does O(q2 + g!q) operations, each of which takes polynomial time in the input size g log q. If g is considered to be constant, the running time becomes O(q2 logt q) (that is, O~(q2)) for some real t > 0. A square root method on
runs in (expected) time O~(qg/2). Thus for g > 4 the index calculus method performs better than the square root methods. Indeed Gaudry’s implementation of this algorithm is capable of computing in a few days discrete logs in the curve of genus 6 mentioned above. The Jacobian of this curve is of cardinality ≈ 1040.
For cryptographic purposes, we should have
. If we want to take q small (so that multi-precision arithmetic can be avoided), we should choose large values of g. But this choice makes the ADH–Gaudry algorithm quite efficient. For achieving the desired level of security in cryptographic applications, hyperelliptic curves of genus 2, 3 and 4 only are recommended.
So far we have seen many algorithms which require solving large systems of linear equations (or congruences). The number n of unknowns in such systems can be as large as several millions. Standard Gaussian elimination on such a system takes time O(n3) and space O(n2). There are asymptotically faster algorithms like Strassen’s method [292] that takes time O(n2.807) and Coppersmith and Winograd’s method [60] having a running time of O(n2.376). Unfortunately, these asymptotic estimates do not show up in the range of practical interest. Moreover, the space requirements of these asymptotically faster methods are prohibitively high (though still O(n2)).
Luckily enough, cryptanalytic algorithms usually deal with coefficient matrices that are sparse: that is, that have only a small number of non-zero entries in each row. For example, consider the system of linear congruences available from the relation collection stage of an ICM for solving the DLP over a finite field
. The factor base consists of a subexponential (in lg q) number of elements, whereas each relation involves at most O(lg q) non-zero coefficients. Furthermore, the sparsity of the resulting matrix A is somewhat structured in the sense that the columns of A corresponding to larger primes in the factor base tend to have fewer numbers of non-zero entries. In this regard, we refer to the interesting analysis by Odlyzko [225] in connection with the Coppersmith method (Section 4.4.4). Odlyzko took m = 2n equations in n unknown indices and showed that about n/4 columns of A are expected to contain only zero coefficients, implying that these variables never occurred in any relation collected. Moreover, about 0.346n columns of A are expected to have only single non-zero coefficients.
The sparsity (as well as the structure of the sparsity) of the coefficient matrix A can be effectively exploited and the system can be solved in time O~(n2). In this section, we describe some special algorithms for large sparse linear systems. In what follows, we assume that we want to compute the unknown n-dimensional column vector x from the given system of equations
Ax = b,
where A is an m × n matrix, m ≥ n, and where b is a non-zero m-dimensional column vector. Though this is not the case in general, we will often assume for the sake of simplicity that A has full rank (that is, n). We write vectors as column vectors, that is, an l-dimensional vector v with elements v1, . . . , vl is written as v = (v1 v2 . . . vl)t, where the superscript t denotes matrix transpose.
Before we proceed further, some comments are in order. First note that our system of equations is often one over the finite ring
which is not necessarily a field. Most of the methods we describe below assume that
is a field, that is, r is a prime. If r is composite, we can do the following. First, assume that the prime factorization
, αi > 0, of r is known. In that case, we first solve the system over the fields
for i = 1, . . . , s. Then for each i we lift the solution modulo pi to the solution modulo
. Finally, all these lifted solutions are combined using the CRT to get the solution modulo r.
Hensel lifting can be used to lift a solution of the system Ax ≡ b (mod p) to a solution of Ax ≡ b (mod pα), where p is a prime and
. We proceed by induction on α. Let us denote the (or a) solution of Ax ≡ b (mod p) by x1, which can be computed by solving a system in the field
. Now, assume that for some
we know (integer) vectors x1, . . . , xi such that
Equation 4.14

We then attempt to compute a vector xi+1 such that
Equation 4.15

Congruence (4.14) shows that the elements of A, x1, . . . , xi, b can be so chosen (as integers) that for some vector yi we have the equality
A(x1 + px2 + ··· + pi–1xi) = b – piyi
in
. Substituting this in Congruence (4.15) gives Axi+1 ≡ yi (mod p). Thus the (incremental) vector xi+1 can be obtained by solving a linear system in
.
It, therefore, suffices to know how to solve linear congruences modulo a prime p. However, problems arise, when we do not know the factorization of r (while solving Ax ≡ b (mod r)). If r is large, it would be a heavy investment to make attempts to factor r. What can be done instead is the following. First, we use trial divisions to extract the small prime factors of r. We may, therefore, assume that r has no small prime factors. We proceed to solve Ax ≡ b (mod r) assuming that r is a prime (that is, that
is a field). In a field, every non-zero element is invertible. But if r is composite, there are non-zero elements
which are not invertible (that is, for which gcd(a, r) > 1). If, during the course of the computation, we never happen to meet (and try to invert) such non-zero non-invertible elements, then the computation terminates without any trouble. Otherwise, such an element a corresponds to a non-trivial factor gcd(a, r) of r. In that case, we have a partial factorization of r and restart solving the system modulo each suitable factor of r.
Some of the algorithms we discuss below assume that A is a symmetric matrix. In our case, this is usually not the case. Indeed we have matrices A which are not even square. Both these problems can be overcome by trying to solve the modified system AtAx = At b. If A has full rank, this leads to an equivalent system.
If r = 2 (as in the case of the QSM for factoring integers), using the special methods is often not recommended. In this case, the elements of A are bits and can be packed compactly in machine words, and addition of rows can be done word-wise (say, 32 bits at a time). This leads to an efficient implementation of ordinary Gaussian elimination, which usually runs faster than the more complicated special algorithms described below, at least for the sizes of practical systems.
In what follows, we discuss some well-known methods for solving large sparse linear systems over finite fields (typically prime fields). In order to simplify notations, we will refrain from writing the matrix equalities as congruences, but treat them as equations over the underlying finite fields.
Structured Gaussian elimination is applied to a sparse system before one of the next three methods is employed to solve the system. If the sparsity of A has some structures (as discussed earlier), then structured Gaussian elimination tends to reduce the size of the system considerably, while maintaining the sparsity of the system. We now describe the essential steps of structured Gaussian elimination. Let us define the weight of a row or column of a matrix to be the number of non-zero entries in that row or column.
First we delete all the columns (together with the corresponding variables) that have weight 0. These variables never occur in the system and need not be considered at all.
Next we delete all the columns that have weight 1 and the rows corresponding to the non-zero entries in these columns. Each such deleted column correspond to a variable xi that appears in exactly one equation. After the rest of the system is solved, the value of xi is obtained by back substitution. Deleting some rows in the matrix in this step may expose some new columns of weight 1. So this step should be repeated, until all the columns have weight > 1.
Now, choose each row with weight 1. This gives a direct solution for the variable xi corresponding to the non-zero entry of the row. We then substitute this value of xi in all the equations where it occurs and subsequently delete the ith column. We repeat this step, until all rows are of weight > 1.
At this point, the system usually has many more equations than variables. We may make the system a square one by throwing away some rows. Since subtracting multiples of rows of higher weights tends to increase the number of non-zero elements in the matrix, we should throw away the rows with higher weights. While discarding the excess rows, we should be careful to ensure that we are not left with a matrix having columns of weight 0. Some columns in the reduced system may again happen to have weight 1. Thus, we have to repeat the above steps again. And again and again and . . . , until we are left with a square matrix each row and column of which has weight ≥ 2.
This procedure leads to a system which is usually much smaller than the original system. In a typical example quoted in Odlyzko [225], structured Gaussian elimination reduces a system with 16,500 unknowns to one with less that 1,000 unknowns. The resulting reduced system may be solved using ordinary Gaussian elimination which, for smaller systems, appears to be much faster than the following sophisticated methods.
The conjugate gradient method was originally proposed to solve a linear system Ax = b over
for an n × n (that is, square) symmetric positive definite matrix A and for a nonzero vector b and is based on the idea of minimizing the quadratic function
. The minimum is attained, when the gradient ∇f = Ax – b equals zero, which corresponds to the solution of the given system.
The conjugate gradient method is an iterative procedure. The iterations start with an initial minimizer x0 which can be any n-dimensional vector. As the iterations proceed, we obtain gradually improved minimizers x0, x1, x2, . . . , until we reach the solution. We also maintain and update two other sequences of vectors ei and di. The vector ei stands for the error b – Axi, whereas the vectors d0, d1, . . . constitute a set of mutually conjugate (that is, orthogonal) directions. We initialize e0 = d0 = b – Ax0 and for i = 0, 1, . . . repeat the steps of Algorithm 4.8, until ei = 0. We denote the inner product of two vectors v = (v1 v2 . . . vn)t and w = (w1 w2 . . . wn)t by
.
|
ai := 〈ei, ei〉/〈di, Adi〉. xi+1 := xi + aidi. ei+1 := ei – aiAdi. bi := 〈ei+1, ei+1〉/〈ei, ei〉. di+1 := ei+1 + bidi. |
This method computes a set of mutually orthogonal directions d0, d1, . . . , and hence it has to stop after at most n – 1 iterations, since we run out of new orthogonal directions after n – 1 iterations. Provided that we work with infinite precision, we must eventually obtain ei = 0 for some i, 0 ≤ i ≤ n – 1.
If A is sparse, that is, if each row of A has O(logc n) non-zero entries, c being a positive constant, then the product Adi can be computed using O~(n) field operations. Other operations clearly meet this bound. Since at most n – 1 iterations are necessary, the conjugate gradient method terminates after performing O~(n2) field operations.
We face some potential problems, when we want to apply this method to solve a system over a finite field
. First, the matrix A is usually not symmetric and need not even be square. This problem can be avoided by solving the system AtAx = At b. The new coefficient matrix AtA may be non-sparse (that is, dense). So instead of computing and working with AtA explicitly, we compute the product (AtA)di as At (Adi), that is, we avoid multiplication by a (possibly) dense matrix at the cost of multiplications by two sparse matrices.
The second difficulty with a finite field
is that the question of minimizing an
-valued function makes hardly any sense (and so does positive definiteness of a matrix over
). However, the conjugate gradient method is essentially based on the generation of a set of mutually orthogonal vectors d0, d1, . . . . This concept continues to make sense in the setting of a finite field.
If A is a real positive definite matrix, we cannot have 〈di, Adi〉 = 0 for a nonzero vector di. But this condition need not hold for a matrix A over
. Similarly, we may have a non-zero error vector ei over
, for which 〈ei, ei〉 = 0. (Again this is not possible for real vectors.) So for the iterations over
(more precisely, the computations of ai and bi) to proceed gracefully, all that we can hope for is that before reaching the solution we never hit a non-zero direction vector di for which 〈di, Adi〉 = 0 nor a non-zero error vector ei for which 〈ei, ei〉 = 0. If q is sufficiently large and if the initial minimizer x0 is sufficiently randomly chosen, then the probability of encountering such a bad di or ei is rather low and as a result the method is very likely to terminate without problems. If, by a terrible stroke of bad luck, we have to abort the computation prematurely, we should restart the procedure with a new random initial vector x0. If q is small (say q = 2 as in the case of the QSM), it is a neater idea to select the entries of the initial vector x0 from a field extension
and work in this extension. The eventual solution we will reach at will be in
, but working in the larger field decreases the possibility of an attempt of division by 0.
There is, however, a brighter side of using a finite field
in place of
, namely every calculation we perform in
is exact, and we do not have to bother about a criterion for determining whether an error vector ei is zero or about the conditioning of the matrix A. One of the biggest headaches of numerical analysis is absent here.
The Lanczos method is another iterative method quite similar to the conjugate gradient method. The basic difference between these methods lies in the way by which the mutually conjugate directions d0, d1, . . . are generated. For the Lanczos method, we start with the initializations: d0 := b,
,
, x0 = a0d0. Then, for i = 1, 2, . . . , we repeat the steps in Algorithm 4.9 as long as
.
|
vi+1 := Adi.
xi := xi–1 + aidi. |
If A is a real positive definite matrix, the termination criterion
is equivalent to the condition di = 0. When this is satisfied, the vector xi–1 equals the desired solution x of the system Ax = b. Since d0, d1, . . . are mutually orthogonal, the process must stop after at most n – 1 iterations. Therefore, for a sparse matrix A, the entire procedure performs O~(n2) field operations.
The problems we face with the Lanczos method applied to a system over
are essentially the same as those discussed in connection with the conjugate gradient method. The problem with a non-symmetric and/or non-square matrix A is solved by multiplying the system by At. Instead of working with AtA explicitly, we prefer to multiply separately by A and At.
The more serious problem with a system over
is that of encountering a non-zero direction vector di with
. If it happens, we have to abort the computation prematurely. In order to restart the procedure, we try to solve the system BAx = Bb, where B is a diagonal matrix whose diagonal elements are chosen randomly from the non-zero elements of the field
or of some suitable extension
(if q is small).
The Wiedemann method for solving a sparse system Ax = b over
uses ideas different from those employed by the other methods discussed so far. For the sake of simplicity, we assume that A is a square non-singular matrix (not necessarily symmetric). The Wiedemann method tries to compute the minimal polynomial
, d ≤ n, of A. To that end, one selects a small positive integer l in the range 10 ≤ l ≤ 20. For
, let vi denote the column vector of length l consisting of the first l entries of the vector Aib. For the working of the Wiedemann method, we need to compute only the vectors v0, . . . , v2n. If A is a sparse matrix, this computation involves a total of O~(n2) operations in
.
Since μA(A) = 0, we have
for every
. Therefore, for each k = 1, . . . , l the sequence v0,k, v1,k, . . . of the k-th entries of v0, v1, . . . satisfies the linear recurrence

But then the minimal polynomial μk(X) of the k-th such sequence is a factor of μA(X). There are methods that compute each μk(X) using O(n2) field operations. We then expect to obtain μA(X) = lcm(μk(X) | 1 ≤ k ≤ l).
The assumption that A is non-singular is equivalent to the condition that c0 ≠ 0. In that case, the solution vector
can be computed using O~(n2) arithmetic operations in the field
.
If A is singular, we may find out linear dependencies among the rows of A and subsequently throw away suitable rows. Doing this repeatedly eventually gives us a non-singular A. For further details on the Wiedemann method, see [303].
In this section, we assume that
be a knapsack set. For
, we are required to find out
such that
, provided that a solution exists. In general, finding such a solution for ∊1, . . . , ∊n is a very difficult problem.[6] However, if the weights satisfy some specific bounds, there exist polynomial-time algorithms for solving the SSP.
[6] In the language of complexity theory, the decision problem of determining whether a solution of the SSP exists is NP-complete.
Let us first define an important quantity associated with a knapsack set:
|
The density of the knapsack set |
If d(A) > 1, then there are, in general, more than one solutions for the SSP (provided that there exists one solution). This makes the corresponding knapsack set A unsuitable for cryptographic purposes. So we consider low densities: that is, the case that d(A) ≤ 1.
There are certain algorithms that reduce in polynomial time the problem of finding a solution of the SSP to that of finding a shortest (non-zero) vector in a lattice. Assuming that such a vector is computable in polynomial time, Lagarias and Odlyzko’s reduction algorithm [157] solves the SSP in polynomial time with high probability, if d(A) ≤ 0.6463. An improved version of the algorithm adapts to densities d(A) ≤ 0.9408 (see Coster et al. [64] and Coster et al. [65]). The reduction algorithm is easy and will be described in Section 4.8.1. However, it is not known how to efficiently compute a shortest non-zero vector in a lattice. The Lenstra–Lenstra–Lovasz (L3) polynomial-time lattice basis reduction algorithm [166] provably finds out a non-zero vector whose length is at most the length of a shortest non-zero vector, multiplied by a power of 2. In practice, however, the L3 algorithm tends to compute a shortest vector quite often. Section 4.8.2 deals with the L3 lattice basis reduction algorithm.
Before providing a treatment on lattices, let us introduce a particular case of the SSP, which is easily (and uniquely) solvable.
|
A knapsack set {a1, . . . , an} with a1 < ··· < an is said to be superincreasing, if |
Algorithm 4.10 solves the SSP for a superincreasing knapsack set in deterministic polynomial time. The proof for the correctness of this algorithm is easy and left to the reader.
|
Input: A superincreasing knapsack set {a1, . . . , an} with a1 < ··· < an and Output: The (unique) solution for Steps: for i = n, n – 1, . . . , 1 { |
We start by defining a lattice.
|
Let n,
We say that v1, . . . , vd constitute a basis of L. |
In general, a lattice may have more than one basis. We are interested in bases consisting of short vectors, where the concept of shortness is with respect to the following definition.
|
Let v := (v1, . . . , vn)t and w := (w1, . . . , wn)t be two n-dimensional vectors in 〈v, w〉 := v1w1 + ··· + vnwn, and the length of v is defined as
|
For the time being, let us assume the availability of a lattice oracle which, given a lattice, returns a shortest non-zero vector in the lattice. The possibilities for realizing such an oracle will be discussed in the next section.
Consider the subset sum problem with the knapsack set A := {a1, . . . , an} and let B be an upper bound on the weights (that is, each ai ≤ B). For
, we are supposed to find out
such that
. Let L be the n+1-dimensional lattice in
generated by the vectors

where N is an integer larger than
. The vector
is in the lattice L, where
. Involved calculations (carried out in Coster et al. [64, 65]) show that the probability P of the existence of a vector
with ‖w‖ ≤ ‖v‖ satisfies
, where c ≈ 1.0628. Now, if the density d(A) of A is less than 1/c ≈ 0.9408, then B = 2c′n for some c′ > c and, therefore, P → 0 as n → ∞. In other words, if d(A) < 0.9408, then, with a high probability, ±v are the shortest non-zero vectors of L. The lattice oracle then returns such a vector from which the solution ∊1, . . . , ∊n can be readily computed.
Let L be a lattice in
specified by a basis of n linearly independent vectors v1, . . . , vn. We now construct a basis
of
such that
(that is,
and
are orthogonal to each other) for all i, j, i ≠ j. Note that
need not be a basis for L. Algorithm 4.11 is known as the Gram–Schmidt orthogonalization procedure.
|
Input: A basis v1, . . . , vn of Output: The Gram–Schmidt orthogonalization Steps:
|
One can easily verify that
constitute an orthogonal basis of
. Using these notations, we introduce the following important concept:
|
The basis v1, . . . , vn is called a reduced basis of L, if Equation 4.16
and Equation 4.17
|
A reduced basis v1, . . . , vn of L is termed so, because the vectors vi are somewhat short. More precisely, we have Theorem 4.5, the proof of which is not difficult, but is involved, and is omitted here.
|
Let v1, . . . , vn be a reduced basis of a lattice L, and
for all i = 1, . . . , m. In particular, for any non-zero vector w of L we have ‖ v1‖2 ≤ 2n–1‖w‖2. That is, for a reduced basis v1, . . . , vn of L the length of v1 is at most 2(n–1)/2 times that of the shortest non-zero vector in L. |
Given an arbitrary basis v1, . . . , vn of a lattice L, the L3 basis reduction algorithm computes a reduced basis of L. The algorithm starts by computing the Gram–Schmidt orthogonalization
of v1, . . . , vn. The rational numbers μi,j are also available from this step. We also obtain as byproducts the numbers
for i = 1, . . . , n.
Algorithm 4.12 enforces Condition (4.16) |μk,l| ≤ 1/2 for a given pair of indices k and l. The essential work done by this routine is subtracting a suitable multiple of vl from vk and updating the values μk,1, . . . , μk,l accordingly.
|
Input: Two indices k and l. Output: An update of the basis vectors to ensure |μk,l| ≤ 1/2. Steps:
vk := vk – rvl. for h = 1, . . . , l – 1 {μk,h := μk,h – rμl,h. } μk,l := μk,l – r. |
If Condition (4.17) is not satisfied by some k, that is, if
, then vk and vk–1 are swapped. The necessary changes in the values Vk, Vk–1 and certain μi,j’s should also be incorporated. This is explained in Algorithm 4.13.
|
Input: An index k. Output: An update of the basis vectors to ensure Steps: μ := μk,k–1. V := Vk + μ2Vk–1. |
The main basis reduction algorithm is described in Algorithm 4.14. It is not obvious that this algorithm should terminate at all. Consider the quantity D := d1 · · · dn–1, where di := | det(〈vk, vl〉)1≤k,l≤i| for each i = 1, . . . , n. At the beginning of the basis reduction procedure one has di ≤ Bi for all i = 1, . . . , n, where B := max(|vi|2 | 1 ≤ i ≤ n). It can be shown that an invocation of Algorithm 4.12 does not alter the value of D, whereas interchanging vi and vi–1 in Algorithm 4.13 decreases D by a factor < 3/4. It can also be shown that for any basis of L the value D is bounded from below by a constant which depends only on the lattice. Thus, Algorithm 4.14 stops after finitely many steps.
|
Input: A basis v1, . . . , vn of a lattice L. Output: v1, . . . , vn converted to a reduced basis. Steps: Compute the Gram–Schmidt orthogonalization of v1, . . . , vn (Algorithm 4.11). /* The initial values of μi,j and Vi are available at this point */ |
For a more complete treatment of the L3 basis reduction algorithm, we refer the reader to Lenstra et al. [166] (or Mignotte [203]). It is important to note here that the L3 basis reduction algorithm is at the heart of the Lenstra–Lenstra–Lovasz algorithm for factoring a polynomial in
. This factoring algorithm indeed runs in time polynomially bounded by the degree of the polynomial to be factored and is one of the major breakthroughs in the history of symbolic computing.
| 4.27 | Let be a knapsack set. Show that:
|
| 4.28 | Let L be a lattice in and let v1, . . . , vn constitute a basis of L. The determinant of L is defined by
det L := det(v1, . . . , vn).
|
This chapter introduces the most common computationally intractable mathematical problems on which the security of public-key cryptosystems banks. We also describe some algorithms known till date for solving these difficult computational problems.
To start with, we enumerate these computational problems. The first problem in the row is the integer factorization problem (IFP) and its several variants. Some problems that are provably or believably equivalent to the IFP are the totient problem, problems associated with the RSA algorithm, and the modular square root problem. The next class of problems includes the discrete logarithm problem (DLP) and its variants on elliptic curves (ECDLP) and hyperelliptic curves (HECDLP). The Diffie–Hellman problem (DHP) and its variants (ECDHP, HECDHP) are believed to be equivalent to the respective variants of the DLP. Finally, the subset sum problem (SSP) and two related problems, namely the shortest vector problem (SVP) and the closest vector problem (CVP) on lattices, are introduced.
The subsequent sections are devoted to an algorithmic study of these difficult problems. We start with IFP. We first present some fully exponential algorithms like trial division, Pollard’s rho method, Pollard’s p – 1 method and Williams’ p + 1 method. Next we describe the modern genre of subexponential algorithms. The quadratic sieve method (QSM) is discussed at length together with its heuristic improvements like incomplete sieving, large prime variation and the multiple polynomial variant. We also describe TWINKLE, a hardware device that efficiently implements the sieving stage of the QSM. We then discuss the elliptic curve method (ECM) and the number field sieve method (NFSM) for factoring integers. The NFSM turns out to be the asymptotically fastest known algorithm for factoring integers.
The (finite field) DLP is discussed next. The older square-root methods, such as Shanks’ baby-step–giant-step method (BSGS), Pollard’s rho method and the Pohlig–Hellman method (PHM), take exponential running times in the worst case. The PHM for a field
is, however, efficient if q – 1 has only small prime factors. Next we discuss the modern family of algorithms collectively known as the index calculus method (ICM). For prime fields, we discuss three variants of the ICM, namely the basic method, the linear sieve method (LSM) and the number field sieve method (NFSM). We also discuss three variants of the ICM for fields of characteristic 2: the basic method, the linear sieve method and Coppersmith’s algorithm. Another interesting variant is the cubic sieve method (CSM) covered in the exercises. We explain Gordon and McCurley’s polynomial sieving in connection with Coppersmith’s algorithm.
The next section deals with algorithms for solving the ECDLP. For a general elliptic curve, the exponential square-root methods are the only known algorithms. For some special classes of curves, more efficient methods are proposed in the literature. The MOV reduction based on Weil pairing reduces ECDLP on a curve over
to DLP in the finite field
for some suitable
. This k is small and the reduction is efficient for supersingular curves. The SmartASS method (also called the anomalous method) reduces the ECDLP in an anomalous curve to the computation of p-adic discrete logarithms. This reduction solves the original DLP in polynomial time. In view of these algorithms, it is preferable to avoid supersingular and anomalous curves in cryptographic applications. The xedni calculus method (XCM) is discussed finally. This algorithm works by lifting a curve over
to a curve over
. Experimental and theoretical evidences suggest that the XCM is not an efficient solution to the ECDLP.
We then devote a section to the study of an index calculus method to solve the HECDLP. For hyperelliptic curves of small genus, this method leads to a subexponential algorithm (the ADH–Gaudry algorithm).
Many of the above subexponential methods require solving a system of linear congruences over finite rings. This (inherently sequential) linear algebra part often turns out to be the bottleneck of the algorithms. However, the fact that these equations are necessarily sparse can be effectively exploited, and some faster algorithms can be used to solve these systems. We study four such algorithms: structured Gaussian elimination, the conjugate gradient method, the Lanczos method and the Wiedemann method.
In the last section, we study the subset sum problem. We first reduce the SSP to problems associated with lattices. We finally present the lattice-basis reduction algorithm due to Lenstra, Lenstra and Lovasz.
Several other computationally intractable problems have been proposed in the literature for building cryptographic systems. Some of these problems are mentioned in the annotated references of Chapter 5. Due to space and time limitations, we will not discuss these problems in this book.
The integer factorization problem is one of the oldest computational problems. Though the exact notion of computational complexity took shape only after the advent of computers, the apparent difficulty of solving the factorization problem has been noticed centuries ago. Crandall and Pomerance [69] call it the fundamental computational problem of arithmetic. Numerous books and articles provide discussions on this subject at varying levels of coverage. Crandall and Pomerance [69] is perhaps the most extensive in this regard. The reader can also take a look at Bressoud’s (much simpler) book [36] or the (compact, yet reasonably detailed) Chapter 10 of Henri Cohen’s book [56]. The articles by Lenstra et al. [164] and by Montgomery [211] are also worth reading.
John M. Pollard has his name attached to three modern inventions in the arena of integer factorization. In [238, 239], he introduces the rho and p – 1 methods. (Later he has been part of the team that has designed the number-field sieve factoring algorithm.) Williams’ p + 1-method appears in 1982 in [305].
The continued fraction method (CFRAC) is apparently the first known subexponential-time integer factoring algorithm. It is based on the work of Lehmer and Powers [162] and first appears in the currently used form in Morrison and Brillhart’s paper [213]. CFRAC happens to be the most widely used integer factoring algorithm used during late 1970s and early 1980s.
The quadratic sieve method, invented by Carl Pomerance [241] in 1984, supersedes the CFRAC method. The multiple-polynomial QSM appears in Silverman [279]. Hendrik Lenstra’s elliptic curve method [174] is proposed almost concurrently as the QSM. Nowadays, the QSM and the ECM are the most commonly used factoring methods. Reyneri’s cubic sieve method is described in Lenstra and Lenstra [165].
The theoretically superior number field sieve method follows from Pollard’s factoring method using cubic integers [240]. The initial proposal for the NFS method is that of the simple NFS and appears in Lenstra et al. [167]. It is later modified to the general NFS method in Buhler et al. [41]. Lenstra and Lenstra [165] is a compilation of papers on the NFS method. Though the NFS method is the asymptotically fastest factoring method, its fairly complicated implementation makes the algorithm superior to QSM or ECM, only when the bit size of the integer to be factored is reasonably large.
Shamir’s factoring engine TWINKLE is proposed in [269]. A. K. Lenstra and Shamir analyse and optimize its design in [168]. Shamir and Tromer [270] have proposed a device called TWIRL (The Weizmann Institute Relation Locator) that is geared to the NFS factoring method. It is estimated that a TWIRL implementation costing US$10K can complete the sieving for a 512-bit RSA modulus in less than 10 minutes, whereas one that does the same for a 1024-bit RSA modulus costs US$10–50M and takes a time of one year. Lenstra et al. [163] provide a more detailed analysis of these estimates. See Lenstra et al. [169] to know about Bernstein’s factorization circuit which is another implementation of the NFS factoring method.
The (finite field) discrete logarithm problem also invoked much research in the last few decades. The older square-root methods are described well in the book [191] by Menezes. Donald Knuth attributes the baby-step–giant-step method to Daniel Shanks. See Stein and Teske [290] for various optimizations of the baby-step–giant-step method. Pollard’s rho method is an adaptation of the same method for integer factorization. See Pohlig and Hellman [234] for the Pohlig–Hellman method.
The first idea of the index calculus method appears in Western and Miller [302]. Coppersmith et al. [59] describe three variants of the index calculus method: the linear sieve method, the residue list sieve method and the Gaussian integer method. The same paper also proposes the cubic sieve method (CSM). LaMacchia and Odlyzko [158] describe an implementation of the linear sieve and the Gaussian integer methods. Das and Veni Madhavan [73] make an implementation study of the CSM. Also look at the survey [189] by McCurley.
Gordon [119] uses number field sieves for computing discrete logarithms over prime fields. Weber et al. [261, 299, 300, 301] have implemented and proved the practicality of the number field sieve method. Also see Schirokauer’s paper [260].
Odlyzko [225] surveys the algorithms for computing discrete logs in the fields
. The best algorithm for these fields is Coppersmith’s algorithm [57]. No analog of this algorithm is known for prime fields. Gordon and McCurley [120] use Coppersmith’s algorithm for the computation of discrete logarithms in
and
.
The article [226] by Odlyzko and the one [242] by Pomerance are two recent surveys on the finite field discrete logarithm problem. Also see Buchmann and Weber [40].
The elliptic curve discrete logarithm problem seems to be a very difficult computational problem. A direct adaptation of the index calculus method is expected to lead to a running time worse than that of brute-force search (Silverman and Suzuki [278] and Blake et al. [24].) Menezes et al. [193] reduce the problem of computing discrete logs in an elliptic curve over
to computing discrete logs in the field
for some k. For supersingular elliptic curves, this k can be chosen to be small. For a general curve, the MOV reduction takes exponential time (Balasubramanian and Koblitz [16]). The SmartASS method is due to Smart [282], Satoh and Araki [257] and Semaev [265]. Joseph H. Silverman proposes the xedni calculus method in [277]. This method has been experimentally and heuristically shown to be impractical by Jacobson et al. [139].
Adleman et al. [2] propose the first subexponential algorithm for the hyperelliptic curve discrete log problem. This algorithm is applicable for curves of high genus over prime fields. The analysis of its running time is based on certain heuristic assumptions. Enge [86] provides a subexponential algorithm which has a rigorously provable running time and which works for curves over any arbitrary field
. Again, the algorithm demands curves of high genus. An implementation of the Adleman–DeMarrais–Huang algorithm is given by Gaudry [105]. Also see Enge and Gaudry [87].
Gaudry et al. [107] propose a Weil-descent attack for the hyperelliptic curve discrete log problem. This is modified in Galbraith [100] and Galbraith et al. [101].
Coppersmith et al. [59] describe sparse system solvers. LaMacchia and Odlyzko [159] implement these methods. For further details, see Montgomery [212], Coppersmith [58], Wiedemann [303], and Yang and Brent [306].
That public-key cryptosystems can be based on the subset-sum problem (or the knapsack problem) was considered at the beginning of the era of public-key cryptography. Historically the first realization of a public-key system is based along this line and is due to Merkle and Hellman [196]. But the Merkle–Hellman system and several variants of it are broken; see Shamir [266], for example. At present, most public-key systems based on the subset-sum problem are known to be insecure.
The lattice-basis reduction algorithm and the associated L3 algorithm for factoring polynomials appear in the celebrated work [166] of Lenstra, Lenstra and Lovasz. Mignotte’s book [203] also describes these topics in good details.
| 5.1 | Introduction |
| 5.2 | Secure Transmission of Messages |
| 5.3 | Key Exchange |
| 5.4 | Digital Signatures |
| 5.5 | Entity Authentication |
| Chapter Summary | |
| Sugestions for Further Reading |
An essential element of freedom is the right to privacy, a right that cannot be expected to stand against an unremitting technological attack.
—Whitfield Diffie
Mary had a little key (It’s all she could export), and all the email that she sent was opened at the Fort.
—Ronald L. Rivest
Treat your password like your toothbrush. Don’t let anybody else use it, and get a new one every six months.
—Clifford Stoll
As we pointed out in Chapter 1, cryptography tends to guard sensitive data from unauthorized access. We shortly describe some algorithms that achieve this goal. We restrict ourselves only to public-key algorithms. In practice, however, public-key algorithms are used in tandem with secret-key algorithms. In this chapter, we describe only the basic routines to which are input mathematical entities like integers, points in finite fields or on curves. Message encoding will be dealt with in Chapter 6.
Consider the standard scenario: a party named Alice, and called sender, is willing to send a secret message m to a party named Bob, and called receiver or recipient, over a public communication channel. A third party Carol may intercept and read the message. In order to maintain the secrecy of the message, Alice uses a well-defined transform fe to convert the plaintext message m to the ciphertext message c and sends c to Bob. Bob possesses some secret information with the help of which he uses the reverse transformation fd in order to get back m. Carol who is expected not to know the secret information cannot retrieve m from c by applying the transformation fd.
In a public-key system, the realization of the transforms fe and fd is based on a key pair (e, d) predetermined by Bob. The public key e is made public, whereas the private key d is kept secret. The encryption transform generates c = fe(m, e). Since e is a public knowledge, anybody can generate c from a given m, whereas the decryption transform m = fd(c, d) can be performed only by Bob who possesses the knowledge of d. The key pair has to be so chosen that knowledge of e does not allow Carol to compute d in feasible time. The intractability of the computational problems discussed in Chapter 4 can be exploited to design such key pairs. The exact realization of the keys e, d and the transforms fe, fd depends on the choice of the underlying intractable problem and also on the way to make use of the problem. Since there are several intractable problems suitable for cryptography, there are several encryption schemes varying widely in algorithmic and mathematical details.
RSA has been the most popular encryption algorithm. Historically also, it is the first public-key encryption algorithm published in the literature (see Rivest et al. [252]). Its security is based on the intractability of the RSAP (or the RSAKIP) discussed in Exercise 4.2. Since both these problems are polynomial-time reducible to the IFP, we often say that the RSA algorithm derives its security from the intractability of the IFP. It may, however, be the case that breaking RSA is easier than factoring integers, though no concrete evidences seem to be available.
Algorithm 5.1 generates a key pair for RSA.
|
Output: A random RSA key pair. Steps: Generate two different random primes p and q each of bit length l. n := pq. Choose an integer e coprime to φ(n) = (p – 1)(q – 1). d := e–1 (mod φ(n)). Return the pair (n, e) as the public key and the pair (n, d) as the private key. |
The length l of the primes p and q should be chosen large enough so as to make the factorization of n infeasible. For short-term security, values of l between 256 and 512 suffice. For long-term security, one may choose l as large as 2,048.
The random primes p and q can be generated using a probabilistic algorithm like those described in Section 3.4.2. Naive primes are normally considered to be sufficiently secure in this respect, since p ± 1 and q ± 1 are expected to have large prime factors in general. Gordon’s algorithm (Algorithm 3.14) can also be used for generating strong primes p and q. Since Gordon’s algorithm runs only nominally slower than the algorithm for generating naive primes, there is no harm in using strong primes. Safe primes, on the other hand, are difficult to generate and may be avoided.
The RSA modulus n is public knowledge. Determining d from n and e is easily doable, given the value of φ(n) = (p – 1)(q – 1) which, in turn, is readily computable, if p and q are known. If an adversary can compute φ(n) (with or without factoring n), the security of the RSA protocol based on the modulus n is compromised. However, computing φ(n) without the knowledge of p and q is (at least historically) a very difficult computational problem, and so, if n is reasonably large, RSA encryption is assumed to be sufficiently secure.
RSA encryption is done by raising the plaintext message m to the power e modulo n. In order to speed up this (modular) exponentiation, it is often expedient to take a small value for e (like 3, 257 and 65,537). However, in that case one should adopt certain precautions as Exercise 5.2 suggests. More specifically, if e entities share a common (small) encryption key e but different (pairwise coprime) moduli and if the same message m is encrypted using all these public keys, then an eavesdropper can reconstruct m easily from a knowledge of the e ciphertext messages. Another potential problem of using small e is that if m is small, that is, if m < n1/e, then m can be retrieved by taking the integer e-th root of the ciphertext message.
Although the pair (n, d) is sufficient for carrying out RSA decryption, maintaining some additional (secret) information significantly speeds up decryption. To this end, it is often recommended that some or all of the values n, e, d, p, q, d1, d2, h be stored, where d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p).
If n can be factored, then d can be easily computed from the public key (n, e). Conversely, if n, e, d are all known, there is an efficient probabilistic algorithm which factors n. This algorithm is based on the fact that if ed – 1 = 2st with t odd, then for at least half of the integers
there exists
such that a2σt ≢ ±1 (mod n), whereas a2σ+1t ≡ 1 (mod n). But then the gcd of n and a2σt – 1 is a non-trivial factor of n. For the details, solve Exercise 7.9.
Different entities in a given network should use different values of n. If two or more entities share a common n but different exponent pairs (ei, di), then each entity can first factor n and then use this factorization to compute the private keys of other entities. Primes are quite abundant in nature and so finding pairwise coprime RSA moduli for all entities is no problem at all. A common value of the encryption exponent e (for example, a small value of e) can, however, be shared by all entities. In that case, for pairwise different moduli ni, the corresponding decryption exponents di will also be pairwise different.
RSA encryption is rather simple, as Algorithm 5.2 shows.
|
Input: The RSA public key (n, e) of the recipient and the plaintext message Output: The ciphertext message Steps: c := me (mod n). |
By Exercise 4.1, the exponentiation function m ↦ me is bijective; so m can be uniquely recovered from c. It is clear why small encryption exponents e speed up RSA encryption. For a general exponent e, the routine takes time O(log3 n), whereas for a small e (that is, e = O(1)) the running time drops to O(log2 n).
RSA decryption (Algorithm 5.3) is analogous to RSA encryption.
|
Input: The RSA private key (n, d) of the recipient and the ciphertext message Output: The recovered plaintext message Steps: m := cd (mod n). |
The correctness of this decryption procedure follows from Exercise 4.1. As in the case of encryption, one might go for small decryption exponents d. In general, both e and d cannot be small simultaneously. If e is small, the security of the RSA scheme is expected not be affected, whereas small values of d are not desirable for several reasons. First, if d is very small, the adversary chooses some m, computes the corresponding ciphertext c (using public knowledge) and then keeps on computing cx (mod n) for x = 1, 2, . . . until x = d is reached, that is, until the original message m is recovered.
Even when d is not very small so that the possibility of exhaustive search with x = 1, 2, . . . can be precluded, there are several attacks known for small private exponents. Wiener [304] proposes an efficient algorithm in this respect. Boneh and Durfee [32] improve Wiener’s algorithm. Sun et al. [294] propose three variants of the RSA scheme that are resistant to these attacks. Durfee and Nguyen [82] extend the Boneh–Durfee attack to break two of these three variants. To sum up, it is advisable not to use small secret exponents d, that is, the bit length of d should be close to that of n in order to achieve the desired level of security.
There are alternative ways to speed up RSA decryption. If the values p, q, d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p) are all available to the recipient, he can use Algorithm 5.4 for RSA decryption.
|
Input: The RSA extended private key (p, q, d1, d2, h) of the recipient and the ciphertext message Output: The recovered plaintext message Steps: m1 := cd1 (mod p). m2 := cd2 (mod q). t := h(m1 – m2) (mod p). m := m2 + tq. |
In this modified routine, m1 := m rem p and m2 := m rem q are first computed and then combined using the CRT to get m modulo n = pq. Algorithm 5.3 performs a single modular exponentiation modulo n, whereas in Algorithm 5.4 two exponentiations modulo p and q respectively take the major portion of the running time. Since an exponentiation modulo N to an exponent O(N) runs in time O(log3 N), and since each of p and q has bit length (about) half of that of n, Algorithm 5.4 runs about four times as fast as Algorithm 5.3.
If only the values p, q, d are stored, then d1, d2 and h can be computed on the fly using relatively inexpensive operations and subsequently Algorithm 5.4 can be used. This leads to a decryption routine almost as fast as Algorithm 5.4, but calls for somewhat smaller memory requirements for the storage of the private key.
The Rabin public-key encryption algorithm is based on the intractability of computing square roots modulo a composite integer (SQRTP). By Exercise 4.10, the SQRTP is probabilistically polynomial-time equivalent to the IFP, that is, breaking the Rabin scheme is provably as hard as factoring integers. Breaking RSA, on the other hand, is only believed to be equivalent to factoring integers. Moreover, Rabin encryption is faster than RSA encryption (for moduli of the same size).
Like RSA, Rabin encryption requires a modulus of the form n = pq.
|
Output: A random Rabin key pair. Steps: Generate two different random primes p and q each of bit length l. n := pq. Return n as the public key and the pair (p, q) as the private key. |
Here, the choice of the bit length l and the generation of the primes p and q follow the same guidelines as discussed in connection with RSA key generation.
Encryption in the Rabin scheme involves a single modular squaring.
|
Input: The Rabin public key n of the recipient and the plaintext message Output: The ciphertext message Steps: c := m2 (mod n). |
Unfortunately, the Rabin encryption map m ↦ m2 (mod n) is not injective. In general, a ciphertext c has four square roots modulo n.[1] This poses ambiguity during decryption. In order to work around this difficulty, one adds some distinguishing feature or redundancy to the message m before encryption. One possibility is to duplicate a predetermined number of bits at the least significant end of m. This reduces the message space somewhat, but is rarely a serious issue. Only one of the (four) square roots of the ciphertext c is expected to have the desired redundancy. If none or more than one square root possesses the redundancy, decryption fails. However, this is a very rare phenomenon and can be ignored for all practical purposes.
[1] More specifically, if an element
is a square modulo both p and q, then the number of square roots of c equals 1 if c = 0; it is 2 if either c ≡ 0 (mod p) or c ≡ 0 (mod q) but not both; and it is 4 if c ≢ 0 (mod p) and c ≢ 0 (mod q). If c is not a square modulo either p or q, then c does not possess a square root modulo n. These assertions can be readily proved using the Chinese remainder theorem.
Rabin decryption (Algorithm 5.7) involves computing square roots modulo n. Since n is composite, this is a very difficult problem (for the eavesdropper). But the knowledge of the prime factors p and q of n allows the recipient to decrypt.
|
Input: The Rabin private key (p, q) of the recipient and the ciphertext message Output: The recovered plaintext message Steps: if
if (c has exactly one distinguished square root m mod n) { Return m. } else { Return “failure”. } |
So far, we have encountered encryption algorithms that are deterministic in the sense that for a given public key of the recipient the same plaintext message encrypts to the same ciphertext message. In a probabilistic encryption algorithm, different calls of the encryption routine produce different ciphertext messages for the same plaintext message and public key.
The Goldwasser–Micali encryption algorithm is probabilistic and is based on the intractability of the quadratic residuosity problem (QRP) described in Exercise 4.2. If n is a composite integer and a an integer coprime to n, then
implies that a is a quadratic non-residue modulo n. The converse does not hold, that is, one may have
, even when a is a quadratic non-residue modulo n. For example, if n is the product of two distinct odd primes p and q, then a is a quadratic residue modulo n if and only if a is a quadratic residue modulo both p and q. However, if
, we continue to have
. There is no easy way to find out if a is a quadratic residue modulo n for an integer a with
. If the factorization of n is available, the QRP is solvable in polynomial time. These observations lead to the design of the Goldwasser–Micali scheme.
The Goldwasser–Micali scheme works in the ring
, where n is the product of two distinct sufficiently large primes. The integer a (resp. b) in Algorithm 5.8 can be found by randomly choosing elements of
(resp.
) and computing the Legendre symbol
(resp.
). Under the assumption that quadratic non-residues are randomly located in
and
, a and b can be found after only a few trials. The integer x is a quadratic non-residue modulo n with
.
Goldwasser–Micali encryption (Algorithm 5.9) is probabilistic, since its output is dependent on a sequence of random elements ai of
. It generates a tuple (c1, . . . , cr) of elements of
such that each
. If mi = 0, then ci is a quadratic residue modulo n, whereas if mi = 1, ci is a quadratic non-residue modulo n. Therefore, if the quadratic residuosity of ci modulo n can be computed, the bit mi can be determined. If one (for example, the recipient) knows the factorization of n or equivalently the prime factor p of n, one can perform decryption easily. An eavesdropper, on the other hand, must solve the QRP (or the IFP) in order to find out the bits m1, . . . , mr. This is how Goldwasser–Micali encryption derives its security.
|
Output: A random Goldwasser–Micali key pair. Steps: Generate two (different) random primes p and q each of bit length l. n := pq. Find out integers a and b such that Compute an integer x with x ≡ a (mod p) and x ≡ b (mod q). /* Use CRT */ Return the pair (n, x) as the public key and the prime p as the private key. |
|
Input: The Goldwasser—Micali public key (n, x) of the recipient and the plaintext message m = m1 . . . mr, Output: The ciphertext message Steps: for i = 1, . . . , r { |
Since randomly chosen non-zero elements of
are with high probability coprime to n, it is sufficient to draw ai from
\{0} and skip the check whether gcd(ai, n) = 1. In fact, if an ai with gcd(ai, n) > 1 is somehow located, this gcd equals a non-trivial factor of n, and the security of the scheme is broken.
The Goldwasser–Micali scheme has the drawback that the length of the ciphertext message is much bigger than that of the plaintext message. Thus, for example, for a 1024-bit modulus n and a message m of bit length 64, the output requires a huge 65,536-bit space. This phenomenon is called message expansion and can be a serious limitation in certain circumstances.
Goldwasser–Micali decryption (Algorithm 5.10) recovers the bits of the plaintext message by computing Legendre symbols modulo the prime divisor p of n. The correctness of this decryption algorithm is evident from the discussion immediately following Algorithm 5.9.
|
Input: The Goldwasser—Micali private key p of the recipient and the ciphertext message Output: The recovered plaintext message m = m1, . . . , mr, Steps: for i = 1, . . . , r { |
The Blum–Goldwasser algorithm is another probabilistic encryption algorithm and is better than the Goldwasser–Micali algorithm in the sense that in this case the message expansion is by only a constant number of bits irrespective of the length of the plaintext message. The Blum–Goldwasser scheme is based on the intractability of the SQRTP (modulo a composite integer).
As in the case of the encryption algorithms discussed so far, the Blum–Goldwasser algorithm works in the ring
, where n = pq is the product of two distinct primes p and q. Now, we additionally demand p and q to be both congruent to 3 modulo 4.
|
Input: A bit length l. Output: A random Blum–Goldwasser key pair. Steps: Generate two (different) random primes p and q each of bit length l and each congruent to 3 mod 4. n := pq. Return n as the public key and the pair (p, q) as the private key. |
Since p and q are two different primes, there exist integers u and v such that up + vq = 1. In order to speed up decryption, it is often expedient to store u and v along with p and q in the private key. Recall that the solution of the congruences x ≡ a (mod p) and x ≡ b (mod q) is given by x ≡ vqa + upb (mod n).
The Blum–Goldwasser encryption algorithm assumes that the input plaintext message m is in the form of a bit string, and breaks m into substrings of a fixed length t. A typical choice for t is t = ⌊lg lg n⌋, where n is the public key of the recipient. Write m = m1 . . . mr, where each mi is a bit string of length t. The ciphertext consists of r bit strings c1, . . . , cr, each of bit length t, and an element
.
|
Input: The Blum–Goldwasser public key n of the recipient and the plaintext message m = m1 . . .mr, where each mi is a bit string of length t. Output: The ciphertext message (c1, . . . , cr, d), where each ci is a bit string of length t and Steps: Choose a random element d := d2 (mod n). |
Blum–Goldwasser encryption involves computation of r modular squares in
and is quite fast (for example, faster than RSA encryption with a general encryption exponent). It makes sense to assume that the initial choice of d is from
, since finding a non-zero non-invertible element of
is as difficult as factoring n.
For an intruder to determine the plaintext message m from the corresponding ciphertext message, the values of d inside the for loop are necessary. These can be obtained by taking repeated square roots modulo n. Since n is composite, this is a difficult problem. On the other hand, since the recipient knows the prime divisors p and q of n, taking square roots modulo n requires only polynomial-time effort.
Recall from Exercise 3.43 that a quadratic residue
(where n is the public key of the recipient) has four distinct square roots of which exactly one is again a quadratic residue modulo n. This distinguished square root y of d satisfies the congruences y ≡ d(p+1)/4 (mod p) and y ≡ d(q+1)/4 (mod q). In the decryption Algorithm 5.13, we assume that
.
Algorithm 5.13 assumes that each value of d is a quadratic residue modulo n. This can be verified by inserting in the for loop a check whether
, before an attempt is made to compute the square root of d modulo n. If (c1, . . . , cr, d) is a valid ciphertext message, this condition necessarily holds, and there is no fun wasting time for checking obvious things. However, if there is a possibility that d is altered by an (active) adversary (or corrupted during transmission), one may insert this check. In that case, the routine should report failure, when the square root of a quadratic non-residue modulo n is to be computed.
|
Input: The Blum–Goldwasser private key (p, q) of the recipient and the ciphertext message (c1, . . . , cr, d), where each ci is a bit string of length t and Output: The recovered plaintext message m = m1 . . . mr, where each mi is a bit string of length t. Steps: for i = r, r – 1, . . . , 1 { |
The ElGamal encryption algorithm works in a group G in which it is difficult to solve the Diffie–Hellman problem (DHP). Typical candidates for G include the multiplicative group
of a finite field
(usually q is a prime or a power of 2), the (additive) group of points on an elliptic curve over a finite field and the (additive) group (called the Jacobian) of reduced divisors on an hyperelliptic curve over a finite field. Here we assume that G is multiplicatively written and has order n. It is not necessary for G to be cyclic, but we should have at our disposal an element
with a suitably large (preferably prime) order k. We essentially work in the cyclic subgroup H of G generated by g (but using the arithmetic of G). For the ElGamal scheme, G (together with its representation), g, n and k are made public and can be shared by different entities on a network.
Generating a key pair for the ElGamal scheme (Algorithm 5.14) involves an exponentiation in G. In order to make the exponentiation efficient, the exponent (the private key) is often chosen to have a small number of 1 bits. However, if this number is too small, exhaustive search by an adversary may become feasible.
If the DLP can be solved in G, the private key d can be computed from the public key gd. This amounts to breaking a system based on this key pair. This is why we often say that the security of the ElGamal encryption scheme banks on the intractability of the DLP. But as we see shortly, the DHP is the more fundamental computational problem that dictates the security of ElGamal encryption.
|
Input: G, g and k as defined above. Output: A random ElGamal key pair. Steps: Generate a random integer d, 2 ≤ d ≤ k – 1. Return gd as the public key and d as the private key. |
Given a message
, the ElGamal encryption procedure (Algorithm 5.15) generates a pair (r, s) of elements of G as the ciphertext message and thus corresponds to message expansion by a factor of 2. Clearly, the sender has all the relevant information for computing (r, s). The need for using a different session key for each encryption is explained in Exercise 5.6.
|
Input: (G, g, k and) the ElGamal public key gd of the recipient and the plaintext message Output: The ciphertext message Steps: Generate a (random) session key d′, 2 ≤ d′ ≤ k – 1. r := gd′. s := mgdd′ = m(gd)d′. |
Notice that ElGamal encryption uses two exponentiations in G to exponents which are O(k). Therefore, the running time of Algorithm 5.15 reduces, if smaller values of k are selected. On the other hand, if k is too small, the square-root methods in H = 〈g〉 may become efficient (see Section 4.4.1). In practice, it is recommended that k be taken as a prime of length 160 bits or more.
ElGamal decryption involves an exponentiation in G to an exponent which is O(k). It is easy to verify that Algorithm 5.16 performs decryption correctly and that the recipient has the necessary information to carry out decryption.
|
Input: (G, g, k and) the ElGamal private key d of the recipient and the ciphertext message Output: The recovered plaintext message Steps: m := sr–d = srk–d. |
An eavesdropper Carol knows the domain parameters G, g, k and n and also the recipient’s public key gd. Determining the message m from a knowledge of the corresponding ciphertext (r, s) is then equivalent to computing the element gdd′. This implies that a (quick) solution of the DHP permits Carol to decrypt a ciphertext. If a (quick) solution of the DLP is available, then the element gdd′ is computable fast. The reverse implication is, however, not clear: it may be easier to solve the DHP than the DLP, though no concrete evidences are available to corroborate this fact.
The Chor–Rivest encryption algorithm is based on a variant of the subset sum problem. It selects a prime p and an integer h ≥ 2, uses a knapsack set A = {a0, . . . , ap–1} with 1 ≤ ai ≤ ph – 2 for each i, and considers sums of the form
,
, with
. In order to construct the set A for which the h-fold sum s is uniquely determined by the binary vector (∊0, . . . , ∊p–1) of weight h (that is, with exactly h bits equal to 1), we take the help of the finite field
. We represent
as
, where
is irreducible of degree h and where x is the residue class of X in
. The parameters p and h must be so chosen that ph –1 is reasonably smooth, so that the integer factorization of ph – 1 can be easily computed. This helps us in two ways. First, a generator g(x) of the multiplicative group
can be made available quickly using Algorithm 3.25. Second, the Pohlig–Hellman method of Section 4.4.1 becomes efficient for computing discrete logarithms in
. We can then take ai := indg(x)(x + i), i = 0, 1, . . . , p – 1. If (∊0, . . . , ∊p–1) and
are two binary vectors of weight h, then
implies
, that is,
, that is,
for all i = 0, . . . , p – 1 , since otherwise x would satisfy a non-zero polynomial of degree < h.
A randomly permuted version of a0, . . . , ap–1 shifted by a noise (that is, a random bias) d together with p and h constitute the public key of the Chor–Rivest scheme. The private key, on the other hand, comprises the polynomials f(X) and g(x), the permutation just mentioned and the noise d. Algorithm 5.17 elaborates the generation of such a key pair. The same values of p and h can be used by different entities on a network. So we assume that p and h are provided instead of generated by the recipient as a part of his public key. For brevity, we use the notation q := ph.
Key generation may be a long process in the Chor–Rivest scheme depending on how difficult it is to compute all the indexes indg(x)(x + i). Furthermore, the size of the public key is quite large, namely O(ph log p). Typically one may take p ≈ 200 and h ≈ 25. The original paper of Chor and Rivest [54] recommends the possibilities (197, 24), (211, 24), (243, 24) and (256, 25) for (p, h). Note that 256 is not a prime, but Chor–Rivest’s algorithm works, even when p is a power of a prime. For the sake of simplicity, we here stick to the case that p is a prime.
|
Input: A prime p and an integer h ≥ 2 such that ph – 1 is smooth. Output: A Chor–Rivest key pair. Steps: Choose an irreducible polynomial Use the representation Choose a random generator g(x) of Compute the indexes ai := indg(x)(x + i) for i = 0, 1, . . . , p – 1. Select a random permutation π of {0, 1, . . . , p – 1}. Select a random noise d in the range 0 ≤ d ≤ q – 2. Compute αi := aπ(i) + d (mod q – 1) for i = 0, 1, . . . , p – 1. Return (α0, α1 . . . , αp–1) as the public key and (f, g, π, d) as the private key. |
The Chor–Rivest encryption procedure (Algorithm 5.18) assumes that the input plaintext message is represented as a binary vector (m0, . . . , mp–1) of weight (that is, number of one-bits) equal to h. Since there are
such binary vectors, arbitrary binary strings of bit length
can be encoded into binary vectors of the above special form. See Chor and Rivest [54] for an algorithm that describes how such an encoding can be done. Chor–Rivest encryption is quite fast, since it computes only h integer additions modulo q – 1.
|
Input: The Chor–Rivest public key (α0, . . . , αp–1) (together with p and h) and the plaintext message (m0, . . . , mp–1) which is a binary vector of weight h. Output: The ciphertext message Steps:
|
The Chor–Rivest decryption procedure (Algorithm 5.19) generates a monic polynomial
of degree h, the h (distinct) roots of which gives the non-zero bits mi in the original plaintext message.
In order to prove that the decryption correctly works, note that
(mod q – 1) , so that
(mod f(X)). The polynomial u(X) is computed as one of degree < h. Adding f(X) to u(X) gives a monic polynomial v(X) of degree h, which is congruent modulo f(X) to
. The roots of v(X) can be obtained either by a root finding algorithm or by trial divisions of v(X) by X + i, i = 0, 1, . . . , p – 1. Applying the inverse of π on these roots then reconstructs the plaintext message.
|
Input: The Chor–Rivest private key (f, g, π, d) (together with p and h) and the ciphertext message Output: The recovered plaintext message (m0, . . . , mp–1) which is a binary vector of weight h. Steps: s := c – hd (mod q – 1). u(X) := g(X)s (mod f(X)). v(X) := f(X) + u(X). Factorize u(X) as u(X) = (X + i1)· · ·(X + ih), For i = 0, 1, . . . , p – 1 set |
An eavesdropper sees only the sum
(mod q – 1) of the (known) knapsack weights α0, . . . , αp–1. In order to recover m0, . . . , mp–1, she should solve the SSP. By choosing p and h carefully, the density of the knapsack set can be adjusted to be high, that is, larger than what the cryptanalytic routines described in Section 4.8 can handle. Thus, the Chor–Rivest scheme is assumed to be secure. However, as discussed in Chor and Rivest [54], the security of the system breaks down, when certain partial information on the private key are available.
XTR, a phonetic abbreviation of efficient and compact subgroup trace representation, is designed by Arjen Lenstra and Eric Verheul as an attractive alternative to RSA (and similar cryptosystems including the ElGamal scheme over finite fields) and elliptic curve cryptosystems (ECC). The attractiveness of XTR arises from the following facts:
XTR runs (about three times) faster than RSA or ECC.
XTR has shorter keys (comparable with ECC).
The security of XTR is based on the DLP/DHP over finite fields of sufficiently big sizes and not on a new allegedly difficult computational problem.
The parameter and key generation for XTR is orders of magnitude faster than that for RSA/ECC.
XTR, though not a fundamental breakthrough, deserves treatment in this chapter. The working of XTR is somewhat involved and we plan to present only a conceptual description of the algorithm, hiding the mathematical details.
XTR considers the following tower of field extensions:

where p ≡ 2 (mod 3) is a prime, sufficiently large so that computing discrete logs in
using known algorithms is infeasible. We have p6 – 1 = (p – 1)(p + 1)(p2 – p + 1)(p2 + p + 1). Let q be a prime divisor of p2 – p + 1 of bit length 160 or more. There is a unique subgroup G of
with #G = q. G is called the XTR (sub)group, whereas the entire group
is called the XTR supergroup. The XTR group G is cyclic (Lemma 2.1, p 27). Let g be a generator of G, that is, G = 〈g〉 = {1, g, g2, . . . , gq–1}.
The working of XTR is based on the discrete log problem in G. Since p2 – p + 1 and hence q are relatively prime to the orders of the multiplicative groups of all proper subfields of
, computing discrete logs in G is (seemingly) as difficult as that in
, that is, one gets the same level of security by the use of G instead of the full XTR supergroup.
The main technical innovation of XTR is the proposal of a compact representation of the elements of G in place of the obvious representation using ⌈6 lg p⌉ bits inherited from that of
. This is precisely where the intermediate field
comes into picture. We require a map
, so that we can represent elements of G by those of
. This map offers two benefits. First, the elements of G can now be represented using ⌈2 lg p⌉ bits leading to a three-fold reduction in the key size. Second, the arithmetic of
can be exploited to implement the arithmetic in G, thereby improving the efficiency of encryption and decryption routines (compared to those over the full XTR supergroup).
The map
uses the traces of elements of
over
(Definition 2.59). In this section, we use the shorthand notation Tr to stand for
. The conjugates of an element
over
are h, hp2, hp4 and so

Let us now specialize to
. Since p2 ≡ p – 1 (mod p2 – p + 1) and p4 ≡ –p (mod p2 – p + 1), the conjugates of h are gn, g(p–1)n, g–pn. Thus, Tr(gn) = gn + g(p–1)n + g–pn. Moreover,

so the minimal polynomial of h = gn over
is

This minimal polynomial is determined uniquely by Tr(gn) and so we can represent
by
. Note, however, that this representation is not unique, that is, the map
, is not injective. More precisely, the only elements of G that map to Tr(gn) are the conjugates gn, g(p–1)n, g–pn of gn. This is often not a serious problem, as we see below.
In order to complete the description of the implementation of the arithmetic of the group G, we need to address two further issues. This is necessary, since the trace representation
defined above is not a homomorphism of groups. First, we specify how one can implement the arithmetic of
. Since p ≡ 2 (mod 3), X2+X+1 is irreducible over
. If
is a root of X2 + X + 1, we have the standard representation
. That is, we can represent
. Since 1 + α + α2 = 0, we have y0 + y1α = (–α – α2)y0 + y1α = (y1 – y0)α + (–y0)α2. This leads to the non-standard representation

Since p ≡ 2 (mod 3) and α3 = 1 + (α – 1)(α2 + α + 1) = 1, the
-basis {α, α2} of
is the same as the normal basis {α, αp}. Under this basis, the basic arithmetic operations in
can be implemented using only a few multiplications (and some additions/subtractions) in
, as described in Table 5.1. Here, the operands are x = x1α + x2α2, y = y1α + y2α2 and z = z1α + z2α2.
| Operation | Number of multiplications |
| xp | 0 (since xp = x2α + x1α2.) |
| x2 | 2 (since x2 = x2(x2 – 2x1)α + x1(x1 – 2x2)α2.) |
| xy | 3 (since xy = (x2y2–x1y2–x2y1)α + (x1y1–x1y2–x2y1)α2, that is, it suffices to compute x1y1, x2y2, (x1 + x2)(y1 + y2).) |
| xz – yzp | 4 (since xz – yzp = (z1(y1 – x2 – y2) + z2(x2 – x1 + y2))α + (z1(x1 – x2 + y1) + z2(y2 – x1 – y1))α2.) |
Now, we explain how arithmetic operations in G translate to those in
under the representation of
by
. To start with, we show how the knowledge of Tr(h) and n allows one to compute Tr(hn) for
. This corresponds to an exponentiation in G. For
, define the polynomial

where h1, h2,
are the three roots (not necessarily distinct) of Fc(X). For
, we use the notation

Putting c = Tr(g) yields cn = Tr(gn), or, more generally, for c = Tr(gk) we have cn = Tr(gkn). Algorithm 5.20 computes

given
(for example, Tr(gk)) and
(typically
). The correctness of the algorithm is based on the following identities, the derivations of which are left to the reader (alternatively, see Lenstra and Verheul [170]).
Equation 5.1

Equation 5.2

Equation 5.3

Equation 5.4

Equation 5.5

Equation 5.6

Equation 5.7

Equation 5.8

|
Output: Steps: if (n < 0) {
/* Initialize */ |
A careful analysis suggests that the computation of cn from c requires 8 lg n multiplications in
. An exponentiation in
, on the other hand, requires an expected number of 23.4 lg n multiplications in
(assuming that, in
, the time for squaring is 80 per cent of that of multiplication). Thus, the XTR representation provides a speed-up of about 3.
The domain parameters for an XTR cryptosystem include primes p and q satisfying the following requirements:
|q| ≥ 160 (where |a| = ⌈lg a⌉ denotes the bit size of a positive integer a).
|p6| ≥ 1024.
p ≡ 2 (mod 3).
q|(p2 – p + 1).
We require a generator g of the XTR group G. Since we planned to replace working in G by working in
, the element g is not needed explicitly. The trace Tr(g) suffices for our purpose. Lenstra and Verheul [170, 172] describe several methods for obtaining the domain parameters p, q, Tr(g). We describe here the naivest strategies. Algorithm 5.21 outputs the primes p, q with |p| = lp and |q| = lq for some given lp,
.
|
Randomly choose Randomly choose |
Determination of Tr(g) for a suitable g requires some mathematics. First, notice that if the polynomial
is irreducible (over
) for some
, then c = Tr(h) for some
with ord h|(p2 – p + 1). Moreover, c(p2 –p+1)/q, if not equal to 3, is the trace of an element (for example, h(p2–p+1)/q) of order q. Thus, we may take Tr(g) = c(p2–p+1)/q. Although we do not need it explicitly, the corresponding
can be taken to be any root of the polynomial FTr(g)(X).
What remains to explain is how one can find an irreducible
. A randomized algorithm results from the fact that for a randomly chosen
the polynomial Fc(X) is irreducible with probability ≈ 1/3.
Once the domain parameters of an XTR system are set, the recipient chooses a random
and computes Tr(gd) using Algorithm 5.20. The tuple (p, q, Tr(g), Tr(gd)) is the public key and d the private key of the recipient.
XTR encryption (Algorithm 5.22) is very similar to ElGamal encryption. The only difference is that now we work in
under the trace representation of the elements of G, that is, one uses Algorithm 5.20 for computing exponentiations in G.
|
Input: The public key (p, q, Tr(g), Tr(gd)) of the recipient and the message Output: The ciphertext message Steps: Generate a random session key Compute r := Tr(gd′) using Algorithm 5.20 with c := Tr(g) and n := d′. Compute Tr(gdd′) using Algorithm 5.20 with c := Tr(gd) and n := d′. Set s := m Tr(gdd′). |
XTR decryption (Algorithm 5.23) is again analogous to ElGamal decryption except that we have to incorporate the XTR representation of elements of G.
|
Input: The private key d of the recipient and the ciphertext Output: The recovered plaintext message m. Steps: Compute Tr(gdd′) using Algorithm 5.20 with c := r = Tr(gd′) and n := d. Set |
Note that XTR encryption and decryption use Algorithm 5.20 for performing exponentiations. Therefore, these routines run about three times faster than the corresponding ElGamal routines based on the standard
arithmetic.
Hoffstein et al. [130] have proposed the NTRU encryption scheme in which encryption involves a mixing system using the polynomial algebra
and reductions modulo two relatively prime integers α and β. The decryption involves an unmixing system and can be proved to be correct with high probability. The security of this scheme banks on the interaction of the mixing system with the independence of the reductions modulo α and β. Attacks against NTRU based on the determination of short vectors in certain lattices are known. However, suitable choices of the parameters make NTRU resistant to these attacks. The most attractive feature of the NTRU scheme is that encryption and decryption in this case are much faster than those in other known schemes (like RSA, ECC and even XTR).
NTRU parameters include three positive integers n, α and β with gcd(α, β) = 1 and with β considerably larger than α (see Table 5.2). Consider the polynomial algebra
. An element of
is represented as a polynomial f = f0 + f1X + · · · + fn–1Xn–1 or, equivalently, as a vector (f0, f1, . . . , fn–1) of the coefficients. Note that Xn – 1 is not irreducible in
(for n ≥ 2) and so R is not a field, but that does not matter for the NTRU scheme. For two polynomials f, g of degree < n and with integer coefficients, we denote by f g the product of f and g in
, whereas f and g as elements of R multiplies to f ⊛ g = h with
| Security | n | α | β | νf | νg | νu |
| short-term | 107 | 3 | 64 | 15 | 12 | 5 |
| moderate | 167 | 3 | 128 | 61 | 20 | 18 |
| standard[*] | 263 | 3 | 128 | 50 | 24 | 16 |
| high | 503 | 3 | 256 | 216 | 72 | 55 |
[*] Assumed to be equivalent to 1024-bit RSA
NTRU works with polynomials having small coefficients. More specifically, we define the following subsets of R. The message space (that is, the set of plaintext messages)
consists of all polynomials of R with coefficients reduced modulo α. Unlike our representation of
so far, we use the integers between –α/2 and +α/2 to represent the coefficients of polynomials in
, that is,

For ν1,
, we also define the subset

of R. For suitably chosen parameters νf, νg and νu (see Table 5.2), we use the special notations:

With these notations we are now ready to describe the NTRU key generation routine. The subsets
,
,
and
are assumed to be public knowledge (along with the parameters n, α and β).
|
Input: n, α, β and Output: A random NTRU key pair. Steps: Choose /* f must be invertible modulo both α and β */ Compute fα and fβ satisfying fα ⊛ f ≡ 1 (mod α) and fβ ⊛ f ≡ 1 (mod β). h := fβ ⊛ g (mod β). Return h as the public key and f (along with fα) as the private key. |
The polynomial fα can be computed from f during decryption. However, for the sake of efficiency, it is recommended that fα be stored along with f.
The integers α and β are either small primes or small powers of small primes (Table 5.2). The most time-consuming step in the NTRU key generation procedure is the computation of the inverses fα and fβ. Suppose we want to compute the inverse of f in
, where p is a small prime and e is a small exponent (we may have e = 1). We first compute f(X)–1 in the ring
. Since p is a prime,
is a field, that is,
is a Euclidean domain (Exercise 2.31). We compute the extended Euclidean gcd of f(X) with Xn – 1. If f(X) and Xn – 1 are not coprime modulo p, then f(X) is not invertible in
, else we get
and s(X) is the inverse of f(X) in
. A randomly chosen f(X) with gcd(f(1), p) = 1 has high probability of being invertible modulo p. Recall that we have chosen
, so that f(1) = 1.
If e = 1, we have already computed the desired inverse of f(X). If e > 1, we have to lift the inverse fp(X) = u(X) of f(X) modulo p to the inverse fp2 (X) of f(X) modulo p2, and then to the inverse fp3 (X) of f(X) modulo p3, and so on. Eventually, we get the inverse fpe (X) of f(X) modulo pe. Here we describe the generic lift procedure of fpk (X) to fpk+1 (X). In the ring
, we have fpk ⊛ f ≡ 1 (mod pk). We can write fpk+1 (X) = fpk (X) + pka(X) for some
. Substituting this value in fpk+1 ⊛ f ≡ 1 (mod pk+1) gives the unknown polynomial a(X) as

where s(X) = fp(X) is the inverse of f modulo p.
It is often recommended that f(X) be taken of the form
for some
. In this case, fα(X) = 1 is trivially available and need not be computed as mentioned above. Such a choice of f also speeds up NTRU decryption (see Algorithm 5.26) by reducing the number of polynomial multiplications from two to one. The inverse fβ, however, has to be computed (but need not be stored).
For NTRU encryption (Algorithm 5.25), the message is encoded to a polynomial in
. The costliest step in this algorithm is computing the product u ⊛ h and can be done in time O(n2). Asymptotically better running time (O(n log n)) is achievable by Algorithm 5.25, if one uses faster polynomial multiplication routines (like those based on fast Fourier transforms). However, for the cryptographic range of values of n, straightforward quadratic multiplication gives better performance. Most other encryption schemes (like RSA) take time O(n3), where n is the size of the modulus. This explains why NTRU encryption is much faster than conventional encryption routines.
|
Input: (n, α, β and) the NTRU public key h of the recipient and the plaintext message Output: The ciphertext c which is a polynomial in R, reduced modulo β. Steps: Randomly select c := αu ⊛ h + m (mod β). |
NTRU decryption (Algorithm 5.26) involves two multiplications in R and runs in time O(n2). In order to prove the correctness of Algorithm 5.26, one needs to verify that v ≡ αu ⊛ g + f ⊛ m (mod β). With an appropriate choice of the parameters, it can be ensured that almost always the polynomial
has coefficients in the interval –β/2 and +β/2. In that case, we have the equality v = αu ⊛ g + f ⊛ m in R. Multiplication of v by fα and reduction modulo α now clearly retrieves m.
|
Input: The NTRU private key f (and fα) of the recipient and the ciphertext message c. Output: The recovered plaintext message Steps: v := f ⊛ c (mod β). /* The coefficients of v are chosen to lie between –β/2 and +β/2 */ m := fα ⊛ v (mod α). |
If f is chosen to be of the special form f = 1 + αf1 (for some polynomial f1), then v = αu ⊛ g + αf1 ⊛ m + m. Thus, reduction of v modulo α straightaway gives m, that is, there is no need to multiply v by fα. Also fα (having the trivial value 1) need not be stored in the private key. To sum up, taking f to be of the above special form increases the efficiency of the NTRU scheme without (seemingly) affecting its security. But now f is no longer an element of
and some care should be taken to choose suitable values of f.
NTRU decryption fails, usually when m is not properly centred (around 0). In that case, representing v as a polynomial with coefficients in the range –β/2 + x and +β/2 + x for a small positive or negative value of x may result in correct decryption. If, on the other hand, no values of x work, NTRU decryption cannot recover m easily and is said to suffer from a gap failure. For suitable parameter values, gap failures are very unlikely and can be ignored for all practical purposes.
Now, let us see how the NTRU system can be broken. In order to find out the private key f from the public key h = fβ ⊛ g, one may keep on searching for
exhaustively, until
. Alternatively, one may try all
, until
. In a similar manner, m can be retrieved from c by trying all
, until
. Clearly, such an attack takes expected time proportional to the size of
or
or
.
A baby-step–giant-step strategy reduces the running times to the square roots of the sizes of the above sets. For example, suppose we want to compute f from h. We split f = f1 + f2 into two nearly equal pieces f1 and f2. If n is odd, f1 may contain the (n + 1)/2 most significant terms and f2 the (n – 1)/2 least significant terms of f. Now, we compute (f2, –f2 ⊛ h (mod β)) for all possibilities of f2 and store the pairs sorted by the second component. Next, for each possibility of f1 (baby step) we compute f1 ⊛ h (mod β) and see if there is any f2 (giant step) for which f1 ⊛ h (mod β) and –f2 ⊛ h (mod β) have nearly equal values. If a matching pair (f1, f2) is located, we take f = f1 + f2. A similar method works for guessing m from c.
It is necessary to take the sets
,
and
big enough, so that exhaustive or square root attacks are not feasible. Typically, choosing the sizes of these sets to be ≥ 2160 is deemed sufficiently secure.
Another relevant attack is discussed in Exercise 5.11. By far, the most sophisticated attack on the NTRU encryption scheme is based on finding short vectors in a lattice. We describe this attack in connection with the computation of the private key f from a knowledge of the private key h. Let L denote the lattice in
generated by the rows of the 2n × 2n matrix:

where h = h0 + h1X + · · · + hn–1Xn–1 = (h0, h1, . . . , hn–1) and where λ is a parameter whose choice is discussed below. Since h = g ⊛ f–1 (mod β), multiplying the i-th row by fi–1 (i = 1, . . . , n) and adding we conclude that the vector v := (λf0, λf1, . . . , λfn–1, g0, g1, . . . , gn–1) is in L. By tuning the value λ, the attacker maximizes the chance for v to be a short vector in L. However, if the system parameters are appropriately selected, lattice reduction algorithms become rather ineffective in finding v. Heuristic evidences suggest that this attack runs in time exponential in n.
| 5.1 | Establish the correctness of Algorithm 5.4. |
| 5.2 |
|
| 5.3 |
|
| 5.4 | Assume that two parties Bob and Barbara share a common RSA modulus n but relatively prime encryption exponents e1 and e2. Alice encrypts the same message by (n, e1) and (n, e2) and sends the ciphertext messages to Bob and Barbara respectively. Suppose also that Carol intercepts both the ciphertexts. Describe a method by which Carol retrieves the (common) plaintext. [H] |
| 5.5 | Let n = pq be a Rabin public key and let be a quadratic residue modulo n. Show that the knowledge of the four square roots of c modulo n breaks the Rabin system.
|
| 5.6 | What is the disadvantage of using the same session key in the ElGamal encryption scheme for encrypting two different messages (for the same recipient)? [H] |
| 5.7 | Let p be an odd prime and g a generator of .
|
| 5.8 | Show that if the private-key parameters f(X) and d are known to a cryptanalyst of the Chor–Rivest scheme, she can recover the other parts of the private key and thus break the system completely. [H] |
| 5.9 | Show that if f(X) is only known to a cryptanalyst of the Chor–Rivest scheme, then also she can recover the full private key. [H] |
| 5.10 |
|
| 5.11 | In this exercise, we use the notations of Section 5.2.8. Assume that Alice encrypts the same message m several times using the NTRU public key h of Bob, but with different random polynomials , i = 1, . . . , r, and sends the corresponding ciphertext messages c1, . . . , cr. Describe a strategy how an eavesdropper Carol can recover a considerable part of u1. [H] Trying all the possibilities for the (relatively small) unknown part of u1 allows Carol to retrieve m with little effort.
|
Consider the scenario wherein two parties Alice and Bob want to share a secret information (say, a DES key for future correspondence), but it is not possible to communicate this secret by personal contact or by conversing over a secure channel. In other words, Alice and Bob want to arrive at a common secret value by communicating over a public (and hence insecure) channel. A key-exchange or a key-agreement protocol allows Alice and Bob to do so. The protocol should be such that an eavesdropper listening to the conversation between Alice and Bob cannot compute the secret value in feasible time.
Public-key technology is used to design a key-exchange protocol in the following way. Alice generates a key pair (eA, dA) and sends the public key eA to Bob. Similarly, Bob generates a random key pair (eB, dB) and sends the public key eB to Alice. Now, Alice and Bob respectively compute the values sA = f(eB, dA) and sB = f(eA, dB) using their respective knowledges, where f is a suitably chosen function. If sA = sB, then this value can be used as the shared secret between Alice and Bob. The intruder Carol can intercept eA and eB, but f should be such that a knowledge of eA and eB alone does not allow Carol to compute sA = sB. She needs dA or dB for this computation. Since (eA, dA) and (eB, dB) are key pairs, we assume that it is infeasible to compute dA from eA or dB from eB.
In what follows, we describe some key-exchange protocols. The security of these protocols is dependent on the intractability of the DHP (or the DLP). We provide a generic description, where we work in a finite Abelian multiplicative group G of order n. We write the identity of G as 1. G need not be cyclic, but we assume that an element
having suitably large (and preferably prime) multiplicative order m is provided. G, g, n and m may be made publicly available, but G should be a group in which one cannot compute discrete logarithms in feasible time. Typical examples of G are given in Section 5.2.5.
Basic key-exchange protocols provide provable security against passive attacks under the intractability of the DHP. However, several models of active attacks are known for the basic protocols. One requires authentication (validation of the public keys) to eliminate these attacks.
The Diffie–Hellman (DH) key-exchange algorithm [78] is one of the pioneering discoveries leading to the birth of public-key cryptography.
|
Input: G, g, n and m as defined above. Output: A secret element Steps: Alice generates a random Alice sends eA to Bob. Bob generates a random Bob sends eB to Alice. Alice computes s := (eB)dA = gdAdB. Bob computes s := (eA)dB = gdAdB. if (s = 1) { Return “failure”. } |
The DH scheme fails, if the shared secret turns out to be a trivial element (like the identity) of G. In that case, Alice and Bob should re-execute the protocol with different key pairs. The probability of such an incident is, however, extremely low.
The intruder Carol learns the group elements gdA and gdB by listening to the conversation between Alice and Bob and intends to compute s = gdAdB. Thus, she has to solve an instance of the DHP in the group G. By assumption, this is computationally infeasible. This is how the DH scheme derives its security.
A small-subgroup attack on the DH protocol can be mounted by an active adversary. Assume that the order m of g in G is composite and has known factorization m = uv with u small. Carol intercepts the messages between Alice and Bob, replaces them by their respective v-th powers and retransmits the modified messages.
|
Alice generates a random Alice transmits eA for Bob. Carol intercepts eA, computes Bob generates a random Bob transmits eB for Alice. Carol intercepts eB, computes Alice computes Bob computes if (s′ = 1) { Return “failure”. } |
But ord g = uv and so (s′)u = 1, that is, s′ has only u – 1 non-trivial values. Since u is small, the possibilities for s′ can be exhaustively searched by Carol. The best countermeasure against this attack is to take m to be a prime (of bit length ≥ 160).
Even when m is prime, it may be the case that the cofactor k := n/m has a small divisor u and it is possible that an active attacker intervenes in such a way that Alice and Bob agree upon a secret value of order (equal to or dividing) u. For example, Carol may replace both the transmitted public keys by an element h of order u. If dA and dB are congruent modulo u, the shared secret has only a few possible values and Carol can obtain the correct value by exhaustive search. On the other hand, if dA ≢ dB (mod u), Alice and Bob do not come up with the same secret. However, if Alice uses her secret to encrypt a message for Bob, it remains easy for Carol to decrypt the intercepted ciphertext by trying only a few choices for Alice’s key. Alice and Bob can prevent this attack by refusing to accept as the shared secret not only the trivial value s = 1 but also elements of small orders.
A small-subgroup attack can also be mounted by one of the communicating parties (say, Bob) in an attempt to gain information about the other’s (Alice’s) secret dA. Let us continue to assume that the cofactor k := n/m has a small divisor u. Bob finds an element h in G of order u. Instead of eB = gdB Bob now sends
h to Alice. Alice computes the shared secret as
. Bob, on the other hand, can normally compute sB := gdAdB. Now, suppose that Alice uses a symmetric cipher with the key
(or some part of it) and sends the ciphertext to Bob. In order to decrypt, Bob tries all of the u possible keys sBhj for j = 0, 1, . . . , u – 1. The value of j for which decryption succeeds equals dA modulo u. A similar attack can be mounted by Bob, when
is chosen to be an element (like h itself) of order u.
If G is cyclic and H is the subgroup generated by g, then an element
is in H if and only if am = 1 (Proposition 2.5, p 27). Moreover, if gcd(k, m) = 1, each communicating party can check the validity of the other party’s public key by using an m-th power exponentiation. An element like
h or h of the last paragraph does not pass this test. If so, Alice should abandon the protocol. However, the validation of the public key requires a modular exponentiation and thereby slows down the protocol.
We now present an efficient modification of the basic Diffie–Hellman scheme that prevents small-subgroup attacks (by a communicating party or an eavesdropper) without calculating an extra exponentiation. We continue with the notation k := n/m and assume that k is coprime to m. Now, the shared secret is computed as gdAdB or gkdAdB depending on whether compatibility with the original DH scheme is desired or not. Algorithm 5.29 describes the modified DH algorithm. Solve Exercise 5.12 in order to establish the effectiveness of this algorithm against small-subgroup attacks.
Other active attack models on the (basic or modified) DH protocol can be conceived of. One important class of attacks is now described.
An unknown key-share attack on a key-exchange protocol makes a party believe that (s)he shares a secret with another party, whereas the secret is actually shared by a third party. Assume that Carol can monitor and modify every message between Alice and Bob. When Alice and Bob execute Algorithm 5.27 or 5.29, Carol can intervene and pretend to Alice that she is Bob and to Bob that she is Alice. At the end of the protocol, Alice and Carol come up with a shared secret sAC, and Bob and Carol with another shared secret sBC. Alice believes that she shares sAC with Bob, and Bob believes that he shares sBC with Alice.
|
Input: G, g, n, m and k as defined above and a flag indicating compatibility with the original DH scheme. Output: A secret element Steps: Alice generates a random Alice sends eA to Bob. Bob generates a random Bob sends eB to Alice. if (compatibility with the original DH algorithm is desired) { |
Now, when Alice wants to send a secret message m to Bob, she encrypts m by sAC and transmits the ciphertext c. Carol intercepts c, decrypts it by sAC to retrieve m, encrypts m by sBC and sends the new ciphertext c′ to Bob. Bob retrieves m by decrypting c′ with his key sBC. The process raises hardly any suspicion in Alice or Bob about the existence of the mediating third party.
In order to avoid this attack, Alice and Bob should each validate the authenticity of the public key of the other party. Public-key certificates can be used to this effect. Unfortunately, using certificates alone may fail to eliminate unknown key-share attacks, as Algorithm 5.30 shows. At the end of this protocol Alice and Bob share a secret s, but Bob believes that he shares it with (the intruder) Carol. Here Carol herself cannot compute the shared secret s (provided that computing discrete logs in G is infeasible). Still there may be situations where this attack can be exploited (see Law et al. [161] for a hypothetical example).
This attack has two potential problems. Under the assumption of intractability of the DLP in G, Carol cannot compute the private key corresponding to the public key eC and so her getting the certificate CertC knowing eC alone may be questioned. Furthermore, replacing (eB, CertB) to ((eB)d, CertB) may make the certificate invalid. If we assume that a certificate authenticates only the entity and not the public key, then these objections can be overruled. In practice, however, a public key certificate should bind the public key to an entity (who can prove the knowledge of the corresponding private key) and so the above attack cannot be easily mounted. Nonetheless, the need for stronger authenticated key-exchange protocols is highlighted by the attack.
|
Alice generates a random Alice gets the certificate CertA on eA from the certifying authority. Alice transmits (eA, CertA) for Bob. Carol intercepts (eA, CertA). Carol chooses a random Carol gets the certificate CertC on eC := (eA)d from the certifying authority. Carol sends (eC, CertC) to Bob. Bob generates a random Bob gets the certificate CertB on eB from the certifying authority. Bob sends (eB, CertB) to Carol. Carol transmits ((eB)d, CertB) to Alice. Alice computes s = ((eB)d)dA = gddAdB. Bob computes s = (eC)dB = ((eA)d)dB = gddAdB. |
The Menezes–Qu–Vanstone (MQV) key-exchange protocol is an improved extension of the basic DH scheme, that incorporates public-key authentication. Though the achievement of the desired security goals by the MQV protocol does not seem to be provable, heuristic arguments suggest the effectiveness of the protocol against active adversaries.
Once again, let Alice and Bob be the two parties who plan to agree on a secret element
, where the domain parameters G, g, n and m are chosen as in the basic DH scheme. In the MQV scheme, each entity uses two key pairs, one of which ((EA, DA) for Alice and (EB, DB) for Bob) is called the static or the long-term key pair, whereas the other ((eA, dA) for Alice and (eB, dB) for Bob) is called the ephemeral or the short-term key pair. The static key is bound to an entity for a certain period of time and is used in every invocation of the MQV protocol during that period. On the other hand, each entity generates and uses a new ephemeral key pair during each invocation of the protocol. The static key of an entity is assumed to be authentic, say, certified by a trusted authority. The ephemeral key, on the other hand, is validated using the static private key.
Assume that there is a (publicly known) function
. Let l := ⌊lg m⌋ + 1 denote the bit length of m = ord g. For
, let
denote the integer
. The bit size of
is about half of that of m. In particular,
(mod m) for all
.
In the MQV protocol, Alice and Bob each computes the shared secret s = gσAσB, where
and
. Here the exponents σA and σB bear the implicit signatures of Alice and Bob, impressed by their respective static private keys. Alice can compute
, since she knows the static public key EB and the ephemeral public key eB of Bob. Similarly, Bob can compute
from a knowledge of the public keys EA and eA of Alice. We summarize the steps in Algorithm 5.31.
|
Input: G, g, n and m as defined above. Output: A secret element Steps: Alice obtains Bob’s static public key EB. Bob obtains Alice’s static public key EA. Alice generates a random integer dA, 2 ≤ dA ≤ m – 1, and computes eA := gdA. Alice sends eA to Bob. Bob generates a random integer dB, 2 ≤ dB ≤ m– 1, and computes eB := gdB. Bob sends eB to Alice. Alice computes Alice computes Bob computes Bob computes if (s = 1) { Return “failure”. } |
Each participating entity using the MQV protocol performs three exponentiations in G. Alice computes gdA,
and
, of which the first and the last ones have exponents O(m). On the other hand,
is
, so that the middle exponentiation is about twice as fast as a full exponentiation. This performance benefit justifies the use of
and
instead of eA and eB themselves. It appears that using these half-sized exponents does not affect security. Also note that
(mod m), which implies a non-zero contribution of the static key DA in the expression σA. Similarly for σB.
In order to guard against small-subgroup attacks, the MQV algorithm can incorporate the cofactor k := n/m, that is, assuming gcd(k, m) = 1, the shared secret would now be gσAσB or gkσAσB, depending on whether compatibility with the original MQV method is desired or not.
The MQV algorithm can be used in a situation when only one party, say, Alice, is capable of initiating a transmission to the other party (Bob). In that case, Bob’s static key pair is used also as his ephemeral key pair, that is, the secret element shared between Alice and Bob is
.
See Raymond and Stiglic [250] to know more about the security issues for the DH key agreement protocol and its variants.
| 5.12 | Let G be a multiplicative Abelian group of order n and with identity 1, H the subgroup of G generated by an element of order n, k := n/m and gcd(k, m) = 1. Further let a be a non-identity element of G.
|
| 5.13 | Write the MQV key-exchange protocol with cofactor exponentiation. |
| 5.14 | Provide the details of the Diffie–Hellman key-exchange algorithm based on the XTR representation (Section 5.2.7). |
Suppose an entity (Alice) is required to be bound to some electronic data (like messages or documents or keys). This binding is achieved by Alice digitally signing the data in such a way that no party other than Alice would be able to generate the signature. The signature should also be such that any entity can easily verify that it was Alice who generated the signature. Digital signatures can be realized using public-key techniques. The entity (Alice) generating a digital signature is called the signer, whereas anybody who wants to verify a signature is called a verifier.
We have seen in Section 5.2 how the encryption and decryption transforms fe, fd achieve confidentiality of sensitive data. If the set of all possible plaintext messages is the same as the set of all ciphertext messages and if fe and fd are bijective maps on that set, then the sequence of encryption and decryption can be reversed in order to realize a digital signature scheme. In order to sign m, Alice uses her private key d and the transform fd to generate s = fd(m, d). Any party who knows the corresponding public key e can recover m as m = fe(s, e). This is broadly how a signature scheme works. Depending on how the representative m is generated from the message M that Alice wants to sign, signature schemes can be classified in two categories.
In this case, one takes m = M. Verification involves getting back the message M. If M is assumed to be (the encoded version of) some human-readable text, then the recovered M = fe(s, e) will also be human-readable. If s is forged, that is, if a private key d′ ≠ d has been used to generate s′ = fd(m, d′), then verification using Alice’s public key yields m′ = fe(s′, e), and typically m′ ≠ m, since d′ and e are not matching keys. The resulting message m′ will, in general, make little or no sense to a human reader. If m is not a human-readable text, one adds some redundancy to it before signing. A forged signature yields m′ during verification, which, with high probability, is expected not to have this redundancy.
Attractive as it looks, it is not suitable if M is a long message. In that case, it is customary to break M into smaller pieces and sign each piece separately. Since public-key operations are slow, signature generation (and also verification) will be time-consuming, if there are too many pieces to sign (and verify). This difficulty is overcome using the second scheme described now.
In this scheme, a short representative m = H(M) of M is first computed.[2] The function H is usually chosen to be a hash function, that is, one which converts bit strings of arbitrary length to bit strings of a fixed length. H is assumed to be a public knowledge, that is, anybody who knows M can compute m. We also assume that H(M) can be computed fast for messages M of practical sizes. Alice uses the decryption transform on m to generate s = fd(m, d). The signature now becomes the pair (M, s). A verifier obtains Alice’s public key e and checks if H(M) = fe(s, e). The signature is taken to be valid if and only if equality holds. If a forger uses a private key d′ ≠ d, she generates a signature (M, s′), s′ = fd(m, d′), on M and a verifier expects with high probability the inequality H(M) ≠ fe(s′, e).
[2] If M is already a short message, one may go for taking m = M. In order to promote uniform treatment, we assume that the function H is always applied for the generation of m. Use of H is also desirable from the point of security considerations (Exercise 5.15).
A kind of forgery is possible on signature schemes with appendix. Assume that Alice creates a valid signature (M, s), s = fd(H(M), d), on a message M. The function H is certainly not injective, since its input space is much bigger (infinite) than its output space (finite). Suppose that Carol finds a message M′ ≠ M with H(M′) = H(M). In that case, the pair (M′, s) is a valid signature of Alice on the message M′, though it is not Alice who has generated it. (Indeed it has been generated without the knowledge of the private key d of Alice.) In order to foil such attacks, the function H should have second pre-image resistance. The first pre-image resistance and collision resistance properties of a hash functions also turn out to be important in the context of digital signatures. See Sections 1.2.6 and A.4 to know about hash functions.
We now describe some specific algorithms for (generating and verifying) digital signatures. Key pairs used for these algorithms are usually identical to those used for encryption algorithms of Section 5.2 and, therefore, we refrain from a duplicate description of the key-generation procedures. We focus our discussion only on signature schemes with appendix.
As in the RSA encryption scheme of Section 5.2.1, each entity generates an RSA modulus n = pq, which is the product of two distinct large primes p and q. A key pair consists of an encryption exponent e (the public key) and a decryption exponent d (the private key) satisfying ed ≡ 1 (mod φ(n)).
RSA signature generation involves a modular exponentiation in the ring
.
|
Input: A message M to be signed and the signer’s private key (n, d). Output: The signature (M, s) on M. Steps: m := H(M). /* |
Signature generation can be speeded up if the parameters p, q, d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p) are stored (secretly) in the private key. Now, one can use Algorithm 5.4 for signature generation.
The verification routine also involves a modular exponentiation in
.
|
Input: A signature (M, s) and the signer’s public key (n, e). Output: Verification status of the signature. Steps: m := H(M). /* |
Small values of e speed up RSA signature verification and are not known to make the scheme suffer from some special attacks. So the values of e like 3, 257 and 65,537 are quite recommended.
As in the Rabin encryption algorithm, we choose two distinct large primes p and q of nearly equal sizes and take n = pq. The public key is n, whereas the private key is the pair (p, q). The Rabin signature scheme is based on the intractability of computing square roots modulo n in absence of the knowledge of the prime factors p and q of n.
Rabin signature generation involves finding a quadratic residue m modulo n as a representative of the message M and computing a square root of m modulo n.
|
Input: A message M to be signed and the signer’s private key (p, q). Output: The signature (M, s) on M. Steps: m := H(M). /*
|
Verification (Algorithm 5.35) involves a square operation in
.
|
Input: A signature (M, s) and the signer’s public key n. Output: Verification status of the signature. Steps:
if else { Return “Signature not verified”. } |
The ElGamal signature algorithm is based on the intractability of computing discrete logarithms in certain groups G. For a general description, we consider an arbitrary (finite Abelian multiplicative) group G of order n. We assume that G is cyclic and that a generator g of G is provided. A key pair is obtained by selecting a random integer (the private key) d, 2 ≤ d ≤ n – 1, and then computing gd (the public key). The hash function H is assumed to convert arbitrary bit strings to elements of
. We further assume that the elements of G can be identified as bit strings (on which the hash function H can be directly applied). G (together with its representation), g and n are considered to be public knowledge and are not input to the signature generation and verification routines.
ElGamal signatures are generated as in Algorithm 5.36. The appendix consists of a pair
.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M. Steps: Generate a random session key d′, 2 ≤ d′ ≤ n – 1. s := gd′. t := d′–1 (H(M) – dH(s)) (mod n). |
The costliest step in the ElGamal signature generation algorithm is the exponentiation gd′. Here, G is assumed to be cyclic and the exponent d′ to be O(n). We will shortly see modifications of the ElGamal scheme in which the exponent can be chosen to be much smaller, namely O(r), where r is a suitably large (prime) divisor of n.
In order to forge a signature, Carol can generate a random session key (d′, gd′) and obtain s. For the computation of t, she requires the private key d of the signer. Conversely, if t (and d′) are available to Carol, she can easily compute the private key d. Thus, forging an ElGamal signature is equivalent to solving the DLP in G.
Each invocation of the ElGamal signature generation algorithm must use a new session key (d′, gd′). If the same session key (d′, gd′) is used to generate the signatures (M1, s1, t1) and (M2, s2, t2) on two different messages M1 and M2, then we have (t1 – t2)d′ ≡ H(M1) – H(M2) (mod n), whence d′ can be computed, provided that gcd(t1 – t2, n) = 1. If d′ is known, the private key d can be easily computed (see Exercise 5.6 for a similar situation).
ElGamal signature verification is described in Algorithm 5.37. This is based on the observation that for a (valid) ElGamal signature (M, s, t) on a message M we have
. This verification calls for three exponentiations in G to full-size exponents. Working in a suitable (cyclic) subgroup of G makes the algorithm more efficient.
|
Input: A signature (M, s, t) and the signer’s public key gd. Output: Verification status of the signature. Steps: a1 := gH(M). a2 := (gd)H(s)st. if (a1 = a2) { Return “Signature verified”. } else { Return “Signature not verified”. } |
ElGamal signatures use a congruence of the form A ≡ dB + d′C (mod n), and verification is done by checking the equality gA = (gd)BsC. Our choice for A, B and C was A = H(M), B = H(s) and C = t. Indeed, any permutation of H(M), H(s) and t are acceptable as A, B, C. These give rise to several variants of the ElGamal scheme. It is also allowed to take as A, B, C any permutation of H(M)H(s), t, 1 or H(M)H(s), H(M)t, 1 or H(M)H(s), H(s)t, 1 or H(M)t, H(s)t, 1. Permutations of H(M)H(t), H(s), 1 or H(M), H(s)t, 1, on the other hand, are known to have security bugs. For any allowed combination of A, B, C, the choices ±A, ±B, ±C are also valid. For some other variants, see Horster et al. [132].
The Schnorr signature scheme is a modification of the ElGamal scheme and is faster than the ElGamal scheme, since it works in a subgroup of G generated by g of small order. We assume that r := ord g is a prime (though it suffices to have ord g possessing a suitably large prime divisor). We suppose further that the elements of G are represented as bit strings and that we have a hash function H that maps bit strings to elements of
. A key pair now consists of an integer d (the private key), 2 ≤ d ≤ r – 1, and the element gd (the public key).
Schnorr signature generation is described in Algorithm 5.38.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M. Steps: Generate a random session key pair (d′, gd′), 2 ≤ d′ ≤ r – 1.
|
Similar to the ElGamal scheme, the most time-consuming step in this routine is the computation of the session public key gd′. But now d′ < r and, therefore, Algorithm 5.38 runs faster than Algorithm 5.36. One can easily check that forging a signature of Alice is computationally equivalent to determining Alice’s private key d from her public key gd. The importance of using a new session key pair in each run of Algorithm 5.38 is exactly the same as in the case of ElGamal signatures.
The verification of Schnorr signatures (Algorithm 5.39) is based upon the fact that gt = gd′(gd)–s. Thus, the knowledge of g, s, t and gd allows one to compute gd′ and subsequently H(M ‖gd′). The algorithm involves two exponentiations with both the exponents (t and s) being ≤ r. Thus, signature verification is also faster in the Schnorr scheme than in the ElGamal scheme.
|
Input: A signature (M, s, t) and the signer’s public key gd. Output: Verification status of the signature. Steps: u := gt(gd)s.
if else { Return “Signature not verified”. } |
The Nyberg–Rueppel (NR) signature algorithm is another adaptation of the ElGamal signature scheme and is based on the intractability of solving the DLP in a group G. We assume that ord G = n has a large prime divisor r and that an element
of order r is available. Here, a key pair is of the form (d, gd), where the private key d is an integer between 2 and r – 1 (both inclusive) and where the public key gd is an element of 〈g〉. The hash function H converts bit strings to elements of
. We also assume the existence of a (publicly known) function
.
NR signature generation can be performed as in Algorithm 5.40.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M. Steps: Generate a random session key pair (d′, gd′), 2 ≤ d′ ≤ r – 1. s := H(M) + F(gd′) (mod r). t := d′ – ds (mod r). |
The only difference between NR signature generation and Schnorr signature generation is the way how s is computed. Therefore, whatever we remarked in connection with the security and the efficiency of the Schnorr scheme applies equally well to the NR scheme. Signature verification is also very analogous, as Algorithm 5.41 explains.
|
Input: A signature (M, s, t) and the signer’s public key gd. Output: Verification status of the signature. Steps: u := gt(gd)s.
if else { Return “Signature not verified”. } |
The digital signature algorithm (DSA) has been proposed as a standard by the US National Institute of Standards and Technology (NIST) and later accepted as a Federal Information Processing Standard (FIPS) by the US government. This standard is also known as the digital signature standard (DSS). See the NIST document [220] for a complete description of this standard.
|
Input: An integer λ, 0 ≤ λ ≤ 8. Output: A prime p of bit length l := 512+64λ such that p – 1 has a prime divisor r of length 160 bits. Steps: Let l – 1 = 160n + b, 0 ≤ b < 160. /* n = (l–1) quot 160, b = (l–1) rem 160. */ |
DSA is based on the intractability of the DLP in the finite field
, where p is a prime of bit length 512+64λ with 0 ≤ λ ≤ 8. The cardinality p–1 of
is required to have a prime divisor r of length (exactly) 160 bits. The NIST document [220] specifies a standard method for obtaining such a field
, which we describe in Algorithm 5.42. We denote by H the SHA-1 hash function that converts bit strings of arbitrary length to bit strings of length 160. We will identify (often without explicit mention) the bit string a1a2. . . ak of length k with the integer a12k–1 + a22k–2 + · · · + ak–12 + ak.
The DSA prime generation procedure (Algorithm 5.42) starts by selecting the prime divisor r and then tries to find a prime p such that r|(p–1). The outputs of H are utilized as pseudorandomly generated bit strings of length 160.
Once the DSA parameters p and r are available, an element
of multiplicative order r can be computed by Algorithm 3.26. Henceforth we assume that p, r and g are public knowledge and need not be supplied as inputs to the signature generation and verification routines. A DSA key pair consists of an integer (the private key) d, 2 ≤ d ≤ r – 1, and the element gd (the public key) of
.
The DSA signature-generation procedure is given as Algorithm 5.43. One may additionally include a check whether s = 0 or t = 0, and, if so, one should repeat signature generation with another session key. But this, being an extremely rare phenomenon, can be ignored for all practical purposes. Both s and t are elements of
and hence are represented as integers between 0 and r – 1.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M. Steps: Generate a random session key d′, 2 ≤ d′ ≤ r – 1.
t := d′–1(H(M) + ds) (mod r). |
DSA signature verification is described in Algorithm 5.44. For a valid signature (M, s, t) on a message M, the algorithm computes w ≡ d′(H(M) + ds)–1 (mod r), w1 ≡ H(M)w (mod r) and w2 ≡sw (mod r). Therefore, gw1 (gd)w2 ≡ gw1+dw2 ≡ gw(H(M)+ds) ≡ gd′(H(M)+ds)–1 (H(M)+ds) ≡ gd′ (mod p). Reduction modulo r now gives
.
|
Input: A signature (M, s, t) and the signer’s public key gd. Output: Verification status of the signature. Steps: if ( w := t–1 (mod r). w1 := H(M)w (mod r). w2 := sw (mod r).
if else { Return “Signature not verified”. } |
DSA signature generation performs a single exponentiation and DSA verification does two exponentiations modulo p. All the exponents are positive and ≤ r. Thus, DSA is essentially as fast as the Schnorr scheme or the NR scheme.
The ECDSA is the elliptic curve analog of the DSA. Algorithm 5.45 describes the generation of the domain parameters necessary to set up an ECDSA system. One first selects a suitable finite field
and takes a random elliptic curve E over
. E must be such that the cardinality n of the group
has a suitably large prime divisor r. One generates a random point
of order r and works in the subgroup 〈P〉 of
generated by P. It is assumed that q is either a prime p or a power 2m of 2.
|
Input: A finite field Output: A set of parameters E, n, r, P for the ECDSA. Steps: while (1) { |
The order n = ord
can be computed using the SEA algorithm (for q = p) or the Satoh–FGH algorithm (for q = 2m) described in Section 3.6. The integer n should be factored to check if it has a prime divisor r > max(2160,
). The condition n (qk – 1) for small values of k is necessary to avoid the MOV attack, whereas the condition n ≠ q ensures that the SmartASS attack cannot be mounted.
is not necessarily a cyclic group. But, r being a prime, a point
must be one of order r.
An ECDSA key pair consists of a private key d (an integer in the range 2 ≤ d ≤ r – 1) and the corresponding public key
. H denotes the hash function SHA-1 that converts bit strings of arbitrary length to bit strings of length 160. As discussed in connection with DSA, we identify bit strings with integers. We also make an association of elements of
with integers in the set {0, 1, . . . , q – 1}. ECDSA signatures can be generated as in Algorithm 5.46. It is necessary to check the conditions s ≠ 0 and t ≠ 0. If these conditions are not both satisfied, one should re-run the procedure with a new session key pair.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M. Steps: Generate a random session key pair (d′, d′P), 2 ≤ d′ ≤ r – 1. /* Let us denote s := h (mod r). t := d′–1 (H(M) + ds) (mod r). |
ECDSA signature verification is explained in Algorithm 5.47. The correctness of this algorithm can be proved like that of Algorithm 5.44.
|
Input: A signature (M, s, t) and the signer’s public key dP. Output: Verification status of the signature. Steps: if ( w := t–1 (mod r). w1 := H(M)w (mod r). w2 := sw (mod r). Q := w1P + w2(dP). if ( /* Otherwise denote
if else { Return “Signature not verified”. } |
As discussed in Section 5.2.7, the XTR family of algorithms is an adaptation of other conventional algorithms over finite fields. XTR achieves a speed-up of about three using a clever way of representing elements in certain finite fields. It is no surprise that the DLP-based signature algorithms, described so far, can be given efficient XTR renderings. We explain here XTR–DSA, the XTR version of the digital signature algorithm.
In order to set up an XTR system, we need a prime p ≡ 2 (mod 3). The XTR group G is a subgroup of the multiplicative group
and has a prime order q dividing p2 – p + 1. For compliance with the original version of DSA, one requires q to be of bit length 160. The trace map
taking
is used to represent an element
by the element
. Under this representation, arithmetic in G translates to that in
. For example, we have seen how exponentiation in G can be efficiently implemented using
arithmetic (Algorithm 5.20). The trace Tr(g) of a generator g of G should also be made available for setting up the XTR domain parameters. In Section 5.2.7, we have discussed how a random set of XTR parameters (p, q, Tr(g)) can be computed.
An XTR key comprises a random integer
(the private key) and the trace
(the public key). Algorithm 5.20 is used to compute Tr(gd) from Tr(g) and d. This algorithm gives Tr(gd–1) and Tr(gd+1) as by-products. For an implementation of XTR–DSA, we require these two elements of
. So we assume that the public key consists of the three traces Sd(Tr(g)) = (Tr(gd–1), Tr(gd),
. As explained in Lenstra and Verheul [172], the values Tr(gd–1) and Tr(gd+1) can be computed easily from Tr(gd) even when d is unknown, so it suffices to store only Tr(gd) as the public key. But we avoid the details of this computation here and assume that all the three traces are available to the signature verifier.
Algorithm 5.20 provides an efficient way of computing exponentiations in G. For DSA-like signature verification (cf. Algorithm 5.44), one computes products of the form ga(gd)b with d unknown. In the XTR world, this amounts to computing the trace Tr(ga(gd)b) from the knowledge of a, b, Tr(g) and Tr(gd) (or Sd(Tr(g))) but without the knowledge of d. The XTR exponentiation algorithm is as such not applicable in such a situation. We should, therefore, prescribe a method to compute traces of products in G. Doing that requires some mathematics that we mention now without proofs. See Lenstra and Verheul [170] for the missing details.
Let e :=ab–1 (mod q). Then, a + bd ≡b(e + d) (mod q), that is, Tr(ga(gd)b) = Tr(gb(e+d)), that is, it is sufficient to compute Tr(ge+d) from the knowledge of e, Tr(g) and Tr(gd). We treat the 3-tuple Sk(Tr(g)) as a row vector (over
). For
, let Mc denote the matrix
Equation 5.9

We take c := Tr(g). It can be shown that det
, that is, the matrix MTr(g) is invertible, and we have:
Equation 5.10

Here the superscript t denotes the transpose of a matrix. With these observations, one can write the procedure for computing Tr(ga(gd)b) as in Algorithm 5.48.
|
Input: a, b, Tr(g) and Sd(Tr(g)) for some unknown d. Output: Tr(ga(gd)b). Steps: Compute e := ab–1 (mod q). |
XTR–DSA signature generation (Algorithm 5.49) is an obvious adaptation of Algorithm 5.43.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M with s, Steps: do { |
The bulk of the time taken by Algorithm 5.43 goes for the computation of Tr(gd′). Since the trace representation of XTR makes this exponentiation three times as efficient as the corresponding DSA exponentiation, XTR–DSA signature generation runs nearly three times as fast as DSA signature generation.
XTR–DSA signature verification can be easily translated from Algorithm 5.44 and is shown in Algorithm 5.50. The most costly step in the XTR–DSA verification routine is the computation of Tr(gw1 (gd)w2). One uses Algorithm 5.48 for this purpose. This algorithm, in turn, invokes the exponentiation Algorithm 5.20 twice. For the original DSA signature verification (Algorithm 5.44), the costliest step is the computation of gw1 (gd)w2, which involves two exponentiations and a (cheap) multiplication. A careful analysis shows that XTR–DSA signature verification runs nearly 1.75 times faster than DSA verification.
|
Input: XTR–DSA signature (M, s, t) on a message M and the signer’s public key (Tr(gd–1), Tr(gd), Tr(gd+1)). Output: Verification status of the signature. Steps: if w := t–1 (mod q). w1 := H(M)w (mod q). w2 := sw (mod q).
if else { Return “Signature not verified”. } |
The NTRU Signature Scheme (NSS) (Hoffstein et al. [131]) is an adaptation of the NTRU encryption algorithm discussed in Section 5.2.8. Cryptanalytic studies (Gentry et al. [110]) show that the NSS has security flaws. A newer version of the NSS, referred to as NTRUSign and resistant to these attacks, has been proposed by Hoffstein et al. [128]. In this section, we provide a brief overview of NTRUSign.
In order to set up the domain parameters for NTRUSign, we start with an
and consider the ring
. Elements of R are polynomials with integer coefficients and of degrees ≤ n – 1. The multiplication of R is denoted by ⊛, which is essentially the multiplication of two polynomials of
followed by setting Xn = 1. We also fix a positive integer β to be used as a modulus for the coefficients of the polynomials in R. The subsets
and
of R are of importance for the NTRUSign algorithm, where for
one defines
, and where νf and νg are suitably chosen parameters. The message space
is assumed to consist of pairs of polynomials of R with coefficients reduced modulo β. We further assume that we have at our disposal a hash function H that maps messages (that is, binary strings) to elements of
.
Let
. The average of the coefficients of a is denoted by
. The centred norm ‖a‖ of a is defined by

For two polynomials a,
, one also defines
‖(a, b)‖2 := ‖a‖2 + ‖b‖2.
The parameters νf and νg should be so chosen that any polynomial
and any polynomial
have (centred) norms on the order O(n). An upper bound B on the norms (of pairs of polynomials) should also be predetermined.
Typical values for NTRUSign parameters are
(n, β, νf, νg, B) = (251, 128, 73, 71, 300).
It is estimated that these choices lead to a security level at least as high as in an RSA scheme with a 1024-bit modulus. For very long-term security, one may go for (n, β) = (503, 256).
In order to set up a key pair, the signer first chooses two random polynomials
and
. The polynomial f should be invertible modulo β and the signer computes
with the property that fβ ⊛ f ≡ 1 (mod β). The public key of the signer is the polynomial h ≡ fβ ⊛ g (mod β), whereas the private key is the tuple (f, g, F, G), where F and G are two polynomials in R satisfying
| f ⊛ G – g ⊛ F = q | and | ‖F‖, ‖G‖ = O(n). |
Hoffstein et al. [128] present an algorithm to compute F and G with ‖F‖,
from polynomials f and g with ‖f‖,
, where c is a given constant.
|
Input: A message M to be signed and the signer’s private key (f, g, F, G). Output: The signature (M, s) on M. Steps: Compute Compute polynomials A, B, a,
where a and A have coefficients in the range between –β/2 and +β/2. Compute s ≡ f ⊛ B + F ⊛ b (mod β). |
NTRUSign signature generation is described in Algorithm 5.51. It is apparent that the NTRUSign algorithm derives its security from the difficulty in computing a vector v in a certain lattice, close to the vector defined by the hashed message (m1, m2). For defining the lattice, we first note that a polynomial
can be identified as a vector (u0, u1, . . . , un–1) of dimension n defined by its coefficients. Similarly, two polynomials u,
define a vector, denoted by (u, v), of dimension 2n. To the public key h we associate the 2n-dimensional lattice

It is clear from the definitions that both (f, g) and (F, G) are in Lh.
If h = (h0, h1, . . . , hn–1), then for each i = 0, 1, . . . , n – 1 we have
| Xi ⊛ h(X) | ≡ | (hn–i, . . . , hn–1, h0, . . . , hn–i–1) (mod β) and |
| 0 ⊛ h(X) | ≡ | βXi (mod β). |
It follows immediately that Lh is generated by the rows of the matrix

Now, consider the signature generation routine (Algorithm 5.51). The hash function H generates from the message M a random 2n-dimensional vector m := (m1, m2) not necessarily on Lh. We then look at the vector v := (s, t) defined as:
| s | ≡ | f ⊛ B + F ⊛ b (mod β), and |
| t | ≡ | g ⊛ B + G ⊛ b (mod β). |
The lattice Lh has the rotational invariance property, namely, if
, then (Xi ⊛ u, Xi ⊛ v) is also in Lh for all i = 0, 1, . . . , n – 1. More generally, if
, then
for any polynomial
. In particular, since v = (s, t) = B ⊛ (f, g) + b ⊛ (F, G) (mod β) and since (f, g),
, it follows that
. Of these two polynomials only s is needed for the generation of NTRUSign signatures. The other is needed during signature verification and can be computed easily from s using the formula t ≡ h ⊛ s (mod β), the validity of which is established from the definition of the lattice Lh.
The vector
is close to the message vector m in the sense that

for the constant c chosen earlier (see Hoffstein et al. [128] for a proof of this relation). The verification routine can, therefore, be designed as in Algorithm 5.52.
|
Input: A signature (M, s) and the signer’s public key h. Output: Verification status of the signature. Steps: Compute Compute t ≡ h ⊛ s (mod β). if (‖(m1 – s, m2 – t)‖ ≤ B) { Return “Signature verified”. } else { Return “Signature not verified”. } |
For the choice (n, β, c) = (251, 128, 0.45), we have ‖(m1 – s, m2 – t)‖ ≈ 216. Therefore, choosing the norm bound B slightly larger than this value (say, B = 300) allows the verification scheme to work correctly most of the time. The knowledge of the private key (f, g, F, G) allows the legitimate signer to compute the close vector (s, t) easily. On the other hand, for a forger (who is lacking the private information) fast computation of a vector v′ = (s′, t′) with small norm ‖(m1 – s′, m2 – t′)‖ (say ≤ 400 for the above parameter values) seems to be an intractable task. This is precisely why forging an NTRUSign signature is considered infeasible.
An exhaustive search can be mounted for generating a valid signature (s′, t′) on a message M with H(M) = (m1, m2). More precisely, a forger fixes half of the 2n coefficients of the polynomials s′ and t′ and then tries to solve t′ ≡ h ⊛ s′ (mod β) for the remaining half such that the norm ‖(m1 – s′, m2 – t′)‖ is small. It is estimated (see Hoffstein et al. [128] for the details) that the probability that a random guess for the unknown half succeeds is very low (≤ 2–178.44 for the given parameter values).
Another attack on the NTRUSign scheme is to determine the polynomials f, g from a knowledge of h. Since (f, g) is a short non-zero vector in the lattice Lh, an algorithm that can find such vectors can determine (f, g) (or a rotated version of it). However, for a proper choice of the parameters such an algorithm is deemed infeasible. (Also see the NTRU encryption scheme in Section 5.2.8.)
Similar to the NTRU encryption scheme, the NTRUSign scheme is fast, namely, both signature generation and verification can be carried out in time O(n2). This is one of the main reasons why the NTRUSign scheme deserves popularity. Indeed, it may be adopted as an IEEE standard. Unfortunately, however, several attacks on NTRUSign are known. Gentry and Szydlo [111] indicate the possibility of extending the attacks of Gentry et al. [110]. Nguyen [217] proposes a more concrete attack on NTRUSign, that is capable of recovering the private key from only 400 signatures. The future of NTRUSign and its modifications remains uncertain.
Suppose that an entity (Alice) referred to as the sender or the user, wants to get a message M signed by a second entity (Bob) called the signer, without revealing M to Bob. This can be achieved as follows. First Alice transforms the message M to
and sends
to Bob. Bob generates the signature (
, σ) on
and sends this pair back to Alice. Finally, Alice applies a second transform g to generate the signature
of Bob on M. The transform f hides the actual message M from Bob and, thereby, disallows Bob from associating Alice with the signed message (M, s). Such a signature scheme is called a blind signature scheme.
Blind signatures are widely used in electronic payment systems in which Alice (a customer) wants the signature of Bob (the bank) on an electronic coin, but does not want the bank to be capable of associating Alice with the coin. In this way, Alice achieves anonymity while spending an electronic coin.
In a blind signature scheme, Bob does not know M, but his signature on
is essential for Alice to reconstruct the signature on M. Furthermore, the blind signature on M should not allow Alice to compute the blind signature on another message M′. More generally, Alice should not be able to generate l + 1 (or more) blind signatures with only l (or fewer) interactions with Bob. A forgery of this kind is often called an (l, l + 1) forgery or a one-more forgery (in case l is bounded above by a polynomial in the security parameter) or a strong one-more forgery (in case l is bounded above poly-logarithmically in the security parameter). An (l, l + 1) forgery is mountable on a scheme which is not existentially unforgeable (Exercises 5.15 and 5.19). Usually, existential forgery gives forged signatures on messages over which the forger has no (or little) control (that is, on messages which are likely to be meaningless).
Now, we describe some common blind signature schemes. We provide a brief overview of the algorithms. Detailed analysis of the security of these schemes can be found in the references cited at the end of this chapter.
Chaum’s blind signature protocol is based on the intractability of the RSAP (or the IFP). The signer generates two (distinct) large random primes p and q and computes n := pq. He then chooses a random integer e with gcd(e, φ(n)) = 1 and computes an integer d such that ed ≡ 1 (mod φ(n)). The public key (of the signer) is the pair (n, e), whereas the private key is d. Chaum’s protocol works as in Algorithm 5.53.
|
Input: A message M generated by Alice. Output: Bob’s blind RSA signature (M, s) on M. Steps: Alice hashes the message M to Alice chooses a random Alice sends Bob generates the signature Bob sends σ to Alice. Alice computes Bob’s (blind) signature s := ρ–1σ (mod n) on M. |
Since σ ≡ (ρem)d ≡ρmd (mod n), we have s ≡ρ–1σ ≡ md (mod n), that is, s is indeed the RSA signature of Bob on M. Bob receives
and gains no idea about m, since ρ is randomly and secretly chosen by Alice.
Let G be a finite multiplicative Abelian group and let
be of order r (a large prime). We assume that computing discrete logarithms in G is an infeasible task. The key pair of the signer is denoted by (d, gd), where the integer d, 2 ≤ d ≤ r – 1, is the private key and gd the public key. The Schnorr blind signature protocol is described in Algorithm 5.54.
|
Input: A message M generated by Alice. Output: Bob’s blind Schnorr signature (M, s, t) on M. Steps: Alice asks Bob to initiate a communication. Bob chooses a random Bob sends Alice selects α, Alice computes Alice computes Alice sends Bob computes Bob sends Alice computes |
It is easy to check that the output (M, s, t) of Algorithm 5.54 is a valid Schnorr signature of Bob on the message M. The session key d′ (Algorithm 5.38) for this signature is
. Since d and
are secret knowledges of Bob, Alice must depend on Bob for the computation of
. The message M is never sent to Bob. Also its hash is masked by β. This is how this protocol achieves blindness.
Okamoto’s adaptation of the Schnorr scheme is proved to be resistant to an attack by a third entity (Pointcheval and Stern [237]). As in the Schnorr scheme, we fix a (finite multiplicative Abelian) group G (in which it is difficult to compute discrete logarithms). We then choose two elements g1,
of (large prime) order r. The private key of the signer now comprises a pair (d1, d2) of integers in {2, . . . , r – 1}, whereas the public key y is the group element
. We assume that there is a hash function H whose outputs are in
. We identify elements of G as bit strings. The Okamoto–Schnorr blind signature protocol is explained in Algorithm 5.55.
|
Input: A message M generated by Alice. Output: Bob’s blind signature (M, s1, s2, s3) on M. Steps: Alice asks Bob to initiate a communication. Bob chooses random Bob sends Alice selects α, β, Alice computes Alice computes Alice sends Bob computes Bob sends Alice computes |
An Okamoto–Schnorr signature (M, s1, s2, s3) on a message can be verified by checking the equality s1 = H(M‖u), where
. Each invocation of the protocol uses a session private key
. Alice must depend on Bob for generating s2 and s3, because she is unaware of the private values d1, d2,
and
. Alice, in an attempt to forge Bob’s blind signature, may start with random
and
of her choice. But she still needs the integers d1 and d2 in order to complete the protocol. The blindness of Algorithm 5.55 stems from the fact that the message M is never sent to Bob and its hash is masked by γ.
So far we have seen signature schemes for which any entity with a knowledge of the signer’s public key can verify the authenticity of a signature. There are, however, situations where an active participation of the signer is necessary for the verification of a signature. Moreover, during a verification interaction a signer should not be allowed to deny a legitimate signature made by him. A signature meeting these requirements is called an undeniable signature.
Undeniable signatures are typically used for messages that are too confidential or private to be given unlimited verification facility. In case of a dispute, an entity should be capable of proving a forged signature to be so and at the same time must accept the binding to his own valid signatures. So in addition to the signature generation and verification protocols, an undeniable signature scheme comes with a denial or disavowal protocol to guard against a cheating signer that is unwilling to accept his valid signature either by not taking part in the verification interaction or by responding incorrectly or by claiming a valid signature to be forged.
There are applications where undeniable signatures are useful. For example, a software vendor can use undeniable signatures to prove the authenticity of its products only to its (paying) customers (and not to everybody).
Chaum and van Antwerpen gave a first concrete realization of an undeniable signature scheme [52, 51]. It is based on the intractability of computing discrete logs in the group
, p a prime. Gennaro et al. [109] later adapted the algorithm to design an RSA-based undeniable signature scheme. We now describe these two schemes. Rigorous studies of these schemes can be found in the original papers. See also [53, 186, 187, 102, 202, 230].
For setting up the domain parameters for Chaum–Van Antwerpen (CvA) signatures, Bob chooses a (large) prime p of the form p = 2r + 1, where r is also a prime. (Such a prime p is called a safe prime (Definition 3.5).) Bob finds a random element
of multiplicative order r, selects a random integer
and computes y := gd (mod p). Bob publishes (p, g, y) as his public key and keeps the integer d secret as his private key. The value d–1 (mod r) is needed during verification and can be precomputed and stored (secretly) along with d. We assume that we have a hash function H that maps messages (that is, bit strings) to elements of the subgroup of order r in
. In order to generate a CvA signature on a message M, Bob carries out the steps given in Algorithm 5.56. Verification of Bob’s CvA signature by Alice involves the interaction given in Algorithm 5.57.
|
Input: The message M to be signed and the signer’s private key (p, d). Output: The signature (M, s) on M. Steps: m := H(M). s := md (mod p). |
If (M, s) is a valid CvA signature, then
v ≡ (siyj)d–1 (mod r) ≡ ((md)i(gd)j)d–1 (mod r) ≡ migj ≡ v′ (mod p).
On the other hand, if s ≢ md (mod p), Bob can guess the element v′ with a probability of only 1/r, even under the assumption that Bob has unbounded computing resources. This means that unless the signature (M, s) is valid, it is extremely unlikely that Bob can make Alice accept the signature.
The denial protocol for the CvA scheme involves an interaction between the prover Bob and the verifier Alice, as given in Algorithm 5.58. In order to see how this denial protocol works, we note that Algorithm 5.58 essentially makes two calls of the verification protocol. First assume that Bob executes the protocol honestly, that is, Bob follows the steps as indicated. If the signature (M, s) is a valid one, the check v1 ≡ mi1 gj1 (mod p) (as well as the check v2 ≡ mi2 gj2 (mod p)) should succeed and Alice’s decision to accept the signature as valid is justified. On the other hand, if (M, s) is a forged signature, that is, if s ≢ md (mod p), then the probability that each of these checks succeeds is 1/r as discussed before. Thus, it is extremely unlikely that a forged signature is accepted as valid by Alice. So Alice eventually computes both w1 and w2 equal to si1 i2d–1 (mod r) (mod p) and accepts the signature to be forged. Finally, suppose that Bob is intending to deny the (purported) signature (M, s). If Bob does not fully take part in the interaction, then his intention becomes clear. Otherwise, he sends v1 and/or v2 not computed according to the formulas specified. In that case, Bob succeeds in making Alice compute w1 = w2 with a probability of only 1/r. Thus, it is extremely unlikely that Bob executing this protocol dishonestly can successfully disavow a valid signature.
|
Input: A CvA signature (M, s) on a message M. Output: Verification status of the signature. Steps: Alice computes m := H(M). Alice chooses two secret random integers i, Alice computes u := siyj (mod p). Alice sends u to Bob. Bob computes v := ud–1 (mod r) (mod p). Bob sends v to Alice. Alice computes v′ := migj (mod p). Alice accepts the signature (M, s) if and only if v = v′. |
|
Input: A (purported) CvA signature (M, s) of Bob on a message M. Output: One of the following decisions by Alice:
Steps: Alice computes m := H(M). Alice chooses two secret random integers i1, Alice computes u1 := si1 yj1 (mod p) and sends u1 to Bob. Bob computes if (v1 ≡ mi1 gj1 (mod p)) { Alice chooses two other secret random integers i2, Alice computes u2 := si2 yj2 (mod p) and sends u2 to Bob. Bob computes if (v2 ≡ mi2 gj2 (mod p)) { Alice computes w1 := (v1g–j1)i2 (mod p) and w2 := (v2g–j2)i1 (mod p). if (w1 = w2) { |
Gennaro, Krawczyk and Rabin’s undeniable signature scheme (the GKR scheme) is based on the (intractability of the) RSA problem.
A GKR key pair differs from a usual RSA key pair. The signer chooses two (large) random primes p and q such that both p′ := (p – 1)/2 and q′ := (q – 1)/2 are also prime, and sets n := pq. Two integers e and d satisfying ed ≡ 1 (mod φ(n)) are then selected. Finally, one requires a
, g ≠ 1, and y ≡ gd (mod n). The public key of the signer is the tuple (n, g, y), whereas the private key is the pair (e, d). It can be shown that g need not be a random element of
. Choosing a (fixed) small value of g (for example, g = 2) does not affect the security of the GKR protocol, but makes certain operations (computing powers of g) efficient.
|
Input: The message M to be signed and the signer’s private key (e, d). Output: The signature (M, s) on M. Steps:
|
GKR signature generation (Algorithm 5.59) is the same as in RSA. The verification protocol described in Algorithm 5.60 accepts, in addition to a valid GKR signature (M, s), the signatures (M, αs), where
has multiplicative order 1 or 2 (there are four such values of α). In view of this, we define the subset

of
. Any element
is considered to be a valid signature on M. Since Bob knows p and q, he can easily find out all the elements α of
of order ≤ 2 and can choose to output (M, αH(M)d) as the GKR signature for any such α. Taking α = 1 (as in Algorithm 5.59) is the canonical choice, but during the execution of the denial protocol Bob will not be allowed to disavow other valid choices.
The interaction between the prover Bob and the verifier Alice during GKR signature verification is given in Algorithm 5.60. It is easy to see that if (M, s) is a valid GKR signature, then v = v′. On the other hand, if (M, s) is a forged signature, that is, if s ∉ Sig M, then the equality v = v′ occurs with a probability of
, even in the case that the forger has unbounded computational resources.
|
Input: A GKR signature (M, s) on a message M. Output: Verification status of the signature. Steps: Alice computes m := H(M). Alice chooses random i, Alice computes u := s2iyj (mod n). Alice sends u to Bob. Bob computes v := ue (mod n). Bob sends v to Alice. Alice computes v′ := m2igj (mod n). Alice accepts the signature (M, s) if and only if v = v′. |
|
Input: A (purported) GKR signature (M, s) of Bob on a message M. Output: One of the following decisions by Alice:
Steps: Alice computes m := H(M). Alice chooses random Alice computes w1 := migj (mod n) and w2 := siyj (mod n). Alice sends (w1, w2) to Bob. Bob computes m := H(M). Bob determines Equation 5.11
if (no such i′ is found) { /* This may happen, if Alice has cheated */ |
The denial protocol for the GKR scheme is described in Algorithm 5.61. This protocol is executed, after verification by Algorithm 5.60 fails. In that case, Alice wants to ascertain whether the signature is actually invalid or Bob has denied his valid signature by incorrectly executing the verification protocol. A small integer k is predetermined for the denial protocol. The prover needs a running time proportional to k, whereas the probability of a successful denial of a valid signature decreases with k. Taking k = O(lg n) gives optimal performance.
In order to see how this protocol prevents Bob from denying a valid signature, first consider the case that (M, s) is a valid GKR signature of Bob. In that case,
. On the other hand, se ≡ αemde ≡ αem (mod n). Therefore, for every
, one has
. Thus, Bob can only guess the secret value of i chosen by Alice and the guess is correct with a probability of 1/k. On the other hand, if (M, s) is a forged signature, Congruence (5.11) holds only for a single i′, that is, for i′ = i (Exercise 5.23). Sending i′ will then convince Alice that the signature is really forged. In both these cases, Congruence (5.11) holds for at least one i′. Failure to detect such an i′ implies that the value(s) of w1 and/or w2 have not been correctly sent by Alice. The protocol should then be aborted.
In order to reduce the probability of successful cheating, it is convenient to repeat the protocol few times instead of increasing k. If k = 1024, Bob can successfully cheat in eight executions of the denial protocol with a probability of only 2–80.
The conventional way to ensure both authentication and confidentiality of a message is to sign the message first and then encrypt the signed message. Now that we have many signature and encryption algorithms in our bag, there is hardly any problem in achieving both the goals simultaneously. Zheng proposes signcryption schemes that combine these two operations together. A signcryption scheme is better than a sign-and-encrypt scheme in two aspects. First, the combined primitive takes less running time than the composite primitive comprising signature generation followed by encryption. Second, a signcrypted message is of smaller size than a signed-and-encrypted message. When communication overheads need to be minimized, signcryption proves to be useful.
Before describing the signcryption primitive, let us first review the composite sign-and-encrypt scheme. Let M be the message to be sent. Alice the sender generates the signature appendix s on M using one of the signature schemes described earlier. This step can be described as s = fs(M, da), where da is the private key of Alice. Next a symmetric key k is generated by Alice. The message M is encrypted by a symmetric cipher (like DES) under the key k, that is, C := E(M, k). The key k is then encrypted using an asymmetric routine under the public-key eb of Bob the recipient, that is, c = fe(k, eb). The triple (C, c, s) is then transmitted to Bob.
Upon reception of (C, c, s) Bob first retrieves k using his private key db, that is, k = fd(c, db). The message M is then recovered by symmetric decryption: M = D(C, k). Finally, the authenticity of M is verified from the signature using the verification operation: fv(M, s, ea), where ea is the public key of Alice. Algorithm 5.62 describes the sign-and-encrypt operation and its inverse.
|
Generate a random symmetric key k. c := fe(k, eb). C := E(M, k). Send (C, c, s) to the recipient. Decrypt-and-verify k := fd(c, db). M := D(C, k). Verify the signature: fv(M, s, ea). |
Zheng’s signcryption scheme combines fs and fe to a single operation fse and also fd and fv to another single operation fdv. Each of these combined operations essentially takes the time of a single public- or private-key operation and hence leads to a performance enhancement by a factor of nearly two. Moreover, the encrypted key c need not be sent with the message, that is, C and s are sufficient for both authentication and confidentiality. This reduces communication overhead.
Signcryption is based on shortened digital signature schemes. Table 5.3 describes the shortened version of DSA (Section 5.4.6). We use the notations of Algorithms 5.43 and 5.44. Also ‖ denotes concatenation of strings, and H is a hash function (like SHA-1). The shortened schemes have two advantages over the original DSA. First, a DSA signature is of length 2|r|, whereas an SDSA1 or SDSA2 signature has length |r| + |H(·)|. For the current version of the standard, both r and H(·) are of size 160 bits. However, one may use potentially bigger r and in that case the shortened schemes give smaller signatures with equivalent security. Finally, DSA requires computing a modular inverse during verification, whereas SDSA does not. So verification is more efficient in the shortened schemes.
| Name | Signature generation | Signature verification |
|---|---|---|
| SDSA1 | s := H(gd′ (mod p)‖M). t := d′(s + d)–1 (mod r). | w := (eags)t (mod p). Verify if s = H(w‖M). |
| SDSA2 | s := H(gd′ (mod p)‖M). t := d′(1 + ds)–1 (mod r). | .
Verify if s = H(w‖M). |
Algorithms 5.63 and 5.64 provide the details of the signcryption algorithm and its inverse called unsigncryption. The algorithms use a keyed hash function KH. One may implement KH(x,
) as
using an unkeyed hash function H.
Signcryption differs from the shortened scheme in that
is used instead of gd′ for the computation of s. The running time of the signcryption algorithm is dominated by this modular exponentiation. When signature and encryption are used separately, the encryption operation uses one (or more) exponentiations. So signcryption significantly improves upon the sign-and-encrypt scheme of Algorithm 5.62.
|
Input: Plaintext message M, the sender’s private key da, the recipient’s public key eb = gdb (mod p). Output: The signcrypted message (C, s, t). Steps: Select a random |
|
Input: The signcrypted message (C, s, t), the sender’s public key ea = gda (mod p) and the recipient’s private key db. Output: The plaintext message M and the verification status of the signature. Steps:
Write k := k1 ‖ k2 with |k2| equal to the length of a symmetric key.
if (KH(M‖N, k1) = s) { Return “Signature verified”. } else { Return “Signature not verified”. } |
The most time-consuming part of unsigncryption is the computation of two modular exponentiations. DSA verification too has this property. However, an additional decryption in the decrypt-and-verify scheme of Algorithm 5.62 calls for one (or more) exponetiations, making it slower that unsigncryption.
| 5.15 |
| ||||||
| 5.16 | Assume that Bob uses the same RSA key pair ((n, e), d) for receiving encrypted messages and for signing. Suppose that Carol intercepts the ciphertext c ≡ me (mod n) sent by Alice. Also suppose that Bob is willing to sign any random message presented by Carol. Explain how Carol can choose a message to be signed by Bob in order to retrieve the secret m. [H] | ||||||
| 5.17 | Let G be a finite cyclic group of order n, and g a generator of G. Suppose that Alice’s private and public keys are respectively d and gd.
| ||||||
| 5.18 | Show that:
(Here we call a signature valid, if it passes the verification routine.) | ||||||
| 5.19 |
| ||||||
| 5.20 | Design the XTR version of the Nyberg–Rueppel signature scheme with appendix (Section 5.4.5). What are the speed-ups achieved by the signature generation and verification routines of the XTR version over the original NR routines? | ||||||
| 5.21 | Repeat Exercise 5.20 with the Schnorr digital signature scheme (Section 5.4.4). | ||||||
| 5.22 |
| ||||||
| 5.23 | Let p, q, p′, q′ be distinct odd primes with p = 2p′ + 1 and q = 2q′ + 1, and let n := pq (as in the RSA-based undeniable signature scheme).
| ||||||
| 5.24 |
|
Entity authentication (also called identification) is a process by means of which an entity Alice, called the claimant, proves her identity to another entity Bob, called the prover or the verifier. Alice is assumed to possess some secret piece(s) of information that no intruder is expected to know. During the execution of the identification protocol, an interaction takes place between Alice and Bob. If the interaction allows Bob to conclude (deterministically or with high probability) that the claimer possesses the secret knowledge, he accepts the claimer as Alice. An intruder Carol lacking the secret information is expected (with high probability) to fail to convince Bob of her identity as Alice. This is how entity authentication schemes tend to prevent impersonation attacks by intruders. Typically, identification schemes are used to protect access to some sensitive piece(s) of data, like a user’s (or a group’s) private files in a computer or an account in a bank. Both secret-key and public-key techniques are used for the realization of entity authentication protocols.
A password is a small string to be remembered by an entity and produced verbatim to the verifier at the time of identification. The most common example is a computer password used to protect access to a user’s private working area in a file system. In this case, an alphanumeric string (or a string that can be input using a computer keyboard) of length between 4 and 20 characters is normally used as the secret information associated with an entity. Passwords are also used to prevent misuse of certain physical objects (like an ATM card for withdrawing cash from one’s bank account, a prepaid telephone card) by anybody other than the legitimate owners of the objects. In this case, a password usually consists of a sequence of four to ten digits and is also called a personal identification number or a PIN.
In order that Bob can recognize an entity from her password, a possibility for Bob is to store the (entity, password) pairs corresponding to all the entities that are expected to participate in identification interactions with Bob. When Alice enters her password, Bob checks if Alice’s input is the same as what he stores in the pair for Alice. The file(s) storing these private records should be preserved with high secrecy, and neither read nor write access should be granted to any user. But a privileged user (the superuser) is usually given the capability to inspect any file (even read-protected ones) and, therefore, can make misuse of the passwords.
This problem can be avoided by storing, instead of the passwords themselves, a one-way transform of the passwords.[3] When Alice enters a password P, Bob computes the transform f(P) and compares f(P) with the record stored for Alice. The identity of Alice is accepted if and only if a match occurs. The password file now need not be read-protected, since any intruder (even the superuser) knowing the value f(P) cannot easily compute P.
[3] Informally speaking, a one-way function is one which is computationally infeasible to invert.
Passwords should be chosen from a space large enough to preclude exhaustive search by an intruder in feasible time. Unfortunately, however, it is a common tendency for human users to choose passwords from limited subsets of the allowed space. For example, use of lower case characters, dictionary words, popular names, birth dates and so on in passwords makes attacks on passwords much easier. A strategy to foil such dictionary-based attacks is to use a pseudorandom bit sequence S known as the salt and apply the one-way function f to a combination of the password P and the salt S. That is, a function f(P, S) is now stored against an entity Alice having a password P. The combination (P, S) is often referred to as a key for the password scheme. Since a password now corresponds to many possible keys, the search space for an intruder increases dramatically. For instance, if S is a pseudorandomly chosen bit string of length 64, the intruder has to compute f(P, S) for a total of 264 times in order to guess the correct candidates for S for each P under trial. It is also necessary that the same key is not chosen for two different entities. If the salt S is a 64-bit string, then by the birthday paradox a collision between two keys is expected to occur only after (at least) 232 keys are generated.
A second strategy to strengthen the protection of passwords is to increase the so-called iteration count n, that is, instead of storing f(P, S) for each password P, Bob now stores fn(P, S). An n-fold application of the function f increases by a factor of n both the time for password verification and for exhaustive search by an intruder. For a legitimate user, this is not really a nuisance, since computation of fn(P, S) only once during identification is tolerable (and may even be unnoticeable), whereas to an intruder breaking a password simply becomes n times as difficult. In typical applications, values of n ≥ 1000 are recommended.
In some situations, it is advisable to lock access to a password-protected area after a predetermined number of (say, three) wrong passwords have been input in succession. This is typically the case with PINs for which the search space is rather small. For unlocking the access (to the legitimate user Alice), a second longer key (again known only to Alice) is used or human intervention is called for.
As a case study, let us briefly describe the password scheme used by the UNIX operating system. During the creation of a password a user supplies a string P of eight 7-bit ASCII characters as the password. (Longer strings are truncated to first 8 characters.) A 56 bit DES[4] key K is constructed from P. A 12-bit random salt S is obtained from the system clock at the time of the creation of the password. The zero message (that is, a block of 64 zero bits) is then iteratively encrypted n = 25 times using K as the key. The encryption algorithm is a variant of the DES, that depends on the salt S. The output ciphertext and the salt (which account for a total of 64 + 12 = 76 bits) are then packed into eleven 7-bit ASCII characters and stored in the password file (usually /etc/passwd). When UNIX was designed (in 1970), this algorithm, often referred to as the UNIX crypt password algorithm, was considered to be reasonably safe under the assumption of the difficulty of finding a DES key from a plaintext–ciphertext pair. With today’s hardware and software speed, a motivated attacker can break UNIX passwords in very little time.
[4] The data encryption standard (DES) is a well-known symmetric-key cipher (Section A.2.1).
Password-based authentication schemes suffer from the disadvantage that the user has to disclose her secret P to the verifier. The verifier may misuse the knowledge of P by storing it secretly and deploying it afterwards. During the process of computation of fn(P, S) the string P resides in the machine’s memory. An eavesdropper capable of monitoring the temporary storage holding the string P easily gets its value. In view of these shortcomings, password schemes are referred to as weak authentication schemes.
In a strong authentication scheme, the claimant proves the possession of a secret knowledge to a verifier without disclosing the secret to the verifier. One of the communicating entities generates a random bit string c known as the challenge and sends c (or a function of c) to the other. The latter then reacts to the challenge appropriately, for example, by sending a response string r to the former. Strong authentication schemes are, therefore, also called challenge–response authentication schemes. The communication between the entities depends both on the random challenge and on the secret knowledge of the claimant. An intruder lacking the secret knowledge of a valid claimant cannot take part properly in the interaction. Furthermore, since a random challenge is used during each invocation of the identification protocol, an eavesdropper cannot use the intercepted transcripts of a particular session for a future invocation of the protocol.
Public-key protocols can be used to realize challenge–response schemes. We assume that Alice is the claimant and Bob is the verifier. Without committing to specific algorithms, we denote the public and private keys of Alice by e and d, and the encryption and decryption transforms by fe and fd respectively. Alice proves her identity by demonstrating her knowledge of d (but without revealing d) to Bob. Bob uses the transform fe and Alice the transform fd under the respective keys e and d. If a key d′ other than d is used by Carol in conjunction with e, some step of the interaction detects this and the protocol rejects Carol’s claim to be Alice. We describe two challenge–response schemes that differ in the sequence of applying the transforms fe and fd.
In this scheme, Bob (the verifier) first generates a random string r, encrypts the same by the public key of Alice (the claimant) and sends the ciphertext c (the challenge) to Alice. Alice uses her private key to decrypt c to the message r′ and sends r′ (the response) back to Bob. Identification of Alice succeeds if and only if r = r′. Algorithm 5.65 illustrates the details of this scheme. It employs a one-way function H (like a hash function) for a reason explained later. This scheme checks whether the claimant can recover the random string r correctly. A knowledge of the decryption key d is needed for that.
|
Bob generates a random bit string r and computes w := H(r). Bob reads Alice’s (authentic) public key e and computes c := fe(r, e). Bob sends (w, c) to Alice. Alice computes r′ := fd(c, d). if (H(r′) ≠ w) { Alice quits the protocol. } Alice sends r′ to Bob. Bob identifies Alice if and only if r′ = r. |
The string H(r) = w is called the witness. By sending w to Alice, Bob convinces her of his knowledge about the secret r without disclosing r itself. If Bob (or a third party pretending to be Bob) tries to cheat, Alice has the option to abort the protocol prematurely. In other words, Alice does not have to decrypt an arbitrary ciphertext presented by Bob without confirming that Bob knows the corresponding plaintext.
In the scheme explained in Algorithm 5.66, Alice (the claimant) first does the private key operation, that is, Alice sends her digital signature on a message to Bob (the prover). Bob then verifies the signature of Alice by employing the encryption transform with Alice’s public key.
|
Bob selects a random string rB. Bob sends rB to Alice. Alice selects a random string rA. Alice generates the signature s := fd(rA‖rB, d). Alice sends (rA, s) to Bob. Bob reads Alice’s (authentic) public key e. Bob retrieves the strings Bob identifies Alice if and only if |
This authentication scheme is based on the assumption that only a person knowing Alice’s private key d can generate a signature s that leads to the equalities
and
. Using only rA and the signature s = fd(rA, d) would demonstrate to Bob that Alice possesses the requisite knowledge of d. The random string rB is used to prevent the so-called replay attack. If rB were not used, an eavesdropper Carol intercepting the transcripts of a session can later claim her identity as Alice by simply supplying rA and Alice’s signature on rA to Bob. Using a new rB in every session (and incorporating it in the signature) guarantees that the signature varies in different sessions, even when rA remains the same.
There is an alternative strategy by which the use of the random string rB can be avoided. All we have to ensure is that a value of rA used once cannot be reused in a subsequent session. This can be achieved by using a timestamp, which is a string reflecting the time when a certain event occurs (in our case, when Alice generates the signature). Thus, if Alice gets the local time tA, computes the signature s := fd(tA, d) and sends (tA, s) to Bob, it is sufficient for Bob to check that the timestamp tA is valid. A possible criterion for the validity of Alice’s timestamp tA is that the difference between tA and the time when Bob is verifying the signature is within an allowed bound (predetermined, based on the approximate time for the communication). But it may be possible for an adversary to provide to Bob the timestamp tA and Alice’s signature on tA, before tA expires. Therefore, Bob should additionally ensure that timestamps from Alice come in a strictly ascending order. Maintaining the timestamp for the last interaction with Alice takes care of this requirement. Algorithm 5.67 describes the modified version of Algorithm 5.66, based on timestamps. A problem with timestamps is that (local) clocks across a network have to be properly synchronized.
|
Alice reads the local time tA. Alice generates the signature s := fd(tA, d). Alice sends (tA, s) to Bob. Bob reads Alice’s (authentic) public key e. Bob retrieves the time-stamp Bob identifies Alice if and only if |
So far, we have described identification schemes that are unidirectional or unilateral in the sense that only Alice tries to prove her identity to Bob. For mutual authentication between Alice and Bob, the above schemes can be used a second time by reversing the roles of Alice and Bob. Algorithm 5.68 describes an alternative strategy that achieves mutual authentication with reduced communication overhead (compared to two invocations of the unidirectional scheme). Now, the key pairs (eA, dA) and (eB, dB) and the transforms fe, A, fd, A and fe, B, fd, B of both Alice and Bob should be used.
The challenge–response schemes described above ensure that the claimant’s secret is not made available to the verifier (or a listener to the communication between the verifier and the claimant). But the claimant uses her private key for generating the response and, therefore, it continues to remain possible that a verifier extracts some partial information on the secret by choosing challenges strategically.
|
Bob selects a random string rB. Bob sends rB to Alice. Alice selects a random string rA. Alice generates the signature sA := fd, A(rA‖rB, dA). Alice sends (rA, sA) to Bob. Bob reads Alice’s (authentic) public key eA. Bob retrieves the strings Bob identifies Alice if and only if Bob generates the signature sB := fd, B(rB‖rA, dB). Bob sends sB to Alice. Alice reads Bob’s (authentic) public key eB. Alice retrieves the strings Alice identifies Bob if and only if |
Using a zero-knowledge (ZK) protocol overcomes this difficulty in the sense that (absolutely) no information on the claimant’s secret is leaked out during the conversation between the claimant and the verifier. The verifier (or a listener) continues to remain as much ignorant of the secret as he was before the invocation of the protocol. In other words, the verifier (or a listener) does not learn anything form the conversation, that he could not learn by himself in absence of the claimant. The only thing the verifier gains is the confidence whether the claimant actually knows the secret or not. This is intuitively the defining feature of a ZK protocol.
Similar to other public-key techniques, the security of the ZK protocols is based on the intractability of some difficult computational problems. A repeated use of a public-key scheme with a given set of parameters may degrade the security of the scheme under those parameters. For example, each encryption of a message (or each generation of a signature) makes available a plaintext–ciphertext pair which may eventually help a cryptanalyst. A ZK protocol, on the other hand, does not lead to such a degradation of the security of the protocol, irrespective of how many times it is invoked.
We stick to the usual scenario: Alice is the claimant, Bob is the verifier and Carol is an eavesdropper trying to impersonate Alice. In the jargon of ZK protocols, Alice (and not Bob) is called the prover. In order to avoid confusions, we continue to use the terms claimant and verifier. A ZK protocol is usually a three-pass interactive protocol. To start with, Alice chooses a random commitment and sends a witness of the commitment to Bob. A new commitment should be selected by Alice during each invocation of the protocol in order to guard against an adversarial verifier. Upon receiving the witness, Bob chooses and sends a random challenge to Alice. Finally, Alice replies by sending a response to the challenge. If Alice knows the secret (and performs the protocol steps correctly), her response can be easily proved by Bob to be valid. Carol, in an attempt to impersonate Alice without knowing the secret, can produce the valid response with a probability P bounded away from 1. If P happens not to be negligibly small, then the protocol can be repeated a sufficient number of times, so that Carol’s probability of giving the correct response on all occasions becomes extremely low.
The parameters and the secrets for a ZK protocol can be set privately by each claimant. Another alternative is that a trusted third party (TTP) generates a set of parameters and makes these parameters available for use by every claimant over a network. A second duty of the TTP is to register a secret against each entity. The secret may be generated either by the TTP or by the respective entity. The knowledge of this (registered) secret by an entity is equivalent to her identity in the network. Finally, the authenticity of the public key of an entity is ensured by the digital signature of the TTP on the public key. For simplicity, however, we will not bother about the existence of the TTP and the way in which the secret (the possession of which by Alice is to be proved) has been created and/or handed over to Alice. We will also assume that each entity’s public key is authentic.
The FFS protocol (Algorithm 5.69) is based on the intractability of computing square roots modulo a composite integer n. We take n = pq with two distinct primes p and q each congruent to 3 modulo 4.
|
Selection of domain parameters: Select two large distinct primes p and q each congruent to 3 modulo 4. n := pq.
Selection of Alice’s secret: Alice selects t random integers Alice selects t random bits Alice computes Alice makes (y1, . . . , yt) public and keeps (x1, . . . , xt) secret. The protocol:
Bob computes Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ ≡ ±w (mod n). |
It is clear from Algorithm 5.69 that knowing the secret (x1, . . . , xt) allows Alice to let Bob accept her identity (as Alice). The check w′ ≠ 0 in the last line is necessary to preclude the commitment c = 0, that makes any claimant succeed irrespective of the availability of the knowledge of the secret.
Now, let us see how an opponent (Carol), without knowing the secret, can succeed in impersonating Alice by taking part in this protocol. To start with, we consider the simple case t = 1 (which corresponds to Fiat and Shamir’s original scheme). Carol can start the process by generating a random c and γ and computing w = (–1)γc2. Now, Carol should send the response c or cx1 depending on whether Bob sends ∊1 = 0 or 1. Her capability of sending both correctly is equivalent to her knowledge of x1. If Bob sends ∊1 = 0, then she can provide the correct response c. Otherwise, Carol can at best select a random response from
, and the probability that this is correct is overwhelmingly low. On the other hand, let Carol choose a random c and
and send the (improper) witness
. In that case, Carol can answer the valid response r = c, if Bob’s challenge is ∊1 = 1. Sending the correct response
to the challenge ∊1 = 0 now requires knowledge of x1. Therefore, if ∊1 is randomly chosen by Bob (without the prior knowledge of Carol), Carol can successfully respond with probability (very close to) 1/2. For t ≥ 1, this probability of a cheat by Carol can be easily shown to be (very close to) 1/2t which is negligibly small for t ≥ 80.
In practice, however, t is chosen to be O(ln ln n). It is, therefore, necessary to repeat the protocol t′ times, so that the probability of a successful cheat becomes (nearly) 1/2tt′. Taking t′ = Θ(ln n) is recommended. It can be shown that these choices for t and t′ offer the FFS protocol the desired ZK property. Without going into a proof of this assertion, let us informally explain the ZK property of the FFS protocol. Neither Bob nor a listener to the conversation between Alice and Bob can get any idea of the secret (x1, . . . , xt). Bob gets as a response the product of c and those xi’s for which ∊i = 1. Since c is randomly chosen by Alice and is not available to Bob, there is no way to choose a strategic challenge. However, if the square root of w (or –w) can be computed by Bob, then the interaction may give away partial information on the secret. For example, if Bob chooses the challenge (∊1, ∊2, . . . , ∊t) = (1, 0, . . . , 0), then Alice’s response would be cx1 from which x1 can be computed by Bob, if he knows c. Thus, the security and the ZK property of the FFS protocol are based on the assumption that computing square roots modulo n is an infeasible computational problem.
The GQ identification protocol is based on the intractability of the RSA problem. The correctness of Algorithm 5.70 (for a legitimate claimant) is easy to establish. The check w′ ≠ 0 is necessary to avoid the commitment c = 0, which makes a claimant succeed always.
A TTP typically selects the domain parameters p, q, n, e and d. It also selects m and gives s to Alice without revealing d. The execution of the protocol does not require the use of the decryption exponent d. In fact, d is a global secret, whereas s is Alice’s personal secret. Alice tries to prove the knowledge of s (and not of d).
In the GQ algorithm, the power s∊ is blinded by multiplying it with the random commitment c. As a witness for c, Alice presents its encrypted version w. With the assumption that RSA decryption without the knowledge of the decryption exponent d is infeasible, Bob (or an eavesdropper) cannot compute c and hence cannot separate out the value of s∊. Thus, no partial information on s is provided. Furthermore, each invocation requires a random ∊. In order to compute a strategic witness, Carol can at best have a guess of ∊. The guess is correct with a probability of 1/e. If e is reasonably large, the probability of a successful cheat is low. However, larger values of e lead to more expensive generation of the witness from the commitment (and also of the response). So small values of e (say, 216 + 1 = 65,537) are usually recommended. In that case, repeating the protocol a suitable number of times makes Carol’s chance of cheating as small as one desires. Taking t′e (where t′ is the number of iterations of the protocol) of the order of (log n)α for some constant α gives the GQ protocol the desired zero-knowledge property.
|
Selection of domain parameters: Select two distinct large primes p and q and set the modulus n := pq. Select an exponent The pair (n, e) is made public and d is kept secret. Selection of Alice’s secret: Alice selects a random Alice makes m public and keeps s secret. The protocol:
Bob computes w′ := m∊re (mod n). Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ = w. |
The Schnorr protocol is based on the intractability of computing discrete logarithms in a large prime field
. We assume that a suitably large prime divisor q of p – 1 and an element
of multiplicative order q are known. The algorithm works in the subgroup of
, generated by g. In order to make the known algorithms for solving the DLP infeasible for the field
, one should have q > 2160.
|
Selection of domain parameters: Select a large prime p such that p – 1 has a large prime divisor q. Select an element Publish (p, q, g).
Selection of Alice’s secret: Alice chooses a random secret integer Alice computes and makes public the integer y := g–d (mod p). The protocol:
Bob computes w′ := gry∊ (mod p). Bob accepts Alice’s identity if and only if w′ = w. |
We leave the analysis of correctness and security of this protocol to the reader. The secret s is masked from Bob and other eavesdroppers by introducing the random additive bias c modulo q. The probability of a successful cheat by an adversary is 2–t, since ∊ is chosen randomly from a set of cardinality 2t. Usually the Schnorr protocol is not used iteratively. Therefore, t ≥ 40 is recommended for making the probability of cheating negligible. On the other hand, if t is too large, then the protocol can be shown to lose the ZK property. For the generation of the witness from the commitment, Alice computes a modular exponentiation to an exponent which is O(q). Generating the response, on the other hand, involves a single multiplication (and a single addition) modulo q and hence is very fast.
| 5.25 |
| |
| 5.26 | Let n := pq with distinct primes p and q each congruent to 3 modulo 4.
|
All the material studied in earlier chapters culminates in this relatively short chapter which describes some popular cryptographic algorithms. We address most of the problems relevant in cryptography, namely, encryption, key agreement, digital signatures and entity authentication. Against each algorithm we mention the (provable or alleged) source of security of the algorithm.
Encryption algorithms are treated first. We start with the seemingly most popular RSA algorithm. This algorithm derives its security from the RSA key inversion problem and the RSA problem. The key inversion problem is probabilistic polynomial-time equivalent to the integer factorization problem. The intractability of the RSA problem is unknown. At present no algorithm other than factoring the RSA modulus is known for solving the RSA problem. We subsequently describe Rabin encryption (based on the square root problem), Goldwasser–Micali encryption (based on the quadratic residuosity problem), Blum–Goldwasser encryption (based on the square root problem), ElGamal encryption (based on the Diffie–Hellman problem) and Chor–Rivest encryption (based on a variant of the subset sum problem). The XTR encryption algorithm is essentially an efficient implementation of ElGamal encryption and is based on a tricky representation of elements in certain finite fields. The last encryption algorithm we discuss is the NTRU algorithm. It derives its security from a mixing system that uses the algebra
. Attacks on NTRU based on the shortest vector problem are also known.
The basic key-agreement scheme is the Diffie–Hellman scheme. In order to prevent small-subgroup attacks on this scheme, one employs a technique known as cofactor expansion. We then explain unknown key-share attacks against key-agreement schemes. These attacks necessitate the use of authenticated key agreement schemes. The MQV algorithm is presented as an example of an authenticated key-agreement scheme.
Next come digital signature algorithms. Digital signatures may be classified in two broad categories: signature schemes with appendix and signature schemes with message recovery. In this book, we study only the signature schemes with appendix. As specific examples of signature schemes, we first explain RSA and Rabin signatures. Then, we present several variants of discrete-log-based signature schemes: ElGamal signatures, Schnorr signatures, Nyberg–Rueppel signatures, the digital signature algorithm (DSA) and its elliptic curve variant ECDSA. All the discrete-log (over finite fields)-based signature schemes have efficient XTR implementations. The NTRUSign algorithm is the last general-purpose signature scheme discussed in this section.
We then present a treatment of some special signature schemes. Blind signatures are created on messages unknown to the signer. Three blind signature schemes are described: Chaum, Schnorr and Okamoto–Schnorr schemes. An undeniable signature, on the other hand, requires an active participation of the signer at the time of verification and comes with a denial protocol that prevents a signer from denying a valid signature at a later time. The Chaum–Van Antwerpen undeniable signature scheme is based on the discrete-log problem, whereas the GKR scheme is based on the RSA problem.
A way to guarantee both authentication and confidentiality of a message is to sign the message and then encrypt the signed message. This involves two basic operations (signature generation and encryption). Zheng’s signcryption scheme combines these two primitives with a view to reducing both running time and message expansion.
The final topic we discuss in this chapter is entity authentication, a mechanism by means of which an entity can prove its identity to another. Here identity of an entity is considered synonymous with the possession of some secret information by the entity. Passwords are called weak authentication schemes, since the claimant has to disclose the secret straightaway to the verifier. A strong authentication scheme (also called a challenge–response scheme) does not reveal the secret to the verifier. We describe two strong authentication schemes; the first is based on encryption and the second on digital signatures. A way to establish mutual authentication between two entities is also presented. Challenge–response algorithms may be vulnerable to some attacks mounted by the verifier. A zero-knowledge protocol comes with a proof that during the authentication conversation no information is leaked to the verifier. Three zero-knowledge protocols are discussed: the Feige–Fiat–Shamir protocol, the Guillou–Quisquater protocol, and the Schnorr protocol.
Public-key cryptography was born from the seminal works of Diffie and Hellman [78] and Rivest, Shamir and Adleman [252]. Though still young, this area has induced much research in the last three decades. In this chapter, we have made an attempt to summarize some important cryptographic algorithms proposed in the literature. The original papers where these techniques have been introduced are listed below. We don’t plan to be exhaustive, but mention only the most relevant resources.
| Algorithm | Reference(s) |
|---|---|
| RSA encryption | [252] |
| Rabin encryption | [246] |
| Goldwasser–Micali encryption | [117] |
| Blum–Goldwasser encryption | [27] |
| ElGamal encryption | [84] |
| Chor–Rivest encryption | [54] |
| XTR encryption | [170, 172, 171, 173, 289, 297] |
| NTRU encryption | [130] |
| Identity-based encryption | [267, 34, 35] |
| Diffie–Hellman key exchange | [78] |
| Menezes–Qu–Vanstone key exchange | [161] |
| RSA signature | [252] |
| Rabin signature | [246] |
| ElGamal signature | [84] |
| Schnorr signature | [263] |
| Nyberg–Rueppel signature | [223, 224] |
| DSA | [220] |
| ECDSA | [141] |
| XTR signature | [170, 172, 171, 173, 289, 297] |
| NTRUSign | [110, 111, 128, 129, 131, 217] |
| Chaum blind signature | [48, 49, 50] |
| Schnorr blind signature | [263, 202] |
| Okamoto–Schnorr blind signature | [227, 236] |
| Chaum–Van Antwerpen undeniable signature | [51, 52, 53] |
| RSA undeniable signature | [109, 187, 102, 186] |
| Signcryption | [310, 311, 312] |
| Signcryption based on elliptic curves | [313, 314] |
| Identity-based signcryption | [178, 185] |
| Feige–Fiat–Shamir ZK protocol | [90, 91] |
| Guillou–Quisquater ZK protocol | [122] |
| Schnorr ZK protocol | [263] |
The Handbook of Applied Cryptography [194] is a single resource where most of the above algorithms have been discussed in good details. See Chapter 8 of this book for encryption algorithms, Chapter 11 for digital signatures and Chapter 10 for identification schemes.
There are several other (allegedly) intractable mathematical problems based on which cryptographic protocols can be built. Some of the promising candidates that we left out in the text are summarized below:
| Algorithm | Intractable problem |
|---|---|
| LUC [284, 285, 286] | RSA and ElGamal-like problems based on Lucas sequences |
| Goldreich–Goldwasser–Halevi [115] | lattice-basis reduction |
| Patarin’s hidden field equation | solving multivariate polynomial |
| (HFE) [232] | equations |
| EPOC/ESIGN [97, 228] | factorization of integers p2q |
| McEliece encryption [190] | decoding of error-correcting codes |
| Number field cryptography [38, 39] | discrete log problem in class groups of quadratic fields |
| KLCHKP (Braid group cryptosystem) [148] | Braid conjugacy problem |
The Internet site http://www.tcs.hut.fi/~helger/crypto/link/public/index.html is a good place to start, for more information on these (and some other) cryptosystems. Also visit http://www.kisa.or.kr/technology/sub1/index-PKC.htm.
The obvious question that crops up now is that, given so many different cryptographic schemes, which one a user should go for.[5] There is no clear-cut answer to this question. One has to study the relative merits and demerits of the systems. If computational efficiency is what matters, we advocate users to go for NTRU schemes. Having said that, we must also add that the NTRU scheme is relatively new and has not yet withstood sufficient cryptanalytic attacks. Various attacks on NSS and NTRUSign cast doubt about the practical safety of applying such young schemes in serious applications.
[5] It is worthwhile to issue a warning to the readers. Many cryptographic algorithms (and also the idea of public-key cryptography) are/were patented. In order to implement these algorithms (in particular, for commercial purposes), one should take care of the relevant legal issues. We summarize here some of the important patents in this area. The list is far from exhaustive.
Patent No.
Covers
Patent holder
Date of issue
US 4,200,770
Diffie–Hellman key exchange (includes ElGamal encryption)
Stanford University
Apr 29, 1980
US 4,218,582
Public-key cryptography
Stanford University
Aug 19, 1980
US 4,405,829
RSA
MIT
Sep 20, 1983
US 5,231,668
DSA
USA, Secretary of Commerce
Jul 27, 1993
US 5,351,298
LUC
P. J. Smith
Sep 27, 1994
US 5,790,675
HFE
CP8 Transac (France)
Aug 4, 1998
EP 0963635A1 / WO 09836526
XTR
Citibank (North America)
Dec 15, 1999
Aug 20, 1998
US 6,081,597
NTRU
NTRU Cryptosystems, Inc.
Jun 27, 2000
—
EPOC/ESIGN
Nippon Telegraph and Telephone Corporation
Apr 17, 2001
Our mathematical trapdoors are not provably secure and this is where the problems begin. We have to rely on historical evidences that should not be collected too hastily. Slow as it is, RSA has stood the test of time, and has successfully survived more than twenty years of cryptanalytic attacks [29]. The risks attached to the fact that an unforeseen attack will break the system tomorrow, appear much less with RSA, compared to newer schemes that have enjoyed only little cryptanalytic studies. The hidden monomial system proposed by Imai and Matsumoto [188] was broken by Patarin [231]. As a by-product, Patarin came up with the idea of cryptosystems based on hidden field equations (HFE) [232]. No serious attacks on HFE are known till date, but as we mentioned earlier, only time will show whether HFE is going to survive.
Bruce Schneier asserts in his Crypto-gram news-letter (15 March 1999, http://www.counterpane.com/crypto-gram.html): No one can duplicate the confidence that RSA offers after 20 years of cryptanalytic review. A standard security review, even by competent cryptographers, can only prove insecurity; it can never prove security. By following the pack you can leverage the cryptanalytic expertise of the worldwide community, not just a handful of hours of a consultant’s time.
Twenty-odd years is definitely not a wide span of time in the history of evolution of our knowledge, but public-key cryptography is only as old as RSA is!
| 6.1 | Introduction |
| 6.2 | IEEE Standards |
| 6.3 | RSA Standards |
| Chapter Summary | |
| Sugestions for Further Reading |
In theory, there is no difference between theory and practice. But, in practice, there is.
—Jan L. A. van de Snepscheut
ECC curves are divided into three groups, weak curves, inefficient curves, and curves patented by Certicom.
—Peter Gutmann
Acceptance of prevailing standards often means we have no standards of our own.
—Jean Toomer (1894 – 1967)
Public-key cryptographic protocols deal with sets like the ring
of integers modulo n, the multiplicative group
of units in a finite field or the group
of points in an elliptic curve over a finite field. Messages that need to be encrypted or signed are, on the other hand, usually human-readable text or numbers or keys of secret-key cryptographic protocols, which are typically represented in computers in the form of sequences of bits (or bytes). It is necessary to convert such bit stings (or byte strings) to mathematical elements before the cryptographic algorithms are applied. This conversion is referred to as encoding. The reverse transition, that is, converting mathematical entities back to bit strings is called decoding.
If Alice and Bob were the only two parties involved in deploying public-key protocols, they could have agreed upon a set of private (not necessarily secret) encoding and decoding rules. In practice, however, when many entities interact over a public network, it is impractical, if not impossible, to have an individual encoding scheme for every pair of communicating parties. This is also unnecessary, because the security of the protocols comes from the encryption process and not from encoding. On the contrary, poorly designed encoding schemes may endanger the security of the underlying protocols.
We, therefore, need a set of standard ways of converting data between various logical formats. This promotes interoperability, removes ambiguities, facilitates simplicity in handling cryptographic data and thereby enhances the applicability and acceptability of public-key algorithms. IEEE (The Institute of Electrical and Electronics Engineers, Inc., pronounced eye-triple-e) and the RSA laboratories have published extensive documents standardizing data conversion and encoding for many popular public-key cryptosystems. Here we summarize the contents of some of these documents. This exposition is meant mostly for software engineers intending to develop cryptographic tool-kits that conform to the accepted standards.
In this section, we outline the first three of the drafts from IEEE, shown in Table 6.1. At the time of writing this book, these are the latest versions of the drafts available from IEEE. In future, these may be superseded by newer documents. We urge the reader to visit the web-site http://grouper.ieee.org/groups/1363/ for more up-to-date information. Also see the standard IEEE 1363–2000: Standards Specifications for Public-key Cryptography [134].
| Draft | Date | Description |
|---|---|---|
| P1363 / D13 | 12 November 1999 | Traditional public-key cryptography based on IFP, DLP and ECDLP |
| P1363a/D12 | 16 July 2003 | Additional techniques on traditional public-key cryptography |
| P1363.1/D4 | 7 March 2002 | Lattice-based cryptography |
| P1363.2/D15 | 25 May 2004 | Password-based authentication |
| P1363.3/D1 | May 2008 | Identity-based public-key cryptography |
Public-key protocols operate on data of various types. The IEEE drafts specify only the logical descriptions of these data types. The realizations of these data types should be taken care of by individual implementations and are left unspecified.
A bit string is a finite ordered sequence a0a1 . . . al–1 of bits, where each bit ai can assume the value 0 or 1. The length of the bit string a0a1 . . . al–1 is l. The bit a0 in the bit string a0a1 . . . al–1 is called the leftmost or the first or the leading or the most significant bit, whereas the bit al–1 is called the rightmost or the last or the trailing or the least significant bit.
The order of appearance of the bits in a bit string is important, rather than the way the bits are indexed or named. That is to say, the most and least significant bits in a given bit string are uniquely determined by their positions of occurrences in the string, and not by the way the individual bits in the string are numbered. Thus, for example, if we call the bit string 01101 as a0a1a2a3a4, then the leading and trailing bits are a0 and a4 respectively. If we index the bits in the same bit string as a2a3a5a7a11, the first bit is a2 and the last bit is a11. Finally, for the indexing a5a4a3a2a1, the leftmost and rightmost bits are a5 and a1 respectively.
Though bits are the basic building blocks in computer memory, programs typically access memory in groups of 8 bits, known as octets. Thus, an octet is a bit string of length 8 and can have one of the 256 values 0000 0000 through 1111 1111. It is convenient to write an octet as a concatenation of two hexadecimal digits, the first (resp. second) one corresponding to the first (resp. last) 4 bits in the octet being treated as an 8-bit integer in base 2. For example, the octet 0010 1011 is represented by 2b. It is also often customary to treat an octet a0a1 . . . a7 as the integer (between 0 and 255, both inclusive) whose binary representation is a0a1 . . . a7.
An octet string is a finite ordered sequence of octets. The length of an octet string is the number of octets in the string. The leftmost (or first or leading or most significant) and the rightmost (or last or trailing or least significant) octets in an octet string are defined analogously as in the case of bit strings. These octets are dependent solely on their positions in the octet string and are independent of how the individual octets in the octet string are numbered.
Integers are the whole numbers 0, ±1, ±2, . . . . For cryptographic applications, one typically considers only non-negative integers. Integers used in cryptography may have binary representations requiring as many as several thousand bits.
Let p be a prime (typically, odd). The elements of
are represented as integers 0, 1, . . . , p – 1 under the standard way of associating the integer
with the congruence class [a]p in
. Arithmetic operations in
are the corresponding integer operations modulo the prime p.
The elements of the field
are represented as bit strings of length m. In order to provide the mathematical interpretation of these bit strings, we recall that
is an m-dimensional
-vector space. Let β0, . . . , βm–1 be an ordered basis of
over
. The bit string a0 . . . am–1 is to be identified with the element a0β0 + · · · + am–1βm–1, where the bit ai represents the element [ai]2 of
. Selection of the basis β0, . . . , βm–1 renders a complete meaning to this representation and determines how arithmetic operations on these elements are to be performed. The following two cases are recommended.
For the polynomial-basis representation, one chooses an irreducible polynomial
of degree m and represents
as
. Letting x denote the canonical image of X in
one chooses the ordered basis β0 = xm–1, β1 = xm–2, . . . , βm–1 = 1. Arithmetic operations in
under this representation are those of
followed by reduction modulo the defining polynomial f(X). Choice of the irreducible polynomial f(X) is left unspecified in the IEEE drafts.
For the normal-basis representation, one selects an element
which is normal over
(see Definition 2.60, p 86), and takes the ordered basis β0 = θ = θ20, β1 = θ21, β2 = θ22, . . . , βm–1 = θ2m–1. Arithmetic in
is carried out as explained in Section 2.9.3.
The IEEE draft P1363a also specifies a composite-basis representation of elements of
, provided that m is composite. Let m = ds with 1 < d < m. One chooses an (ordered) polynomial or normal basis γ0, γ1, . . . , γs–1 of
over
. An element of
is of the form a0γ0 + a1γ1 + · · · + as–1γs–1 and is represented by a0a1 . . . as–1, where each ai, being an element of
, is represented by a bit string of length d. The interpretation of the representation of ai is dependent on how
is represented. One can use a polynomial- or normal-basis representation of
(over
), or even a composite-basis representation of
over
, if d happens to be composite with a non-trivial divisor d′.
A non-prime finite field of odd characteristic is one with cardinality pm for some odd prime p and for some
, m > 1. The field
is represented as
, where
is an irreducible polynomial of degree m. An element of
is then of the form α = am–1xm–1 + · · · + a1x + a0, where x := X + 〈f(X)〉 and where each ai is an element of
, that is, an integer in the range 0, 1, . . . , p – 1. The element α is represented as an integer by substituting p for x, that is, as the integer
(see the packed representation of Exercise 3.39). In order to interpret an integer between 0 and pm – 1 as an element of
, one has to expand the integer in base p.
An elliptic curve defined over a finite field
is specified by two elements a,
. Depending on the characteristic of
this pair defines the following curves.
If char
, 3, then 4a3 + 27b2 must be non-zero in
and the equation of the elliptic curve is taken to be Y2 = X3 + aX + b.
For char
, we must have b ≠ 0 in
and we use the non-supersingular curve Y2 + XY = X3 + aX2 + b. Because of the MOV attack (Section 4.5.1), supersingular curves are not recommended for cryptographic applications.
Finally, if
has characteristic 3, then both a and b must be non-zero in
and the elliptic curve Y2 = X3 + aX2 + b is specified by (a, b).
A point
on an elliptic curve defined over
can be represented either in compressed or in uncompressed form. In the uncompressed form, one represents P as the pair (h, k) of elements of
. The compressed form can be either lossy or lossless. In the lossy compressed form, P is represented by its X-coordinate h only. Such a representation is not unique in the sense that there can be two points on the elliptic curve with the same X-coordinate h. In applications where Y -coordinates of elliptic curve points are not utilized, such a representation can be used. In the lossless compressed form, one represents P as
. There are two solutions (perhaps repeated) for Y for a given value h of X. The bit
specifies which of these two values is represented. Depending on how the bit
is computed, we have two different lossless compressed forms.
The LSB compressed form is applicable for odd prime fields
or fields
of even characteristic. For
, the bit
is taken to be the least significant (that is, rightmost) bit of k (treated as an integer). For
, we have
, if h = 0, whereas if h ≠ 0, then
is the least significant bit of the element kh–1 treated as an integer via the FE2I conversion primitive described in Section 6.2.2.
The SORT compressed form is used for q = pm, m > 1. Let P′ = (h, k′) be the opposite of P = (h, k), that is,
One converts k and k′ to integers
and
using the FE2I primitive and sets
.
One may also go for a hybrid representation of the elliptic curve point P = (h, k), in which information for both the compressed and the uncompressed representations for P are stored, that is, P is stored as
with
computed by one of the methods (LSB or SORT) described above.
For NTRU public-key cryptosystems, we work in the ring
. We denote
as usual. An element of R is a polynomial a(x) = a0 + a1x + a2x2 + · · · + an–1xn–1 with
, and is represented by the ordered n-tuple of integers (a0, a1, . . . , an–1). Addition (resp. subtraction) in R is simply component-wise addition (resp. subtraction), whereas multiplication of a(x) = a0 + a1x + · · · + an–1xn–1 and b(x) = b0 + b1x + · · · + bn–1xn–1 gives c(x) = c0 + c1x + · · · + cn–1xn–1, where
ajbk (see Section 5.2.8). The IEEE draft P1363.1 designates elements of R as ring elements.
It is customary to deal with polynomials in R with small coefficients. If all the coefficients of
are known to be from {0, 1}, it is convenient to represent a(x) as the bit string a0a1 . . . an–1 instead of as an n-tuple of integers. In this case, a(x) is called a binary ring element or simply a binary element.
The IEEE drafts P1363 and P1363.1 specify algorithms for converting data among the formats discussed above. The standardized data conversion primitives are summarized in Figure 6.1. Though these drafts support elliptic curve cryptography, it is not specified how data representing elliptic curves can be converted to data of other types (like octet strings and bit strings).

We now provide a brief description of the data conversion primitives at a logical level. The implementation details depend on the representations of the data types and are left out here.
A bit string a0a1 . . . al–1 can be broken up in groups of eight bits and packed into octets. But we run with difficulty, if the length of the input bit string is not an integral multiple of 8. We have to add extra bits in order the make the length of the augmented bit string an integral multiple of 8. This can be done is several ways and in this context a standard convention needs to be adopted. The IEEE drafts prescribe the following rules:
Every extra bit added must be the zero bit.
Add the minimal number of extra bits.
Add the extra bits, if any, to the left.[1]
[1] At the time of writing this book there is a serious conflict between the latest drafts of P1363 and P1363.1 from IEEE. The former asks to add extra bits to the left, the latter to the right. One of the authors of this book raised this issue in the discussion group stds-p1363-discuss maintained by IEEE and was notified that in the next version of the P1363.1 document this conflict would be resolved in favour of P1363.
In order to see what these rules mean, let a0a1 . . . al–1 be a bit string of length l to be converted to the octet string A0A1 . . . Ad–1. The length of the output octet string must be d = ⌈l/8⌉. 8d – l zero bits should be added to the left of the input bit string in order to create the augmented bit string 0 . . . 0a0a1 . . . al–1 whose length is 8d. Now, we start from the left and pack blocks of consecutive eight bits in A0, A1, . . . , Ad–1. Thus, we have A0 = 0 . . . 0a0 . . . ak–1, A1 = ak . . . ak+7, . . . , Ad–1 = ak+8(d–2) . . . ak+8(d–2)+7, where k = 8 – (8d – l). Note that if l is already a multiple of 8, then 8d – l = 0, that is, no extra bits need to be added.
As an example, consider the input bit string 01110 01101011 of length 13. The output octet string should be of length ⌈13/8⌉ = 2. Padding gives the augmented bit string 00001110 01101011. The first octet in the output octet string will then be 00001110, that is, 0e; and the second octet will be 01101011, that is, 6b.
The OS2BS primitive is designed to ensure that if we convert an octet string generated by BS2OS, we should get back the original bit string (that is, the input to BS2OS) with which we started. Suppose that we want to convert an octet string A0A1 . . . Ad–1. Let us write the bits of Ai as ai,0ai,1 . . . ai,7. The desired length l of the output bit string has to be also specified. If d ≠ ⌈l/8⌉, the procedure OS2BS reports error and stops. If d = ⌈l/8⌉, we consider the bit string
a0,0a0,1 . . . a0,7a1,0a1,1 . . . a1,7 . . . ad–1,0ad–1,1 . . . ad–1,7
of length 8d. If the leftmost 8d – l bits of this flattened bit string are not all zero, OS2BS should quit after reporting error. Otherwise, the trailing l bits of the flattened bit string is returned.
The reader can check that when 0e 6b and l = 13 are input to OS2BS, it returns the bit string 01110 01101011. (See the example in connection with BS2OS.) Notice also that for this input octet string, OS2BS reports error if and only if a value l ≥ 17 or l ≤ 11 is supplied as the desired length of the output bit string.
Let a non-negative integer n be given. The I2BS primitive outputs a bit string of length l representing n. If n ≥ 2l, this conversion cannot be done and the primitive reports error and quits. If n < 2l, we write the binary representation of n as
n = al–12l–1 + al–22l–2 + · · · + a12 + a0 with
.
Treating each ai as a bit[2], I2BS returns the bit string al–1al–2 . . . a1a0. One or more leading bits of the binary representation of n may be zero. There is no limit on how many leading zero bits are allowed during the conversion. In particular, the integer 0 gets converted to a sequence of l zero bits for any value of l supplied.
[2] Each ai is logically an integer which happens to assume one of two possible values: 0 and 1. A bit, on the other hand, is a quantity that can also assume only two possible values. Traditionally, the values of a bit are also denoted by 0 and 1. But one has the liberty to call these values off and on, or false and true, or black and white, or even armadillo and platypus. To many people, bit is an abbreviation for binary digit which our ais logically are. To others, binit is a safer and more individualistic acronym for binary digit. For I2BS, we identify the two concepts.
A request to I2BS to convert n = 2357 = 211 + 28 + 25 + 24 + 22 + 20 with l = 12 returns 1001 00110101, one with l = 18 returns 00 00001001 00110101 and finally one with l ≤ 11 reports failure. Note that for neater look we write bit strings in groups of eight and grouping starts from the right. This convention reflects the relationship between bit strings and octet strings, as mentioned above.
The primitive BS2I converts the bit string a0a1 . . . al–1 to the integer a02l–1 + a12l–2 + · · · + al–22 + al–1, where we again identify a bit with an integer (or a binary digit). As an illustrative example, the bit string 1001 00110101 (or 00 00001001 00110101) gets converted to the integer 211 + 28 + 25 + 24 + 22 + 20 = 2357. The null bit string (that is, the one of zero length) is converted to the integer 0.
In order to convert a non-negative integer n to an octet string of length d, we write the base-256 expansion of n as
n = Ad–1256d–1 + Ad–2256d–2 + · · · + A1256 + A0,
where each
and can be naturally identified with an octet. I2OS returns the octet string Ad–1Ad–2 . . . A1A0. Note that the above representation of n to the base 256 is possible if and only if n < 256d. If n ≥ 256d, I2OS should return failure. Like bit strings, an arbitrary number of leading zero octets are allowed.
Consider the integer 2357 = 9 × 256 + 53. The two-digit hexadecimal representations of 9 and 53 are 09 and 35 respectively. Thus, a call of I2OS on this n with d = 3 (resp. d = 2, resp. d = 1) returns 00 09 35 (resp. 09 35, resp. failure).
Let an octet string A0A1 . . . Ad–1 be given. Each Ai can be identified with a 256-ary digit. OS2I returns the integer A0256d–1 + A1256d–2 + · · · + Ad–2256 + Ad–1. If d = 0, the integer 0 should be output.
In the IEEE P1363 jargon, a field element is an element of the finite field
, where q is a prime or an integral power of a prime. We want to convert an element
to an octet string. Depending on the value of q, we have two cases:
If char
is odd, β is represented as an integer in {0, 1, . . . , q – 1}. FE2OS converts β to an octet string of length ⌈log256 q⌉ by calling the primitive I2OS.
If q = 2m, β is represented as a bit string of length m. The primitive BS2OS is called to convert β to an octet string.
Assume that an octet string is to be converted to an element of the finite field
. Again we have two possibilities depending on q.
If
is of odd characteristic, the primitive OS2I is called to convert the given octet string to an integer. This integer is returned as the field element.
If q = 2m, one calls the primitive OS2BS with the given octet string and with the length m supplied as inputs. The resulting bit string is returned by OS2FE. If OS2BS reports error, so should do OS2FE too.
Let
and the integer equivalent of β be sought for. If q is odd, then β is already represented as an integer (in {0, 1, . . . , q – 1}) and is itself output. If q = 2m, one first converts β to an octet string by FE2OS and subsequently converts this octet string to an integer by calling the primitive OS2I.
The point
at infinity (on an elliptic curve over
) is defined by an octet string comprising a single zero octet only. So let P = (h, k) be a finite point. The EC2OS primitive produces an octet string PO = P C ‖ H ‖ K which is the concatenation of a single octet PC with octet strings H and K representing h and k respectively. The values of PC and K depend on the type of compression used. One has
, where
S = 1 if and only if the SORT compression is used.
U = 1 if and only if uncompressed or hybrid form is used.
C = 1 if and only if compressed or hybrid form is used.
=
if compression is used, it is 0 otherwise.
The first four bits of PC are reserved for (possible) future use and should be set to 0000 for this version of the standard. H is the octet string of length ⌈log256 q⌉ obtained by converting h using FE2OS. If the compressed form is used, K is the empty octet string, whereas if uncompressed or hybrid form is used, we have K = FE2OS(k, ⌈log256 q⌉). Finally, for the lossy compression we have PC = 0000 0001, H = FE2OS(h, ⌈log256 q⌉) and K is empty. Table 6.2 summarizes all these possibilities. Here, l := ⌈log256 q⌉, and p is an odd prime.
| Representation | PC | H | K | q |
| uncompressed | 0000 0100 | FE2OS(h, l) | FE2OS(k, l) | All |
| LSB compressed | ![]() | FE2OS(h, l) | Empty | p, 2m |
| LSB hybrid | ![]() | FE2OS(h, l) | FE2OS(k, l) | p, 2m |
| SORT compressed | ![]() | FE2OS(h, l) | Empty | 2m, pm |
| SORT hybrid | ![]() | FE2OS(h, l) | FE2OS(k, l) | 2m, pm |
| lossy compression | 0000 0001 | FE2OS(h, l) | Empty | All |
point at infinity ![]() | 0000 0000 | Empty | Empty | All |
The OS2EC data conversion primitive takes as input an octet string PO, the length l = ⌈log256 q⌉ and the method of compression. If PO contains only one octet and that octet is zero,
is output. Otherwise, the elliptic curve point P = (h, k) is computed as follows. OS2EC decomposes PO = PC ‖ H ‖ K, with PC the first octet and with H an octet string of length l. If PC does not match with the method of compression, OS2EC returns error. Otherwise, it uses OS2FE to compute the field element h. If no or hybrid compression is used, the Y -coordinate k is also computed using OS2FE on K. If (h, k) is not a point on the elliptic curve, error is reported. For the LSB or SORT compression, the Y -coordinate
is computed using h and
. If the hybrid scheme is used and
, OS2EC halts after reporting error. If all computations are successful till now, the point (h, k) is output.
Note that the checks for (h, k) being on the curve or for the equality
are optional and may be omitted. For the lossy compression scheme, the Y -coordinate k is not necessarily uniquely determined from the input octet string PO. In that case, any of the two possibilities is output.
Ring elements are elements of the convolution polynomial ring
and can be identified as polynomials with integer coefficients and of degrees < n. The element
(where
) is represented by the n-tuple of integers (a0, a1, . . . , an–1). The IEEE draft P1363.1 assumes that the coefficients ai are available modulo a positive integer β ≤ 256. But then each ai is an integer in {0, 1, . . . , β – 1} and can be naturally encoded by a single octet. RE2OS, upon receiving a(x) as input, outputs the octet string a0a1 . . . an–1 of length n.
An example: Let n = 7 and β = 128. The ring element a(x) = 2 + 11x + 101x3 + 127x4 + 71x5 = (2, 11, 0, 101, 127, 71, 0) is converted to the octet string 02 0b 00 65 7f 47 00.
Let an octet string a0a1 . . . an–1 of length n be given, which we want to convert to an element of
. Once again a modulus β ≤ 256 is assumed, so that each octet ai can be viewed as an integer reduced modulo β. Making the natural identification of ai with an integer, the polynomial
is output. Thus, for example, the octet string 02 0b 00 65 7f 47 00 gets converted to the ring element 2 + 11x + 101x3 + 127x4 + 71x5.
The RE2BS primitive assumes that the modulus β is a power of 2, that is, β = 2t for some positive integer t ≤ 8. Let a ring element
be given, where each
. One applies the I2BS primitive on each ai to generate the bit string ai,0ai,1 . . . ai,t–1 of length t. The concatenated bit string
a0,0a0,1 . . . a0,t–1 a1,0a1,1 . . . a1,t–1 . . . an–1,0an–1,1 . . . an–1,t–1
of length nt is then returned by RE2BS.
As before, take the example of n = 7, β = 128 = 27 (so that t = 7) and a(x) = 2 + 11x + 101x3 + 127x4 + 71x5 = (2, 11, 0, 101, 127, 71, 0). The coefficients 2, 11, 0, . . . should first be converted to bit strings of length 7 each, that is, 2 gives 0000010, 11 gives 0001011 and so on. Thus, the bit string output by RE2BS will be 0000010 0001011 0000000 1100101 1111111 1000111 0000000. Note that here we have shown the bits in groups of 7 in order to highlight the intermediate steps (the outputs from I2BS). With the otherwise standard grouping in blocks of 8, the output bit string looks like 0 00001000 01011000 00001100 10111111 11100011 10000000 and hence transforms to the octet string 00 08 58 0c bf d3 80 by an invocation of BS2OS. This example illustrates that RE2BS followed by BS2OS does not necessarily give the same output as the direct conversion RE2OS, even when every underlying parameter (like β) remains unchanged.
Once again we require the modulus β to be a power 2t of 2. Let a bit string a0a1 . . . al–1 of length l be given, and we want to compute the ring element a(x) equivalent to this. If l is not an integral multiple of t, the algorithm should quit after reporting error. Otherwise we let l = nt for some
, and repeatedly call the BS2I primitive on the bit strings a0a1 . . . at–1, atat+1 . . . a2t–1, . . . , ant–tant–t+1 . . . ant–1 to get the integers α0, α1, . . . , αn–1 respectively. The polynomial a(x) = α0 + α1x + · · · + αn–1xn–1 is then output.
We urge the reader to verify that BS2RE with β = 128 and the bit string
0000010 0001011 0000000 1100101 1111111 1000111 0000000
as input produces the ring element
.
A binary (ring) element is an element
with each
. One can convert a(x) to an octet string A0A1 . . . Al–1 of any desired length l as follows. We denote the bits in the octet Ai as Ai,7Ai,6 . . . Ai,0. Here, the index of the bits increases from right to left.
First we rewrite the polynomial a(x) as one of degree 8l – 1, that is, as a(x) = a0 + a1x + · · · + a8l–1x8l–1. If n ≤ 8l, this can be done by setting an = an+1 = · · · = a8l–1 = 0. On the other hand, if n > 8l and one or more of the coefficients a8l, a8l+1, . . . , an–1 are non-zero (that is, 1), the above rewriting of a(x) cannot be done and BE2OS terminates after reporting failure.
When the above rewriting of a(x) becomes successful, one sets the bits of the output octets as A0,0 := a0, A0,1 := a1, . . . , A0,7 := a7, A1,0 := a8, A1,1 := a9, . . . , A1,7 := a15, A2,0 := a16, A2,1 := a17, . . . , A2,7 := a23, . . . , Al–1,0 := a8l–8, Al–1,1 := a8l–7, . . . , Al–1,7 := a8l–1.
As an example, take n = 20 and consider the binary element
. First let l = 1. Rewriting a(x) as a polynomial of degree 7 is not possible, since the coefficients of x10 and x12 are 1; so BE2OS outputs error in this case. If l = 2, then the output octet string will be 00000111 00010100, that is, 07 14. For l ≥ 3, the first two octets will be 07 and 14 as before, whereas the 3rd through l-th octet will be 00.
The BE2OS primitive can be quite effective for reducing storage requirements. For example, the polynomial a(x) of degree 12 of the previous paragraph, viewed as an element of
, can be encoded in just two octets. Of course, by specifying l ≥ 3 one may add l – 2 trailing zero octets, if one desires. On the other hand, RE2OS requires exactly 200 octets, whereas RE2BS with β = 128 followed by BS2OS requires exactly ⌈(200 × 7)/8⌉ = 175 octets for storing the same a(x).
Assume that an octet string A0A1 . . . Al–1 of length l is given and the equivalent binary element in
is to be determined. As in the case with BE2OS, we index the bits in the octet Ai as Ai = Ai,7Ai,6 . . . Ai,0. Now, consider the polynomial a(x) = a0 + a1x + a2x2 + · · · + a8l–1x8l–1, where a8i+j = Ai,j. If n ≥ 8l, we set a8l = a8l+1 = · · · = an–1 = 0 and output the binary element
. On the other hand, if n < 8l and an = an+1 = · · · = a8l–1 = 0, then
equals the polynomial a(x) and is returned. Finally, if n < 8l and if any of the coefficients an, an+1, . . . , a8l–1 is non-zero, then OS2BE returns error.[3]
[3] In this case, it still makes full algebraic sense to treat a(x) as an element of R, though not in the canonical representation.
For example, assume that the octet string 07 14 is given as input to OS2BE. If n ≤ 12, the algorithm outputs error, because the polynomial a(x) in this case has degree 12. For any n ≥ 13, the binary element
is returned.
The public-key cryptography standards (PKCS) [254] refer to a set of standard specifications proposed by the RSA Laboratories. A one-line description of each of these documents is given in Table 6.3. In the rest of this section, we concentrate only on the documents PKCS #1 and #3.
| Document | Description |
|---|---|
| PKCS #1 | RSA encryption and signature |
| PKCS #2 | Merged with PKCS #1 |
| PKCS #3 | Diffie–Hellman key exchange |
| PKCS #4 | Merged with PKCS #1 |
| PKCS #5 | Password-based cryptography |
| PKCS #6 | Extension of X.509 public-key certificates |
| PKCS #7 | Syntax of cryptographic messages |
| PKCS #8 | Syntax and encryption of private keys |
| PKCS #9 | Attribute types for use in PKCS #6, #7, #8 and #10 |
| PKCS #10 | Syntax for certification requests |
| PKCS #11 | Cryptoki, an application programming interface (API) |
| PKCS #12 | Syntax of transferring personal information (private keys, certificates and so on) |
| PKCS #13 | Elliptic curve cryptography (under preparation) |
| PKCS #15 | Syntax for cryptographic token (like integrated circuit card) information |
PKCS #1 describes RSA encryption and RSA signatures. In this section, we summarize Version 2.1 (dated 14 June 2002) of the standard. This version specifies cryptographically stronger encoding procedures compared to the older versions. More specifically, the optimal asymmetric encryption procedure (OAEP [18]) for RSA encryption is incorporated in the Version 2.0 of PKCS #1, whereas the new probabilistic signature scheme (PSS [19]) is introduced in Version 2.1. This latest draft also includes encryption and signature schemes compatible with older versions (1.5 and 2.0). However, adoption of the new algorithms is strongly recommended for enhanced security.
PKCS #1 Version 2.1 introduces the concept of multi-prime RSA, in which the RSA modulus n may have more than two prime divisors. For RSA encryption and decryption to work properly, we only need n to be square-free (Exercise 4.1). Using u > 2 prime divisors of n increases efficiency and does not degrade the security of the resulting system much, as long as u is not very large. More specifically, if T is the time for RSA private-key operation without CRT, then the cost of this operation with CRT is approximately T/u2 (neglecting the cost of CRT combination).
So an RSA modulus is of the form n = r1r2 . . . ru with u ≥ 2 and with pairwise distinct primes r1, . . . , ru. For the sake of conformity with the older versions of the standard, the first two primes are given the alternate special names p := r1 and q := r2. PKCS #1 does not mention any specific way of choosing the prime divisors ri of n, but encourages use of primes that make factorization of n difficult.
An RSA public exponent is an integer e, 3 ≤ e ≤ n – 1, with gcd(e, λ(n)) = 1, where λ(n) := lcm(r1 – 1, r2 – 1, . . . , ru – 1). An RSA public key is a pair (n, e) with n and e chosen as above.
The RSA private key corresponding to (n, e) can be stored in one of the two formats. In the first format, one maintains the pair (n, d) with the private exponent d so chosen as to make ed ≡ 1 (mod λ(n)). In the second format, one stores the five quantities (p, q, dP, dQ, qInv) and, if u > 2, the triples (ri, di, ti) for each i = 3, . . . , u. The meanings of these quantities are as follows:
| p | = | r1 |
| q | = | r2 |
| dP | ≡ | e–1 (mod p – 1) |
| dQ | ≡ | e–1 (mod q – 1) |
| qInv | ≡ | q–1 (mod p) |
| di | ≡ | e–1 (mod ri – 1) |
| ti | ≡ | (r1 . . . ri–1)–1 (mod ri) |
For the sake of consistency, one should store the CRT coefficient
(mod r2), that is, p–1 (mod q). In order to ensure compatibility with older versions of PKCS, q–1 (mod p) is stored instead.
The RSA public-key operation is used to encrypt a message or to verify a signature. The PKCS draft calls these primitives RSAEP (encryption primitive) and RSAVP1 (verification primitive). It is implemented in a straightforward manner as in Algorithm 6.1.
|
Input: RSA public key (n, e) and message/signature representative x. Output: The ciphertext/message representative y. Steps: if (x < 0) or (x ≥ n) { Return “Error: representative out of range”. } y := xe (mod n). |
The RSA decryption or signature-generation primitive is called RSADP or RSASP1 and is given in Algorithm 6.2. The operation depends on the format in which the private key K is stored. The correctness of the primitive is left to the reader as an easy exercise.
|
Input: RSA private key K and the ciphertext/message representative y. Output: The message/signature representative x. Steps: if (y < 0) or (y ≥ n) { Return “Error: representative out of range”. } |
The encryption scheme RSAES–OAEP is based on the optimal asymmetric encryption procedure (OAEP) proposed by Bellare and Rogaway [18, 98]. In this procedure, a string of length slightly less than the size of the modulus n is probabilistically encoded using a hash function and the encoded message is subsequently encrypted. The probabilistic encoding makes the encryption procedure semantically secure and (provably) provides resistance against chosen-ciphertext attacks. Under this scheme, an adversary can produce a ciphertext, only if she knows the corresponding plaintext. Such an encryption scheme is called plaintext-aware. Given an ideal hash function, Bellare and Rogaway’s OAEP is plaintext-aware.
RSAES–OAEP uses a label L which is hashed by a hash function H. One may take L as the empty string. Other possibilities are not specified in the PKCS draft. SHA-1 (or SHA-256 or SHA-384 or SHA-512) is the recommended hash function. The hash values (in hex) of the empty string under these hash functions are given in Table 6.4.
| Function | Hash of the empty string |
|---|---|
| SHA-1 | da39a3ee 5e6b4b0d 3255bfef 95601890 afd80709 |
| SHA-256 | e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855 |
| SHA-384 | 38b060a7 51ac9638 4cd9327e b1b1e36a 21fdb711 14be0743 4c0cc7bf 63f6e1da 274edebf e76f65fb d51ad2f1 4898b95b |
| SHA-512 | cf83e135 7eefb8bd f1542850 d66d8007 d620e405 0b5715dc 83f4a921 d36ce9ce 47d0d13c 5d85f2b0 ff8318d2 877eec2f 63b931bd 47417a81 a538327a f927da3e |
The length of the hash output (in octets) is denoted by hLen. For SHA-1, hLen = 20. The RSA modulus n is assumed to be of octet length k. The octet length mLen of the input message M must be ≤ k–2hLen–2. RSAES–OAEP uses a mask-generation function designated as MGF (see Algorithm 6.11 for a recommended realization).
Algorithm 6.3 describes the RSA–OAEP encryption scheme which employs the EME–OAEP encoding scheme described in Algorithm 6.4. The use of a random seed makes the encryption probabilistic. We use the notation ‖ to denote string concatenation and ⊕ to denote bit-wise XOR.
|
Input: The recipient’s public key (n, e), the message M (an octet string of length mLen) and an optional label L whose default value is the empty string. Output: The ciphertext C of octet length k. Steps: /* Check lengths */ if (L is longer than what H can handle) { Return “Error: label too long”. } /* For example, for SHA-1 the input must be of length ≤ 261 – 1 octets. */ if (mLen > k – 2hLen – 2) { Return “Error: message too long”. } /* Encode M to EM (EME–OAEP encoding scheme) */
|
The matching decryption operation is shown in Algorithm 6.5 which calls the EME–OAEP decoding procedure of Algorithm 6.6. The only error message that the decryption and decoding algorithms issue is decryption error. This is to ensure that an adversary cannot distinguish between different kinds of errors, because such an ability of the adversary may lead her to guess partial information about the decryption process and thereby mount a chosen-ciphertext attack.
|
Input: The message M of octet length mLen, the label L. Output: The EME–OAEP encoded message EM. Steps: lHash := H(L). Generate the padding string PS with k – mLen – 2hLen – 2 zero octets. Generate the data block DB := lHash ‖ PS ‖ 01 ‖ M. Let seed := a random string of length hLen octets. Generate the data-block mask dbMask := MGF(seed, k – hLen – 1). Generate the masked data-block maskedDB := DB ⊕ dbMask. Generate mask for seed seedMask := MGF(maskedDB, hLen). Generate the masked seed maskedSeed := seed ⊕ seedMask. Generate the encoded message EM := 00 ‖ maskedSeed ‖ maskedDB. |
|
Input: The recipient’s private key K, the ciphertext C to be decrypted and an optional label L (the default value of which is the null string). Output: The decrypted message M. Steps: if (the length of L is more than the limitation of H) or (the length of C is not k octets)
|
|
Input: The encoded message EM and the label L. Output: The EME–OAEP decoded message M. Steps: lHash := H(L). |
RSASSA–PSS employs the probabilistic signature scheme proposed by Bellare and Rogaway [19]. Under suitable assumptions about the hash function and the mask-generation function, the RSASSA–PSS scheme produces secure signatures which are also tight in the sense that forging RSASSA–PSS signatures is computationally equivalent to inverting RSA.
|
Input: The message M (an octet string) to be signed, the private key K of the signer. Output: The signature S (an octet string of length k). Steps:
|
|
Input: The message M to be encoded (an octet string), the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9. Output: The encoded message EM, an octet string of length emLen := ⌈emBits/8⌉. Steps: if (M is longer than what H can handle) { Return “Error: message too long”. } Generate the hashed message mHash := H(M). if (emLen < hLen + sLen + 2) { Return “Encoding error”. } Let salt := a random string of length sLen octets. Generate the salted message M′ := 00 00 00 00 00 00 00 00 ‖ mHash ‖ salt. Generate the hashed salted message mHash′ := H(M′). Generate the padding string PS with emLen – sLen – hLen – 2 zero octets. Generate the data block DB := PS ‖ 01 ‖ salt. Generate the data block mask dbMask := MGF(mHash′, emLen – hLen – 1). Generate the masted data block maskedDB := DB ⊕ dbMask. Set to 0 the leftmost 8emLen – emBits bits of the leftmost octet of maskedDB. Compute EM := maskedDB ‖ mHash′ ‖ bc. |
RSASSA–PSS signature generation (Algorithm 6.7) uses the EMSA–PSS encoding method (Algorithm 6.8). Verification (Algorithm 6.9) uses the EMSA–PSS decoding method (Algorithm 6.10). We assume that k is the octet length of the RSA modulus n. Let modBits denote the bit length of n. The encoded message is of length emLen = ⌈(modBits – 1)/8⌉ octets. The probabilistic behaviour of the encoding scheme is incorporated by the use of a random salt, the octet length of which is sLen. A hash function H that produces hash values of octet length hLen is employed.
|
Input: The message M, the signature S to be verified and the signer’s public key (n, e). Output: Verification status of the signature. Steps: if (the length of S is not k octets) { Return “Signature not verified”. }
if (status is “consistent”) { Return “Signature verified”. } else { Return “Signature not verified”. } |
|
Input: The message M (an octet string), the encoded message EM (an octet string of length emLen = ⌈emBits/8⌉) and the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9. Output: Decoding status: “consistent” or “inconsistent”. Steps: if (M is longer than what H can handle) { Return “inconsistent”. } |
A mask-generation function (MGF1) is specified in the PKCS #1 draft. It is based on a hash function H. The mask-generation function is deterministic in the sense that its output is completely determined by its input. However, the (provable) security of OAEP and PSS schemes are based on the pseudorandom nature of the output of the mask-generation function. This means that any part of the output should be statistically independent of the other parts. MGF1 derives this pseudorandomness from that of the underlying hash function H.
|
Input: The seed mg f Seed (an octet string) and the desired octet length maskLen of the output mask. One requires maskLen ≤ 232hLen, where hLen is the octet length of the hash function output. Output: An octet string mask of length maskLen. Steps: if (maskLen > 232hLen) { Return “Error: mask too long”. } |
The older encryption scheme RSAES–PKCS1–v1_5 is no longer recommended, since this scheme is not plaintext-aware, that is, with high probability, an adversary can generate ciphertexts without knowing the corresponding plaintexts. This allows the adversary to mount chosen-ciphertext attacks. The new drafts of PKCS #1 include this old scheme for backward compatibility. Encryption and decryption for RSAES–PKCS1–v1_5 are given in Algorithms 6.12 and 6.13. Here, k is the octet length of the modulus.
|
Input: The recipient’s public key (n, e) and the message M (an octet string). Output: The ciphertext C which is an octet string of length k. Steps: if (mLen > k – 11) { Return “Error: message too long”. }
|
|
Input: The recipient’s private key K and the ciphertext C (an octet string). Output: The plaintext message M (an octet string of length ≤ k – 11). Steps: if (the length of the ciphertext is not k octets) { Return “decryption error”. }
Try to decompose EM = 00 ‖ 02 ‖ PS ‖ 00 ‖ M, where PS is an octet string of length ≥ 8 and containing only non-zero octets. if (the above decomposition is unsuccessful) { Return “decryption error”. } |
The older RSA signature scheme RSASSA–PKCS1–v1_5 is not known to have security loopholes. (Nevertheless, the provably secure PSS scheme is recommended for future applications.) RSASSA–PKCS1–v1_5 uses EMSA–PKCS1–v1_5 message encoding procedure (Algorithm 6.16). The signature generation and verification procedures are given in Algorithms 6.14 and 6.15. Here, k denotes the octet length of the modulus n.
The EMSA–PKCS1–v1_5 message encoding procedure (Algorithm 6.16) uses a hash function H. Although a member of the SHA family is recommended for future applications, MD2 and MD5 are also supported for compliance with older application. An octet string hashAlgo is used whose value depends on the underlying hash algorithm and is given in Table 6.5.
| Function | The string hashAlgo |
|---|---|
| MD2 | 30 20 30 0c 06 08 2a 86 48 86 f7 0d 02 02 05 00 04 10 |
| MD5 | 30 20 30 0c 06 08 2a 86 48 86 f7 0d 02 05 05 00 04 10 |
| SHA-1 | 30 21 30 09 06 05 2b 0e 03 02 1a 05 00 04 14 |
| SHA-256 | 30 31 30 0d 06 09 60 86 48 01 65 03 04 02 01 05 00 04 20 |
| SHA-384 | 30 41 30 0d 06 09 60 86 48 01 65 03 04 02 02 05 00 04 30 |
| SHA-512 | 30 51 30 0d 06 09 60 86 48 01 65 03 04 02 03 05 00 04 40 |
|
Input: The signer’s private key K and the message M to be signed (an octet string). Output: The signature S (an octet string of length k). Steps:
|
|
Input: The signer’s public key (n, e), the message M (an octet string) and the signature S to be verified (an octet string of length k). Output: Verification status of the signature. Steps: if (the length of S is not k octets) { Return “Signature not verified”. }
if (EM = EM′) { Return “Signature verified”. } else { Return “Signature not verified”. } |
|
Input: The message M (an octet string), the intended length emLen of the encoded message. One requires emLen ≥ tLen + 11, where tLen is the octet length of hashAlgo plus the octet length of the hash output. Output: The encoded message EM (an octet string of length emLen). Steps: if (M is longer than what H can handle) { Return “Error: message too long”. } |
PKCS #3 describes the Diffie–Hellman key-exchange algorithm. The draft assumes the existence of a central authority which generates the domain parameters that include a prime p of octet length k, an integer g satisfying 0 < g < p and optionally a positive integer l. The integer g need not be a generator of
, but is expected to be of sufficiently large multiplicative order modulo p. The integer l denotes the bit length of the private Diffie–Hellman key of an entity. Values of l ≪ 8k can be chosen for efficiency. However, for maintaining a desired level of security l should not be too small. Since the central authority determines p, g (and l), individual users need not bother about the generation of these parameters.
During a Diffie–Hellman key-exchange interaction of Alice with Bob, Alice performs the steps described in Algorithm 6.17. Bob performs an identical operation which is omitted here.
|
Input: p, g and optionally l. Output: The shared secret SK (an octet string of length k). Steps: Alice generates a random /* If l is specified, one should have 2l–1 ≤ x < 2l. */ Alice computes y := gx (mod p). Alice converts y to an octet string PV := I2OS(y, k). Alice sends the public value PV to Bob. Alice receives Bob’s public value PV′. Alice converts PV′ to the integer y′ := OS2I(PV′). Alice computes z := (y′)x (mod p) (with 0 < z < p). Alice transforms z to the shared secret SK := I2OS(z, k). |
In this chapter, we describe some standards for representation of cryptographic data in various formats and for conversion of data among different formats. We also present some standard encoding and decoding schemes that are applied before encryption and after decryption. These standards promote easy and unambiguous interfaces with the cryptographic primitives described in the previous chapter.
The IEEE P1363 range of standards defines several data types: bit strings, octet strings, integers, prime finite fields, finite fields of characteristic 2, extension fields of odd characteristics, elliptic curves, elliptic curve points and polynomial rings. The IEEE drafts also prescribe standard ways of converting data among these formats. For example, the primitive BS2OS converts a bit string to an octet string, the primitive FE2I converts a finite-field element to an integer.
We subsequently mention some of the public-key cryptography standards (PKCS) propounded by RSA Laboratories. Draft PKCS #1 deals with RSA encryption and signature. In addition to the standard RSA moduli of the form pq, it also suggests possibility of using multi-prime RSA, that is, moduli which are products of more than two (distinct) primes. The draft recommends use of the optimal asymmetric encryption procedure (OAEP). This probabilistic encryption scheme provides provable security against chosen-ciphertext attacks. A probabilistic signature scheme is also advocated for use. These probabilistic schemes call for using a mask-generation function (MGF). A concrete realization of an MGF is also provided. Draft PKCS #3 standardizes the Diffie–Hellman key exchange algorithm.
The P1363 class of preliminary drafts [134] published by IEEE and the PKC standards [254] from RSA Security Inc. are available for free download from Internet sites. However, IEEE’s published standard 1363-2000 is to be purchased against a fee. In addition to the data types and data conversion primitives described in this chapter, the IEEE drafts (P1363, P1363a, P1363.1 and P1363.2) provide encryption/decryption and signature generation/verification primitives and also several encryption and signature schemes based on these primitives. These schemes are very similar to the algorithms that we described in Chapter 5. So we avoided repetition of the same descriptions here. Elaborate encoding procedures are described in the PKCS drafts, but for only RSA-and Diffie–Hellman-based systems. We have reproduced the details in this chapter. The remaining PKCS drafts deal with topics that this book does not directly deal with. A good exception is PKCS #13 that talks about elliptic curve cryptography. This draft is not ready yet; when it is, it may be consulted to learn about the RSA Laboratories’ standards on elliptic-curve cryptography.
At present, the different families of standards do not seem to have mutually conflicting specifications. The IEEE has a (free) mailing list for promoting the development and improvement of the IEEE P1363 standards, via e-mail discussions.
Other Internet Standards include the Federal Information Processing Standards or FIPS [221] from NIST, and RFCs (Request for Comments) from the Internet Engineering Task Force or (IETF) [135].
| 7.1 | Introduction |
| 7.2 | Side Channel Attacks |
| 7.3 | Backdoor Attacks |
| Chapter Summary | |
| Sugestions for Further Reading |
A man cannot be too careful in the choice of his enemies.
—Oscar Wilde (1854–1900), The Picture of Dorian Gray, 1891
If you reveal your secrets to the wind you should not blame the wind for revealing them to the trees.
—Kahlil Gibran (1883–1931)
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.
—Charles Antony Richard Hoare
The security of public-key cryptographic protocols is based on the apparent intractability of solving some computational problems. If one can factor large integers efficiently, one breaks RSA. In that sense, seeking for good algorithms to solve these problems (like factoring integers) is part of cryptanalysis. Proving that no poly-time algorithm can break RSA enhances the status of the security of the protocol from assumed to provable. On the other hand, developing a poly-time algorithm for breaking RSA (or for factoring integers) makes RSA (and many other protocols) unusable. A temporary set-back to our existing cryptographic tools as it is, it enriches our understanding of the computational problems. In short, breaking the trapdoors of public-key cryptosystems is of both theoretical and practical significance.
But research along these mathematical lines is open-ended. A desperate cryptanalyst may not wait indefinitely for a theoretical negotiation. She tries to find loopholes in the systems, that she can effectively exploit to gain secret information.
A cryptographic protocol must be implemented (in software or hardware) before it can be used. Careless implementations often supply the loopholes that cryptanalysts wait for. For example, a software implementation of a public-key system may allow the private key to be read only from a secure device (a removable medium, like CDROM), but may make copies of the key in the memory of the machine where the decryption routine is executed. If the decryption routine does not lock and eventually flush the memory holding the key, a second user having access to the machine can simply read off the secrets.
Software and hardware implementations often tend to leak out secrets at a level much more subtle than the example just mentioned. A public-key algorithm is a known algorithm and involves a sequence of well-defined steps dictated by the private key. Each step requires its private share of execution time and power consumption. Watching the decrypting device carefully during a private-key operation may reveal information about the exact sequence of basic steps in the algorithm. Random hardware faults during a private-key operation may also compromise security. Such attacks are commonly dubbed as side-channel attacks.
Let us now look at another line of attack. Every user of cryptography is not expected to implement all the routines she uses. On the contrary, most users run precompiled programs available from third parties. How will a user assess the soundness of the products she is using, that is, who will guarantee that there are no (intentional or unintentional) security snags in the products? The key generation software available from a malicious software designer may initiate a clandestine e-mail every time a key pair is generated. It is also possible that a private key supplied by such a program is generated from a small predefined set known to the designer. Even when private keys look random, they need not come with the desired unpredictability necessary for cryptographic usage. Such attacks during key generation are called backdoor attacks.
In short, public-key cryptanalysis at present encompasses trapdoors, backdoors and side channels. The trapdoor methods have already been discussed in Chapter 4. In this chapter, we concentrate on the other attacks on public-key systems.
Side-channel attacks refer to a class of cryptanalytic tools for determining a private key by measuring signals (like timing, power fluctuation, electromagnetic radiation) from or by inducing faults in the device performing operations involving the private key. In this section, we describe three methods of side-channel cryptanalysis: timing attack, power attack and fault attack.
Paul C. Kocher introduced the concept of side-channel cryptanalysis in his seminal paper [155] on timing attacks. Though not unreasonable, timing attacks are somewhat difficult to mount in practice.
The private-key operation in many cryptographic systems (like RSA or discrete-log-based systems) is usually a modular exponentiation of the form
y := xd (mod n),
where d is the private key. The private-key procedure may involve other overheads (like message decoding), but the running time of the routine is usually dominated by and so can be approximated by the time of the modular exponentiation.
Assume that this exponentiation is carried out by a square-and-multiply algorithm known to Carol, the attacker. For example, suppose that Algorithm 3.9 is used. Each iteration of the for loop involves a modular squaring followed conditionally by a modular multiplication. The multiplication is done in an iteration if and only if the corresponding bit ei in the exponent is 1. Thus, an iteration runs slower if ei = 1 than if ei = 0. If Carol could measure the timing of each individual iteration of the for loop, she would correctly guess most (if not all) of the bits in the exponent. But it is unreasonable to assume that an attacker can collect such detailed timing data. Moreover, if Algorithm 3.10 is used, these detailed data do not help much, because in this case the timing of an individual iteration of the for loop can at best differentiate between the two cases ei = 0 and ei ≠ 0. There are 2t – 1 non-zero values for each ei.
However, it is not difficult to think of a situation where the attacker can measure, to a reasonable accuracy, the total time of the exponentiation. In order to guess d, Carol requires the times of the modular exponentiations for several different values of x, say x1, . . . , xk, all known to her. (Note that xi may be messages to be signed or intercepted ciphertexts.) The same exponent d is used for all these exponentiations. Let Ti be the time for computing
(mod n), as measured by Carol. We may assume that all these k exponentiations are carried out on the same machine using the same routine.
Kocher considers the attack on the exponentiation routine of RSAREF, a cryptography toolkit available from the RSA Laboratories. This routine implements Algorithm 3.10 with t = 2. For the sake of convenience, the algorithm is reproduced below. We may assume that the exponent has an even number of bits—if not, pad a leading zero.
|
Input: Output: y := xd (mod n). Steps: (1) z1 := x. |
Every step of the above algorithm runs in a time dependent on the operands. For example, the modular multiplication in Step (9) takes time dependent on the operands y and z(d2j+1d2j)2. The variation in the timing depends on the implementation of the modular arithmetic routines and also on the machine’s architecture. However, we make the assumption that for fixed operands each step requires a constant time on a given machine (or on identical machines). This is actually a loss of generality, since the running time of a complex step (like modular multiplication or squaring) for fixed operands may vary for various reasons like process scheduling, availability of cache, page faults and so on. It may be difficult, perhaps impossible, for an attacker to arrange for herself a verbatim emulation of the victim’s machine at the time when the latter performed the private-key operations. Let us still proceed with our assumption, say by conceiving of a not-so-unreasonable situation where the effects of these other factors are not sizable enough.
We use the subscript i to denote the i-th private-key operation for 1 ≤ i ≤ k. The entire routine takes time Ti for the i-th exponentiation, that is, for the input xi. This measurement may involve some (unknown) error which we denote by ei. The first four steps are executed only once during each call and take a total time of pi (precomputation time). The for loop is executed l times. We ignore the time needed to maintain the loop (like decrementing j) and also the time taken by the if statement in Step (8). Let si,j and ti,j be the times taken respectively by Steps (6) and (7), when the loop variable (j) assumes the value j. If Step (9) is executed, we denote by mi,j the time taken by this step, else we set mi,j := 0. It follows that
Equation 7.1

where the index in the sum decreases from l – 1 to 0 in steps of 1. Carol does not know this break-up (that is, the explicit values of ei, si,j, ti,j and mi,j), but she can make an inductive guess in the following way.
Carol manages a machine and a copy of the exponentiation software both identical to those of the victim. She then successively guesses the secret bit pairs d2l–1d2l–2, d2l–3d2l–4, d2l–5d2l–6 and so on. Assume that at some stage Carol has correctly determined the exponent bits d2j+1d2j for j = l–1, l–2, . . . , j′+1. Initially j′ = l–1. Using this information Carol computes d2j′ +1d2j′ as follows. Carol’s knowledge at this stage allows her to measure pi and si,j, ti,j, mi,j for j = l – 1, . . . , j′ + 1 — she simply runs Algorithm 7.1 on xi. Carol then enters the loop with j = j′. The squaring operations are unconditional. Carol has the exact operands as the victim for the squaring steps. So Carol also measures si,j′ and ti,j′.
The bit pair d2j′+1d2j′ (considered as a binary integer) can take any one of the four values g = 0, 1, 2, 3. Carol measures the time
of Step (9) for each of the four choices of g and adds this time to the time taken by the algorithm so far, in order to obtain:
Equation 7.2

Kocher observed that the distribution of Ti, i = 1, . . . , k, is statistically related to that of
only for the correct guess g. In order to see how, we subtract Equation (7.2) from Equation (7.1) to get:
Equation 7.3

Let us assume that the error term ei is distributed like a random variable E. Similarly suppose that each multiplication (resp. squaring) has the distribution of a random variable M (resp. S). Taking the variance of Equation (7.3) over the values i = 1, 2, . . . , k and assuming that the sample size k is so large that the sample variances are very close to the variances of the respective random variables, we obtain:
Equation 7.4

where λ denotes the number of times Step (9) is executed for j = j′ – 1, . . . , 0. Note that λ is dependent on the private key and not on the arguments to the exponentiation routine. For the correct guess g, we have
and so

On the other hand, for an incorrect guess g we have:

if one of mi,j′ or
is zero, or

if both mi,j′ and
are non-zero. (Recall that Var(αX + βY) = α2 Var(X) + β2 Var(Y) for any real α, β.)
Calculation of the sample variances of
for the four choices of g gives Carol a handle to determine (or guess) the correct choice. Carol simply takes the g for which the variance is minimum. This is the fundamental observation that makes the timing attack work.
Of course, statistical irregularities exist in practice, and the approximation of the actual variances by the sample variances introduces errors in Equation (7.4). These errors are of particular concern for large values of j′, that is, during the beginning of the attack. However, if an incorrect guess is made at a certain stage, this is detected soon with high probability, as Carol proceeds further. Suppose that an erroneous guess of d2j″ + 1d2j″ has been made for some j″ > j′. This means that the values of y are different from the actual values starting from the iteration of the loop with j = j″ – 1. (We may assume that most, if not all, xi ≠ 1.) We then do not have a cancellation of the timings for j = j″ – 1, . . . , j′. More correctly, if the guesses for j = l – 1, . . . , j″ + 1 are correct and the first error occurs at j = j″, then denoting the subsequent timings by
one gets
Equation 7.5

Since each of the square and multiplication operations takes y as an operand, the original timings and the measured timings (the ones with hat) behave like independent variables and, therefore, taking the variance of Equation (7.5) yields

for some λ′ depending on the private key and on the previous guesses, but independent of the current guess g. In other words, Carol loses a meaningful relation of Var
with the correctness of the current guess. Once Carol notices this, she backtracks and changes older guesses until the expected behaviour is restored. Thus, the timing attack comes with an error detection and correction strategy.
An analysis done by Kocher (neglecting E and assuming normal distributions for S and M) shows that Carol needs k = O(l) for a good probability of success.
There are several ways in which timing attacks can be prevented.
If every multiplication step takes exactly the same time and so does every squaring step, the above timing attack does not work. Thus, forcing each multiplication and each squaring take the same respective times independent of their operands disallows Carol to mount the timing attack. Making mi,j constant alone does not suffice, for difference in square timings can be exploited in subsequent iterations to correct a guess. Forcing every operation take exactly the same time as the slowest possibility makes the implementation run slower. Moreover, finding the slowest possibility may be difficult.
Interleaving random delays also makes timing attacks difficult to mount, because the attacker then requires more number of samples in order to smooth out the effect of the delays. But again adding delays harms performance and does not completely rule out the possibility of timing attacks.
Perhaps the best strategy to thwart timing attacks is to use a random pair (u, v) with v := u–d (mod n) for each private-key operation. Initially x is multiplied by u and then the product ux is exponentiated to get udxd ≡ v–1y (mod n). Multiplication by v then yields the desired y. A new random pair (u, v) must be used for every exponentiation. However, the exponentiation v := u–d (mod n) is too costly to be performed during every private-key operation and may itself invite timing attacks. A good trade-off is to choose (u, v) once, keep it secret and for the next private-key operation update (and replace) the old (u, v) by (u′, v′) with u′ ≡ ue (mod n) and v′ ≡ ve (mod n) for some small e (random or deterministic). The choice e = 2 is quite satisfactory in practice—performing two modular squares is much cheaper than computing the full exponentiation v := u–d (mod n).
In connection with timing attacks, we mentioned that if an adversary were able to measure the timing of each iteration of the square-and-multiply loop during an RSA (or discrete-log-based) private-key exponentiation, she could guess the bits in the key quite efficiently from only a few timing measurements. But it is questionable if such detailed timing data can be made available.
Now, think of a situation where Carol can measure patterns of power consumption made by the decrypting (or signing) device during one or more private-key operations with Alice’s private key. If Alice carries out the private-key operations in her personal workstation, it is difficult for Carol to conduct such measurements. So assume that Alice is using a smart card with a device to which Carol has a control. Carol inserts a small resistor in series with the line which drives Alice’s smart card. The power consumed by the smart-card circuit is roughly proportional to the current through the resistor. Measuring the voltage across the resistor (and multiplying by a suitable factor) Carol can observe the power consumed by Alice’s decryption device. Carol has to use a power measuring device that takes readings at a high frequency (100 MHz to several GHz depending on the budget of Carol). A set of power measurements obtained during a cryptographic operation is called a power trace. We now study how power traces can reveal Alice’s secrets.
The individual steps in a private-key operation may be nakedly exposed in a power trace. This is, in particular, the case when different steps consume different amounts of power and/or take different times. Obtaining information about the operation of the decrypting device and/or the secrets by a direct interpretation of power traces is referred to as simple power analysis or SPA in short.
As an example of SPA, consider an implementation of RSA exponentiation using the naive square-and-multiply Algorithm 3.9. Here, the most power-consuming operations are modular squaring and modular multiplication. Modular multiplication typically runs slower than modular squaring. Also modular multiplication requires two different operands to fetch from the memory, whereas modular squaring requires only one operand. Thus, a multiplication operation has more and longer power requirements than a squaring operation.
A hypothetical[1] SPA trace during a portion of an RSA private-key operation is shown in Figure 7.1. Each spike in the trace corresponds to either a square or a multiplication operation. Let us assume that the power consumption is measured with sufficient resolution, so that no spike is missed. Since multiplication runs longer (and requires more operands) than squaring, multiplication spikes are wider than squaring spikes.
[1] SPA traces from real-life experiments on smart cards, as reported in several references, look similar to this. We, however, generated the trace using a random number generator. Absolute conformity to reality is not always crucial for the purposes of illustration.

Let us denote a squaring operation by S and a multiplication operation by M. We observe that Alice’s smart card performs the sequence
SMSMSSMSSSSMSSSMSS
of operations during the measurement interval shown. Since multiplication in an iteration of the loop is skipped if and only if the corresponding bit in the exponent is zero, we can group the operations as
(SM)(SM)(S)(SM)(S)(S)(S)(SM)(S)(S)(SM)(S)(S.
This, in turn, reveals the bit string 110100010010 in Alice’s private key.
Effective as it appears, SPA, in practice, does not pose a huge threat to the security of conventional cryptographic systems. Using algorithms for which power traces do not bear direct relationships with the bits of the private key largely reduces risks of fruitful SPA. The inefficient repeated square-and-multiply Algorithm 7.2 always performs a multiplication after squaring and thereby eliminates chances of a successful SPA.
|
Input: Output: y := xd (mod n). Steps: y := 1. |
Using the (more efficient) Algorithm 7.1 also frustrates SPA. Some chunks of two successive 0 bits are anyway revealed by power traces collected during the execution of this algorithm. But, for a decently large and random private key, this still leaves Carol with many unknown bits to be guessed. Note, however, that neither of the three remedies suggested to thwart the timing attack on Algorithm 7.1 seems to be effective in the context of SPA. Delays normally do not consume much power (unless some power-intensive dummy computations fill up the delays). Also, the masking of (x, y) by (u, v) fails to produce any alteration in the power consumption pattern during exponentiation.
If some private-key algorithm has unavoidable branchings due to individual bits in the private key, SPA can prove to be a notorious botheration.
A carefully designed algorithm (like Algorithm 7.2) does not reveal key information from a simple observation of power traces. Moreover, the observed power traces may be corrupted by noise to an extent where SPA is not feasible. In such cases, differential power analysis (DPA) often helps the cryptanalyst reduce the effects of noise and exploit subtle correlation of power consumption patterns with specific bits in the operands. DPA requires availability of power traces from several private-key operations with the same key.
Consider the SPA-resistant Algorithm 7.2. Suppose that k power traces P1(t), . . . , Pk(t) for the computations of
(mod n), i = 1, . . . , k, are available to Carol, that the ciphertexts x1, . . . , xk are known to Carol and that d = (dl–1 · · · d1d0)2. Carol successively guesses the bits dl–1, dl–2, dl–3, . . . of the exponent. Suppose that Carol has correctly guessed dj for j = l – 1, . . . , j′ + 1. She now uses DPA to guess dj′.
Let e := (dl–1dl–2 · · · dj′ + 1)2. At the beginning of the for loop with j = j′ the variable y holds the value xe modulo n. The loop computes x2e and x2e+1 and assigns y the appropriate value. If dj′ = 0, then in the next iteration the loop computes x4e and x4e+1, whereas if dj′ = 1, then in the next iteration the loop computes x4e+2 and x4e+3. It follows that the algorithm handles the value x4e if and only if dj′ = 0.
For each i = 1, . . . , k, Carol computes
(mod n). Carol then chooses a particular bit position (say, the least significant bit) and considers the bit bi of zi at this position. We make the assumption that there is some subsequent step (or substep) in the implementation for which the average power consumption Π0 for b = 0 is different from the average power consumption Π1 for b = 1.[2]
[2] The exact step which exhibits differential bias toward an individual bit value is dependent on the implementation. If the implementation does not provide such a step, the attack cannot be mounted in this way. Initially, the DPA was proposed for DES, a symmetric encryption algorithm, in which such a dependence is clearly available. With asymmetric-key encryption, such a strong dependence of the power, consumed by a step, on an individual bit value is not obvious. One may, however, use other dividing criteria, like low versus high Hamming weight (that is, number of one-bits) in the operand, which bear more direct relationships with power consumption.
Carol partitions {1, . . . , k} into two subsets:
| I0 | := | {i | bi = 0}, |
| I1 | := | {i | bi = 1}. |
Carol computes the average power traces
and
and subsequently the differential power trace

First, let dj′ = 0. In this case, the routine handles
and so the power consumption at some time τ is correlated to the bit bi of
. At any other instant, the power consumption is uncorrelated to this particular bit value. Therefore, if the sample size is sufficiently large and if the measurement noise has mean at zero, we have:

On the other hand, if dj′ = 1, the value
never appears in the execution of the algorithm and so at every time t the power consumption is uncorrelated to the particular bit of
and so we expect
| Δ(t) ≈ 0 | for all t. |
Figure 7.2 illustrates the two cases.[3] If the differential power trace has a distinct spike, the guess dj′ = 0 is correct. So by observing the existence or otherwise of a spike, Carol determines whether dj′ = 0 or dj′ = 1.
[3] Once again, these are hypothetical traces obtained by random number generators.
(a) for the correct guess
(b) for an incorrect guess

The number k of samples required for a good probability of success depends on the bias Π1–Π0 relative to the measurement noise. We assume that
. If the noise has a variance of σ2, then by the central limit theorem the noise in each average power trace
or
has at each t an approximate variance 2σ2/k, and so in the differential power trace Δ(t) the noise has an approximate variance 4σ2/k. In order that the bias Π1 –Π0 stands out against the noise, we require
, say,
, that is, k ≥ 64σ2/(Π1 – Π0)2.
Several countermeasures can be adopted to prevent DPA, both in the software level and in the hardware level.
Interleaving random delays between instructions destroys the alignment of the time τ in different power traces. Using a clock with randomly varying tick-rate has a similar effect. The delays should be such that they cannot be easily analyzed and subsequently removed. Random delays increase the number of samples required for a successful DPA to an infeasible value.
Suitable implementations of the power-critical steps destroy the power consumption signature of these steps. For example, one may go for an implementation that exhibits a constant power consumption pattern irrespective of the operands. Another possibility is replacement of complex critical instructions by atomic instructions (like assembly instructions) for which the dependence of power consumption on operands is less or difficult to analyze. However, the assumption that one can measure power at any resolution (perhaps at infinite resolution, say, using an analog device) indicates that this countermeasure challenges only the attacker’s budget.
Masking (x, y) by multiplying with (u, v) (as we did to prevent timing attacks) also eliminates chances of mounting successful DPA. One has to use a fresh mask for each private-key operation. Random unknown masks destroy the correlation of the bit values bi with power consumption. That is, the chosen bit bi of
behaves randomly in relation to the same bit of (uixi)4e and so the differential power trace no longer leaks the bias Π1 – Π0.
Another strategy to foil DPA is to use randomization in the private exponent d. Instead of computing y := xd (mod n), one chooses a small random integer r (typically of bit size ≤ 20) and computes y := xd+rh (mod n), where h is φ(n) for RSA or the order of the discrete-log (sub)group. Since d = O(h) typically, the performance of the exponentiation routine does not deteriorate much. But random values of r during different private-key operations change the exponent bits in an unpredictable manner.
Quick changes in the exponent (the private key, that is, the key pair) also prevent the attacker to gather sufficiently many power traces for mounting a successful DPA. A key-use counter can be employed for this purpose. Whenever a given private key has been used on a small predetermined number of occasions, the key pair is updated.
Hardware shielding of the decrypting device also reduces DPA possibilities. For example, in-chip buffers between the external power source and the chip processor have been proposed to mask off the variation of internal power from external measurements. Such hardware countermeasures are, in general, somewhat costlier than software countermeasures.
Paul Kocher asserts: DPA highlights the need for people who design algorithms, protocols, software, and hardware to work closely together when producing security products.
We finally come to the third genre of side-channel cryptanalysis. We investigate how hardware faults occurring during private-key operations can reveal the secret to an adversary. There are situations where a single fault suffices. Boneh et al. [30] classify hardware faults into three broad categories.
Transient faults These are faults caused by random (unpredictable) hardware malfunctioning. These may be the outcomes of occasional flips of bit values in registers or of temporary erroneous outputs from logic or arithmetic circuits in the processor. These faults are called transient, because they are not repeated. It is rather difficult to detect such (silent) faults.
Latent faults These are faults generated by some permanent malfunctioning and/or bugs inherent in the processor. For example, the floating-point bug in the early releases of the Pentium processor may lead to latent faults. Latent faults are permanent, that is, repeated, but may be difficult to locate in practice.
Induced faults An induced fault is deliberately caused by an adversary. For example, a short surge of electromagnetic radiation may cause a smart card to malfunction temporarily. A malicious adversary can induce such temporary hardware faults to extract secret information from the smart card. It is, however, difficult to induce deliberate faults in a remote workstation.
Although induced faults appear to be the ones to guard against most seriously, the other two types of faults are also of relevance. Consider a certifying authority signing many messages. Transient and/or unknown latent faults may reveal the authority’s private key to a user who can later utilize this knowledge to produce false certificates.
Consider the implementation of RSA private-key operation based on the CRT combination of the values obtained by exponentiation modulo the prime divisors p and q of the modulus n (Algorithm 5.4). Suppose that m is a message to be signed and s := md (mod n) the corresponding signature, where d is the signer’s private key. The CRT-based implementation computes s1 := s (mod p) and s2 := s (mod q). Assume that due to hardware fault(s) exactly one of s1 and s2 is wrongly computed. Say, s1 is incorrectly computed as
. The corresponding faulty signature is denoted by
. We assume that the CRT combination of
and s2 is correctly computed.
An adversary requires the faulty signature
and the correct signature s on the same message m in order to obtain the factor q of n. To see how, note that
(mod p), s ≡ s1 (mod p) and
(mod p), so that
(mod p), that is,
. On the other hand,
(mod q), that is,
. Therefore,

This is how the fault analysis of Boneh et al. [30] works.
Arjen K. Lenstra et al. [142] point out that the knowledge of the faulty signature
alone reveals the secret divisor q, that is, one does not require the genuine signature s on m. The verification key e of the signer is publicly known. Since RSA exponentiation is bijective,
(mod n). However,
(mod q), and so
(mod p). It follows that

Now, consider an implementation of RSA decryption based on a single exponentiation modulo n. For such an implementation, several models of fault attacks have been proposed. These attacks are less practical than the attack on CRT-based RSA just mentioned, because now one requires several faulty signatures in order to deduce the entire private key. Here, we present an attack due to Bao et al. [17].
As usual, the RSA modulus is n = pq and the signer’s key pair is (e, d). Consider a valid signature s on a message m. Let d = (dl–1 · · · d1d0)2 be the binary representation of the private key. Consider the powers:
| si ≡ m2i (mod n) | for i = 0, 1, . . . , l – 1. |
The signature s can be written as:

We assume that the attacker knows m and s and hence can compute si and
modulo n for i = 0, . . . , l – 1. There is no harm in assuming that the message m is randomly chosen. (We may assume that randomly chosen integers are invertible modulo n, because encountering a non-invertible non-zero integer by chance is a stroke of unimaginable good luck and is tantamount to knowing the factors of n.)
In order to guess a bit of d, the attacker induces a fault in exactly one of the bits dj, changing it from dj to
. The position j is random, that is, not under the control of the attacker. Now, the algorithm outputs the faulty signature

and so

A repetition in the values sl–1, . . . , s0,
, . . . ,
modulo n is again an incident of minuscule probability. Hence the attacker can uniquely identify the bit position j and the bit value dj in d by comparing
with these 2l values.
Statistical analysis implies that the attacker needs to repeat this procedure about l log l times (on same or different (m, s) pairs) in order to ensure that the probability of identifying all the bits of d is at least 1/2.
Recall from Algorithm 5.34 that the Rabin signature algorithm uses CRT to combine s1 (mod p) and s2 (mod q). Thus, the attack on CRT-based RSA, described earlier, is applicable mutatis mutandis to the Rabin signature scheme. The computation of the square roots s1 and s2 demands the major portion of the running time of the routine. Inducing a fault during the execution is, therefore, expected to affect exactly one of s1 and s2, as desired by the attacker.
Bao et al. [17] propose a fault attack on the digital signature algorithm (DSA). We work with the notations of Algorithm 5.43 and Algorithm 5.44, except that, for maintaining uniformity in this section, we use m (instead of M) to denote the message to be signed. The (public) parameters are p, a prime divisor r of p – 1 of length 160 bits and an element
of multiplicative order r. The signer’s DSA key pair is (d, gd(mod p)) with 1 < d < r.
Suppose that during the generation of a DSA signature, an attacker induces a fault in exactly one bit position of d changing it to
. The routine generates the faulty signature
, where

(d′, gd′) being the session key pair (not mutilated). As in the DSA signature-verification scheme, the attacker computes the following:

For each i = 0, . . . , l – 1 (where the bit length of d is l), the attacker also computes

Assume that the j-th bit dj of d is altered. If dj = 0,
and so

On the other hand, if dj = 1, then
and a similar calculation shows that

Thus, the attacker computes
and
for all j = 0, . . . , l – 1 and notices a unique match (with s). This discloses the position j and the corresponding bit dj.
A fault attack similar to that on the DSA scheme can be mounted on the ElGamal signature scheme. We here propose an alternative method proposed by Zheng and Matsumoto [315]. The novelty in their approach is that it performs the cryptanalysis of the ElGamal signature scheme by inducing fault on the pseudorandom bit generator of the signer’s smart card.
Algorithms 5.36 and 5.37 describe the ElGamal signature scheme on a general cyclic group G. Here, we restrict our attention to the specific group
(though the following exposition works perfectly well for a general G). The parameters are a prime modulus p and a generator g of
. The signer’s key-pair is (d, gd(mod p)) for some d, 2 ≤ d ≤ p – 2.
In order to generate a signature (s, t) on a message m, a random session key d′ is generated and subsequently the following computations are carried out:
| s | ≡ | gd′ (mod p), |
| t | ≡ | d′–1(H(m) – dH(s)) (mod p – 1). |
Zheng and Matsumoto attack the generation of the session key d′. They propose the possibility that an abnormal physical stress (like low voltage) forces a constant output d0 for d′ from the pseudorandombit generator (software or hardware) in the smart card. First, assume that this particular value d0 is known a priori to the attacker. She then lets a message m generate a signature (s, t) with the session secret d0. The private key d is then immediately available from the equation:
d ≡ H(s)–1(H(m) – d0t) (mod p – 1).
Here, we assume that H(s) is invertible modulo p – 1.
If d0 is not known a priori, the attacker generates two signatures (s1, t1) and (s2, t2) on messages m1 and m2 respectively. Since d′ is always d0, we have s1 = s2 = s0, say. One can then easily calculate
d0 ≡ (t1 – t2)–1(H(m1) – H(m2)) (mod p – 1),
which, in turn, yields
d ≡ H(s0)–1(H(m1) – d0t1) (mod p – 1).
Let us conclude our repertoire of fault attack examples by explaining an attack on the FFS zero-knowledge identification protocol. This attack is again from Boneh et al. [30].
We use the notations of Algorithm 5.69. A modulus n = pq, p,
, is first chosen (by Alice or by a trusted third party). Alice selects random x1, . . . ,
and random bits δ1, . . . , δt, computes
(mod n), publishes (y1, . . . , yt) and keeps (x1, . . . , xt) secret.
During an identification session with Bob, Alice generates a random commitment
and sends to Bob the witness w := c2 (mod n). (For simplicity, we take γ of Algorithm 5.69 to be 0.) When Alice is waiting for a challenge from Bob, a fault occurs in her smart card changing the commitment c to c + E. Assume that the fault is at exactly one bit position, that is, E = ±2j for some
, l being the bit length of c (or of n). This fault may be purposely induced by Bob with the malicious intention of guessing Alice’s secret (x1, . . . , xt).
Bob then generates a random challenge
as usual. Upon reception of this challenge Alice computes and sends to Bob the faulty response

The knowledge of
now aids Bob to obtain the product
as follows. First, note that

so that
![]() | for some . |
There are only 4l possible values of (E, δ). Bob tries all these possibilities one by one. To simplify matters we assume that only one value of (E, δ) with E of the special form ±2j and with
satisfies the last congruence. In practice, the existence of two (or more) solutions for (E, δ) is an extremely improbable phenomenon. For a guess of (E, δ), the commitment c can be computed as

The correctness of the guess (E, δ) can be verified from the relation w ≡ c2 (mod n). Bob can now compute the desired product

In order to strengthen the confidence about the correctness of T, Bob may repeat the protocol once more with the same values of ∊1, . . . , ∊t, but under normal conditions (that is, without faults). This time he obtains w′ ≡ (c′)2 (mod n) and r′ ≡ c′T (mod n), which together give (r′)2 ≡ w′T2 (mod n), a relation that proves the correctness of T.
Bob repeats the above procedure t times in order to generate the system:
Equation 7.6

Here, ∊ki and Tk are known to Bob. Moreover, the exponents ∊ki can be so selected that the matrix (∊ki) is invertible modulo 2. In order to determine x1, Bob tries to find
satisfying

for some integers v1, . . . , vt. Comparing the coefficients gives the linear system

which can be solved for u1, . . . , ut, since the matrix (∊ki) is invertible modulo 2. The solution gives v1, . . . , vt and hence

Similarly, x2, . . . , xt can be determined up to sign. Plugging in these values of xi in System (7.6) and solving another linear system modulo 2 gives the exact signs of all xi.
Notice that Bob could have selected ∊ki = δki (where δ is the Dirac delta). For this choice, System (7.6) immediately gives x1, . . . , xt. But, in practice, Alice may disagree to respond to such simplistic challenges. Moreover, Bob must not raise any suspicion about a possible malpractice. For a general choice, all Bob has to do additionally is a little amount of simple linear algebra. The parameter t is rather small (typically less than 20); so this extra effort is of little concern to Bob.
Fault analysis could be a serious threat, especially to smart-card users and certification authorities. We mention here some precautions to guard against such attacks. Some of these work for a general kind of fault attack, the others are specific to the algorithms they plan to protect.
One obvious general strategy is to perform the private-key operation twice and compare the results from the two executions. If the two results disagree, a fault must have taken place. It is then necessary to restart the computation from the beginning. This strategy slows down the implementation by a factor of two. Moreover, latent (permanent) faults cannot be detected by this method—the same error creeps in during every run.
It is sometimes easier to verify the correctness of the output by performing the reverse operation. For instance, after an RSA signature s ≡ md (mod n) is generated, one can check whether m ≡ se (mod n). If so, one can be reasonably confident about the correctness of s. If the RSA encryption exponent is small (like 3 or 257), this verification is quite efficient.
Ad hoc algorithm-specific tricks often offer effective and efficient checks for errors. Shamir [268] proposes the following check for CRT-based RSA signature generation. One chooses a small random prime r (say, of length ~ 32 bits) and computes s1 ≡ md (mod pr) and s2 ≡ md (mod qr). If s1 ≢ s2 (mod r), then one or both of the exponentiations went wrongly. If, on the other hand, s1 ≡ s2 (mod r), then s1 (mod p) and s2 (mod q) are combined by the CRT.
Maintaining extraneous error-checking data can guard against random bit flips. Parity check bits can detect the existence of single bit flips. Retaining a verbatim copy of a secret information d and comparison of the two copies at strategic instants can help detect more serious faults. It appears unlikely that both the copies can be affected by faults in exactly the same way. For discrete-log-based systems, maintaining d–1 in tandem with d appears to be a sound approach. Since the bits of d–1 are not in direct relationship with those of d, an attack on d cannot easily produce the relevant changes in d–1. As an example, consider the attack on DSA effected by toggling a bit of the secret key d. The second part of the signature can be generated in two ways: by computing t1 ≡ d′–1(H(m) + ds) (mod r) using d, and by computing t2 ≡ d′–1(d–1)–1(d–1H(m) + s) (mod r) using d–1. If t1 ≡ t2 (mod r), we can be pretty confident that this common value is the correct signature.
Appending random strings to the messages being signed also prevents timing attacks. Such random strings are not known to the adversary and cannot be easily recovered by the verification routine on a faulty signature. Also in this case the signer signs different strings on different occasions, even when the message remains the same.
Hardware countermeasures can also be adopted. Adequately shielded cards resist induced faults. In a situation described by Zheng and Matsumoto, the card should refuse to work instead of generating constant random bits. In the scenario of fault analysis, it, however, appears that robustness can be implanted easily at the software level. At any rate, sloppy hardware designs are never advocated.
| 7.1 | Consider the notations of Section 7.2.1. Assume that mi,j is constant for all i, j (and irrespective of d2j+1d2j), but the square times si,j and ti,j vary according to their operands. Device a timing attack on such a system. |
| 7.2 | Show that under reasonable assumptions the SPA-resistant Algorithm 7.2 can be crypt-analyzed by timing attacks. |
| 7.3 | Recall that SPA of Algorithm 7.1 may leak partial information on the private key (some 00 sequences in the key). Rewrite the algorithm to prevent this leakage. |
| 7.4 | Assume that in Bao et al.’s attack on RSA described in the text, the attacker can induce faults in exactly two bit positions of d. Suggest how the two bits of d at these positions can be revealed from the resulting faulty signature. |
| 7.5 | Consider a variant of the Bao et al.’s attack on RSA described in the text, in which the valid signature s on m is unknown to the attacker. Explain how the position j of the erroneous bit and the bit dj at this position can still be identified. [H] |
| 7.6 | Bao et al. [17] propose an alternate fault analysis on RSA with square-and-multiply exponentiation. Use the notations (n, e, d, m, s, si) as in the text. Assume that the attacker knows an (m, s) pair and can induce a fault in exactly one of the values sj (and nowhere else) and generate the corresponding faulty signature. Suggest a strategy how the position j and the bit dj can be recovered in this case. |
| 7.7 | Propose a fault attack on the ElGamal signature scheme (Algorithms 5.36 and 5.37), similar to the attack on DSA described in the text. |
Backdoor attacks on a public-key cryptosystem refer to attacks embedded in the key generation procedure (hardware or software) by the designer of the procedure. A contaminated cryptosystem is one in which the key generation procedure comes with hidden backdoors. A good backdoor attack should meet the following criteria:
To a user, keys generated by the contaminated system should be indistinguishable from those generated by an honest version of the cryptosystem. For example, the parameters and keys must look sufficiently random.
Keys generated by the contaminated system should satisfy the input/output requirements of an honest system. For example, for the RSA cryptosystem the user should be allowed to opt for small public exponents.
A contaminated key generation procedure should not run (on an average) much slower than the honest procedure.
The designer (and nobody else) should have the exclusive capability of determining the secret information from a contaminated published public key.
A user (other than the designer), detecting or suspecting information leakage from a contaminated system, may reverse-engineer the binaries or the smart card to identify the contaminated key generation procedure. The user may even be given the source code of the contaminated routine. Still the user should not be able to steal keys from other users of the same contaminated system. In this sense, a good backdoor protects the designer universally.
A stronger requirement is that reverse-engineering (or source code) should also not allow a user to distinguish (in poly-time) between keys generated by the contaminated procedure and those generated by a genuine procedure. It is exclusively the designer who should possess the capability to make such distinctions in poly-time.
Young and Yung [307] have proposed using public-key cryptography itself for generating backdoors. In their schemes, the attacker (the designer) embeds the encryption routine and the encryption key of the attacker in the key generation procedure of the contaminated system. The decryption key of the attacker is not embedded in the contaminated system and is known only to the attacker. The attacker’s encryption system is assumed to be honest and unbreakable and, thereby, it gives the attacker the exclusive power to decrypt contaminated keys. Young and Yung call such a backdoor a secretly embedded trapdoor with universal protection (SETUP). They also coined the term kleptography to denote such use of cryptography against cryptography.
In the rest of this section, we denote the attacker’s encryption and decryption functions by fe and fd respectively. We often do not restrict these functions to public-key routines only. Since public-key routines are slow, symmetric-key routines can be employed in practice. Simple XOR-ing with a fixed bit string (known to the designer) may also suffice. However, for these faster alternatives of fe, fd, reverse engineering reveals the symmetric key or the XOR operand to the user who can subsequently mimic the attacker to steal keys generated elsewhere by the same contaminated system.
We use the following shorthand notations. Here, n stands for a positive integer that can be naturally identified with a unique bit string having the most significant (that is, leftmost) bit equal to 1.
| |n| | = | the bit length of n. |
| lsbk(n) | = | the least significant k bits of n. |
| msbk(n) | = | the most significant k bits of n. |
| (a1 ‖ a2 ‖ · · · ‖ ar) | = | the concatenation of the bit strings a1, a2, . . . , ar. |
RSA, (seemingly) being the most popular public-key cryptosystem, has been the target of most cryptanalytic attacks. Backdoor attacks are not an exception. The backdoor attacks on RSA work by cleverly hiding some secret information in the public key (n, e) of a user. As earlier, we denote the corresponding private exponent by d and the prime factors of n by p and q.
The simplest attack is to choose a fixed p known to the designer. The other prime q is generated randomly, and correspondingly n = pq and the key pairs (e, d) are computed. Reverse engineering such a scheme is pretty simple, since two different moduli n1 = pq1 and n2 = pq2 belch out p = gcd(n1, n2) easily.
A better approach is given in Algorithm 7.3. The function fe may be RSA encryption under the designer’s public key. In that case, the RSA modulus of the attacker should be so chosen that the condition e < n is satisfied with good probability. On the other hand, if this modulus is too small, then this scheme will generate values of e much smaller than n.
In order to determine the secret exponent from a public key generated using this scheme, the attacker runs Algorithm 7.4. If fe and fd are RSA functions under the attacker’s keys, nobody other than the attacker can apply fd to generate p from e. This provides the designer with the exclusive capability of stealing keys.
A problem with Algorithm 7.3 is that the attacker has little control over the length of the public exponent e. If the user demands a small modulus (like e = 3 or e = 257), this scheme fails to produce one. Algorithm 7.5 overcomes this difficulty by hiding p in the high order bits of the modulus n (instead of in the exponent e). Young and Yung [307] proposed this algorithm in the name PAP (pretty awful privacy). The name contrasts with PGP (pretty good privacy), a popular and widely used RSA implementation.
|
Input: Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d). Steps: Generate a random k-bit prime q. |
|
Input: An RSA public key (n, e). Output: The corresponding secret (p, q, d) or failure. Steps: p := fd(e). |
Algorithm 7.5 works as follows. Following Young and Yung [307], we assume that the attacker uses RSA to realize fe and fd. The RSA modulus of the attacker is denoted by N. The attack requires |N| = k, where |p| = |q| = k. To start with, a random prime p of the desired bit length k is generated. This prime is to be encrypted using fe and so one requires p < N. Instead of encrypting p directly, the attacker uses a permutation function π keyed by K + i for some fixed K and for i = 1, 2, . . . , B, where B is a small bound (typically B = 16). This permutation helps the attacker in two ways. First, one may now have p > N, so a suspicion regarding bounded values of p does not arise. Second, it is cheaper to apply the permutation instead of generating fresh candidates for p. (In an (honest) RSA key generation routine, the prime generation part typically takes the most of the running time.)
|
Input: Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d). Steps: while (1) { |
Once a suitable p and the corresponding p′ = πK+i(p) are generated, the encryption function fe is applied to generate p″ = fe(p′). Now, instead of embedding p″ directly in the modulus n, another keyed permutation is applied on p″ to generate
. This permutation facilitates investigating several choices for q and so is a faster alternative than restarting the entire process afresh, every time an unsuitable q is computed. A pseudorandom bit string a of length k is appended to p‴ to obtain an approximation X for n. If q := ⌊X/p⌋ happens to be a prime of bit length k, the exact n = pq is computed, else another j is tried. If all values of
(for some small bound B′) fail, the entire procedure is repeated with a new k-bit prime p.
For random choices of a, the quotients q = ⌊X/p⌋ behave like random integers and so the probability that q is prime is almost the same as random integers of bit length k. Write X = qp + r with r = X rem p. If r > a, then n = X – r has p‴ – 1 embedded in its higher bits, whereas if r ≤ a, then p‴ itself is embedded in the higher bits of n.
Once suitable p and q are found, the PAP routine generates (like PGP) a small encryption exponent e relatively prime to φ(n) and its inverse d modulo φ(n). One can anyway opt for bigger values of e. In that case, instead of choosing e successively from the sequence 17, 19, 21, 23, . . . one writes one’s customized steps for generating candidate values for e. Choosing small e in Algorithm 7.5 indicates resemblance with PGP and the flexibility of doing so.
The authors of PAP compare their implementation of Algorithm 7.5 with that of the honest PGP key generation procedure. The contaminated routine has been found to run on an average only 20 per cent slower than the honest routine.
Algorithm 7.6 recovers the prime factor p of n from a public key (n, e) generated by PAP, using the RSA decryption function fd of the attacker. Reverse engineering may make available to the user the permutation functions π and π′, the fixed constants K, B, B′ and the designer’s public key. But this knowledge alone does not empower the user to steal PAP-generated keys.
|
Input: An RSA public key (n, e) with n = pq. Output: The prime divisor p of n or failure. Steps: Write n = (U ‖ V) with |V | = k. |
Another possible backdoor is hiding an RSA key pair (∊, δ) with small δ inside a key pair (e, d). Crépeau and Slakmon [70] realize this backdoor using a result from Boneh and Durfee [32], which describes a polynomial-time (in |n|) algorithm for computing δ from the public key (n, ∊), provided that δ is less than n0.292. This attack is explained in Algorithm 7.7. Here, the modulus is a genuine random RSA modulus. The mischievous key ∊ is neatly hidden by the attacker’s encryption routine fe. The resulting output key pair (e, d) looks reasonably random. However, this scheme has a drawback similar to Algorithm 7.3; that is, it cannot easily generate small values of e.
|
Input: Output: An RSA modulus n = pq with |n| = k and a key pair (e, d). Steps: Generate random primes p, q of bit length ~ k/2, such that n := pq has |n| = k. |
Algorithm 7.8 retrieves d from a public key (n, e) generated by Algorithm 7.7.
|
Input: An RSA public key (n, e) generated by Algorithm 7.7. Output: The corresponding private key d. Steps: ∊ := fd(e). /* Recover the hidden exponent */ |
The correctness of Algorithm 7.8 is evident. In order to see how the knowledge of ∊ and δ reveals φ(n), note that x := ∊δ – 1 is a multiple of φ(n); that is,
Equation 7.7

for some integer l. Since δ < n0.292 and ∊ < n, we have x < n1.292. But φ(n) ≈ n and so l cannot be much larger than n0.292. Since |p| ≈ k/2 ≈ |q|, we have l(p+q–1) < n. Now, if we write
x = an + b = (a + 1)n – (n – b)
with a = x quot n and b = x rem n, comparison with Equation (7.7) reveals that l = a + 1. This gives φ(n) = x/l.
Although not needed explicitly here, the factorization of n can be easily obtained by solving the equations pq = n and p + q = n – φ(n) + 1. If ∊ and δ are not small, we may have l(p + q – 1) ≥ n, and φ(n) cannot be calculated so easily as above. A randomized polynomial-time algorithm can still factor n from the knowledge of ∊, δ and n. For the details, solve Exercise 7.9.
Crépeau and Slakmon propose another backdoor attack based on the following result due to Boneh et al. [33]. Let (∊, δ) be a key pair for an RSA modulus n = pq. Further, let
and 2t–1 ≤ ∊ < 2t. There exists a polynomial-time algorithm that, given n, ∊, and t most significant and |n|/4 least significant bits of δ, recovers the full private exponent δ.
|
Input: Output: An RSA modulus n = pq with |n| = k and a key pair (e, d). Steps: Generate random primes p, q of bit length ~ k/2, such that n := pq has |n| = k. |
Algorithm 7.9 uses fe to hide in e a small ∊, t most significant bits of δ and |n|/4 least significant bits of δ. A string of bit length 2t + k/4 is encrypted by fe. Applying the decryption routine fd on e recovers these hidden values, from which ∊ and δ and hence φ(n) can be obtained. Algorithm 7.10 does this task. This scheme also fails, in general, to produce small public exponents e.
|
Input: An RSA public key (n, e) generated by Algorithm 7.9 and the matching Output: The corresponding private key d. Steps: Compute fd(e) and retrieve the following: |
We now describe a backdoor attack on the ElGamal signature Algorithm 5.36. This attack does not work when the user’s permanent key pair is generated. It manipulates the session-key generation in such a way that the user’s permanent private key is revealed to the attacker from two successive signatures.
Let p be a prime, g a generator of
, and (d, gd(mod p)) the permanent key pair of Alice. The attacker uses the same field and a key pair (D, gD (mod p)) with gD supplied to the signing device. Suppose that Alice signs two messages m1 and m2 to generate signatures (s1, t1) and (s2, t2) respectively, where

The attack proceeds by letting d1 arbitrary, but by taking
d2 ≡ (gD)d1 (mod p).
Since
, we have

that is,

The private key D of the attacker (or d1) is required for computing d; so nobody other than the designer can retrieve Alice’s secret by observing the contaminated signatures (s1, t1) and (s2, t2).
For ElGamal encryption (Algorithm 5.15) and for Diffie–Hellman key exchange (Algorithm 5.27) over
, a party (Alice) generates random session key pairs of the form (d′, gd′(mod p)) and communicates the public session key gd′ to another party. The following backdoor manipulates the session-key generation in such a way that two public session keys reveal the second private session key (but not the permanent private key). We assume that the attacker learns the public session keys by eavesdropping. The attacker’s key-pair is (D, gD(mod p)). The contaminated routine contains the public key gD(mod p), but not the private key D.
Let (d1, r1) and (d2, r2) be two session keys used by Alice, where
| r1 | ≡ | gd1 (mod p), |
| r2 | ≡ | gd2 (mod p). |
The contaminated routine that generates the session keys uses a fixed odd integer u, a hash function H and a random bit
to generate d2 from d1 as follows:
| z | ≡ | gd1+ub(gD)d1 (mod p), |
| d2 | ≡ | H(z) (mod p – 1). |
The attacker knows r1 and r2 by eavesdropping. She computes d2 by Algorithm 7.11, the correctness of which is established from that
.
|
|
Algorithm 7.11 requires the attacker’s private key D (or d1) and can be performed only by the attacker. Now, d2 can be analogously used to generate the third session key d3 and so on, that is, the attacker can steal all the private session keys (except the first).
The odd integer u is used for additional safety. In order to see what might happen without it (that is, with b = 0 always), assume that H can be inverted. This gives z and
(mod p). If D is even, y is always a quadratic residue modulo p. If D is odd, y is a quadratic residue or non-residue modulo p depending on whether d1 is even or odd. The randomly added odd bias u destroys this correlation of z with quadratic residues.
Using trustworthy implementations (hardware or software) of cryptographic routines (in particular, key generation routines) eliminates or reduces the risk of backdoor attacks. Preferences should be given to software applications with source codes (rather than to the more capable ones without source codes). Random number generators should be given specific attention. Cascading products from different independent sources also minimizes the possibility of hidden backdoors.
If the desired grain of trust is missing from the available products, the only safe alternative is to write the codes oneself. Complete trust on cryptographic devices and packages and using them as black boxes without bothering about the internals is often called black-box cryptography. Users should learn to question black-box cryptography. The motto is: Be aware or bring peril.
| 7.8 | Argue that reverse engineering the PAP routine (Algorithm 7.5) can enable a user to distinguish in polynomial time between key pairs generated by PAP and those generated by honest procedures. |
| 7.9 | Let n = pq be an RSA modulus and (e, d) a key pair under this modulus. Write ed – 1 = 2st, where s = v2(ed – 1) (so that t is odd). Since ed – 1 is a multiple of φ(n) = (p – 1)(q – 1) with odd p, q, we have s ≥ 2.
|
In this chapter, we discuss some indirect ways of attacking public-key cryptosystems. These attacks do not attempt to solve the underlying intractable problems, but watch the decryption device and/or use malicious key generation routines in order to gain information about private keys.
The timing attack works based on the availability of the total times of several private-key operations under the same private key. It successively keeps on guessing bits of the private key by performing some variance calculations.
The power attack requires the availability of the power consumption patterns (also called power traces) of the decrypting (or signing) device during one or more private-key operations. If the measurements are done with good accuracy and resolution, a single power trace may reveal the private key to the attacker; this is called simple power analysis. In practice, however, such power measurements are often contaminated with noise. Differential power analysis requires power traces from several decryption operations under the same private key. The different traces are combined using a technique that reduces the effect of noise.
A fault attack can be mounted by injecting one or more faults in the device performing private-key operations. Fault attacks are discussed in connection with several encryption (RSA), signature (ElGamal, DSA and so on) and authentication (FFS) schemes.
The above three kinds of attacks are collectively called side-channel attacks. Several general and algorithm-specific countermeasures against side-channel attacks are discussed.
Backdoor attacks, on the other hand, are mounted by malicious key generation routines. Young and Yung propose the concept of secretly embedded trapdoor with universal protection (SETUP). In a SETUP-contaminated system, the designer of the key generation routine possesses the exclusive right to steal keys from users. Several examples of backdoor attacks on RSA and ElGamal cryptosystems are described.
Kocher introduces the concept of side-channel attacks in his seminal paper [155]. This paper describes further details about the timing attack (like a derivation of the choice of the sample size k) and some experimental results.
Timing attacks in various forms are applicable to other systems. Kocher [155] himself suggests a chosen message attack on an RSA implementation based on CRT (Algorithm 5.4). Carol, in an attempt to guess Alice’s public key d, tries to guess the factor p (or q) of the modulus n using a timing attack. She starts by letting Alice sign a message y (c in Algorithm 5.4) close to an initial guess of p. The CRT-based algorithm first reduces y modulo p and modulo q before performing the modular exponentiations. If y < p already, then the initial reduction modulo p returns (almost) immediately, whereas if y ≥ p, the reduction involves at least one subtraction. This gives a variation in the timings based on the value of p. This fact is exploited by the attack to arrive at better and better approximations of p.
A known-message timing attack (in addition to the chosen message attack mentioned in the last paragraph) on the CRT-based RSA signature scheme is proposed by Kocher in the same paper [155]. Kocher also explains a timing attack on the signature algorithm DSA (Algorithm 5.43), based on the dependence of the modular reduction of H(M) + ds modulo r on the bits of the signer’s private key d.
Large scale implementations of timing attacks are reported in the technical reports [77, 259] from the Crypto group of Université catholique de Louvain. These implementations study Montgomery exponentiation.
Kocher [155] mentions the possibility of power attacks. However, a concrete description is first published in Kocher et al. [156], which explains both SPA and DPA. DES is the basic target of this paper, though possibilities for using these techniques against public-key systems are also mentioned.
Several variants of the basic DPA model described in the text have been proposed. Messerges et al. [200] describe attacks against smart-card implementations of exponentiation-based public-key systems. Also consult Aigner and Oswald’s tutorial [9] for a recent survey.
DPA seems to be the most threatening of all side-channel attacks. Many papers suggesting countermeasures against DPA have appeared. Chari et al. [45] propose a masking method. Messerges [199] applies this idea to a form suitable for AES.[4] Messerges’ countermeasure is broken in [63] using a multi-bit DPA. Some other useful papers on DPA include [10, 55, 201].
[4] AES is an abbreviation for advanced encryption standard which is a US-government standard that supersedes the older standard DES. AES uses the Rijndael cipher [219].
Boneh et al. [30, 31] from the Bellcore Lab. announce the first systematic study of fault attacks on asymmetric-key cryptosystems. They explain fault attacks on RSA (with and without CRT), the Rabin signature scheme, the Feige–Fiat–Shamir identification protocol and on the Schnorr identification protocol. These attacks are collectively known as Bellcore attacks.
Arjen K. Lenstra points out that the fault attack on CRT-based RSA does not require the valid signature. Joye and Quisquater propose some generalizations of the Bellcore–Lenstra attack. A form of this attack is applicable to elliptic-curve cryptosystems. The paper [142] talks about these developments.
Bao et al. [17] propose fault attacks on DSA, ElGamal and Schnorr signatures. They also describe variants of the fault analysis of RSA based on square-and-multiply algorithms. Zheng and Matsumoto [315] indicate the possibilities of attacking the random bit generator in a smart card.
Biham and Shamir [22] investigate fault analysis of symmetric-key ciphers and introduce the concept of differential fault analysis. Anderson and Kuhn [11] also study fault analysis of symmetric-key ciphers. Aumüller et al. [15] publish their practical experiences regarding physical realizations of faults in smart cards. They also suggest countermeasures against such attacks.
James A. Muir’s work [215] is a very readable and extensive survey on side-channel cryptanalysis. Also look at Boneh’s survey [29].
Because of small key sizes, elliptic-curve cryptosystems are very attractive for implementation in smart cards. It is, therefore, necessary to provide effective countermeasures against side-channel attacks (most importantly, against the DPA) for elliptic-curve cryptosystems. Many recent articles discuss this issue. Coron [62] suggests the use of random projective coordinates to avoid the costly (and power-consuming) field inversion operation needed for adding and doubling of points. Möller [206] proposes a non-conventional way of carrying out the double-and-add procedure. Izu and Takagi [138] describe a Montgomery-type point addition scheme resistant against side-channel attacks. An improved version of this algorithm, that works for a more general class of elliptic curves, is presented in Izu et al. [137].
Young and Yung introduce the concept of SETUP in [307]. The PAP SETUP on RSA and the ElGamal signature SETUP are from this paper which also includes attacks on DSA and Kerberos authentication protocol. In a later paper [308], Young and Yung categorizes SETUP in three types: regular, weak and strong. Strong SETUPs are proposed for Diffie–Hellman key exchange and for RSA. The third reference [309] from the same authors extends the ideas of kleptography further and provides backdoor routines for several other cryptographic schemes.
Crépeau and Slakmon [70] adopt a more informal approach and discuss several backdoors for RSA key generation. In addition to the trapdoors with hidden small private and public exponents, described in the text, they propose a trapdoor that hides small prime public exponent. They also present an improved version of the PAP routine. Unlike Young and Yung, they suggest symmetric techniques for designing fe, fd. Symmetric techniques endanger universal protection of the attacker, but continue to make perfect sense in the context of black-box cryptography.
| 8.1 | Introduction |
| 8.2 | Quantum Computation |
| 8.3 | Quantum Cryptography |
| 8.4 | Quantum Cryptanalysis |
| Chapter Summary | |
| Sugestions for Further Reading |
Our best theories are not only truer than common sense, they make far more sense than common sense does.
—David Deutsch [76]
One can be a masterful practitioner of computer science without having the foggiest notion of what a transistor is, not to mention how it works.
—N. David Mermin [197]
But suppose I could buy a truly powerful quantum computer off the shelf today — what would I do with it? I don’t know, but it appears that I will have plenty of time to think about it!
—John Preskill [243]
So far, we studied algorithms in the area of cryptology, that can be implemented on classical computers (Turing machines or von Neumann’s stored-program computers). Now, we shift our attention to a different paradigm of computation, known as quantum computation. The working of a quantum computer is specified by the laws of quantum mechanics, a branch of physics developed in the 20th century. However counterintuitive, contrived or artificial these laws initially sound, they have been accepted by the physics community as robust models of certain natural phenomena. A bit, modelled as a quantum mechanical system, appears to be a more powerful unit than a classical bit to build a computing device.
This enhanced power of a computing device has many important ramifications in cryptology. On one hand, we have polynomial-time quantum algorithms to solve the integer factorization and the discrete-log problems. This implies that most of the cryptographic algorithms that we discussed earlier become (provably) insecure. On the other hand, there are proposals for a quantum key-exchange method that possesses unconditional (and provable) security.
Unfortunately, it is not clear how one can manufacture a quantum computer. Technological difficulties involved in the process appear enormous and a section of the crowd even questions the feasibility of building such a machine. However, no laws or proofs rule out the possibility of success in the (near or distant) future. Myth has it that Thomas Alva Edison, after several hundred futile attempts to manufacture an electric light bulb, asserted that he knew hundreds of ways how one cannot make an electric bulb. Edison succeeded eventually and dream turned into reality.
But we will not build quantum computers in this chapter. That is well beyond the scope of this book, or, for that matter, of computer science in general. It is thoroughly unimportant to understand the I-V curves of a transistor (or even to know what a transistor actually is), when one designs and analyses (classical) algorithms. In order to design and analyse quantum algorithms, it is equally unimportant to know how a quantum computer can be realized.
We start with a formal description of quantum computation. Quantum mechanical laws govern this paradigm. We will pay little attention to the physical interpretations of these laws. A mathematical formulation suffices for our purpose.
For defining a quantum mechanical system, we need to enrich our mathematical vocabulary. Let V be a vector space over
(or
). Using Dirac’s ket notation we denote a vector ψ in V as |ψ〉.
|
An inner product (also called a dot product or a scalar product) on V is a function
A vector space V with an inner product is called an inner product space. |
|
For
|
|
The inner product on a vector space V induces a norm (Definition 2.115) on V:
An inner product space |
|
We define an equivalence relation ~ on a Hilbert space |
|
An orthonormal basis of a Hilbert space
It is customary to denote the n vectors in an orthonormal basis of |
|
|0〉 := (1, 0, 0 . . . , 0), |1〉 := (0, 1, 0, . . . , 0), . . . , |n – 1〉 := (0, 0, . . . , 0, 1) form an orthonormal basis of |
The following axiom describes the model of a quantum mechanical system.
|
A system is a ray in a (finite-dimensional) Hilbert space (over |
|
The simplest non-trivial quantum mechanical system is a ray in a 2-dimensional Hilbert space In order distinguish a qubit from a classical bit, we call the latter a cbit. |
has an orthonormal basis {|0〉, |1〉}. In the classical interpretation, a cbit can assume only the two values |0〉 and |1〉, whereas a qubit can assume any value of the form
| a|0〉 + b|1〉 | with | a, , |a|2 + |b|2 = 1. |
Such a state of the qubit is called a superposition of the classical states.
Though we don’t care much, at least for the moment, here are two promising candidates for realizing a qubit:
Spin of an electron: The spin of a particle (like electron) in a given direction, say, along the Z-axis, is modelled as a quantum mechanical system with an orthonormal basis consisting of spin up and spin down.
Polarization of a photon: Photons constitute another class of quantum systems, where the two independent states are provided by the polarization of a photon.
A conceptual example of a 2-state quantum system is the Schrödinger cat. The two independent states of a cat, as we classically know, are |alivei〉 and |deadi〉. However, if we think of the cat confined in a closed room and isolated from our observations, quantum mechanics models the state of the cat as a superposition (that is, a complex-linear combination) of these two states. But then if the quantum model were true, opening the room may reveal the cat in a non-trivial state a|alive〉 + b|dead〉 for some complex numbers a, b with |a|2 + |b|2 = 1. It would indeed be an exciting experience. But alas, quantum mechanics precludes the possibility of such an observation. Read on to know what we would actually see, if we open the room.
A single qubit is too small to build a useful computer. We need to use several (albeit a finite number of) qubits and hence must have a way to describe the combined system in terms of the individual qubits. As the simplest and basis case, we first concentrate on combining two quantum systems into one.
|
Let A and B be two quantum mechanical systems with respective Hilbert spaces
where {|i〉A ⊗ |j〉B | i = 0, . . . ,m – 1 and j = 0, . . . , n – 1}. |
It is customary to abbreviate the normalized vector |i〉A ⊗ |j〉B as |i〉A|j〉B or even as |ij〉AB. A general state of AB is of the form

We can generalize this construction to describe a system having
components A1, . . . , Ak. If
is the Hilbert space of Ai with an orthonormal basis {|j〉i | 0 ≤ j < ni}, the composite system A1 · · · Ak has the n1 · · · nk-dimensional Hilbert space with an orthonormal basis comprising the vectors
|j1〉1 ⊗ |j2〉2 ⊗ · · · ⊗ |jk〉k = |j1〉1|j2〉2 · · · |jk〉k = |j1j2 . . . jk〉
with 0 ≤ ji < ni for all i = 1, . . . , k.
|
An n-bit quantum register is a system having exactly n qubits. |
Let A1, . . . , An denote the individual bits in an n-bit quantum register A. Each Ai has the Hilbert space
with orthonormal basis {|0〉, |1〉}. So A has the 2n-dimensional Hilbert space
with an orthonormal basis consisting of the vectors
|j1〉 ⊗ |j2〉 ⊗ · · · ⊗ |jn〉 = |j1〉|j2〉 · · · |jn〉 = |j1j2 · · · jn〉
with each
. Viewed as an integer in binary notation, j1j2 . . . jn is an integral value between 0 and 2n – 1. This gives us a canonical numbering |0〉, |1〉, . . . , |2n – 1〉 of the basis vectors for the register A. These 2n values are precisely the states that a classical n-bit register can have. The quantum register can, however, be in any state |ψ〉 which is a superposition of the classical states:

Let us once again look at the general composite system A = A1 · · · Ak. In the classical sense, each state of A is composed of the individual states of the subsystems Ai. For example, each of the 2n classical states of an n-bit register corresponds to a choice between |0〉 and |1〉 for each individual bit. That is, each individual component retains its own state in a classical composite system. This is, however, not the case with a quantum composite system. Just think of a 2-bit quantum register C := AB. A state
|ψ〉C = c0|0〉C + c1|1〉C + c2|2〉C + c3|3〉C
of C equals a tensor product
| |ψ1〉A ⊗ |ψ2〉B | = | (a0|0〉A + a1|1〉A) ⊗ (b0|0〉B + b1|1〉B) |
| = | a0b0|0〉C + a0b1|1〉C + a1b0|2〉C + a1b1|3〉C, |
|
The state |ψ〉 of a quantum register A = A1 · · · An is called entangled, if |ψ〉 cannot be written as a tensor product of the states of any two parts of A. In other others, |ψ〉 is entangled if and only if no set of fewer than n qubits of A possesses its individual state. |
Entanglement essentially implies correlation or interaction between the components. In a composite quantum system, we cannot treat the components individually. A quantum system, as we have defined (axiomatically) earlier, is a completely isolated system. In reality, interactions with the surroundings make a (non-isolated) system change its state and get entangled. This is one of the biggest problems in the realization of a quantum computer. Quantum error correction is an important topic in quantum computation. For our purpose, we stick to the abstract model of an isolated system (quantum register) immune from external disturbances.
Quantum registers give us a way to store quantum information. A computation involves manipulating the information stored in the registers. In quantum mechanics, all such operations must be reversible, that is, it must be possible to invert every operation. The only invertible operations on the classical states |0〉, |1〉, . . . , |2n – 1〉 of an n-bit quantum register A are precisely all the permutations of the classical states. Now that A can be in many more (quantum) states, there are other allowed operations on A. Any such operation must be reversible and of a particular type. This is the third axiom of quantum mechanics, which is detailed shortly.
A classical n-bit register supports many non-invertible operations. For example, erasing the content of the register (that is, resetting all the bits to zero) is a non-invertible process, since the pre-erasure state of the register cannot be uniquely determined after the erase operation is carried out. Classical computation is based on (classical) gates (like NOT, AND, OR, XOR, NOR, NAND), most of which are non-invertible. XOR, as an example, requires two input bits and outputs a single bit. It is impossible to determine the inputs uniquely from the output only. All such non-reversible operations are disallowed in the quantum world. An invertible version of the XOR operation takes two bits x and y as input and outputs the two bits x and x ⊕ y (where ⊕ denotes XOR of bits). Given the output (x, x ⊕ y), the input can be uniquely determined as (x, y) = (x, x ⊕ (x ⊕ y)), that is, by applying the reversible XOR operation once more.
Like XOR, all bit operations that build up a classical computer can be realized using reversible operations only. This gives us the (informal) assurance that quantum computers are at least as powerful as classical computers.
Back to the business—the third axiom of quantum mechanics.
|
Let U be a square matrix (that is, an m × m matrix for some |
Let A be a quantum system (like a quantum register) with Hilbert space
. An m × m unitary matrix U defines a unitary linear transformation on
taking a normalized vector |ψ〉 to a normalized vector U|ψ〉. Moreover, the transformation maps an orthonormal basis of
to another orthonormal basis of
(Exercise 8.4).
|
A quantum system evolves unitarily, that is, any operation on a quantum mechanical system is a unitary transformation. |
|
The Hadamard transform H on one qubit is defined as:
(Recall that a linear transformation is completely specified by its images of the elements of a basis.) If one takes
By linearity, H transforms a general state |ψ〉 = a|0〉 + b|1〉 to the state
Some other unitary operators are described in Exercises 8.5 and 8.6. |
An important consequence of quantum mechanical dynamics is that cloning of a state of a system is not permissible. In other words, there does not exist an operator that copies an arbitrary state (content) of one quantum register to another.
|
For two n-bit registers A and B, there do not exist a unitary transform U of the composite system AB and a state Proof Assume that such a state |
We have seen how to represent a quantum mechanical system and do operations on the system. Now comes the final part of the game, namely observing or measuring or reading the state of a quantum system. In classical computation, reading the value stored in a classical register is a trivial exercise—just read it! In quantum mechanics, this is not the case.
|
Let A be a quantum mechanical system with an orthonormal basis {|0〉, |1〉, . . . , |m – 1〉}. Assume that A is in a state |
This means that whatever the state |ψ〉 of A was before the measurement, the process of measurement can reveal only one of m possible integer values. Moreover, the measurement causes a total loss of information about the pre-measurement amplitudes ai. Thus, it is impossible to measure A repeatedly at the state |ψ〉 to see a statistical pattern in the occurrences of different values of i so as to guess the probabilities |ai|2.
If we open the room, we can see the Schrödinger cat in only one of the two possible states: |alivei or |deadi. Well, then, what else can we expect? Quantum mechanics only models the cat in the isolated room as one evolving following the unitary dynamics.
At first glance, this is rather frustrating. We claim that the system went through a series of classically meaningless states, but the classical states are all we can see. What is the guarantee that the system really evolved in the quantum mechanical way? Well, there is no guarantee actually. The solace is that the axioms of quantum mechanics can explain certain natural phenomena. Also it is perfectly consistent with the classical behaviour in that if the system A evolves classically and is measured at the state |i〉 (so that ai = 1 and aj = 0 for j ≠ i), measuring A reveals i with probability one and causes the system to collapse to the state |i〉, that is, to remain in the state |i〉 itself.
There is a positive side of the quantum mechanical axioms. A quantum mechanical system is inherently parallel. An n-bit classical register at any point of time can hold only one of the classical values |0〉, . . . , |2n – 1〉. An n-bit quantum register, on the other hand, can simultaneously hold all these classical values, with respective probabilities. This inherent parallelism seems to impart a good deal of power to a computing device. Of course, as long as we cannot harness some physical objects to build a real quantum mechanical computing device, quantum computation continues to remain science fiction. But on an algorithmic level, the inherent parallelism of a (hypothetical) quantum computer can be exploited to do miracles, for example, to design a polynomial-time integer factorization algorithm. This is where we win—at least conceptually. Our failure to see a cat in the state
(|alive〉 – |dead〉) should not bother us at all!
Measurement of a quantum register gives us a way to initialize a quantum register A to a state |ψ〉. Suppose that we get the value i upon measuring A. We then apply any unitary transform on A that changes A from the post-measurement state |i〉 to the desired state |ψ〉.
The measurement described in Axiom 8.4 is called measurement in the classical basis. The system A has, in general, many orthonormal bases other than the classical one {|0〉, . . . , |m – 1〉}. If B is any such basis, we can conceive of measuring A in the basis B. All we need to perform is to rewrite the state of A in terms of the new basis B. This can be achieved by applying to A a unitary transformation (the change-of-basis transformation) before the measurement in the classical basis is carried out.
A generalization of the Born rule is also worth mentioning here. Suppose that we have an m + n-bit quantum register A and we want to measure not all but some of the bits of A. To be more specific, let us say that we want to measure the leftmost m bits of A, though the generalized Born rule works for any arbitrary choice of m bit positions in the register A. Denoting by |i〉m, i = 0, . . . , 2m – 1, the canonical basis vectors for the left m bits and by |j〉n, j = 0, . . . , 2n – 1, those for the right n bits, a general state of A can be written as

with Σi,j|ai,j|2 = 1 and with |i, j〉m+n identified as |i〉m|j〉n = |i〉m ⊗ |j〉n. A measurement of the left m bits of A yields an integer i, 0 ≤ i ≤ 2m – 1, with probability
. Also this measurement causes A to collapse to the state
.
Now, if we immediately apply the generalized Born rule once again on the right n bits of A, we get an integer j, 0 ≤ j ≤ 2n – 1, with probability |ai,j|2/pi and the system collapses to the state |i〉n|j〉n. The probability of getting |i〉n|j〉n by this two step process is then pi|ai,j|2/pi = |ai,j|2. This is consistent with a single application of the original Born rule.
We start with a general framework of doing computations using quantum registers. Suppose we want to compute a function f which requires an m-bit integer as input and which outputs an n-bit integer. A general function f need not be invertible, but we cannot afford non-invertible operations on quantum registers. This is why we work on an m + n-bit quantum register A in which the left m bits represent the input and the right n bits the output. Computing f(x) for a given x is tantamount to designing a unitary transformation Uf that acts on A and converts its state from |x〉m|y〉n to |x〉m|f(x) ⊕ y〉n, where ⊕ is the bitwise XOR operation, and where the subscripts (m and n) indicate the number of bits in the input or output part of A. It is easy to verify that Uf is unitary. Moreover, the inverse of Uf is Uf itself. For y = 0, we, in particular, have Uf (|x〉m|0〉n) = |x〉m|f(x)〉n.
It may still be unclear to the reader what one really gains by using this quantum model. The answer lies in the parallelism inherent in a quantum register. In order to see how this parallelism can be exploited, we describe David Deutsch’s algorithm which, being the first known quantum algorithm, has enough historical importance to be included here in spite of its apparent irrelevance in the context of cryptology.
Assume that f : {0, 1} → {0, 1} is a function that operates on one bit and outputs one bit. There are four such functions: Two of these are constant functions (f(0) = f(1)) and the remaining two non-constant (f(0) ≠ f(1)). We are given a black box Df representing f. We don’t know which one of the four functions Df actually implements, but we can supply a bit to Df as input and read its output on this bit. Our task is to determine whether Df represents a constant function or not. Classically, we make two invocations of Df on the inputs 0 and 1 and make a comparison of the output values f(0) and f(1). It is impossible to solve the problem classically using only one invocation of the black box. The Deutsch algorithm makes this task possible using quantum computational techniques.
Following the general quantum computational model we assume that Df is a unitary transformation on a 2-bit register A (with m = n = 1) that computes Df |x〉|y〉 = |x〉|f(x) ⊕ y〉 with the left (resp. the right) bit corresponding to the input (resp. the output) of f. Instead of supplying a classical input to Df we initialize the register A to the state

Linearity shows that on this input, Df ends its execution leaving A in the state

Here,
. We won’t measure A right now, but apply the Hadamard transform on the left bit. This transforms A to the state

Now, if we measure the input bit, we deterministically get the integer 1 or 0 according as whether f is constant or not respectively. That’s it!
Deutsch’s algorithm solved a rather artificial problem, but it opened up the possibilities of exploring a new paradigm of computation. Till date, (good) quantum algorithms are known for many interesting computational problems. In the rest of this chapter, we concentrate on some of the quantum algorithms that have an impact in cryptology.
| 8.1 | Let S be a finite set and let l2(S) denote the set of all functions .
| ||||||||
| 8.2 | Show that the vectors and form an orthonormal -basis of .
| ||||||||
| 8.3 | Show that is an entangled state of a 2-bit quantum register.
| ||||||||
| 8.4 | Prove the following assertions.
| ||||||||
| 8.5 |
| ||||||||
| 8.6 | Let A be an n-bit quantum register. Let us plan to number the bits of A as 1, . . . , n from left to right. One can apply the operators like X, Z, H of Exercise 8.5 on each individual bit of A. A qubit operation B applied on bit i of A will be denoted by Bi.
| ||||||||
| 8.7 | Suppose that whenever you switch on your quantum computer, every bit in its registers is initialized to the state |0〉. Describe how you can use the operators I, X, Z and H defined in Exercise 8.5, in order to change the state of a qubit from |0〉 to the following:
| ||||||||
| 8.8 | Let A be an n-bit quantum register at the state |0|〉n. Show that the application of the Hadamard transform individually to each bit of A transforms A to the state . This is precisely the state of A in which all of the 2n possible outcomes in a measurement of A are equally likely. What happens if we apply H a second time individually to each bit of A, that is, what is H1H2 · · · Hn|ψ〉, where Hi denotes the Hadamard transform on the i-th bit of A?
| ||||||||
| 8.9 | We know that any arithmetic or Boolean operation can be implemented using AND and NOT gates. This exercise suggests a reversible way to implement these operations. The Toffoli gate is a function T : {0, 1}3 → {0, 1}3 that maps (x, y, z) ↦ (x, y, z ⊕ xy), where ⊕ means XOR, and xy means AND of x and y. Thus, T flips the third bit, if and only if the first two bits are both 1.
|
We now describe the quantum key-exchange algorithm due to Bennett and Brassard. The original paper also talks about a practical implementation of the algorithm—an implementation using polarization of photons. For this moment, we do not highlight such specific implementation issues, but describe the algorithm in terms of the conceptual computational units called qubits.
The usual actors Alice and Bob want to agree upon a shared secret using communication over an insecure channel. A third party who gave her name as Carol plans to eavesdrop during the transmission. Alice and Bob repeat the following steps. Here, H stands for the Hadamard transform.
|
Alice generates a random classical bit Alice makes a random choice Alice computes the quantum bit A := Hx|i〉. Alice sends A to Bob. Bob makes a random choice Bob computes B := HyA. Bob measures B to get the classical bit Bob sends y to Alice. Alice sends x to Bob. if (x = y) { Bob and Alice retains i = j } |
The algorithm works as follows. Alice generates a random bit i and a random decision x whether she is going to use the Hadamard transform H. If x = 0, she sends the quantum bit |0〉 or |1〉 to Bob. If x = 1, she sends either
or
to Bob. At this point Bob does not know whether Alice applied H before the transmission. So Bob makes a random guess
and accordingly skips/applies the Hadamard transform on the qubit received. If x = y = 0, then Bob has the qubit B = H0H0|i〉 = |i〉 and a measurement of this qubit reveals i with probability 1. On the other hand, if x = y = 1, then B = H2|i〉 = |i〉, since H2 is the identity transform (Exercise 8.5). In this case also, Bob retrieves Alice’s classical bit i with certainty by measuring B.
If x ≠ y, then B is generated from Alice’s initial choice |i〉 using a single application of H, that is,
in this case. A measurement of this bit outputs 0 or 1, each with probability
, that is, Bob gathers no idea about the initial choice of Alice. So after it is established that x ≠ y, they both discard the bit.
If we assume that x and y are uniformly chosen, Bob and Alice succeed in having x = y about half of the time. They eventually set up an n-bit secret after about 2n invocations of the above protocol. Table 8.1 illustrates a sample session between Alice and Bob. After 20 iterations of the above procedure, they agree upon the shared secret 0001110111.
| Iteration | i | x | A | y | B | j | Common bit |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 1 | ![]() | 0 | ![]() | 1 | |
| 2 | 0 | 0 | |0〉 | 1 | ![]() | 1 | |
| 3 | 0 | 1 | ![]() | 1 | |0〉 | 0 | 0 |
| 4 | 0 | 1 | ![]() | 0 | ![]() | 0 | |
| 5 | 1 | 1 | ![]() | 0 | ![]() | 1 | |
| 6 | 0 | 0 | |0〉 | 0 | |0〉 | 0 | 0 |
| 7 | 0 | 0 | |0〉 | 0 | |0〉 | 0 | 0 |
| 8 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | 1 |
| 9 | 0 | 0 | |0〉 | 1 | ![]() | 0 | |
| 10 | 1 | 1 | ![]() | 0 | ![]() | 0 | |
| 11 | 0 | 1 | ![]() | 0 | ![]() | 1 | |
| 12 | 0 | 0 | |0〉 | 1 | ![]() | 0 | |
| 13 | 1 | 0 | |1〉 | 1 | ![]() | 1 | |
| 14 | 1 | 1 | ![]() | 1 | |1〉 | 1 | 1 |
| 15 | 1 | 1 | ![]() | 1 | |1〉 | 1 | 1 |
| 16 | 0 | 1 | ![]() | 1 | |0〉 | 0 | 0 |
| 17 | 1 | 1 | ![]() | 1 | |1〉 | 1 | 1 |
| 18 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | 1 |
| 19 | 0 | 1 | ![]() | 0 | ![]() | 0 | |
| 20 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | 1 |
What remains to explain is how this protocol guards against eavesdropping by Carol. Let us model Carol as a passive adversary who intercepts the qubit A transmitted by Alice, investigates the bit to learn about Alice’s secret i and subsequently transmits the qubit to Bob. In order to guess i, Carol mimics the role of Bob. At this point Carol does not know x, so she makes a guess z about x, accordingly skips/applies the Hadamard transform on the intercepted qubit in order to get a qubit C, measures C to get a bit value k and sends the measured qubit D to Bob. (Recall from Theorem 8.1 that it is impossible for Carol to make a copy of A, work on this copy and transmit the original qubit A to Bob.) Bob receives D, assumes that it is the qubit A transmitted by Alice and carries out his part of the work to generate the bit j. Bob and Alice later reveal x and y. If x ≠ y, they anyway reject the bits obtained from this iteration. Carol should also reject her bit k in this case. So let us concentrate only on the case that x = y. The introduction of Carol in the protocol changes A to D and hence Alice and Bob may eventually agree upon distinct bits. A sample session of the protocol in presence of Carol is illustrated in Table 8.2. The three parties generate the secret as:
| Alice | 0110 0111 1000 1011 |
| Bob | 0101 1101 1100 1011 |
| Carol | 0100 0101 0100 1011 |
| Iteration | i | x | A | z | C = HzA | k | D | y | B = HyD | j |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 1 | ![]() | 1 | |0〉 | 0 | |0〉 | 1 | ![]() | 0 |
| 2 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | |1〉 | 0 | |1〉 | 1 |
| 3 | 1 | 0 | |1〉 | 1 | ![]() | 0 | |0〉 | 0 | |0〉 | 0 |
| 4 | 0 | 1 | ![]() | 0 | ![]() | 0 | |0〉 | 1 | ![]() | 1 |
| 5 | 0 | 1 | ![]() | 1 | |0〉 | 0 | |0〉 | 1 | ![]() | 1 |
| 6 | 1 | 1 | ![]() | 1 | |1〉 | 1 | |1〉 | 1 | ![]() | 1 |
| 7 | 1 | 1 | ![]() | 0 | ![]() | 0 | |0〉 | 1 | ![]() | 0 |
| 8 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | |1〉 | 0 | |1〉 | 1 |
| 9 | 1 | 1 | ![]() | 0 | ![]() | 0 | |0〉 | 1 | ![]() | 1 |
| 10 | 0 | 1 | ![]() | 0 | ![]() | 1 | |1〉 | 1 | ![]() | 1 |
| 11 | 0 | 0 | |0〉 | 1 | ![]() | 0 | |0〉 | 0 | |0〉 | 0 |
| 12 | 0 | 0 | |0〉 | 0 | |0〉 | 0 | |0〉 | 0 | |0〉 | 0 |
| 13 | 1 | 1 | ![]() | 1 | |1〉 | 1 | |1〉 | 1 | ![]() | 1 |
| 14 | 0 | 0 | |0〉 | 0 | |0〉 | 0 | |0〉 | 0 | |0〉 | 0 |
| 15 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | |1〉 | 0 | |1〉 | 1 |
| 16 | 1 | 0 | |1〉 | 1 | ![]() | 1 | |1〉 | 0 | |1〉 | 1 |
In this example, Alice and Bob’s shared secrets differ in five bit positions. Carol’s intervention causes a shared bit to differ with a probability of
(Exercise 8.11). Thus, the more Carol eavesdrops, the more she introduces different bits in the secret shared by Alice and Bob.
Once Alice and Bob generate a shared secret of the desired bit length, they can check for the equality of their secret values without revealing them. For example, if the shared secret is a 64-bit DES key, Alice can send Bob one or more plaintext–ciphertext pairs generated by the DES algorithm using her shared key. Bob also generates the ciphertexts on Alice’s plaintexts using his secret key. If the ciphertexts generated by Bob differ from those generated by Alice, Bob becomes confident that their shared secrets are different and this happened because of the presence of some adversary (or because of communication errors). They then repeat the key-exchange protocol.
Another possible way in which Alice and Bob can gain confidence about the equality of their shared secrets is the use of parity checks. Suppose Alice breaks up her secret in blocks of eight bits and for each block computes the parity bit and sends these bits to Bob. Bob generates the parity bits on the blocks of his secret and compares the two sets of parity bits. If the shared secrets of Alice and Bob differ, it is revealed by this parity check with high probability.
A minor variant of the key-exchange algorithm just described comes with an implementation strategy. The polarization of a photon is measured by an angle θ, 0° ≤ θ < 180°.[1] A photon polarized at an angle θ passes through a φ-filter with the probability cos2(φ – θ) and gets absorbed in the filter with the probability sin2(φ – θ). Therefore, a photon polarized at the angles 0°, 90°, 45°, 135° can be used to represent the quantum states |0〉, |1〉,
,
respectively. Alice and Bob use 0°- and 45°-filters. Alice makes a random choice (x) among the two filters. If x = 0, she sends a photon polarized at an angle 0° or 90°. If x = 1, a photon polarized at an angle 45° or 135° is sent. When Bob receives the photon transmitted by Alice, he makes a random guess y. If y = 0, he uses the 0°-filter to detect its polarization, and if y = 1, he uses the 45°-filter to detect its polarization. Then, Alice and Bob reveal their choices x and y and if the two choices agree, they share a common secret bit. See Exercise 8.12 for a mathematical formulation of this strategy.
[1] Ask a physicist!
One of the most startling features of this Bennett–Brassard algorithm (often called the BB84 algorithm) is that there has been successful experimental implementations of the strategy. The first prototype was designed by the authors themselves in the T. J. Watson Research Center. They used a quantum channel of length 32 cm. Using longer channels requires many technological barriers to be overcome. For example, fiber optic cables tend to weaken and may even destroy the polarization of photons. Using boosters to strengthen the signal is impossible in the quantum mechanical world, since doing so produces an effect similar to eavesdropping. Interference pattern (instead of polarization) has been proposed and utilized to build longer quantum channels for key exchange. At present, Stucki et al. [293] hold the world record of performing quantum key exchange over an (underwater) channel of length 67 km between Geneva and Lausanne.
| 8.10 | We have exploited the property that H2 = I in order to prove the correctness of the quantum key-exchange algorithm. Exercise 8.5 lists some other operators (X and Z) which also satisfy the same property (X2 = Z2 = I). Can one use one of these transforms in place of H in the quantum key-exchange algorithm? | ||||||||||||||||||||||||||||||||||||
| 8.11 | Assume that Carol eavesdrops (in the manner described in the text) during the execution of the quantum key-exchange protocol between Alice and Bob. Derive for different choices of i, x and z the following probabilities Pixz of having i ≠ j in the case x = y.
If all these choices of i, x, z are equally likely, show that the probability that Carol introduces mismatch (that is, i ≠ j) in a shared bit during a random execution of the key-exchange protocol with x = y is 3/8. (Note that if x = y = z = 0, that is, if the execution of the algorithm proceeds entirely in the classical sense, Carol goes unnoticed. It is the application of the classically meaningless Hadamard transform, that introduces the desired security in the protocol.) | ||||||||||||||||||||||||||||||||||||
| 8.12 | In the key-exchange algorithm described in the text, Bob (and also Carol) always measure qubits in the classical basis {|0〉, |1〉}. Now, consider the following variant of this algorithm. Alice sends, as before, one of the four qubits |0〉, |1〉, , depending on her choice of i and x. Bob upon receiving the qubit A generates a random guess . If y = 0, Bob measures A in the classical basis, whereas if y = 1, Bob measures A in the basis {H|0〉, H|1〉}. After this, they exchange x and y, and retain/discard the bits as in the original algorithm.
|
The quantum parallelism has been effectively exploited to design fast (polynomial-time) algorithms to solve some of the intractable mathematical problems discussed in Chapter 4. With the availability of quantum computers, cryptographic systems that derive their security from the intractability of these problems will be unusable (completely insecure). Nobody, however, has the proof that these intractable problems cannot have fast classical algorithms. It is interesting to wait and see which (if any) is invented first, a quantum computer or a polynomial-time classical algorithm.
Let us set up some terminology for the rest of this chapter. Let P be a unitary operator on a qubit. One can apply P individually on the i-th bit of an n-bit register. In this case, we denote the operation by Pi. If Pi is operated for each i = 1, . . . , n (in succession or simultaneously), then we abbreviate P1 · · · Pn by the short-hand notation P(n). The parentheses distinguish the operation from Pn which is the n-fold application of P on a single qubit.
If P and Q are unitary transforms on n1- and n2-bit quantum registers respectively, we let P ⊗ Q denote the unitary transform on an n1 + n2-bit register, with P operating on the left n1 bits and Q on the right n2 bits of the register.
Let N := 2n for some
. Let
be a periodic function with (least) period r, that is, f(x + kr) = f(x) for every x,
. Suppose further that 1 ≪ r ≤ 2n/2 and also that f(0), f(1), . . . , f(r – 1) are pairwise distinct. Shor proposed an algorithm for an efficient computation of the period r in this case.
Let’s first look at the problem classically. If one evaluates f at randomly chosen points, by the birthday paradox (Exercise 2.172) one requires
evaluations of f on an average in order to find two different integers x and y with f(x) = f(y). But then r|(x – y). If sufficiently many such pairs (x, y) are available, the period can be obtained by computing the gcd of the integers x – y. If r is large, say, r = O(2n/2), this gives us an algorithm for computing r in expected time exponential in n. Shor’s quantum algorithm determines r in expected time polynomial in n.
Let us assume that we have an oracle Uf which, on input the 2n-bit value |x〉n|y〉n, computes |x〉n|f(x) ⊕ y〉n. We prepare a 2n-bit register A in the state |0〉n|0〉n. Then, we apply the Hadamard transform H(n) on the left n-bits. By Exercise 8.8, the state of A becomes

Supplying this state as the input to the oracle Uf yields the state

We then measure the output register (right n bits). By the generalized Born rule, we get a value
for some
and the state of the register A collapses to the uniform superposition of all those |x〉|f(x)〉 for which f(x) = f(x0). By the given periodicity properties of f, the post-measurement state of the input register (left n bits) can be written as
Equation 8.1

for some M determined by the relations:
x0 + (M – 1)r < N ≤ x0 +Mr.
This is an interesting state, for if we were allowed to make copies of this state and measure the different copies, we could collect some values x0+j1r, . . . , x0+jkr, which in turn would reveal r with high probability. But the no-cloning theorem disallows making copies of quantum states. Shor proposed a trick to work around with this difficulty. He considered the following transform:
Equation 8.2

By Exercise 8.13, F is a unitary transform. F is known as the Fourier transform. Applying F to State (8.1) transforms the input register to the state

A measurement of this state gives an integer
with the probability

Application of the Fourier transform to State (8.1) helps us to concentrate the probabilities of measurement outcomes in strategic states. More precisely, consider a value
of y, where –1/2 ≤ ∊k < 1/2, that is, a value of y close to an integral multiple of N/r. In this case,

The last summation is that of a geometric series and we have

Now, we use the inequalities
for 0 ≤ x ≤ π/2 and the facts that rM ≈ N and that
to get

Since
has about r positive integral multiples less than N and each such multiple has a closest integer yk for some k, the probability that we obtain one such yk as the outcome of the measurement is at least 4/π2 = 0.40528 . . . , that is, after O(1) iterations of the above procedure we get some yk. The Fourier transform increases the likelihood of getting some yk to a level bounded below by a positive constant.
What remains is to show that r can be retrieved from such a useful observation yk. We have
. If a/b and c/d are two distinct rationals with b,
and with
and
, then by the triangle inequality we have
. On the other hand, since a/b ≠ c/d, we have
, a contradiction. Therefore, since
, there is a unique rational k/r satisfying
, and this rational k/r can be determined by efficient classical algorithms, for example, using the continued fraction expansion[2] of yk/N.
[2] Consult Zuckerman et al. [316] to learn about continued fractions and their applications in approximating real numbers.
If gcd(k, r) = 1, we get r. We can verify this by checking if f(x) = f(x + r). If gcd(k, r) > 1, we get a factor of r. Repeating the entire procedure gives another k′/r, from which we get (hopefully) another factor of r (if not r itself). After a few (O(1)) iterations, we obtain r as the lcm of its factors obtained.
Much of the quantum magic is obtained by the use of the Fourier transform F on a suitably prepared quantum register. The question is then how easy it is to implement F. We will not go to the details, but only mention that a circuit consisting of basic quantum gates and of size O(n2) can be used to realize the Fourier transform (cf. Exercise 8.14).
To sum up, we have a polynomial-time (in n) randomized quantum algorithm for computing the period r of f. This leads to efficient quantum algorithms for solving many classically intractable problems of cryptographic significance.
Let m = pq with p,
. We have φ(m) = (p – 1)(q – 1). Choose an RSA key pair (e, d) with gcd(e, φ(m)) = 1 and ed ≡ 1 (mod φ(m)). Given a message
the ciphertext message is b ≡ ae (mod m). The task of a cryptanalyst is to compute a from the knowledge of m, e and b. If gcd(b, m) > 1, then this gcd is a non-trivial factor of m. So assume that
. But then
also. Since b ≡ ae (mod m), b is in the subgroup of
generated by a. Similarly, a ≡ bd (mod m), that is, a is in the subgroup of
generated by b. It follows that these two subgroups are equal and, in particular, the multiplicative orders of a and b modulo m are the same. This order—call it r—divides φ(m) and hence is ≤ (p – 1)(q – 1) < m.
Choose
with N := 2n ≥ m2 > r2. The function
sending x ↦ bx (mod m) is periodic of (least) period r. By Shor’s algorithm, one computes r efficiently. Since gcd(e, φ(m)) = 1 and r|φ(m), we have gcd(e, r) = 1, that is, using the extended gcd algorithm one obtains an integer d′ with d′e ≡ 1 (mod r). But then bd′ ≡ ad′e ≡ a (mod m).
The private key d is the inverse of e modulo φ(m). It is not necessary to compute d for decrypting b. The inverse d′ of e modulo r = ordm(a) = ordm(b) suffices.
Let m be a composite integer that we want to factor. Choose a non-zero integer
. If gcd(a, m) > 1, then we already know a non-trivial factor of m. So assume that gcd(a, m) = 1, that is,
. Let r := ordm(a).
As in the case of breaking RSA, choose
with N := 2n ≥ m2 > r2. The function
, x ↦ ax (mod m), is periodic of least period r. Shor’s algorithm computes r. If r is even, we can write:
(ar/2 – 1)(ar/2 + 1) ≡ 0 (mod m).
Since ordm(a) = r, ar/2 – 1 ≢ 0 (mod m). If we also have ar/2 + 1 ≢ 0 (mod m), then gcd(ar/2 + 1, m) is a non-trivial factor of m. It can be shown that the probability of finding an even r with ar/2 + 1 ≢ 0 (mod m) is at least half (cf. Exercise 4.9). Thus, trying a few integers
one can factor m.
A variant of Shor’s algorithm in Section 8.4.1 can be used to compute discrete logarithms in the finite field
,
,
. For the sake of simplicity, let us concentrate only on prime fields (s = 1). Let g be a generator of
and our task is to compute for a given
an integer
with a ≡ gr (mod p). We assume that p is a large prime, that is, p is odd.
Choose
with N := 2n satisfying p < N < 2p. We use a 3n-bit quantum register A in which the left 2n bits constitute the input part and the right n bits the output part. The input part is initialized to the uniform superposition of all pairs
, that is, A has the initial state:

(see Exercise 8.15). Then, we use an oracle
Uf : |x〉n|y〉n|z〉n ↦ |x〉n|y〉n|f(x, y) ⊕ z〉n
to compute the function f(x, y) := gxa–y (mod p) in the output register. Applying Uf transforms A to the state

Measurement of the output register now gives a value z ≡ gk (mod p) for some
and causes the input register to jump to the state

Note that gxa–y ≡ gk (mod p) if and only if x – ry ≡ k (mod p – 1), that is, only those pairs (x, y) that satisfy this congruence contribute to the post-measurement state. For each value of y modulo p – 1, we get a unique x ≡ ry + k (mod p – 1), that is, there are exactly p – 1 such pairs (x, y).
If we were allowed make copies of this state and observe two copies separately, we would get pairs (x1, y1) and (x2, y2) with x1 – ry1 ≡ x2 – ry2 ≡ k (mod p – 1). Now, if gcd(y1 – y2, p – 1) = 1, we would get r ≡ (y1 – y2)–1 (x1 – x2) (mod p – 1). But we are not allowed to copy quantum states. So Shor used his old trick, that is, applied the Fourier transforms

to obtain the state

A measurement of the input register at this state yields
with probability:
Equation 8.3

As in Shor’s period-finding algorithm, we now require to identify a set of useful pairs (u, v) which are sufficiently many in number so as to make the probability of observing one of them bounded below by a positive constant. We also need to demonstrate how a useful pair can reveal the unknown discrete logarithm r of a. The jugglery with inequalities and approximations is much more involved in this case. Let us still make a patient attempt to see the end of the story.
First, we eliminate one of x, y from Equation (8.3). Since x ≡ ry + k (mod p – 1) and 0 ≤ x ≤ p – 2, we have x = (ry + k) rem
. But then
. Let j be the integer closest to u(p – 1)/N, that is, u(p – 1) = jN + ∊ with
, –N/2 < ∊ ≤ N/2. This yields
Equation 8.4

where
Equation 8.5

Since
is an integer, substituting Equation (8.4) in Equation (8.3) gives

Writing S = lN + σ with –N/2 < σ ≤ N/2 then gives

We now impose the usefulness conditions on u, v:
Equation 8.6

Equation 8.7

Involved calculations show that the probability pu,v for a (u, v) satisfying these two conditions is at least
. Let us now see how many pairs (u, v) satisfy the conditions. From Equation (8.5), it follows that for each u there exists a unique v, such that Condition (8.6) is satisfied. Condition (8.7), on the other hand, involves only u. If w := v2(p – 1), then 2w must divide ∊. For each multiple of 2w not exceeding N/12 in absolute value, we get 2w distinct solutions for u modulo N. (We are solving for u the congruence u(p – 1) ≡ ∊ (mod 2n).) There is a total of at least N/12 of them. Therefore, the probability of making any one of the useful observations (u, v) is at least
, since N < 2p.
We finally explain the extraction of r from a useful observation (u, v). Condition (8.6) and Equation (8.5) give
. Dividing throughout by N and using the fact that u(p – 1) = jN + ∊, we get

that is, the fractional part of
must lie between
and
. The measurement of the input gives us v and we know N. We approximate
to the nearest multiple
of
and get rj ≡ λ (mod p – 1). Now, j, being the integer closest to u(p – 1)/N, is also known to us. If gcd(j, p – 1) = 1, we have r ≡ j–1λ (mod p – 1). We don’t go into the details of determining the likelihood of the invertibility of j modulo p – 1. A careful analysis shows that Shor’s quantum discrete-log algorithm runs in probabilistic polynomial time (in n).
| 8.13 | Let F be the Fourier Transform (8.2). For basis vectors |x〉 and |x′〉, show that
|
| 8.14 | Let N = 2n. Let x, have binary expansions (xn–1 · · · x1x0)2 and (yn–1 · · · y1y0)2 respectively.
|
| 8.15 | Let , N := 2n and . Consider an (n + 1)-bit quantum register with input consisting of the left n bits and the output the rightmost bit. Suppose there is an oracle Uf that takes an n-bit input x and outputs the bit:
First prepare the register in the state |
| 8.16 | Recall that the Fourier Transform (8.2) is defined for N equal to a power of 2. It turns out that for such values of N the quantum Fourier transform is easy to implement. For this exercise, assume hypothetically that one can efficiently implement F for other values of N too. In particular, take N = p – 1 in Shor’s quantum discrete-log algorithm. Show that in this case, the probability pu,v of Equation (8.3) becomes:
Conclude that an outcome (u, v) of measuring the input register yields r ≡ –u–1v (mod p – 1), provided gcd(u, p – 1) = 1. |
This chapter is a gentle introduction to the recent applications of quantum computation in public-key cryptography. These developments have both good and bad impacts for cryptologers. It is still a big question whether a quantum computer can ever be manufactured. So at present a study of quantum cryptology is mostly theoretical in nature.
Quantum mechanics is governed by a set of four axioms that define a system and prescribe the properties of a system. A quantum bit (qubit) is a quantum mechanical system that has two orthogonal states |0〉 and |1〉. A quantum register is a collection of qubits of a fixed size.
As an example of what we can gain by using quantum algorithms, we first describe the Deutsch algorithm that determines whether a function f : {0, 1} → {0, 1} is constant by invoking f only once. A classical algorithm requires two invocations.
Next we present the BB84 algorithm for key exchange over a quantum mechanical channel. The algorithm guarantees perfect security. This algorithm has been implemented in hardware, and key agreement is carried out over a channel of length 67 km.
Finally, we describe Shor’s polynomial-time quantum algorithms for factoring integers and for computing discrete logarithms in finite fields. These algorithms are based on a technique called quantum Fourier transform.
If quantum computers can ever be realized, RSA and most other popular cryptosystems described and not described in this book will forfeit all security guarantees. And what will happen to this book? If you don’t possess a copy of this wonderful book, just rush to your nearest book store now—they have not yet mastered the quantum technology!
There was a time when the newspapers said that only twelve men understood the theory of relativity. I do not believe there ever was such a time . . . On the other hand, I think I can safely say that nobody understands quantum mechanics.
—Richard Feynman, The Character of Physical Law, BBC, 1965
Quantum mechanics came into existence, when Werner Heisenberg, at the age of 25, proposed the uncertainty principle in 1927. It created an immediate stir in the physics community. Eventually Heisenberg and Niels Bohr came up with an interpretation of quantum mechanics, known as the Copenhagen interpretation. While many physicists (like Max Born, Wolfgang Pauli and John von Neumann) subscribed to this interpretation, many other eminent ones (including Albert Einstein, Erwin Schrödinger, Max Planck and Bertrand Russell) did not. Interested readers may consult textbooks by Sakurai [255] and Schiff [258] to study this fascinating area of fundamental science.[3]
[3] Well! We are not physicists. These books are followed in graduate and advanced undergraduate courses in many institutes and universities.
For a comprehensive treatment of quantum computation (including cryptographic and cryptanalytic quantum algorithms), we refer the reader to the book by Nielsen and Chuang [218]. Mermin’s paper [197] and course notes [198] are also good sources for learning quantum mechanics and computation, and are suitable for computer scientists. Preskill’s course notes [244] are also useful, though a bit more physics-oriented. The very readable article [243] by Preskill on the realizability of quantum computers is also worth mentioning in this context. The first known quantum algorithm is due to Deutsch [75].
Bennett and Brassard’s quantum key-exchange algorithm (BB84) appeared in [20]. The implementation due to Stucki et al. of this algorithm is reported in [293].
Shor’s polynomial-time quantum factorization and discrete-log algorithms are described in [271]. All the details missing in Section 8.4.4 can be found in this paper. No polynomial-time quantum algorithms are known to solve the elliptic curve discrete logarithm problem. Proos and Zalka [245] present an extension of Shor’s algorithm for a special class of elliptic curves. See [146] for an adaptation of this algorithm applicable to fields of characteristic 2.
| A.1 | Introduction |
| A.2 | Block Ciphers |
| A.3 | Stream Ciphers |
| A.4 | Hash Functions |
Sour, sweet, bitter, pungent, all must be tasted.
—Chinese Proverb
Unless we change direction, we are likely to end up where we are going.
—Anonymous
Not everything that can be counted counts, and not everything that counts can be counted.
—Albert Einstein
Cryptography, today, cannot bank solely on public-key (that is, asymmetric) algorithms. Secret-key (that is, symmetric) techniques also have important roles to play. This chapter is an attempt to introduce to the readers some rudimentary notions about symmetric cryptography. The sketchy account that follows lacks both the depth and the breadth of a comprehensive treatment. Given the focus of this book, Appendix A could have been omitted. Nonetheless, some attention to the symmetric technology is never irrelevant for any book on cryptology.
It remains debatable whether hash functions can be treated under the banner of this chapter—a hash function need not even use a key. If the reader is willing to accept symmetric as an abbreviation for not asymmetric, some justifications can perhaps be given. How does it matter anyway?
Block ciphers encrypt plaintext messages in blocks of fixed lengths and are more ubiquitously used than public-key encryption routines. In a sense, public-key encryption is also block encryption. Since public-key routines are much slower than (secret-key) block ciphers, it is a custom to use public-key algorithms only in specific situations, for example, for encrypting single blocks of data, like keys of symmetric ciphers.
In the rest of this chapter, we use the word bit in the conventional sense, that is, to denote a quantity that can take only two possible values, 0 and 1. It is convenient to use the symbol
to refer to the set {0, 1}. We also let
stand for the set of all bit strings of length m. Whenever we plan to refer to the field (or group) structure of
, we will use the alternative notation
.
|
A block cipher f of block-size n and of key-size r is a map
that encrypts a plaintext block m of bit length n to a ciphertext block c of bit length n under a key K, a bit string of length r. To ensure unique decryption, the map
for a fixed key K has to be a permutation of (that is, a bijective function on) |
A good block cipher has the following desirable properties:
The sizes n and r should be big enough, so that an adversary cannot exhaustively check all possibilities of m or K in feasible time.
For most, if not all, keys K, the permutations fK should be sufficiently random. In other words, if the key K is not known, it should be computationally infeasible to guess the functions fK and
. That is, it should be difficult to guess c from m or m from c, unless the key K is provided. The identity map on
, though a permutation of
, is a bad candidate for an encryption function fK. It is also desirable that the functions fK for different values of K are unpredictably selected from the set of all permutations of
. Thus, for example, taking fK to be a fixed permutation for all choices of K leads to a poor design of a block cipher f.
For most, if not all, pairs of distinct keys K1 and K2, the functions gK1 ο gK2 should not equal gK for some key K, where g stands for f or f–1 with independent choices in the three uses. A more stringent demand is that the subgroup generated by the permutations fK for all possible keys K should be a very big subset of the group of all permutations of
. If gK = gK1 ο gK2 ο · · · ο gKt for some t ≥ 2, multiple encryption (see Section A.3) forfeits its expected benefits.
A block cipher provably possessing all these good characteristics (in particular, the randomness properties) is difficult to construct in practice. Practical block ciphers are manufactured for reasonably big n and r and come with the hope of representing reasonably unpredictable permutations. We dub a block cipher good or safe, if it stands the test of time. Table A.1 lists some widely used block ciphers.
| Name | n | r |
|---|---|---|
| DES (Data Encryption Standard) | 64 | 56 |
| FEAL (Fast Data Encipherment Algorithm) | 64 | 64 |
| SAFER (Secure And Fast Encryption Routine) | 64 | 64 |
| IDEA (International Data Encryption Algorithm) | 64 | 128 |
| Blowfish | 64 | ≤ 448 |
| Rijndael, accepted as AES (Advanced Encryption Standard) by NIST (National Institute of Standards and Technology, a US government organization) | 128/192/256 | 128/192/256 |
The data encryption standard (DES) was proposed as a federal information processing standard (FIPS) in 1975. DES has been the most popular and the most widely used among all block ciphers ever designed. Although its relatively small key-size offers questionable security under today’s computing power, DES still enjoys large-scale deployment in not-so-serious cryptographic applications.
DES encryption requires a 64-bit plaintext block m and a 56-bit key K.[1] Let us plan to use the notations DESK and
to stand respectively for DES encryption and decryption functions under the key K.
[1] A DES key K = k1k2 . . . k64 is actually a 64-bit string. Only 56 bits of K are used for encryption. The remaining 8 bits are used as parity-check bits. Specifically, for each i = 1, . . . , 8 the bit k8i is adjusted so that the i-th byte (k8i – 7k8i – 6 . . . k8i) has an odd number of one-bits.
The DES algorithm first computes sixteen 48-bit keys K1, K2, . . . , K16 from K using a procedure known as the DES key schedule described in Algorithm A.1. These 16 keys are used in the 16 rounds of encryption. The key schedule uses two fixed permutations PC1 and PC2 described after Algorithm A.1 and to be read in the row-major order. Here, PC is an abbreviation for permuted choice.
|
Input: A DES key K = k1k2 . . . k64 (containing the parity-check bits). Output: Sixteen 48-bit round keys K1, K2, . . . , K16. Steps: Use PC1 to generate |
| PC1 | ||||||
|---|---|---|---|---|---|---|
| 57 | 49 | 41 | 33 | 25 | 17 | 9 |
| 1 | 58 | 50 | 42 | 34 | 26 | 18 |
| 10 | 2 | 59 | 51 | 43 | 35 | 27 |
| 19 | 11 | 3 | 60 | 52 | 44 | 36 |
| 63 | 55 | 47 | 39 | 31 | 23 | 15 |
| 7 | 62 | 54 | 46 | 38 | 30 | 22 |
| 14 | 6 | 61 | 53 | 45 | 37 | 29 |
| 21 | 13 | 5 | 28 | 20 | 12 | 4 |
| PC2 | |||||
|---|---|---|---|---|---|
| 14 | 17 | 11 | 24 | 1 | 5 |
| 3 | 28 | 15 | 6 | 21 | 10 |
| 23 | 19 | 12 | 4 | 26 | 8 |
| 16 | 7 | 27 | 20 | 13 | 2 |
| 41 | 52 | 31 | 37 | 47 | 55 |
| 30 | 40 | 51 | 45 | 33 | 48 |
| 44 | 49 | 39 | 56 | 34 | 53 |
| 46 | 42 | 50 | 36 | 29 | 32 |
DES encryption, as described in Algorithm A.2, proceeds in 16 rounds. The i-th round uses the key Ki (obtained from the key schedule) in tandem with the encryption primitive e. A fixed permutation IP and its inverse IP–1 are also used.[2]
[2] A block cipher that executes several encryption rounds with the i-th round computing the two halves as Li := Ri – 1 and Ri := Li – 1 ⊕ e(Ri – 1, Ki) for some round key Ki and for some encryption primitive e, is called a Feistel cipher. Most popular block ciphers mentioned earlier are of this type. Rijndael is an exception, and its acceptance as the new standard has been interpreted as an end of the Feistel dynasty.
It requires a specification of the round encryption function e to complete the description of DES encryption. The function e can be compactly depicted as:
e(X, J) := P(S(E(X) ⊕ J)),
|
Input: Plaintext block m = m1m2 . . . m64 and the round keys K1, . . . , K16. Output: The ciphertext block Steps: Apply the initial permutation on m to get |
| IP | |||||||
|---|---|---|---|---|---|---|---|
| 58 | 50 | 42 | 34 | 26 | 18 | 10 | 2 |
| 60 | 52 | 44 | 36 | 28 | 20 | 12 | 4 |
| 62 | 54 | 46 | 38 | 30 | 22 | 14 | 6 |
| 64 | 56 | 48 | 40 | 32 | 24 | 16 | 8 |
| 57 | 49 | 41 | 33 | 25 | 17 | 9 | 1 |
| 59 | 51 | 43 | 35 | 27 | 19 | 11 | 3 |
| 61 | 53 | 45 | 37 | 29 | 21 | 13 | 5 |
| 63 | 55 | 47 | 39 | 31 | 23 | 15 | 7 |
| IP–1 | |||||||
|---|---|---|---|---|---|---|---|
| 40 | 8 | 48 | 16 | 56 | 24 | 64 | 32 |
| 39 | 7 | 47 | 15 | 55 | 23 | 63 | 31 |
| 38 | 6 | 46 | 14 | 54 | 22 | 62 | 30 |
| 37 | 5 | 45 | 13 | 53 | 21 | 61 | 29 |
| 36 | 4 | 44 | 12 | 52 | 20 | 60 | 28 |
| 35 | 3 | 43 | 11 | 51 | 19 | 59 | 27 |
| 34 | 2 | 42 | 10 | 50 | 18 | 58 | 26 |
| 33 | 1 | 41 | 9 | 49 | 17 | 57 | 25 |
where
is an expansion function,
is a contraction function and P is a fixed permutation of
(called the permutation function). S uses eight S-boxes (substitution boxes) S1, S2, . . . , S8. Each S-box Sj is a 4 × 16 matrix with each row a permutation of 0, 1, 2, . . . , 15 and is used to convert a 6-bit string y1y2y3y4y5y6 to a 4-bit string z1z2z3z4 as follows. Let μ denote the integer with binary representation y1y6 and ν the integer with binary representation y2y3y4y5. Then, z1z2z3z4 is the 4-bit binary representation of the μ, ν-th entry in the matrix Sj. (Here, the numbering of the rows and columns starts from 0.) In this case, we write Sj(y1y2y3y4y5y6) = z1z2z3z4. Algorithm A.3 provides the description of e.
|
Input: Output: e(X, J). Steps: Y := E(X) ⊕ J (where E(x1x2 . . . x32) = x32x1x2 . . . x32x1). |
The tables for E and P are as follows.
| E | |||||
|---|---|---|---|---|---|
| 32 | 1 | 2 | 3 | 4 | 5 |
| 4 | 5 | 6 | 7 | 8 | 9 |
| 8 | 9 | 10 | 11 | 12 | 13 |
| 12 | 13 | 14 | 15 | 16 | 17 |
| 16 | 17 | 18 | 19 | 20 | 21 |
| 20 | 21 | 22 | 23 | 24 | 25 |
| 24 | 25 | 26 | 27 | 28 | 29 |
| 28 | 29 | 30 | 31 | 32 | 1 |
| P | |||
|---|---|---|---|
| 16 | 7 | 20 | 21 |
| 29 | 12 | 28 | 17 |
| 1 | 15 | 23 | 26 |
| 5 | 18 | 31 | 10 |
| 2 | 8 | 24 | 14 |
| 32 | 27 | 3 | 9 |
| 19 | 13 | 30 | 6 |
| 22 | 11 | 4 | 25 |
Finally, the eight S-boxes are presented:
| S1 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 14 | 4 | 13 | 1 | 2 | 15 | 11 | 8 | 3 | 10 | 6 | 12 | 5 | 9 | 0 | 7 |
| 0 | 15 | 7 | 4 | 14 | 2 | 13 | 1 | 10 | 6 | 12 | 11 | 9 | 5 | 3 | 8 |
| 4 | 1 | 14 | 8 | 13 | 6 | 2 | 11 | 15 | 12 | 9 | 7 | 3 | 10 | 5 | 0 |
| 15 | 12 | 8 | 2 | 4 | 9 | 1 | 7 | 5 | 11 | 3 | 14 | 10 | 0 | 6 | 13 |
| S2 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15 | 1 | 8 | 14 | 6 | 11 | 3 | 4 | 9 | 7 | 2 | 13 | 12 | 0 | 5 | 10 |
| 3 | 13 | 4 | 7 | 15 | 2 | 8 | 14 | 12 | 0 | 1 | 10 | 6 | 9 | 11 | 5 |
| 0 | 14 | 7 | 11 | 10 | 4 | 13 | 1 | 5 | 8 | 12 | 6 | 9 | 3 | 2 | 15 |
| 13 | 8 | 10 | 1 | 3 | 15 | 4 | 2 | 11 | 6 | 7 | 12 | 0 | 5 | 14 | 9 |
| S3 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 0 | 9 | 14 | 6 | 3 | 15 | 5 | 1 | 13 | 12 | 7 | 11 | 4 | 2 | 8 |
| 13 | 7 | 0 | 9 | 3 | 4 | 6 | 10 | 2 | 8 | 5 | 14 | 12 | 11 | 15 | 1 |
| 13 | 6 | 4 | 9 | 8 | 15 | 3 | 0 | 11 | 1 | 2 | 12 | 5 | 10 | 14 | 7 |
| 1 | 10 | 13 | 0 | 6 | 9 | 8 | 7 | 4 | 15 | 14 | 3 | 11 | 5 | 2 | 12 |
| S4 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7 | 13 | 14 | 3 | 0 | 6 | 9 | 10 | 1 | 2 | 8 | 5 | 11 | 12 | 4 | 15 |
| 13 | 8 | 11 | 5 | 6 | 15 | 0 | 3 | 4 | 7 | 2 | 12 | 1 | 10 | 14 | 9 |
| 10 | 6 | 9 | 0 | 12 | 11 | 7 | 13 | 15 | 1 | 3 | 14 | 5 | 2 | 8 | 4 |
| 3 | 15 | 0 | 6 | 10 | 1 | 13 | 8 | 9 | 4 | 5 | 11 | 12 | 7 | 2 | 14 |
| S5 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 12 | 4 | 1 | 7 | 10 | 11 | 6 | 8 | 5 | 3 | 15 | 13 | 0 | 14 | 9 |
| 14 | 11 | 2 | 12 | 4 | 7 | 13 | 1 | 5 | 0 | 15 | 10 | 3 | 9 | 8 | 6 |
| 4 | 2 | 1 | 11 | 10 | 13 | 7 | 8 | 15 | 9 | 12 | 5 | 6 | 3 | 0 | 14 |
| 11 | 8 | 12 | 7 | 1 | 14 | 2 | 13 | 6 | 15 | 0 | 9 | 10 | 4 | 5 | 3 |
| S6 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | 1 | 10 | 15 | 9 | 2 | 6 | 8 | 0 | 13 | 3 | 4 | 14 | 7 | 5 | 11 |
| 10 | 15 | 4 | 2 | 7 | 12 | 9 | 5 | 6 | 1 | 13 | 14 | 0 | 11 | 3 | 8 |
| 9 | 14 | 15 | 5 | 2 | 8 | 12 | 3 | 7 | 0 | 4 | 10 | 1 | 13 | 11 | 6 |
| 4 | 3 | 2 | 12 | 9 | 5 | 15 | 10 | 11 | 14 | 1 | 7 | 6 | 0 | 8 | 13 |
| S7 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 11 | 2 | 14 | 15 | 0 | 8 | 13 | 3 | 12 | 9 | 7 | 5 | 10 | 6 | 1 |
| 13 | 0 | 11 | 7 | 4 | 9 | 1 | 10 | 14 | 3 | 5 | 12 | 2 | 15 | 8 | 6 |
| 1 | 4 | 11 | 13 | 12 | 3 | 7 | 14 | 10 | 15 | 6 | 8 | 0 | 5 | 9 | 2 |
| 6 | 11 | 13 | 8 | 1 | 4 | 10 | 7 | 9 | 5 | 0 | 15 | 14 | 2 | 3 | 12 |
| S8 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 2 | 8 | 4 | 6 | 15 | 11 | 1 | 10 | 9 | 3 | 14 | 5 | 0 | 12 | 7 |
| 1 | 15 | 13 | 8 | 10 | 3 | 7 | 4 | 12 | 5 | 6 | 11 | 0 | 14 | 9 | 2 |
| 7 | 11 | 4 | 1 | 9 | 12 | 14 | 2 | 0 | 6 | 10 | 13 | 15 | 3 | 5 | 8 |
| 2 | 1 | 14 | 7 | 4 | 10 | 8 | 13 | 15 | 12 | 9 | 0 | 3 | 5 | 6 | 11 |
DES decryption is analogous to DES encryption. To obtain
one first computes the round keys K1, K2, . . . , K16 using Algorithm A.1. One then calls a minor variant of Algorithm A.2. First, the roles of m and c are interchanged. That is, one inputs c instead of m, and obtains m in place of c as output. Moreover, the right half Ri in the i-th round is computed as Ri := Li – 1 ⊕ e(Ri – 1, K17 – i). In other words, DES decryption is same as DES encryption, only with the sequence of using the keys K1, K2, . . . , K16 reversed. Solve Exercise A.1 in order to establish the correctness of this decryption procedure.
Some test vectors for DES are given in Table A.2.
| Key | Plaintext block | Ciphertext block |
|---|---|---|
| 0101010101010101 | 0000000000000000 | 8ca64de9c1b123a7 |
| fefefefefefefefe | ffffffffffffffff | 7359b2163e4edc58 |
| 3101010101010101 | 1000000000000001 | 958e6e627a05557B |
| 1010101010101010 | 1111111111111111 | f40379ab9e0ec533 |
| 0123456789abcdef | 1111111111111111 | 17668dfc7292532d |
| 1010101010101010 | 0123456789abcdef | 8a5ae1f81ab8f2dd |
| fedcba9876543210 | 0123456789abcdef | ed39d950fa74bcc4 |
DES, being a popular block cipher, has gone through a good amount of cryptanalytic studies. At present, linear cryptanalysis and differential cryptanalysis are the most sophisticated attacks on DES. But the biggest problem with DES is its relatively small key size (56 bits). An exhaustive key search for a given plaintext–ciphertext pair needs carrying out a maximum of 256 encryptions in order to obtain the correct key. But how big is this number 256 = 72,057,594,037,927,936 (nearly 72 quadrillion) in a cryptographic sense?
In order to review this question, RSA Security Inc. posed several challenges for obtaining the DES key from given plaintext–ciphertext pairs. The first challenge, posed in January 1997, was broken by Rocke Verser of Loveland, Colorado, with approximately 96 days of computing. DES Challenge II-1 was broken in February 1998 by distributed.net with 41 days of computing, and the DES challenge II-2 was cracked in July 1998 by the Electronic Frontier Foundation (EFF) in just 56 hours. Finally, DES Challenge III was broken in a record of 22 hours 15 minutes in January 1999. The computations were carried out in EFF’s supercomputer Deep Crack with collaborative efforts from nearly 105 PCs on the Internet guided by distributed.net. These figures demonstrate that DES offers hardly any security against a motivated adversary.
Another problem with DES is that its design criteria (most importantly, the objectives behind choosing the particular S-boxes) were never made public. Chances remain that there are hidden backdoors, though none has been discovered till date.
The advanced encryption standard (AES) [219] has superseded the older standard DES. The Rijndael cipher designed by Daemen and Rijmen has been accepted as the advanced standard. As mentioned in Footnote 2, Rijndael is not a Feistel cipher. Its working is based on the arithmetic in the finite field
and in the finite ring
.
AES encrypts data in blocks of 128 bits. Let B = b0b1 . . . b127 be a block of data, where each bi is a bit. Keeping in view typical 32-bit processors, each such block B is represented as a sequence of four 32-bit words, that is, B = B0B1B2B3, where Bi represents the bit string b32ib32i+1 . . . b32i+31. Each word C = c0c1 . . . c31, in turn, is viewed as a sequence of four octets, that is, C = C0C1C2C3, where Ci stores the bit string c8ic8i+1 . . . c8i+7. Each octet is identified as an element of
, whereas an entire 32-bit word is identified with an element of
.
The field
is represented as
, where f(X) is the irreducible polynomial X8 + X4 + X3 + X + 1. Let x := X + 〈f(X)〉. The element
is identified with the octet d7d6 . . . d1d0. Thus, the i-th octet c8ic8i+1 . . . c8i+7 in a word is treated as the finite field element
.
Now, let us explain the interpretation of a 32-bit word C = C0C1C2C3. The
-algebra
is not a field, since the polynomial Y4 + 1 is reducible (over
and so over
). However, each element β of A can be uniquely expressed as a polynomial β = α3y3 + α2y2 + α1y + α0, where y := Y + 〈Y4 + 1〉 and where each αi is an element of
. As described in the last paragraph, each αi is represented as an octet. We take Ci to be the octet representing α3 – i, that is, the 32-bit word α3α2α1α0 stands for the element
.
and A are rings and hence equipped with arithmetic operations (addition and multiplication). These operations are different from the usual addition and multiplication operations defined on octets and words. For example, the addition of two octets or words under the AES interpretation is the same as bit-wise XOR of octets or words. The AES multiplication of octets and words, on the other hand, involves polynomial arithmetic and reduction modulo the defining polynomials and so cannot be expressed so simply as addition. To resolve ambiguities, let us plan to denote the multiplication of
by ⊙ and that of A by ⊗, whereas regular multiplication symbols (·, × and juxtaposition) stand for the standard multiplication on octets or words. Exercises A.5, A.6 and A.7 discuss about efficient implementations of the arithmetic in
and A.
Every non-zero element
is invertible; the inverse is denoted by α–1 and can be computed by the extended gcd algorithm on polynomials over
. With an abuse of notation, we take 0–1 := 0. Every non-zero element of A is not invertible (under the multiplication of A). The AES algorithm uses the following invertible element β := 03010102 (in hex notation); its inverse is β–1 = 0b0d090e.
The AES algorithm uses an object called a state, comprising 16 octets arranged in a 4 × 4 array. Each message block also consists of 16 octets. Let M = μ0μ1 . . . μ15 be a message block (of 16 octets). This block is translated to a state as follows:
Equation A.1

Thus, each word in the block is relocated in a column of the state. At the end of the encryption procedure, AES makes the reverse translation of a state to a block:
Equation A.2

A collection of round keys is generated from the given AES key K. The number of rounds of the AES encryption algorithm depends on the size of the key. Let us denote the number of words in the AES key by Nk and the corresponding number of rounds by Nr. We have:

One first generates an initial 128-bit key K0K1K2K3. Subsequently, for the i-th round, 1 ≤ i ≤ Nr, a 128-bit key K4iK4i+1K4i+2K4i+3 is required. Here, each Kj is a 32-bit word. The key schedule (also called key expansion) generates a total of 4(Nr + 1) words K0, K1, . . . , K4Nr+3 from the given secret key K using a procedure described in Algorithm A.4. Here, (02)j – 1 stands for the octet that represents the element
. The following table summarizes these values for j = 1, 2, . . . , 15.
| j | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| xj – 1 | 01 | 02 | 04 | 08 | 10 | 20 | 40 | 80 | 1b | 36 | 6c | d8 | ab | 4d | 9a |
The transformation SubWord on a word T = τ0τ1τ2τ3 is the octet-wise application of AES S-box substitution SubOctet, that is,
SubWord(T) = SubOctet(τ0) ‖ SubOctet(τ1) ‖ SubOctet(τ2) ‖ SubOctet(τ3).
|
Input: (Nk and) the secret key K = κ0κ1 ... κ4Nk – 1, where each κi is an octet. Output: The expanded keys K0, K1, . . . , K4Nr+3. Steps: /* Initially copy the bytes of K */ |
The transformation SubOctet is also used in each encryption round and is now described. Let A = a0a1 . . . a7 be an octet that can be identified with an element of
as mentioned earlier. Let B = b0b1 . . . b7 denote the octet representing the inverse of this finite field element. (We take 0–1 = 0.) One then applies the following affine transformation on B to generate the final value C := SubOctet(A) := c0c1 . . . c7. Here, D = d0d1 . . . d7 is the constant octet 63 = 01100011.
Equation A.3

In order to speed up this octet substitution, one may use table lookup. Since the output octet C depends only on the input octet A, one can precompute a table of values of SubOctet(A) for the 256 possible values of A. This list is given in Table A.3. The table is to be read in the row-major fashion. In other words, if hi and lo respectively represent the most and the least significant four bits of A, then SubOctet(A) can be read off from the entry in the table having row number hi and column number lo. For example, SubOctet(a7) = 5c. In an actual implementation, a one-dimensional array is to be used. We use a two-dimensional format in Table A.3 for the sake of clarity of presentation.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f | |
| 0 | 63 | 7c | 77 | 7b | f2 | 6b | 6f | c5 | 30 | 01 | 67 | 2b | fe | d7 | ab | 76 |
| 1 | ca | 82 | c9 | 7d | fa | 59 | 47 | f0 | ad | d4 | a2 | af | 9c | a4 | 72 | c0 |
| 2 | b7 | fd | 93 | 26 | 36 | 3f | f7 | cc | 34 | a5 | e5 | f1 | 71 | d8 | 31 | 15 |
| 3 | 04 | c7 | 23 | c3 | 18 | 96 | 05 | 9a | 07 | 12 | 80 | e2 | eb | 27 | b2 | 75 |
| 4 | 09 | 83 | 2c | 1a | 1b | 6e | 5a | a0 | 52 | 3b | d6 | b3 | 29 | e3 | 2f | 84 |
| 5 | 53 | d1 | 00 | ed | 20 | fc | b1 | 5b | 6a | cb | be | 39 | 4a | 4c | 58 | cf |
| 6 | d0 | ef | aa | fb | 43 | 4d | 33 | 85 | 45 | f9 | 02 | 7f | 50 | 3c | 9f | a8 |
| 7 | 51 | a3 | 40 | 8f | 92 | 9d | 38 | f5 | bc | b6 | da | 21 | 10 | ff | f3 | d2 |
| 8 | cd | 0c | 13 | ec | 5f | 97 | 44 | 17 | c4 | a7 | 7e | 3d | 64 | 5d | 19 | 73 |
| 9 | 60 | 81 | 4f | dc | 22 | 2a | 90 | 88 | 46 | ee | b8 | 14 | de | 5e | 0b | db |
| a | e0 | 32 | 3a | 0a | 49 | 06 | 24 | 5c | c2 | d3 | ac | 62 | 91 | 95 | e4 | 79 |
| b | e7 | c8 | 37 | 6d | 8d | d5 | 4e | a9 | 6c | 56 | f4 | ea | 65 | 7a | ae | 08 |
| c | ba | 78 | 25 | 2e | 1c | a6 | b4 | c6 | e8 | dd | 74 | 1f | 4b | bd | 8b | 8a |
| d | 70 | 3e | b5 | 66 | 48 | 03 | f6 | 0e | 61 | 35 | 57 | b9 | 86 | c1 | 1d | 9e |
| e | e1 | f8 | 98 | 11 | 69 | d9 | 8e | 94 | 9b | 1e | 87 | e9 | ce | 55 | 28 | df |
| f | 8c | a1 | 89 | 0d | bf | e6 | 42 | 68 | 41 | 99 | 2d | 0f | b0 | 54 | bb | 16 |
AES encryption is described in Algorithm A.5. The algorithm first converts the input plaintext message block to a state, applies a series of transformations on this state and finally converts the state back to a message (the ciphertext).
The individual state transition transformations are now explained. The transition SubState is an octet-by-octet application of the substitution function SubOctet, that is, SubState maps

where
for all r, c. The transform ShiftRows cyclically left rotates the r-th row by r byte positions, that is, maps

The AddKey operation uses four 32-bit round keys L0, L1, L2, L3. Name the octets of Li as λi0λi1λi2λi3. The i-th key Li is XORed with the i-th column of the state, that is, AddKey transforms

Finally, the MixCols transform multiplies each column of the state, regarded as an element of
, by the element
, where the coefficients (expressions within square brackets) are octet values in hexadecimal, that can be identified with elements of
. For the c-th column, this transformation can be represented as:

|
Input: The plaintext message M = μ0μ1 . . . μ15 and the round keys K0, K1, . . . , K4Nr+3. Output: Ciphertext message C = γ0γ1 . . . γ15. Steps: Convert M to the state S. /* Use Transform (A.1) */ |
AES decryption involves taking inverse of each state transition performed during encryption. The key schedule needed for encryption is to be used during decryption too. The straightforward decryption routine is given in Algorithm A.6.
|
Input: The ciphertext message C = γ0γ1 . . . γ15 and the round keys K0, K1, . . . , K4Nr+3. Output: The recovered plaintext message M = μ0μ1 . . . μ15. Steps: Convert C to the state S. /* Use Transform (A.1) */ |
What remains is a description of the inverses of the basic state transformations. AddKey involves octet-by-octet XORing and so is its own inverse. Table A.4 summarizes the inverse of the substitution transition SubOctet (Exercise A.8). For computing SubState–1(S), one should apply SubOctet–1 on each octet of S. The inverse of ShiftRows is also straightforward and can be given by

Finally, MixCols–1 involves multiplication of each column by the inverse of the element
, that is, by the element [0b]y3 + [0d]y2 + [09]y + [0e]. So MixCols–1 transforms each column of the state as follows:

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f | |
| 0 | 52 | 09 | 6a | d5 | 30 | 36 | a5 | 38 | bf | 40 | a3 | 9e | 81 | f3 | d7 | fb |
| 1 | 7c | e3 | 39 | 82 | 9b | 2f | ff | 87 | 34 | 8e | 43 | 44 | c4 | de | e9 | cb |
| 2 | 54 | 7b | 94 | 32 | a6 | c2 | 23 | 3d | ee | 4c | 95 | 0b | 42 | fa | c3 | 4e |
| 3 | 08 | 2e | a1 | 66 | 28 | d9 | 24 | b2 | 76 | 5b | a2 | 49 | 6d | 8b | d1 | 25 |
| 4 | 72 | f8 | f6 | 64 | 86 | 68 | 98 | 16 | d4 | a4 | 5c | cc | 5d | 65 | b6 | 92 |
| 5 | 6c | 70 | 48 | 50 | fd | ed | b9 | da | 5e | 15 | 46 | 57 | a7 | 8d | 9d | 84 |
| 6 | 90 | d8 | ab | 00 | 8c | bc | d3 | 0a | f7 | e4 | 58 | 05 | b8 | b3 | 45 | 06 |
| 7 | d0 | 2c | 1e | 8f | ca | 3f | 0f | 02 | c1 | af | bd | 03 | 01 | 13 | 8a | 6b |
| 8 | 3a | 91 | 11 | 41 | 4f | 67 | dc | ea | 97 | f2 | cf | ce | f0 | b4 | e6 | 73 |
| 9 | 96 | ac | 74 | 22 | e7 | ad | 35 | 85 | e2 | f9 | 37 | e8 | 1c | 75 | df | 6e |
| a | 47 | f1 | 1a | 71 | 1d | 29 | c5 | 89 | 6f | b7 | 62 | 0e | aa | 18 | be | 1b |
| b | fc | 56 | 3e | 4b | c6 | d2 | 79 | 20 | 9a | db | c0 | fe | 78 | cd | 5a | f4 |
| c | 1f | dd | a8 | 33 | 88 | 07 | c7 | 31 | b1 | 12 | 10 | 59 | 27 | 80 | ec | 5f |
| d | 60 | 51 | 7f | a9 | 19 | b5 | 4a | 0d | 2d | e5 | 7a | 9f | 93 | c9 | 9c | ef |
| e | a0 | e0 | 3b | 4d | ae | 2a | f5 | b0 | c8 | eb | bb | 3c | 83 | 53 | 99 | 61 |
| f | 17 | 2b | 04 | 7e | ba | 77 | d6 | 26 | e1 | 69 | 14 | 63 | 55 | 21 | 0c | 7d |
AES decryption is as efficient as AES encryption, since each state transformation primitive has the same structure as its inverse. However, the sequence of application of these primitives in the loop (rounds) for decryption differs from that for encryption. For some implementations, mostly in hardware, this may be a problem. Compare this with DES for which the encryption and decryption algorithms are identical save the sequence of using the round keys (Exercise A.1). With little additional effort AES can also be furnished with this useful property of DES. All we have to do is to use a different key schedule for decryption. The necessary modifications are explored in Exercise A.9.
Table A.5 provides the ciphertexts for the plaintext block
M = 00112233445566778899aabbccddeeff
under different keys.
| Cipher | Key | Ciphertext block |
|---|---|---|
| AES-128 | 0001020304050607 \ 08090a0b0c0d0e0f | 69c4e0d86a7b0430 \ d8cdb78070b4c55a |
| AES-192 | 0001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617 | dda97ca4864cdfe0 \ 6eaf70a0ec0d7191 |
| AES-256 | 0001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617 \ 18191a1b1c1d1e1f | 8ea2b7ca516745bf \ eafc49904b496089 |
AES has been designed so that linear and differential attacks are infeasible. Another attack known as the square attack has been proposed by Lucks [184] and Ferguson et al. [93], but at present can tackle less number of rounds than used in Rijndael encryption. Also see Gilbert and Minier [112] to know about the collision attack.
The distinct algebraic structure of AES encryption invites special algebraic attacks. One such potential attack (the XSL attack) has been proposed by Courtois and Pieprzyk [68]. Although this attack has not yet been proved to be effective, a better understanding of the algebra may, in foreseeable future, lead to disturbing consequences for the advanced standard.
For more information on AES, read the book [71] from the designers of the cipher. Also visit the following Internet sites:
| http://www.esat.kuleuven.ac.be/~rijmen/rijndael/ | Rijndael home |
| http://csrc.nist.gov/CryptoToolkit/aes/index1.html | NIST site for AES |
| http://www.cryptosystem.net/aes/ | Algebraic attacks |
Multiple encryption presents a way to achieve a desired level of security by using block ciphers of small key sizes. The idea is to cascade several stages of encryption and/or decryption, with different stages working under different keys. Figure A.1 illustrates double and triple encryption for a block cipher f. Each gi or hj represents either the encryption or the decryption function of f under the given key.

For double encryption, we have K1 ≠ K2 and both g1 and g2 are usually the encryption function. Unless fK2 ο fK1 is the same as fK for some key K and if the permutations of f are reasonably random, it appears at the first glance that double encryption increases the effective key size by a factor of two. Unfortunately, this is not the case. The meet-in-the-middle attack on double encryption works as follows.
Suppose that an adversary knows a plaintext–ciphertext pair (m, c) under the unknown keys K1, K2. We assume as before that f has block-size n and key-size r. The adversary computes for each possibility of
the encrypted message xi := fi(m). She also computes for each
the decrypted message
. Now, (i, j) is a possible value of (K1, K2) if and only if
.
A given pair (m, c) usually gives many such candidates (i, j) for (K1, K2). More precisely, if each
is assumed to be a random permutation of
, for a given i we have the equality
for an expected number of 2r/2n values of j. Considering all possibilities for i gives an expected number of 2r × 2r/2n = 22r – n candidate pairs (i, j). If f = DES, this number is 22 × 56–64 = 248.
If a second pair (m′, c′) under (K1, K2) is also known to the adversary, then for a given i the pair (i, j) is consistent with both (m, c) and (m′, c′) with probability 2r/(2n × 2n). Thus, we get an expected number of (2r × 2r)/(2n × 2n) = 22r – 2n candidates (i, j). For DES, this number is 2–16. This implies that it is very unlikely that a false candidate (i, j) satisfies both (m, c) and (m′, c′). Thus, with high probability the adversary uniquely identifies the double DES key (K1, K2) from two plaintext–ciphertext pairs.
This attack calls for O(2r) encryptions and O(2r) decryptions. With the assumption that each encryption takes roughly the same time as each decryption (as in the case of DES), the adversary spends a time for O(2r) encryptions. Moreover, she can find all the matches
in O(r2r) time. This implies that double encryption increases the effective key size (over single encryption) by a few bits only. On the other hand, both the actual key size and the encryption time get doubled. In view of these shortcomings, double encryption is rarely used in practice.
For the triple encryption scheme of Figure A.1, a meet-in-the-middle attack at x or y demands an effort equivalent to O(22r) encryptions, that is, the effective key size gets doubled. It is, therefore, customary to take K1 = K3 and K2 different from this common value. The actual key size also gets doubled with this choice—one doesn’t have to remember K3 separately. It is also a common practice to take h1 and h3 the encryption function (under K1 = K3) and h2 the decryption function (under K2). One often calls this particular triple encryption an E-D-E scheme.
In practice, the length of the message m to be encrypted need not equal the block length n of the block cipher f. One then has to break up m into blocks of some fixed length n′ ≤ n and encrypt each block using the block cipher. In order to make the length of m an integral multiple of n′, one may have to pad extra bits to m (say, zero bits at the end). It is often necessary to store the initial size of m in a separate block, say, after the last message block. In what follows, we shall assume that the input message m gives rise to l blocks m1, m2, . . . , ml each of size n′. The corresponding ciphertext blocks c1, c2, . . . , cl will also be of bit length n′ each. The reason for choosing the block size n′ ≤ n will be clear soon.
The easiest way to encrypt multiple blocks m1, . . . , ml is to take n′ = n and encrypt each block mi as ci := fK(mi). Decryption is analogous:
. This mode of operation of a block cipher is called the electronic code-book or the ECB mode. Algorithms A.7 and A.8 describe this mode.
|
Input: The plaintext blocks m1, . . . , ml and the key K. Output: The ciphertext c = c1 . . . cl. Steps: for i = 1, . . . , l { ci := fK(mi) } |
|
Input: The ciphertext blocks c1, . . . , cl and the key K. Output: The plaintext m = m1 . . . ml. Steps: for |
In this mode, identical message blocks encrypt to identical ciphertext blocks (under the same key), that is, partial information about the plaintext may be leaked out. The following three modes overcome this problem.
In the cipher-block chaining or the CBC mode, one takes n′ = n and each plaintext block is first XOR-ed with the previous ciphertext block and then encrypted. In order to XOR the first plaintext block, one needs an n-bit initialization vector (IV). The IV need not be kept secret and may be sent along with the ciphertext blocks.
|
Input: The plaintext blocks m1, . . . , ml, the key K and the IV. Output: The ciphertext c = c1 . . . cl. Steps: c0 := IV. for i = 1, . . . , l { ci := fK(mi ⊕ ci – 1). } |
|
Input: The ciphertext blocks c1, . . . , cl, the key K and the IV. Output: The plaintext m = m1 . . . ml. Steps: c0 := IV. for |
In the cipher feedback or the CFB mode, one chooses
. In this mode, the plaintext blocks are not encrypted, but masked by XOR-ing with a stream of random keys generated from a (not necessarily secret) n-bit IV. In this sense, the CFB mode works like a stream cipher (see Section A.3).
|
Input: The plaintext blocks m1, . . . , ml, the key K and the IV. Output: The ciphertext c = c1 . . . cl. Steps: k0 := IV. /* Initialize the key stream */ |
Algorithm A.11 explains CFB encryption. The notation msbk(z) (resp. lsbk(z)) stands for the most (resp. least) significant k bits of a bit string z. For CFB decryption (Algorithm A.12), the identical key stream k0, k1, . . . , kl is generated and used to mask off the message blocks from the ciphertext blocks.
|
Input: The ciphertext blocks c1, . . . , cl, the key K and the IV. Output: The plaintext m = m1 . . . ml. Steps: k0 := IV. |
The output feedback or the OFB mode also works like a stream cipher by masking the plaintext blocks using a stream of keys. The key stream in the OFB mode is generated by successively applying the block encryption function on an n-bit (not necessarily secret) IV. Here, one chooses any
.
OFB encryption is explained in Algorithm A.13. OFB decryption (Algorithm A.14) is identical, with only the roles of m and c interchanged, and requires the generation of the same key stream k0, k1, . . . , kl used during encryption.
|
Input: The plaintext blocks m1, . . . , ml, the key K and the IV. Output: The ciphertext c = c1 . . . cl. Steps: k0 := IV. /* Initialize the key stream */ |
|
Input: The ciphertext blocks c1, . . . , cl, the key K and the IV. Output: The plaintext m = m1 . . . ml. Steps: k0 := IV. /* Initialize the key stream */ |
| A.1 | Let us use the notations of Algorithm A.2. For a message m and round keys Ki, we have the values V, Li, Ri, W, c. For another message m′ and another set of round keys , let us denote these values by V′, , , W′, c′. Show that if m′ = c and if for i = 1, . . . , 16, then and for all i = 0, 1, . . . , 16. Deduce that in this case we have c′ = m. (This shows that DES decryption is the same as DES encryption with the key schedule reversed.)
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.2 | For a bit string z, let denote the bit-wise complement of z. Deduce that , that is, complementing both the plaintext message and the key complements the ciphertext message. [H]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.3 | A DES key K is said to be weak, if the DES key schedule on K gives K1 = K2 = · · · = K16. Show that there are exactly four weak DES keys which in hexadecimal notation are:
0101 0101 0101 0101 FEFE FEFE FEFE FEFE 1F1F 1F1F 0E0E 0E0E E0E0 E0E0 F1F1 F1F1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.4 | A DES key K is said to be anti-palindromic, if the DES key schedule on K gives for all i = 1, . . . , 16. Show that the following four DES keys (in hexadecimal notation) are anti-palindromic:
01FE 01FE 01FE 01FE FE01 FE01 FE01 FE01 1FE0 1FE0 0EF1 0EF1 E01F E01F F10E F10E | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.5 | Represent , where f(X) = X8 + X4 + X3 + X + 1 (Section A.2.2).
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.6 | The multiplication of can be made table-driven. Since this field contains 256 elements, a 256 × 256 array suffices to store all the products. That requires a storage of 64 kb. We can considerably reduce the storage by using discrete logs.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.7 | Denote the multiplication of by ⊗ (Section A.2.2).
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.9 |
Algorithm A.15. Equivalent form of AES decryption
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.10 | Show that a multiple encryption scheme with exactly k stages provides an effective security of ⌈k/2⌉ keys against the meet-in-the-middle attack. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.11 | Consider a message m broken into blocks m1, . . . , ml, encrypted to c1, . . . , cl and sent to an entity.
|
A block cipher encrypts large blocks of data using a fixed key. A stream cipher, on the other hand, encrypts small blocks of data (typically bits or bytes) using different keys. The security of a stream cipher stems from the unpredictability of guessing the keys in the key stream. Here, we deal with stream ciphers that encrypt bit-by-bit.
|
A stream cipher F encrypts a plaintext m = m1m2 . . . ml to a ciphertext c = c1c2 . . . cl using a key stream k = k1k2 . . . kl, where each mi, ci, |
|
An obvious choice for fκ is fκ(μ) := μ ⊕ κ, so that
So Pr(ci = 1) is 1/2 too, that is, the two values of ci are equally likely, irrespective of the probability p. This, in turn, implies that the ciphertext bit ci provides absolutely no information about the plaintext bit mi. In this sense, this stream cipher, called Vernam’s one-time pad, offers unconditional security. |
Generating a truly random key stream of arbitrary length is a difficult problem. Moreover, the same key stream is used for decryption and has to be reproduced at the recipient’s end. In view of these difficulties, Vernam’s one-time pad is used only very rarely.
A practical solution is to use a pseudorandom key stream k1, k2, k3, . . . generated from a secret key J of fixed small length. The bits in the pseudorandom stream should be sufficiently unpredictable and the length of J adequately large, so as to preclude the possibility of mounting a successful attack in feasible time.
Depending on how the key stream is generated from J, stream ciphers can be broadly classified in two categories. In a synchronous stream cipher, each key in the key stream is generated independent of any plaintext or ciphertext bit, whereas in a self-synchronizing (or asynchronous) stream cipher each key in the stream is generated based only on J and a fixed number of previous ciphertext bits. Algorithms A.16 and A.17 explain the workings of these two classes of stream ciphers.
|
Input: The message m = m1m2 . . . ml, the secret key J and a (not necessarily secret) initial state S of the key stream generator. Output: The ciphertext c = c1c2 . . . cl. Steps: s0 := S. /* Initialize the state of the key stream generator */ |
|
Input: The message m = m1m2 . . . ml, the secret key J and a (not necessarily secret) initial state (c–t+1, c–t+2, . . . , c0). Output: The ciphertext c = c1c2 . . . cl. Steps: for i = 1, . . . , l { |
A block cipher in the OFB mode works like a synchronous stream cipher, whereas a block cipher in the CFB mode like an asynchronous stream cipher.
Linear feedback shift registers (LFSRs), being suitable for hardware implementation and possessing good cryptographic properties, are widely used as basic building blocks for many stream ciphers. Figure A.2 depicts an LFSR L with d stages or delay elements D0, D1, . . . , Dd–1, each capable of storing one bit. The state of the LFSR is described by the d-tuple s := (s0, s1, . . . , sd–1), where si is the bit stored in Di. It is often convenient to treat s as the column vector (s0 s1 . . . sd–1)t.

There are d control bits a0, a1, . . . , ad–1. The working of the LFSR is governed by a clock. At every clock pulse the bits stored in the delay elements are bit-wise AND-ed with the respective control bits and the AND gate outputs are XOR-ed to obtain the bit sd. The bit s0 stored in D0 is delivered to the output. Finally, for each
the delay element Di sets its stored bit to si+1, that is, the register experiences a right shift by one bit with the feedback bit sd filling up the leftmost delay element.
Thus, a clock pulse changes the state of the LFSR from s := (s0, s1, . . . , sd–1) to t := (t0, t1, . . . , td–1), where s and t are related as:

If s and t are treated as column vectors, this can be compactly represented as
Equation A.4

where the transition matrix ΔL is given by
Equation A.5

When the LFSR L is initialized to a non-zero state, the bit stream output by it can be used as a pseudorandom bit sequence. For a given set of control bits a0, . . . , ad–1, the next state of L is uniquely determined by its previous state only. Since L has only finitely many (2d – 1) non-zero states, the output bit sequence of L must be (eventually) periodic. For cryptographic use, the period of the bit sequence should be as large as possible. If the period is maximum possible, namely 2d – 1, L is called a maximum-length LFSR.
Many properties of the LFSR L can be explained in terms of its connection polynomial defined as:
Equation A.6

For example, assume that a0 = 1, so that deg CL(X) = d. Assume further that CL(X) is irreducible (over
). Consider the extension
of
, represented as
, where
. It turns out that if x is a generator of the cyclic group
, then L is a maximum-length LFSR. In this case, the polynomial CL(X) is called a primitive polynomial of
.[3]
[3] A primitive polynomial defined in this way has nothing to do with a primitive polynomial over a UFD, defined in Exercise 2.54. Mathematicians often go for such multiple definitions of the same terms and phrases.
The bit sequence output by an LFSR L can be used as the key stream k1k2 . . . kl in order encrypt a plaintext stream m1m2 . . . ml to the ciphertext stream c1c2 . . . cl with ci := mi ⊕ ki. The number d of stages in L should be chosen reasonably large and the control bits a0, . . . , ad–1 should be kept secret. The initial state of L may or may not be a secret. For suitable choices of a0, . . . , ad–1, the output sequences from L possess good statistical properties and hence L appears to be an efficient key stream generator.
Unfortunately, such a key stream generator is vulnerable to a known-plaintext attack as follows. Suppose that mi and ci are known for i = 1, 2, . . . , 2d. One can easily compute ki = mi⊕ci for all these i. Let si := (ki, ki+1, . . . , ki+d–1) denote the state of L while outputting ci. By Congruence (A.4), si+1 ≡ ΔLsi (mod 2) for i = 1, 2, . . . , d. Define the d × d matrices S := (s1 s2 . . . sd) and T := (s2 s3 . . . sd+1), where si are treated as column vectors as before. We then have T ≡ ΔLS (mod 2). If S is invertible modulo 2, then ΔL and hence the secret control bits can be easily computed. In order to avoid this known-plaintext attack, one should introduce some non-linearity in the LFSR outputs.
A non-linear combination generator combines the output bits u1, u2, . . . , ur from r LFSRs by a non-linear function
in order to generate the key
. The Geffe generator of Figure A.3 gives a well-known example. It uses the non-linear function
, that is,
(mod 2).

A non-linear filter generator generates the key as k = ψ(s0, s1, . . . , sd–1), where s0, . . . , sd–1 are the bits stored in the delay elements of a single LFSR and where ψ is a non-linear function.
Several other ad hoc schemes can destroy the linearity of an LFSR’s output. The shrinking generator, for example, uses two LFSRs L1 and L2. Both L1 and L2 are simultaneously clocked. If the output of L1 is 1, the output of L2 goes to the key stream, whereas if the output of L1 is 0, the output of L2 is discarded. The resulting key stream is an irregularly (and non-linearly) decimated subsequence of the output sequence of L2.
The non-linear function (
or ψ) eliminates the chance of mounting the straightforward known-plaintext attack described above. However, for polynomial non-linearities certain algebraic attacks are known, for example, see Courtois and Pieprzyk [67, 66].[4] Solving non-linear polynomial equations is usually more difficult than solving linear equations, but ample care should be taken to avoid accidental encounters with easily solvable systems. Complacency is a word ever excluded from a cryptologer’s world.
[4] Visit the Internet site http://www.cryptosystem.net/ for more papers in related areas.
| A.12 | For each of the two classes of stream ciphers (Algorithms A.16, A.17) discuss the effects on decryption of
of a ciphertext bit during transmission. |
| A.13 | Suppose that the LFSR L of Figure A.4 is initialized to the state (1, 0, 0, 0). Derive the sequence of state transitions of the LFSR, and hence determine the output bit sequence of L. Argue that L is a maximum-length LFSR. Verify (according to the definition) that the connection polynomial CL(X) is primitive.
Figure A.4. An LFSR with four stages
|
| A.14 | Let ΔL and CL(X) be as in Equations (A.5) and (A.6). Show that:
|
| A.15 | Let L be an LFSR with connection polynomial CL(X). Further let , , denote a power series[5] over . Show that L generates the (infinite) bit sequence s0, s1, s2, . . . if and only if the product CL(X)S(X) modulo 2 is a polynomial of degree < d.
|
| A.16 | Let σ = s0s1 . . . sd–1 ≠ 00 . . . 0 be a bit string of length d ≥ 1. The linear complexity L(σ) of σ is defined to be the length of the shortest LFSR that generates σ as the leftmost part of its output (after it is initialized to a suitable state). Prove that:
|
| A.17 | Assume that the three LFSR outputs u1, u2, u3 in the Geffe generator are uniformly distributed. Show that Pr(k = u1) = 3/4 = Pr(k = u3). Thus, partial information about the internal details of the Geffe generator is leaked out in the key stream. |
A hash function maps bit strings of any length to bit strings of a fixed length n. For practical uses, hash functions should be easy to compute, that is, computing the hash of x should be doable in time polynomial in the size of x.
Since a hash function H maps an infinite set to a finite set, there must exist pairs (x1, x2) of distinct strings with H(x1) = H(x2). Such a pair is called a collision for H. For cryptographic applications (for example, for generating digital signatures), it should be computationally infeasible to find collisions for hash functions. To elaborate this topic further we mention the following two desirable properties of hash functions used in cryptography.
|
A hash function H is called second pre-image resistant, if it is computationally infeasible[6] to find, for a given bit string x1, a second bit string x2 with H(x1) = H(x2).
|
|
A hash function H is called collision resistant, if it is computationally infeasible to find any two distinct bit strings x1 and x2 with H(x1) = H(x2). |
In order to prevent existential forgery (Exercise 5.15) of digital signatures, hash functions should also be difficult to invert.
|
An n-bit hash function H is called first pre-image resistant (or simply pre-image resistant), if it is computationally infeasible to find, for almost all bit strings y of length n, a bit string x (of any length) such that y = H(x). The qualification almost all in the last sentence was necessary, since one can compute and store the pairs (xi, H(xi)), i = 1, 2, . . . , k, for some small k and for some xi of one’s choice. If the given y turns out to be one of these hash values H(xi), a pre-image of y is easily available. |
A hash function (provably or believably) satisfying all these three properties is called a cryptographic hash function. A hash function having first and second pre-image resistance is often called a one-way hash function. Some authors require both second pre-image resistance and collision resistance to define a collision-resistant hash function, but here we stick to Definitions A.3 and A.4. In what follows, an unqualified use of the phrase hash function indicates a cryptographic hash function.
Most of the properties of a cryptographic hash function are mutually independent. However, we have the following implication.
|
A collision resistant hash function is second pre-image resistant. Proof Let H be a (non-cryptographic) hash function which is not second pre-image resistant. This means that there is an algorithm A that efficiently computes second pre-images, except perhaps for a vanishingly small fraction of inputs. Choose a random bit string x1. The probability that x1 is not a bad input to A is very high and, in that case, A outputs a second pre-image x2 quickly. This gives us an efficient randomized algorithm to compute collisions (x1, x2) for H. |
The converse of Proposition A.1 is not true: A second pre-image resistant hash function need not be collision resistant (Exercise A.19). Also collision resistance (or second pre-image resistance) does not imply first pre-image resistance (Exercise A.20), and first pre-image resistance does not imply second pre-image resistance (Exercise A.21).
A hash function may or may not be used in conjunction with a secret key. An unkeyed hash function is typically used to check the integrity of a message and is often called a modification detection code (MDC). A keyed hash function, on the other hand, is usually employed to authenticate the origin of a message (in addition to verifying the integrity of the message) and so is often called a message authentication code (MAC).
Let us now describe a generic method of constructing hash functions. We start by defining the following basic building block.
|
Let m, |
Since m > n, collisions must exist for F. For cryptographic use, collisions should be difficult to locate. We can define first and second pre-image resistance and collision resistance of compression functions as before.
|
Input: A compression function Output: The hash value H(x). Steps: Let λ be the bit length of x. |
Algorithm A.18 demonstrates how a compression function can be used to design an n-bit hash function H. The input message x is first broken into l ≥ 0 blocks each of bit length r, after padding zero bits, if necessary. The initial bit length λ of x is then stored in a new block. This implies that H cannot handle bit strings of length ≥ 2r. For a reasonably big r, this is not a practical limitation. Storing λ is necessary for several reasons. First, it ensures that the for loop is executed at least once for any message. This prevents the trivial hash value 0r (the bit string of length r containing zero bits only) for the null message. Moreover, if hi = 0r for some
, then, without the length block, we would get H(x1 ‖ . . . ‖ xl) = H(xi+1 ‖ . . . ‖ xl) that leads to a collision for H.
We now show if F possesses the desired properties for use in cryptography, then so does H too.
|
If F is first pre-image resistant, then so is H. Proof Assume that H is not first pre-image resistant, that is, an efficient algorithm A exists to compute x with H(x) = y for most (if not all) |
|
If F is collision resistant, then H is collision resistant (and hence also second pre-image resistant). Proof Given a collision (x, x′) for H, we can find a collision for F with little additional effort. We use the notations of Algorithm A.18 with primed variables for x′. First consider l ≠ l′. But then, in particular, the length blocks xl+1 and Now, suppose that The only case that remains to be treated is |
In order to design cryptographic hash functions, it suffices to design cryptographic compression functions. Block ciphers can be used for that purpose. Let f be a block cipher with block size n and key size r. Take m := n + r and consider the map
that sends x = L ‖ R with
and
to the encrypted bit string fR(L). If fR are assumed to be random permutations of
, the resulting compression function F possesses the desirable properties.
Several custom-designed hash functions have been popularly used by the cryptography community. MD4 and MD5 are somewhat older 128-bit hash functions. Soon after its conception, MD4 was found to be vulnerable to several attacks. Also collisions for the compression function of MD5 are known. Therefore, these two hash functions have lost the desired level of confidence for cryptographic uses.
NIST has proposed a family of four hash algorithms. These algorithms are called secure hash algorithms and have the short names SHA-1, SHA-256, SHA-384 and SHA-512, which respectively produce 160-, 256-, 384- and 512-bit hash values. No collisions for SHA are known till date. In the rest of this section, we explain the SHA-1 algorithm. The workings of the other SHA algorithms are very similar and can be found in the FIPS document [222]. RIPEMD-160 is another popular 160-bit hash function.
SHA-1 (like other custom-designed hash functions mentioned above) is suitable for implementation in 32-bit processors. Suppose that we want to compute the hash SHA-1(M) of a message M of bit length λ. First, M is padded to get the bit string M′ := M ‖ 1 ‖ 0k ‖ Λ, where Λ is the 64-bit representation of λ, and where k is the smallest non-negative integer for which the bit length of M′, that is, λ + 1 + k + 64, is a multiple of 512. M′ is broken into blocks M(1), M(2), . . . , M(l) each of length 512 bits. Each M(i) is represented as a collection of sixteen 32-bit words
, j = 0, 1, . . . , 15. SHA-1 supports big-endian packing, that is,
stores the leftmost 32 bits of M(i),
the next 32 bits of
the rightmost 32 bits of M(i).
The SHA-1 computations are given in Algorithm A.19. One starts with a fixed initial 160-bit hash H(0). Successively for i = 1, 2, . . . , l the i-th message block M(i) is considered and the previous hash value H(i–1) is updated to H(i). At the end of the loop the 160-bit string H(l) is returned as SHA-1(M). Each H(i) is represented by five 32-bit words
, j = 0, 1, 2, 3, 4. Here also, big-endian notation is used, that is,
stores the leftmost 32 bits of H(i), . . . ,
the rightmost 32 bits of H(i).
The updating procedure uses logical functions fj. Here, product (like xy) implies bit-wise AND, bar (as in
) denotes bit-wise complementation and ⊕ denotes bit-wise XOR, each on 32-bit operands. The notation LRk(z) (resp. RRk(z)) stands for a left (resp. right) rotation, that is, a cyclic left (resp. right) shift, of the bit string z of length 32 by k positions.
The bits of H(i) are well-defined transformations of the bits of H(i–1) under the guidance of the bits of M(i). The good amount of non-linearity, introduced by the functions fj and the modulo 232 sums, makes it difficult to invert the transformation H(i–1) ↦ H(i) and thereby makes SHA-1 an (apparently) secure hash function.
|
Output: The hash SHA-1(M) of M. Steps: Generate the message blocks M(i), i = 1, 2, . . . , l. |
A test vector for SHA-1 is the following (here 616263 is the string “abc”):
SHA-1(616263) = a9993e364706816aba3e25717850c26c9cd0d89d.
| A.18 | Let x be a bit string. Break up x into blocks x1, . . . , xl each of bit size n (after padding, if necessary). Define H1(x) := x1 ⊕ . . . ⊕ xl. Show that H1 possesses none of the desirable properties of a cryptographic hash function. |
| A.19 | Let H be an n-bit cryptographic hash function and S a finite set of strings with #S ≥ 2. Define the function . Here, 0n+1 refers to a bit string of length n + 1 containing zero-bits only. Show that H2 is second pre-image resistant, but not collision resistant. [H]
|
| A.20 | Let H be an n-bit cryptographic hash function. Show that the function H3 defined as is collision resistant (and hence second pre-image resistant), but not first pre-image resistant. [H]
|
| A.21 | Let m be a product of two (unknown) big primes and let the binary representation of m (with leading one-bit) have n bits. Assume that it is computationally infeasible to compute square roots modulo m. We can identify bit strings with integers in a natural way. For a bit string x, take y := 1 ‖ x and let H4(x) denote the n-bit binary representation of y2 (mod m). Show that H4 is first pre-image resistant, but not second pre-image resistant (and hence not collision-resistant). [H] |
| A.22 | Let H be an n-bit cryptographic hash function. Assume that H produces random hash values on random input strings. Prove that O(2n/2) hash values need to be computed to detect a collision for H with high probability. [H] Deduce also that nearly 2n–1 hash values need to be computed on an average to obtain a second pre-image x′ of H(x). |
| A.23 | Let be a collision resistant compression function.
|
| A.24 |
|
| A.25 | Assume that in the SHA-1 algorithm the designers opted for Algorithm A.19 with the following minor modifications: They defined fj as fj(x, y, z) := x ⊕ y ⊕ z for all and they replaced all costly mod 232 addition operations (+) by cheap bit-wise XOR operations (⊕). Do you sense anything wrong with this design? [H]
|
| B.1 | Introduction |
| B.2 | Security Issues in a Sensor Network |
| B.3 | The Basic Bootstrapping Framework |
| B.4 | The Basic Random Key Predistribution Scheme |
| B.5 | Random Pairwise Scheme |
| B.6 | Polynomial-pool-based Key Predistribution |
| B.7 | Matrix-based Key Predistribution |
| B.8 | Location-aware Key Predistribution |
One of the keys to happiness is a bad memory.
—Rita Mae Brown
That theory is worthless. It isn’t even wrong!
—Wolfgang Pauli
You’re only as sick as your secrets.
—Anonymous
Public-key cryptography is not a solution to every security problem. Asymmetric routines are bulky and slow, and, in practice, augment symmetric cryptography by eliminating the need for prior secret establishment of keys between communicating parties. On a workstation of today’s computing technology, this is an interesting and acceptable breakthrough. A 1 GHz processor runs one public-key encryption or key-exchange primitive in tens to hundreds of milliseconds, using at least hundreds of kilobytes of memory. That is reasonable for most applications, given that the routines are invoked rather infrequently.
Now, imagine a situation, where many tiny computing nodes, called sensor nodes, are scattered in an area for the purpose of sensing some data and transmitting the data to nearby base stations for further processing. This transmission is done by short-range radio communications. The base stations are assumed to be computationally well-equipped, but the sensor nodes are resource-starved. Such networks of sensor nodes are used in many important applications including tracking of objects in an enemy’s area for military purposes and scientific, engineering and medical explorations like wildlife monitoring, distributed seismic measurement, pollution tracking, monitoring fire and nuclear power plants and tracking patients. In some cases, mostly for military and medical applications, data collected by sensor nodes need to be encrypted before transmitting to neighbouring nodes and base stations.
Evidently one has to resort to symmetric-key cryptography in order to meet the security needs in a sensor network. Appendix B provides an overview of some key exchange schemes suitable for sensor networks.
Several issues make secure communication in sensor networks different from that in usual networks:
Each sensor node contains a primitive processor featuring very low computing speed and only small amount of programmable memory. The popular Atmel ATmega 128L processor, as an example, has an 8-bit 4 MHz RISC processor and only 128 kbytes of RAM. The processor does not support instructions for multiplying or dividing integers. One requires tens of minutes to several hours for performing a single RSA or Diffie–Hellman exponentiation for cryptographic key sizes.
Each sensor node is battery-powered and is expected to operate for only a few days. Once deployed sensor nodes die, it becomes necessary to add fresh nodes to the network for continuing the data collection operation. This calls for dynamic management of security objects (like keys).
Sensor nodes communicate with each other and the base stations by wireless radio transmission at low bandwidth and over small communication ranges. For the Atmel ATmega 128L processor, the maximum bandwidth is 40 kbps, and the communication range is at most 100 feet (30 m).
Moreover, the deployment area may have irregularities (like physical obstacles) that further limit the communication abilities of the nodes. One, therefore, expects that a deployed sensor node can directly communicate with only few other nodes in the network.
A sensor network is vulnerable to capture of nodes by the enemy. The captured nodes may be physically destroyed or utilized to send misleading signals and/or disrupt the normal activity of the network. As a result, no node should have full trust on the nodes with which it communicates. The relevant security goal in this context is that the captured nodes should not divulge to the enemy enough secrets to jeopardize the communication among the uncaptured nodes.
In many situations (like scattering of nodes from airplanes or trucks), the post-deployment configuration of the sensor network is not known a priori. It is unreasonable to use security algorithms that have strong dependence on locations of nodes in the network. For example, each sensor node u is expected to have only a few neighbours with which it can directly communicate. This is precisely the set of nodes with which u needs to share keys. However, the list cannot be determined before the actual deployment. An approximate knowledge of the locations of the nodes may strengthen the protocols, but robustness for handling run-time variations must be built in the protocols.
Sensor nodes may be static or mobile. Mobile nodes change the network configurations (like the lists of neighbours) as functions of time and call for time-varying security tools.
Still, sensor nodes need to communicate secretly. The clear impracticality of using public-key routines forces one to use symmetric ciphers. But setting up symmetric keys among communicating nodes is a difficult task. The number n of nodes in a sensor network can range up to several hundred thousands. Storing a symmetric key for each pair of nodes is impossible, since that requires each sensor to have a memory large enough to store n – 1 keys. On the other extreme, every communication may use a single network-wide symmetric key. In that case the capture of a single node makes communication over the entire network completely insecure.
The plot thickens. There are graceful ways out. A host of algorithms has been recently proposed to address key establishment issues in sensor networks. In the rest of this appendix, we provide a quick survey of these tools. For the sake of simplicity, we assume here that our sensor network is static, that is, the nodes have no (or negligibly small) mobility. Though the schemes described below may be adapted to mobile networks, the required modifications are not necessarily easy and the current literature does not seem to be ready to take mobility into account.
We continue to deal with sensor processors of the capability of Atmel ATmega 128L. In practice, better processors (with speed, storage and cost roughly one order of magnitude higher) are available. We assume that the size (number of nodes) n of a sensor network is (usually) not bigger than a million, and also that a sensor node has of the order of 100 neighbours in its communication range.
Key establishment in a sensor network is effected by a three-stage process called bootstrapping. Subsequent node-to-node communication uses the keys established during the bootstrapping phase. The three stages of bootstrapping are as follows:
This step is carried out before the deployment of the sensors. A key set-up server chooses a pool
of randomly generated keys and assigns to each sensor node ui a subset
of
. The set
is called the key ring of the node ui. The key predistribution algorithms essentially differ in the ways the sets
and
are selected. Each key
is associated with an ID that need not be kept secret and can even be transmitted in plaintext. Similarly, each sensor node is given a unique ID which need not be maintained secretly.
Immediately after deployment, each sensor node tries to determine all other sensor nodes with which it can communicate directly and secretly. Two nodes that are within the communication ranges of one another are called physical neighbours, whereas two nodes sharing one (or more) key(s) in their key rings are called key neighbours. Two nodes can secretly (and directly) communicate with one another if and only if they are both physical and key neighbours; let us plan to call such pairs direct neighbours.
In the direct key establishment phase, each sensor node u locates its direct neighbours. To that end u broadcasts its own ID and the IDs of the keys in its key ring. Each physical neighbour v of u responds by mentioning the matching key IDs, if any, stored in the key ring of v. This is how u identifies its direct neighbours.
If sending unencrypted key IDs can be a potential threat to the security of the network, each node u can encrypt some plaintext message m by the keys in its ring and broadcasts the corresponding ciphertexts instead of the key IDs. Those physical neighbours of u that can decrypt one of the transmitted ciphertexts using one of the keys in their respective key rings establish themselves as direct neighbours of u.
This is an optional stage and, if executed, adds to the connectivity of the network. Suppose that two physical neighbours u and v fail to establish a direct link between them in the direct key establishment phase. But there exists a path u = u0, u1, u2, . . . , uh–1, uh = v in the network with each ui a direct neighbour of ui+1 (for i = 0, 1, . . . , h – 1). The node u then generates a random key k, encrypts k with the key shared between u and u1 and sends the encrypted key to u1. Subsequently, u1 retrieves k by decryption, encrypts k by the key shared by u1 and u2 and sends this encrypted version of k to u2. This process is repeated until the key k reaches the desired destination v. Now, u and v can communicate secretly and directly using k and thereby become direct neighbours.
The main difficulty in this process is the discovery of a path between u and v. This can be achieved by u initiating a message reflecting its desire to communicate with v. Let u1 be a direct neighbour of u. If u1 is also a direct neighbour of v, a path between u and v is discovered. Else u1 retransmits u’s request to the direct neighbours u2 of u1. This process is repeated, until a path is established between u and v, or the number of hops exceeds a certain limit. Note that path discovery may incur substantial communication overhead and so the maximum number h of hops allowed needs to be fixed at a not-so-big value. Typically, the values h = 2, 3 are recommended.
A bootstrapping algorithm, or more precisely, a key predistribution algorithm must fulfill the following requirements. These requirements often turn out to be mutually contradictory. A key predistribution scheme attempts to achieve suitable trade-offs among them.
Each key ring should be small enough to fit in a sensor node’s memory. Typically 50–200 cryptographic keys (say, 128-bit keys of block ciphers) can be stored in each processor. That number is between n – 1 (a key for each pair) and 1 (a master key for the entire network).
The key rings in different nodes are to be randomly chosen from a big pool, so that there is not enough overlap between the rings of two nodes.
The resulting network should be connected in the sense that the undirected graph G = (V, E) with V comprising the nodes in the network and E containing a link (u, v) if and only if u and v are direct neighbours, must be (or at least with high probability) connected.
Ideally, the capture of any number of nodes must not divulge the secret key(s) between uncaptured direct neighbours. Practically, the fraction of communication links among uncaptured nodes, that are compromised because of node captures, must be small, at least as long as the fraction of nodes that are captured is not too high.
Arbitrarily (but not impractically) big networks should be supported.
One should allow new nodes to join the network at any point of time after the initial deployment, for example, to replenish captured, faulty and dead nodes.
Additional requirements may also be conceived of in order to take curative measures against active attacks and/or faults. However, a study of active attacks and of countermeasures against those is beyond the scope of our treatment here.
There should be a mechanism to detect the presence and identities of dead, malfunctioning and rogue nodes. Here, a rogue node stands for a captured node that is used by the enemy to disrupt the natural working of the network. Active attacks mountable by the enemy include transmission of unauthorized and misleading data across the network, making neighbours always busy and letting them run out of battery sooner than the expected lifetime (sleep deprivation attack), and so on.
Faulty and rogue nodes must be pruned out of the network before they can cause sizeable harm.
Captured nodes can be replicated and the copies deployed by the enemy with the intention that these added nodes outnumber the legitimate nodes and eventually take control of the network. There should be a strategy to detect and cure replication of malicious nodes.
We now concentrate on some concrete realizations of the bootstrapping scheme. The optional third stage (path key establishment) will often be excluded from our discussion, because there are little algorithm-specific issues in this stage.
Before we introduce specific algorithms, let us summarize the notations we are going to use in the rest of this chapter:
| n | = Number of nodes in the sensor network |
| n′ | = (Expected) number of nodes in the physical neighbourhood of each node |
| d | = Degree of connectivity of each node in the key/direct neighbourhood graph |
| Pc | = Global connectivity (a high probability like 0.9999) |
| p′ | = Local connectivity (probability that two physical neighbours share a key) |
| M | = Size of the key pool |
| m | = The size of key ring of each node (in number of cryptographic keys) |
![]() | = The underlying field for the poly-pool and the matrix-pool schemes |
| S | = Size of the polynomial (or matrix) pool |
| s | = Number of polynomial (or matrix) shares in the key ring of each node |
| t | = Degree of a polynomial (or dimension of a matrix) |
| c | = Number of nodes captured |
| Pe | = Probability of successful eavesdropping expressed as a function of c |
The paper [88] by Eschenauer and Gligor is a pioneering research on bootstrapping in sensor networks. Their scheme, henceforth referred to as the EG scheme, is essentially the basic bootstrapping method just described.
The key set-up server starts with a pool
of randomly generated keys. The number M of keys in
is taken to be a small multiple of the network size n. For each sensor node u to be deployed, a random subset of m keys from
is selected and given to u as its key ring. Upon deployment, each node discovers its direct neighbours as specified in the generic description. We now explain how the parameters M, m are to be chosen so as to make the resulting network connected with high probability.
Let us first look at the key neighbourhood graph Gkey on the n sensor nodes in which a link exists between two nodes if and only if these nodes are key neighbours. Let p denote the probability that a link exists between two randomly selected nodes of this graph. A result on random graphs due to Erdös and Rényi indicates that in the limit n → ∞, the probability that Gkey is connected is
Equation B.1

We fix Pc at a high value, say, 0.9999, and express the expected degree of each node in Gkey as
Equation B.2

In practice, we should also bring physical neighbourhood in consideration and look at the direct neighbourhood graph G = Gdirect on the n deployed sensor nodes. In this graph, two nodes are connected by an edge if and only if they are direct neighbours. G is not random, since it depends on the geographical distribution of the nodes in the deployment area. However, we assume that the above result for random graphs continues to hold for G too. In particular, we fix the degree of direct connectivity of each node to be (at least) d and require
Equation B.3

where n′ denotes the expected number of physical neighbours of each node, and where p′ is the probability that two physical neighbours share one or more keys in their key rings
and
. (Pc is often called the global connectivity and p′ the local connectivity.)
For the determination of p′, we first note that there is a total of
key rings of size m that can be chosen from the pool
of size M. For a fixed
, the total number of ways of choosing
such that
does not share a key with Ki is equal to the number of ways of choosing m keys from
. This number is
. It then follows that
Equation B.4

Equations (B.2), (B.3) and (B.4) dictate how the key-pool size M is to be chosen, given the values of n, n′ and m.
|
As a specific numerical example, consider a sensor network with n = 10,000 nodes. For the desired probability Pc = 0.9999 of connectedness of Gkey, we use Equation (B.2) to obtain the desired degree d as d ≥ 18.419. Let us take d = 20. Now, suppose that the expected number of physical neighbours of each deployed node is n′ = 50. By Equation (B.3), we then require p′ = d/n′ = 0.4. Finally, assume that each sensor can hold m = 150 keys in its memory. Equation (B.4) indicates that we should have M ≤ 44,195 in order to ensure p′ ≥ 0.4. In particular, we may take M = 40,000. |
Let us now study the resilience of the EG scheme against node captures. Assume that c nodes are captured at random from the network and that u and v are two uncaptured nodes that are direct neighbours. We compute the probability Pe that an eavesdropper can decipher encrypted communication between u and v based on the knowledge of the keys available from the c captured key rings. Clearly, smaller values of Pe indicate higher resilience against node captures.
Suppose that u and v use the key k for communication between them. Then, Pe is equal to the probability that k resides in one of the key rings of c captured nodes. Since each key ring consists of m keys randomly chosen from a pool of M keys, the probability that a particular key k is not available in a key ring is
and consequently the probability that k does not appear in all of the c compromised key rings is
. Thus, the probability of successful eavesdropping is

|
As in Example B.1, take n = 10,000, n′ = 50, m = 150 and M = 40,000. If c = 100 nodes are captured, the fraction of compromised communication is Pe ≈ 0.313. Thus, a capture of only 100 nodes leads to a compromise of about one-third of the traffic. That is not a satisfactory figure. We need better algorithms. |
Chan et al. [44] propose several modifications of the basic EG scheme in order to improve upon the resilience of the network against node capture. The q-composite scheme, henceforth abbreviated as the qC scheme, is based on the requirement of a bigger overlap of key rings for enabling nodes to communicate.
As in the EG scheme, the key set-up server decides a pool
of M random keys and loads the key ring of each node with a random subset of size m of
. Let the network consist of n nodes.
In the direct key establishment phase, each node u discovers all its physical neighbours that share q or more keys with u, where q is a predetermined system-wide parameter. Those physical neighbours that do so are now called direct neighbours of u. Let v be a direct neighbour of u and let q′ ≥ q be the actual number of keys shared by u and v. Call these keys k1, k2, . . . , kq′. The nodes use the key
k := H(k1‖k2‖ · · · ‖kq′)
for future communication, where ‖ denotes string concatenation and H is a hash function. A pair of physical neighbours that share < q predistributed keys do not communicate directly.
Recall that for the basic EG scheme q = 1 and the key k for communication between direct neighbours is taken to be one shared key instead of a hash value of all shared keys. The motivation behind going for the qC scheme is that requiring a bigger overlap between the key rings of a pair of physical neighbours leads to a smaller probability Pe of successful eavesdropping, since now the eavesdropper has to possess the knowledge of at least q shared keys (not just one). However, the requirement of q (or more) matching keys between communicating nodes restricts the key pool size M more than the EG scheme, and consequently a capture of fewer nodes reveals a bigger fraction of the total key pool
to the eavesdropper. Chan et al. [44] report that the best trade-off is achieved for the value q = 2 or 3.
Let us now derive the explicit expressions for M and Pe. Equations (B.1), (B.2) and (B.3) hold for the qC scheme with the sole exception that now the interpretation of the probability p′ of direct neighbourhood is different. There is a total of
ways of choosing two random key rings of size m from a pool of M keys. Let us compute the number
of such pairs of key rings sharing exactly r keys. First, these shared r keys can be chosen in
ways. Out of the remaining M – r keys, the remaining m – r keys for the first ring can be chosen in
ways. Finally, the remaining m – r keys for the second ring can be chosen in
ways from M – m keys not present in the first ring. Thus, we have


is the equivalent of Equation (B.4) for the qC scheme.
|
As in Example B.1, consider n = 10,000, n′ = 50, m = 150. For d = 20, we require p′ ≥ 0.4. This, in turn, demands M ≤ 16,387 for q = 2 and M ≤ 9,864 for q = 3. Compare these with the requirement M ≤ 44,195 for the EG scheme. |
Let us now calculate the probability Pe of successfully deciphering the communication between two uncaptured nodes u and v, given that c nodes are already captured by the eavesdropper. Let q′ ≥ q be the actual number of keys shared by u and v; this happens with probability
. Each of these common keys is available to the eavesdropper with a probability
. It follows that

|
Let us continue with the network of Examples B.1, B.2 and B.3. The following table summarizes the probabilities Pe for various values of c. For the EG scheme, we take M = 40,000, whereas for the qC scheme, we take M = 16,000 for q = 2 and M = 9,800 for q = 3.
This table indicates that when the number of nodes captured is small, the qC scheme outperforms the EG scheme. However, for large values of c, the effects of smaller values of the key-pool size show up, leading to a poorer performance of the qC schemes compared to the EG scheme. | |||||||||||||||||||||||||||||||||||||||||||||
Another way to improve the resilience of the network against node captures is the multi-path key reinforcement scheme proposed again by Chan et al. [44]. As in the EG scheme, sensor nodes are deployed each with m keys in its key ring chosen randomly from a pool of M keys. Let u and v establish themselves as direct neighbours sharing the key k. Instead of using k itself as the key for future communication, the nodes try to locate several pairwise node-disjoint paths between them. Such a path u = v0, v1, . . . , vl = v consists of pairs of direct neighbours (vi, vi+1) for i = 0, . . . , l – 1. A randomly generated key k′ is then routed securely along the path from u to v.
Assume that r node-disjoint paths between u and v are discovered and the random keys
are transfered securely along these paths. The nodes u and v then use the key

The reason why this scheme improves resilience against node captures is that even if the original k resides in the memory of a captured node, the new key k′ is computable by the adversary if and only if she can obtain all of the r session secrets
. The bigger r is, the more difficult it is for the adversary to eavesdrop on all of the r node-disjoint paths. On the other hand, if the lengths of these paths are large, then the probability of eavesdropping at some links of the paths increases. Moreover, increasing the lengths of the paths incurs bigger communication overhead. The proponents of the scheme recommend only 2-hop multi-path key reinforcement.
We do not go into the details of the analysis of the multi-path key reinforcement scheme, but refer the reader to Chan et al. [44]. We only note that though it is possible to use multi-path key reinforcement for the q-composite scheme, it is not a lucrative option. The smaller size of the key pool for the q-composite scheme tends to nullify the effects of multi-path key reinforcement.
A pairwise key predistribution scheme offers perfect resilience against node captures, that is, the capture of any number c of nodes does not reveal any information about the secrets used by uncaptured nodes. This corresponds to Pe = 0 irrespective of c. This desirable property of the network is achieved by giving each key to the key rings of only two nodes. Moreover, the sharing of a key k between two unique nodes u and v implies that these nodes can authenticate themselves to one another — no other node possesses k and can prove itself as u to v or as v to u.
Pairwise keys can be distributed to nodes in many ways. Now, we deal with random distribution. Let m denote the size of the key ring of each sensor node. For each node u in the network, the key set-up server randomly selects m other nodes v1, . . . , vm and distributes a new random key ki to each of the pairs (u, vi) for i = 1, . . . , m. This distribution mechanism should also ensure that two nodes u, v in the network share at most one key. If k is given to u and v, the set-up server also attaches the ID of v to the copy of k in the key ring of u and the ID of u to the copy of k in the key ring of v.
In the direct key establishment phase, each node u broadcasts its own ID. Each physical neighbour v of u, that finds the ID of u stored against a key in the key ring of v, identifies u as its direct neighbour and also the unique key shared by u and v.
The analysis of the random pairwise scheme is a bit tricky. Here, the global connectivity graph Gkey is m-regular, that is, each node has degree exactly m and we cannot expect to maintain this degree locally too. On the other hand, it is reasonable to assume under a random deployment model that the fraction of nodes with which a given node shares pairwise keys remains the same both locally and globally. More precisely, we equate p′ with p, that is,
Equation B.5

Here, d denotes the desired local degree of a node. Equation (B.2) gives the formula for d in terms of the global connectivity Pc. For Pc = 0.9999, we have d = 16.11 for n = 1,000, d = 18.42 for n = 10,000, d = 20.72 for n = 100,000, and d = 23.03 for n = 1,000,000. That is, the value of d does not depend heavily on n, as long as n ranges over practical values. In particular, one may fix d = 20 (or d = 25 more conservatively) for all applications.
Equation (B.5) implies

This equation reflects the drawback of the random pairwise scheme. The value m is limited by the memory of a sensor node, whereas n′ is dictated by the density of nodes in the deployment area and d can be taken as a constant, and so the network size n is bounded above by the quantity
called the maximum supportable network size. The basic scheme (and its variants) support networks of arbitrarily large sizes, whereas the random pairwise scheme has only limited supports.
|
Take m = 150, n′ = 50 and d = 20. The maximum supportable network size is then |
Since m and d are limited by hard constraints, the only way to increase the maximum supportable network size is to increase the effective size n′ of the physical neighbourhood of a node. The multi-hop range extension strategy accomplishes that. In the direct key establishment phase, each node u broadcasts its ID. Each physical neighbour v of u re-broadcasts the ID of u. Each physical neighbour w of v then re-re-broadcasts the ID of u. This process is continued for a predetermined number r of hops. Any node u′ reachable from u in ≤ r hops and sharing a pairwise key with u can now establish a path of secure communication with u. During a future communication between u and u′, the intermediate nodes in the path simply forward a message encrypted by the pairwise key between u and u′. Using r hops thereby increases the effective radius of physical neighbourhood by a factor of r, and consequently the number of effective neighbours of each node gets multiplied by a factor of r2. Thus, the maximum supportable network size now becomes

For r = 3 and for the parameters of Example B.5, this size now attains a more decent value of 3375.
Increasing r incurs some cost. First, the communication overhead increases quadratically with r. Second, since intermediate nodes in a multi-hop path simply retransmit messages without authentication, chances of specific active attacks at these nodes increase. Large values of r are, therefore, discouraged.
Liu and Ning’s polynomial-pool-based key predistribution scheme (abbreviated as the poly-pool scheme) [181, 183] is based on the idea presented by Blundo et al. [28]. Let
be a finite field with q just large enough to accommodate a symmetric encryption key. For a 128-bit block cipher, one may take q to be smallest prime larger than 2128 (prime field) or 2128 itself (extension field of characteristic 2). Let
be a bivariate polynomial that is assumed to be symmetric, that is, f(X, Y) = f(Y, X). Let t be the degree of f in each of X and Y. A polynomial share of f is a univariate polynomial f(α)(X) := f(X, α) for some element
. Two shares f(α) and f(β) of the same polynomial f satisfy
Equation B.6

Thus, if the shares f(α), f(β) are given to two nodes, they can come up with the common value
as a shared secret between them.
Given t + 1 or more shares of f, one can reconstruct f(X, Y) uniquely using Lagrange’s interpolation formula (Exercise 2.53). On the other hand, if only t or less shares are available, there are many (at least q) possibilities for f and it is impossible to determine f uniquely. So the disclosure of up to t shares does not reveal the polynomial f to an adversary and uncompromised shared keys based on f remain secure.
Using a single polynomial for the entire network is not a good proposal, since t is limited by memory constraints in a sensor node. In order to increase resilience against node captures, many bivariate polynomials need to be used, and shares of random subsets of this polynomial pool are assigned to the key rings of individual nodes. This is how the poly-pool scheme works. If the degree t equals 0, this scheme degenerates to the EG scheme.
The key set-up server first selects a random pool
of S symmetric bivariate polynomials in
each of degree t in X and Y. Some IDs
are also generated for the nodes in the network. For each node u in the network, s polynomials f1, f2, . . . , fs are randomly picked up from
and the polynomial shares f1(X, α), f2(X, α), . . . , fs(X, α) are loaded in the key ring of u, where α is the ID of u. Each key ring now requires space for storing s(t + 1) log q bits, that is, for storing m := s(t + 1) symmetric keys.
Upon deployment, each node u broadcasts the IDs of the polynomials, the shares of which reside in its key ring. Each physical neighbour v of u, that has shares of some common polynomial(s), establishes itself as a direct neighbour of u. The exact pairwise key k between u and v is then calculated using Equation (B.6). If broadcasting polynomial IDs in plaintext is too unsafe, each node u can send some message encrypted by potential pairwise keys based on its polynomial shares. Those physical neighbours that can decrypt one of these encrypted messages have shares of common polynomials.
Like the EG scheme, the poly-pool scheme can be analysed under the framework of random graphs. Equations (B.1), (B.2) and (B.3) continue to hold under the poly-pool scheme. However, in this case the local connection probability p′ is computed as
Equation B.7

Given constraints on the network and the nodes, the desired size S of the polynomial pool
can be determined from this formula.
Let us now compute the probability Pe of compromise of communication between two uncaptured nodes u, v as a function of the number c of captured nodes. If c ≤ t, the eavesdropper cannot gather enough polynomial shares to learn about any polynomial in
, that is, Pe = 0. So assume that c > t and let pr denote the probability that exactly r shares of a given polynomial f (say, the one whose shares are used by the two uncaptured nodes u, v) are available in the key rings of the c captured nodes. The probability that a share of f is present in a key ring is
and so (by the Bernoulli distribution)
Equation B.8

Since t + 1 or more shares of f are required for the determination of f, we have
Equation B.9

|
Let n = 10,000 (network size), n′ = 50 (expected size of physical neighbourhood of a node), m = 150 (key ring size in number of symmetric keys) and Pc = 0.9999 (global connectivity). Let us plan to choose bivariate polynomials of degree t = 49, so that each key ring can hold s = 3 polynomial shares. For the determination of S, we first compute d = 20 as in Example B.1. We then require The following table lists the probability Pe for various values of c.
The table shows substantial improvement in resilience against node capture as achieved by the poly-pool scheme over the EG and qC schemes. |
The poly-pool scheme can be made pairwise by allowing no more than t + 1 shares of any polynomial to be distributed among the nodes. The best that the adversary can achieve is a capture of nodes with all these t + 1 shares and a subsequent determination of the corresponding bivariate polynomial. But this knowledge does not help the adversary, since no other node in the network uses a share of this compromised polynomial. That is, two uncaptured nodes continue to communicate with perfect secrecy.
However, like the random pairwise scheme, the pairwise poly-pool scheme suffers from the drawback that the maximum supportable network size is now limited by the quantity
. For the parameters of Example B.6, this size turns out to be an impractically low 333.
The grid-based key predistribution considerably enhances the resilience of the network against node captures. To start with, let us play a bit with Example B.6.
|
Take n = 10,000, n′ = 50 and m = 150. We calculated that the optimal value of S that keeps the network connected with high probability is S = 20. Now, let us instead take a much bigger value of S, say, S = 200. First, let us look at the brighter side of this choice. The probability Pe is listed in the following table as a function of c.
That is a dramatic improvement in the resilience figures. It, however, comes at a cost. The optimal value S = 20 was selected in Example B.6 in order to achieve a desired connectivity in the network. With S = 200, the probability p′ reduces from 0.404 to |
The grid-based key predistribution allocates polynomial shares cleverly to the nodes so as to achieve resilience figures of the last example with a reasonable guarantee that the resulting network remains connected. Let n be the size of the network and take
. For the sake of simplicity, let us assume that n = σ2. The n nodes are then placed on a σ × σ square grid. The node at the (i, j)-th grid location (where i,
) is identified by the pair (i, j). The set-up server generates 2σ random symmetric bivariate polynomials
, each of degree t in both X and Y. The i-th polynomial
corresponds to the i-th row and the j-th polynomial
to the j-th column in the grid. The key ring of the node at location (i, j) in the grid is given the two polynomial shares
and
. The memory required for this is equivalent to the storage for 2(t + 1) symmetric keys.
Now, look at the key establishment phase. Let two nodes u, v with IDs (i, j) and (i′, j′) be physical neighbours after deployment. First, consider the simple case i = i′. Both the nodes have shares of the polynomial
and can arrive at the common secret value
using the column identities of one another. Similarly, if j = j′, the nodes can compute the shared secret
. It follows that each node can establish keys directly with 2(σ – 1) other nodes in the network. That’s, however, a truly small fraction of the entire network.
Assume now that i ≠ i′ and j ≠ j′. If the node w with identity either (i, j′) or (i′, j) is in the physical neighbourhood of both u and v, then there is a secure link between u and w, and also one between w and v. The nodes u and v can then establish a path key via the intermediate node w.
So suppose also that neither (i, j′) nor (i′, j) resides in the communication ranges of both u and v. Consider the nodes w1 := (i, k) and w2 := (i′, k) for some
. Suppose further that w1 is in the physical neighbourhood of u, w2 in that of w1 and v in that of w2. But then there is a secure u, v-path comprising the links u → w1, w1 → w2 and w2 → v. Similarly, the nodes (k, j) and (k, j′) for each k ≠ i, i′ can help u and v establish a path key. To sum up, there are 2(σ – 2) potential three-hop paths between u and v.
If all these three-hop paths fail, one may go for four-hop, five-hop, . . . paths, but at the cost of increased communication overhead. As argued in Liu and Ning [181, 183], exploring paths with ≤ 3 hops is expected to give the network high connectivity.
For the grid-based scheme, we have S = 2σ (the key pool size) and s = 2 (the number of polynomial shares in each node’s key ring). Thus, the probability Pe can now be derived like Equations (B.8) and (B.9) as
Pe = 1 – (p0 + p1 + · · · + pt) = pt+1 + pt+2 + · · · + pc,
where

|
Take n = 10,000 and m = 150. Since each node has to store only two polynomial shares, we now take t = 74. Moreover, σ = 100, that is, the size of the polynomial pool is S = 200. The probability Pe can now be tabulated as a function of c (number of nodes captured) as follows:
This is a very pretty performance. The capture of even 60 per cent of the nodes leads to a compromise of only 3.34 per cent of the communication among uncaptured nodes. |
This robustness of the grid-based distribution comes at a cost, though. The path key establishment stage is communication-intensive and is mandatory for ensuring good connectivity. Moreover, this stage is based on the assumption that during bootstrapping not many nodes are captured. If this assumption cannot necessarily be enforced, the scheme forfeits much of its expected resilience guarantees.
The matrix-based key predistribution scheme is derived from the idea proposed by Blom [25]. It is similar to the polynomial-based key predistribution and employs symmetric matrices (in place of symmetric polynomials). Let
be a finite field with q just large enough to accommodate a symmetric key and let G be a t × n matrix over
, where t is determined by the memory of a sensor node and n is the number of nodes in the network. It is not required to preserve G with secrecy. Anybody, even the enemies, may know G. We only require G to have rank t, that is, any t columns of G must be linearly independent. If g is a primitive element of
, the following matrix is recommended.
Equation B.10

In a memory-starved environment, this G has a compact representation, since its j-th column is uniquely identified by the value gj. The remaining elements in the column can be easily computed by performing few multiplications.
Let D be a secret t × t symmetric matrix, and A the n × t matrix defined by:
A := (DG)t = GtDt = GtD.
Finally, define the n × n matrix
K := AG.
It follows that K = AG = Gt DG = Gt (Gt Dt)t = Gt (Gt D)t = Gt At = (AG)t = Kt, that is, K is a symmetric matrix. If the (i, j)-th element of K is denoted by kij, we have kij = kji, that is, this common value can be used as a pairwise key between the i-th and j-th nodes.
Let the (i, j)-th element of A be denoted by aij for 1 ≤ i ≤ n and 1 ≤ j ≤ t. Also let gij, 1 ≤ i ≤ t and 1 ≤ j ≤ n, denote the (i, j)-th element of G. But then the pairwise key kij = kji is expressed as:

Thus, the i-th row of A and the j-th column of G suffice for the i-th node to compute kij. Similarly, the j-th row of A and the i-th column of G allow the j-th node to compute kji. In view of this, every node, say, the i-th node, is required to store the i-th row of A and the i-th column of G. If G is as in Equation (B.10), only gi needs to be stored instead of the full i-th column of G. Thus, the storage of t + 1 elements of
(equivalent to t + 1 symmetric keys) suffices.
During direct key establishment, two physical neighbours exchange their respective columns of G for the computation of the common key. Since G is allowed to be a public knowledge, this communication does not reveal secret information to the adversary.
Suppose that the adversary gains knowledge of some t′ ≥ t rows of A (say, by capturing nodes). We also assume that the matrix G is completely known to the adversary. The adversary picks up any t known rows of A and constructs a t × t matrix A′ comprising these rows. But then A′ = G′D, where G′ is a suitable t × t submatrix of G. Since G is assumed to be of rank t, G′ is invertible and so the secret matrix D can be easily computed. Conversely, if D is known to the adversary, she can compute A and, in particular, any t′ ≥ t rows of A.
If only t′ < t rows are known to the adversary, then any choice of any t – t′unknown rows of A yields a value of the matrix D, and subsequently we can construct the remaining n–t unknown rows of A. In other words, D cannot be uniquely recovered from a knowledge of less than t rows of A. This task is difficult too, since there is an infeasible number of choices for assigning values to the elements of the unknown t – t′rows of A.
To sum up, the matrix-based key predistribution scheme is completely secure, if less than t nodes are only captured. On the other hand, if t or more nodes are captured, then the system is completely compromised. Thus, the resilience against node capture of this scheme is determined solely by t and is independent of the size n of the network. The parameter t, in turn, is restricted by the memory of a sensor node (a node has to store t + 1 elements of
).
In order to overcome this difficulty, Du et al. [79] propose a matrix-pool-based scheme. Here, S matrices A1, A2, . . . , AS are computed from S pairwise different secret matrices D1, D2, . . . , DS. The same G may be used for all these key spaces. Each node is given shares (that is, rows) of s matrices randomly chosen from the pool {A1, A2, . . . , AS}. The resulting details of the matrix-pool-based scheme are quite analogous to those pertaining to the polynomial-pool-based scheme described in the earlier section, and are omitted here.
The key predistribution algorithms discussed so far are based on a random deployment model. In practice, the deployment model (like the expected location of each node and the overall geometry of the deployment area) may be known a priori. This knowledge can be effectively exploited to tune the key predistribution algorithms so as to achieve better connectivity and higher resilience against node capture. As an example, consider sensor nodes deployed from airplanes in groups or scattered uniformly from trucks. Since the approximate tracks of these vehicles are planned a priori, the key rings of the nodes can be loaded appropriately to achieve the expected performance enhancements.
Two nodes that are in the physical neighbourhoods of one another need only share a pairwise key. Therefore, the basic objective of designing location-aware schemes is to predistribute keys in such a way that two nodes that are expected to remain close in the deployment area are given common pairwise keys, whereas two nodes that are expected to be far away after deployment need not share any pairwise key. The actual deployment locations of the nodes cannot usually be predicted accurately. Nonetheless, an approximate knowledge of the locations can boost the performance of the network considerably. The smaller the errors between the expected and actual locations of the nodes are, the better a location-aware scheme is expected to perform.
Liu and Ning [182] propose a modification of the random pairwise key scheme (Section B.5) based on deployment knowledge. Let there be n sensor nodes in the network with each node capable of storing m cryptographic keys. The expected deployment location of each node is provided to the key set-up server. For each node u in the network, the server determines m other nodes whose expected locations of deployment are closest to that of u and for which pairwise keys with u have not already been established. For every such node v, a new random key kuv is generated. The key-plus-ID combination (kuv, v) is loaded in u’s key ring, whereas the pair (kuv, u) is loaded in v’s key ring.
This natural and simple-minded strategy provides complete security against node captures, as it is a pairwise key distribution scheme. Now, there is no limitation on the maximum supportable network size (under the reasonable assumption that there are much less than 2l nodes in the network, where l is the bit length of a cryptographic key, say, 64 or 128). Moreover, the incorporation of deployment knowledge increases the connectivity of the network. In order to analyse this gain, we first introduce some formal notations.
For the sake of simplicity, we assume that the deployment region is two-dimensional, so that every point in that region is expressed by two coordinates x and y. Let u be a sensor node whose expected deployment location is (ux, uy) and whose actual deployment location is
. This corresponds to a deployment error of
. The actual location
(or equivalently the error eu) is modelled as a continuous random variable that can assume values in
. The probability density function fu of
characterizes the pattern of deployment error. One possibility is to assume that
is uniformly distributed within a circle with centre at (ux, uy) and of radius ∊ called the maximum deployment error. We then have:
Equation B.11

An arguably more realistic strategy is to model
as a random variable following the two-dimensional normal (Gaussian) distribution with mean (ux, uy) and variance σ2. The corresponding density function is:

Let u and v be two deployed nodes. We assume that each node has a communication range of ρ. We also make the simplifying assumption that the different nodes are deployed independently, that is,
and
are independent random variables. The probability that u and v lie in the communication ranges of one another can be expressed as a function of the expected locations (ux, uy) and (vx, vy) as:

Here, the integral is over the region C of
defined by
.
Let n′ denote the number of physical neighbours of u (or of any sensor node). We know that u shares pairwise keys with exactly m nodes. We assume that these key neighbours of u are distributed uniformly in a circle centred at u and of radius ρ′. The expected value of ρ′ is:

Let v be a key neighbour of u. The probability that v lies in the physical neighbourhood of u is given by

where C′ is the region (vx – ux)2 + (vy – uy)2 ≤ ρ′2. Therefore, u is expected to have m × p(u) direct neighbours. Since the size of the physical neighbourhood of u is n′, the local connectivity, that is, the probability that u can establish a pairwise key with a physical neighbour is given by

In general, it is difficult to compute the above integrals. Liu and Ning [182] compute the probability p′ for the density function given by Equation (B.11) and establish that p′ ≈ 1 for small deployment errors, namely ∊ ≤ ρ. As ∊ increases, p′ gradually reduces to the corresponding probability for the random pairwise scheme.
In order to add sensor nodes at a later point of time, the key set-up server again uses deployment knowledge. The keys rings of the new nodes are loaded based on the expected deployment locations of these nodes and on the (expected or known) locations of the deployed nodes. Pairwise keys between the new and the deployed nodes are communicated to the deployed nodes over secure channels (routing through uncompromised nodes).
Several variants of the closest pairwise keys scheme have been proposed. Liu and Ning themselves propose an extension based on pseudorandom functions [182]. Du et al. propose a variant of the basic (EG) scheme based on a specific model of deployment [80]. We end this section by briefly outlining a location-aware adaptation of the polynomial-pool-based scheme (Section B.6).
For simplicity, let us assume that the deployment region is a rectangular area. This region is partitioned into a 2-dimensional array of rectangular cells. Let the partition consist of R rows and C columns. The cell located at the i-th row and the j-th column is denoted by Ci,j. The neighbours of the cell Ci,j are taken to be the four adjacent cells: Ci–1,j, Ci+1,j, Ci,j–1, Ci,j+1.
The key set-up server first decides a finite field
with q just big enough to accommodate a cryptographic key. The server also chooses R×C random symmetric bivariate polynomials
. The polynomial fi,j is meant for the cell Ci,j. The degree t (in both X and Y) of each fi,j is so chosen that each sensor node has sufficient memory to store the shares of five such polynomials.
Let u be a node to be deployed and let the expected deployment location of u lie in the cell Ci,j called the home cell of u. The key ring of u is loaded with the shares (evaluated at u) of the five polynomials corresponding to the home cell and its four neighbouring cells. More precisely, u gets the five shares: fi,j(X, u), fi–1,j(X, u), fi+1,j(X, u), fi,j–1(X, u), and fi,j+1(X, u). The set-up server also stores in u’s memory the ID (i, j) of its home cell.
In the direct key establishment phase, each node u broadcasts the ID (i, j) of its home cell (or some messages encrypted by potential pairwise keys). Those physical neighbours whose home cells are either the same as or neighbouring to that of u can establish pairwise keys with u.
An analysis of the performance of this location-aware poly-pool-based scheme can be carried out along similar lines to the closest pairwise scheme. We leave out the details here and refer the reader to Liu and Ning [182].
| C.1 | Introduction |
| C.2 | Provably Difficult Computational Problems Are not Suitable |
| C.3 | One-way Functions and the Complexity Class UP |
. . . complexity turns out to be most elusive precisely where it would be most welcome.
—C. H. Papadimitriou [229]
Real knowledge is to know the extent of one’s ignorance.
—Confucius
The complex develops out of the simple.
—Colin Wilson
It is worthwhile to ask the question why public-key cryptography must be based on problems that are only believed to be difficult. Complexity theory suggests concrete examples of provably intractable problems. This appendix provides a brief conceptual explanation why these provably difficult problems cannot be used for building cryptographic protocols. We may consequently conclude that at present we cannot prove a public-key cryptosystem to be secure. That is bad news, but we have to live with it.
Here, we make no attempts to furnish definitions of formal complexity classes. The excellent books by Papadimitriou [229] and by Sipser [280] can be consulted for that purpose. Here is a list of the complexity classes that we require for our discussion. The relationships between these classes are depicted in Figure C.1. All the containments shown in this figure are conjectured to be proper. With an abuse of notations we identify functional problems with decision problems.
| Class | Brief description |
|---|---|
| P | Languages accepted by deterministic polynomial-time Turing machines |
| NP | Languages accepted by non-deterministic polynomial-time Turing machines |
| coNP | Complements of languages in NP |
| UP | Languages accepted by unambiguous polynomial-time Turing machines |
| PSPACE | Languages accepted by polynomial-space Turing machines |
| EXPTIME | Languages accepted by deterministic exponential-time Turing machines |
| EXPSPACE | Languages accepted by exponential-space Turing machines |

The
problem, arguably the deepest unsolved problem in theoretical computer science, may be suspected to have some bearing on public-key cryptography. Under the assumption that P ≠ NP, one may feel tempted to go for using NP-complete problems for building secure cryptosystems. Unfortunately, this tempting invitation does not prove to be fruitful. Several cryptosystems based on NP-complete problems were broken and that is not really a surprise.
It may be the case that P = NP, and, if so, all NP-complete problems are solvable in polynomial time. It may, therefore, be advised to select problems that lie outside NP, that is, in strictly bigger complexity classes. By the time and space hierarchy theorems, we have
and
. Both EXPTIME and EXPSPACE have complete problems. An EXPTIME-complete problem cannot be solved in polynomial time, whereas an EXPSPACE-complete problem cannot be solved in polynomial space nor in polynomial time too. How about using these complete problems for designing cryptosystems? The idea may sound interesting, but these provably exponential problems turn out to be even poorer, perhaps irrelevant, for use in cryptography.
Let fe and fd be the encryption and decryption transforms for a public-key cryptosystem. We assume that the set of plaintext messages and the set of ciphertext messages are both finite. (Public-key cryptosystems are like block ciphers in this respect.) Moreover, since a ciphertext c = fe(m, e) is computable in polynomial time, the length of c is bounded by a polynomial in the length of m. An intruder can non-deterministically guess messages m (from the finite space) and check if c = fe(m, e) to validate the correctness of the guess. It, therefore, follows that deciphering a ciphertext message (with no additional information) is a problem in NP. That is the reason why we should not look beyond NP.
However, the full class NP, in particular, the most difficult (that is, complete) problems of NP, may be irrelevant for cryptography, as we argue in the next section. In other words, for building cryptosystems we expect to effectively exploit problems that are believed to be easier than NP-complete. Both the integer factoring and the discrete log problems are in the class NP ∩ coNP. We have P ⊆ NP ∩ coNP. It is widely believed that this containment is proper. Also NP ∩ coNP is not known (nor expected) to have complete problems. Even if
, both the factoring and the discrete log problems need not be outside P, since we are unlikely to produce completeness proofs for them. Only historical evidences exist, in favor of the fact that these two problems are difficult. The situation may change tomorrow. Complexity theory does not offer any formal protection.
| C.1 | Prove that the primality testing problem
is in NP ∩ coNP. (Remark: The AKS algorithm is a deterministic poly-time primality testing algorithm and therefore PRIME is in P and so trivially in NP ∩ coNP too. It can, however, be independently proved that primes have succinct certificates.) |
| C.2 | Consider the decision version of the integer factorization problem:
|
| C.3 | Let G be a finite cyclic multiplicative group with a generator g. Assume that one can compute products in G in polynomial time. Consider the decision version of the discrete log problem in G:
Here, indices (indg a) are assumed to lie between 0 and (#G) – 1.
|
Any public-key encryption behaves like a one-way function, easy to compute but difficult to invert.
|
Let Σ be an alphabet (a finite set of symbols). One may assume, without loss of generality, that Σ = {0, 1}. Let Σ* denote the set of all strings over Σ. A function f : Σ* → Σ* is called a one-way function, if it satisfies the following properties.
|
Property (1) ensures unique decryption. Property (2) implies that the length of f(α) is polynomially bounded both above and below by the length of α. Property (3) suggests ease of encryption, whereas Property (4) suggests difficulty of decryption.
We do not know whether there exists a one-way function. The following functions are strongly suspected to be one-way. However, we do not seem to have any clues about how we can prove these functions to be one-way.
|
It is evident that if P = NP, there cannot exist one-way functions. The converse of this is not true, that is, even if P ≠ NP, there may exist no one-way functions.
|
A non-deterministic Turing machine which has at most one accepting branch of computation for every input string is called an unambiguous Turing machine. The class of languages accepted by poly-time unambiguous Turing machines is denoted by UP. |
Clearly, P ⊆ UP ⊆ NP. Both the containments are assumed to be proper. The importance of the class UP stems from the following result:
|
There exists a one-way function if and only if P ≠ UP. |
Therefore, the
question is relevant for cryptography and not the
question. The class UP is not known (nor expected) to have complete problems. So locating a one-way function may be a difficult task. But at the minimum we are now in the right track.[2] Complexity theory helped us shift our attention from NP (or bigger classes) to UP.
[2] Well, hopefully!
In order to use a one-way function f for cryptographic purposes, we require additional properties of f. Computing f–1 must be difficult for an intruder, whereas the same computation ought to be easy to the legitimate recipient. Thus, f must support poly-time inversion, provided that some secret piece of information (the trapdoor) is available during the computation of the inverse. A one-way function with a trapdoor is called a trapdoor one-way function.
The first two functions of Example C.1 do not have obvious trapdoors and so cannot be straightaway used for designing cryptosystems. The third function (RSA encryption) has the requisite trapdoor, namely, the decryption exponent d satisfying ed ≡ 1 (mod φ(n)).
The hunt for a theoretical foundation does not end here. It begins. Most part of complexity theory deals with worst-case complexities of problems, rather than their average or expected complexities. A one-way function, even if existent, may be difficult to invert for only few instances, whereas cryptography demands the inversion problem to be difficult for most instances. A function meeting even this cryptographic demand need not be suitable, since there may be reductions to map hard instances to easy instances. Moreover, the trapdoors themselves may inject vulnerabilities and prepare room for quick attacks.
There still remains a long way to go!
| C.4 | Let f : Σ* → Σ* be a function with the property that f(f(α)) = f(α) for every . Argue that f is not a one-way function.
|
| C.5 | Design unambiguous polynomial time Turing machines for computing the inverses of the functions described in Example C.1. |
| C.6 | Show that if there exists a bijective one-way function, then NP ∩ coNP ≠ P. [H] |
The greatest thing in family life is to take a hint when a hint is intended and not to take a hint when a hint isn’t intended.
—Robert Frost
Teachers open the door, but you must enter by yourself.
—Chinese Proverb
Imagination grows by exercise, and contrary to common belief, is more powerful in the mature than in the young.
—W. Somerset Maugham
| 2.11 (a) | Apply Theorem 2.3 to the restriction to H of the canonical homomorphism G → G/K. |
| 2.11 (b) | Apply Theorem 2.3 to the canonical homomorphism G/H → G/K, aH ↦ aK, .
|
| 2.14 (c) | Consider the canonical surjection G → G/H. |
| 2.17 (a) | Let i ≠ j and . Then ord g divides both and and so is equal to 1, that is, g = e. Now let hi, and with . But then . Thus #(HiHj) = (#Hi)(#Hj). Generalize this argument to show that #(H1 · · · Hr) = n.
|
| 2.18 | First consider the special case #G = pr for some and . For each , the order ordG g is of the form psg for some sg ≤ r. Let s be the maximum of the values sg, . Take any element with ordG h = ps. Then e, h, . . . , hps–1 are all the elements x that satisfy xps = e. But by the choice of s every element satisfies xps = e. Hence we must have s = r. This proves the assertion for the special case. For the general case, use this special case in conjunction with Exercise 2.17.
|
| 2.19 (b) | Show that , (h1, . . . , hr) ↦ h1 . . . hr, is a group isomorphism.
|
| 2.23 | Use Zorn’s lemma. |
| 2.24 (c) | Let be the intersection of all prime ideals of R. First show that . To prove the reverse inclusion take and consider the set S of all non-unit ideals of R such that for all . If f is a non-unit, the set S is non-empty and by Zorn’s lemma has a maximal element, say . Show that is a prime ideal of R.
|
| 2.25 | For , the map R → R, b ↦ ab, is injective and hence surjective by Exercise 2.4.
|
| 2.30 | Apply the isomorphism theorem to the canonical surjection , .
|
| 2.33 | [(1)⇒(2)] Let be an ascending chain of ideals of R. Consider the ideal which is finitely generated by hypothesis.
[(3)⇒(1)] Let |
| 2.36 | Use the pigeon-hole principle: If there are n + 1 pigeons in n holes, then there exists at least one hole containing more than one pigeons. |
| 2.37 | Consider the integer satisfying 2t ≤ n < 2t+1.
|
| 2.39 (e) | 12 ≡ (n – 1)2 (mod n). |
| 2.39 (f) | Apply Wilson’s theorem. |
| 2.40 | Use Fermat’s little theorem. |
| 2.41 | Use Wilson’s theorem or Euler’s criterion. |
| 2.45 | Reduce to the case y2 ≡ α (mod p). |
| 2.49 (a) | Consider the canonical group homomorphism and the fact that a surjective group homomorphism from a cyclic group G onto G′ implies that G′ is cyclic.
|
| 2.49 (b) | Let be a primitive element modulo p. The residue class of a in has order k(p – 1) for some . Show that the order of b := p + 1 modulo pe is pe–1. So the order of akb modulo pe is pe–1(p – 1) = φ(pe).
|
| 2.50 | Use the Chinese remainder theorem in conjunction with Exercises 2.20 and 2.49. |
| 2.53 | Take . The interpolating polynomial is . Use Exercise 2.52 to establish the uniqueness.
|
| 2.56 (b) | is irreducible in if and only if f(X + 1) is irreducible in .
|
| 2.58 | Use the fundamental theorem of algebra. |
| 2.63 | Consider the set of all linearly independent subsets of V that contain T. Show that every chain in has an upper bound in . By Zorn’s Lemma, there exists a maximal element . Show that S generates V.
|
| 2.64 (b) | Use Exercise 2.63. |
| 2.68 | Let p1, . . . , pn be n distinct primes. Take and ai := a/pi for i = 1, . . . , n.
|
| 2.72 (a) | If N is the -submodule of generated by ai/bi, i = 1, . . . , n, with gcd(ai, bi) = 1, then for any prime p that does not divide b1 · · · bn we have 1/p ∉ N.
|
| 2.72 (b) | Any two distinct elements of are linearly dependent over . Now use Exercise 2.69.
|
| 2.74 (b) | Let the conjugates of over F be α1 = α, α2, . . . , αn. Since is injective, it follows from (a) that makes a permutation of α1, . . . , αn. So is surjective.
|
| 2.75 (a) | Use Exercise 2.61. |
| 2.76 (b) | The if part follows from Exercise 2.61. For proving the only if part, take . If the polynomial f(X) := Xp – a splits over F, we are done. So suppose that there exists an irreducible divisor of f(X) of degree ≥ 2. By the separability of F, there exist two distinct roots α, β of g(X). Let K := F (α, β). Show that the Frobenius map , , is an endomorphism of K. Also there exists a field isomorphism τ : F (α) → F (β) which fixes F element-wise and takes α ↦ β. But then . Since any field homomorphism is injective, α equals β, a contradiction. Thus no g(X) chosen as above can exist.
|
| 2.77 (a) | Let be an irreducible polynomial with g(α) = 0 for some . Let β be another root of g. We show that . By Lemma 2.5, there is an isomorphism μ : F(α) → F(β). Clearly, K is the splitting field of f over F(α). Let K′ be the splitting field of μ*(f) over F (β). By Proposition 2.33, K ≅ K′. If are the roots of f, then K′ ≅ F (β, γ1, . . . , γd) = K(β). But then K ≅ K(β).
|
| 2.78 (a) | Consider transcendental numbers. |
| 2.78 (b) | Let . For , we have , implying that for a, with a ≤ b. Now assume for some . Choose a rational number b with . Then , a contradiction. Thus . Similarly .
|
| 2.80 | Use the binomial theorem and induction on n. |
| 2.82 | Follow the proof of Theorem 2.37. |
| 2.90 | Example 2.18. |
| 2.91 (b) | By the fundamental theorem of Galois theory, # . Now show that are distinct -automorphisms of .
|
| 2.92 (a) | Assume r > 1. We have the extensions , where is the splitting field of f over and hence over . Consider the minimal polynomial of a root of f over . Conversely, let f be reducible over . Choose an irreducible factor of f with deg h = s < d. Now h has one (and hence all) roots in and, therefore, d|sm.
|
| 2.93 | Use Corollary 2.18. |
| 2.98 | In each case, the defining polynomial is quadratic in Y (and with coefficients in K[X]). If this polynomial admits a non-trivial factorization, one can reach a contradiction by considering the degrees of X in the coefficients of Y1 and Y0. |
| 2.103 | For simplicity, consider the case char K ≠ 2, 3. Show that the curves Y2 + Y = X3 and Y2 = X3 + X have j-invariants 0 and 1728 respectively. Finally, if , 1728, then the curve has j-invariant . One must also argue that these are actually elliptic curves, that is, have non-zero discriminants.
|
| 2.111 | Use Theorem 2.51. |
| 2.112 (a) | Pair a point with its opposite. This pairing fails for points of orders 1 and 2. |
| 2.112 (c) | Consider the elliptic curve E : Y2 = X3 + 3 over . We have , whereas X3 + 3 is irreducible modulo 13.
|
| 2.113 (a) | Every element of has a unique square root.
|
| 2.115 (a) | Use Theorem 2.49 or Exercise 2.17. |
| 2.115 (b) | Use Theorem 2.50. |
| 2.115 (c) | The trace of Frobenius at q is 0 in this case. Now, use Theorem 2.50. |
| 2.123 | Factor N(G) in .
|
| 2.127 | Let . For each i, write , . But then det , where , δij being the Kronecker delta.
|
| 2.128 (b) | Use Part (a) and Exercise 2.126(c). |
| 2.128 (c) | Let . By Exercise 2.130, is integral over . Let be the ideal generated by in and let and be the ideals of generated respectively by and . Now, use Part (b).
|
| 2.133 (b) | In a PID, non-zero prime ideals are maximal. |
| 2.137 (a) | Since and are maximal, we have , that is, a1 + a2 = 1 for some and . Now use the fact that (a1 + a2)e1 + e2 = 1.
|
| 2.137 (b) | Use CRT. |
| 2.138 (a) | Since is invertible, for some fractional ideal .
|
| 2.140 (a) | For , let constitute a complete residue system of modulo . Then also form a complete residue system of modulo .
|
| 2.142 (d) | Take in Part (b).
|
| 2.143 (a) | Reduce modulo 4. |
| 2.143 (c) | Let divide this gcd. Then divides 2y and . Take norms.
|
| 2.144 (b) | Look at the expansion of a – 1 in base p. More precisely, let a < pN for some . Then –a = (pN – a) – pN = [(pN – 1) – (a – 1)] – pN.
|
| 2.152 (c) | First show that .
|
| 2.153 | Use unique factorization of rationals. |
| 2.154 | Show by induction on n that pn+1 divides apn+1 – apn in for all .
|
| 2.161 | There exists an irreducible polynomial in of every degree .
|
| 3.7 | The implication is obvious. For the reverse implication, use Proposition 2.5.
|
| 3.18 (b) | Consider the binary expansion of m. |
| 3.19 | if n is a pseudoprime to base a and not a pseudoprime to base b, then n is not a pseudoprime to base ab. |
| 3.20 (a) | If p2|n for some , take with ordn(a) = p. If n is square-free, consider a prime divisor p of n and take with and a ≡ 1 (mod n/p).
|
| 3.20 (b) | if n is an Euler pseudoprime to base a and not an Euler pseudoprime to base b, then n is not an Euler pseudoprime to base ab. |
| 3.21 (a) | Let be the prime factorization of n with r and each αi in . Then, . For odd pi, the group is cyclic of order and hence contains an element of order pi – 1.
|
| 3.21 (b) | ordn(–1) = 2. |
| 3.21 (c) | Let vp(n) ≥ 2 for some odd prime p. Construct an element with ordn(a) = p.
|
| 3.28 | Proceed by induction on i = 1, . . . , r. For 1 ≤ i ≤ r, define νi := n1 · · · ni and let be a solution of the congruences bi ≡ aj (mod nj) for j = 1, . . . , i. If i < r, use the combining formula given in Section 2.5 to find such that bi+1 ≡ bi (mod νi) and bi+1 ≡ ai+1 (mod ni+1).
|
| 3.31 | Apply Newton’s iteration to compute a zero of x2 – n. |
| 3.32 (a) | Apply Newton’s iteration to compute a zero of xk – n. |
| 3.34 (b) | The updating d(X) := d(X) – Xi–sb(X) needs to consider only the non-zero words of b. |
| 3.36 (b) | First consider b = 0 and note that the roots of X(q–1)/2 – 1 (resp. X(q–1)/2 + 1) are all the quadratic residues (resp. non-residues) of .
|
| 3.36 (c) | First consider b = 0. |
| 3.40 | For , we have ord(a)|m and for each i = 1, . . . , r the multiplicity vpi (ord(a)) is the smallest of the non-negative integers k satisfying .
|
| 3.41 (a) | Use the CRT. |
| 3.43 (a) | Use the CRT and the fact that for an odd prime r ≡ 3 (mod 4).
|
| 4.1 (a) | Using the CRT, reduce to the case that n is prime. Then is bijective ⇔ the restriction is bijective. Now, if gcd(a, φ(n)) = 1, the inverse of is given by , where ab ≡ 1 (mod φ(n)). On the other hand, if q is a prime divisor of gcd(a, φ(n)), choose an element with ord(y) = q. But then ya ≡ 1 (mod n), that is, is not injective. This exercise provides the foundation for the RSA cryptosystems.
|
| 4.1 (b) | In view of the CRT, reduce to the case n = pα for and α > 1. Then (pα–1)a ≡ 0 (mod n).
|
| 4.6 | Consider the integral .
|
| 4.9 | Use the CRT and lifting. |
| 4.10 | For proving , let n be an odd composite integer, choose a random and compute a square root x of y2 modulo n. By Exercise 4.9, the probability that x ≡ ±y (mod n) is at most 1/2.
|
| 4.12 (d) | Eliminate a from T (a, b, c) using a + b + c = 0. For each fixed c, allow b to vary and use a sieve to find out all the values of b for which T (a, b, c) is smooth for the fixed c. |
| 4.13 | You may use the prime number theorem and the fact that the sum of the reciprocals of the first t primes asymptotically approaches ln ln t.
|
| 4.15 | If a < a1 or a > am, then no i exists. So assume that a1 ≤ a ≤ am and let d := ⌊(1 + m)/2⌋. If a = ad, return d, else if a < ad, recursively search a among the elements a1, . . . , ad–1, and if a > ad, recursively search a among the elements ad+1, . . . , am. |
| 4.16 (a) | Use Lagrange’s interpolation formula (Exercise 2.53). |
| 4.18 (a) | One may precompute the values σi := p rem qi, i = 1, . . . , t. Note that qi|(gα + kp) if and only if ρk,i = 0. |
| 4.19 (a) | Use the approximation T (c1, c2) ≈ (c1 + c2)H. |
| 4.21 (c) | T (a, b, c) = –b2 – c(x + cy)b + (z – c2x). |
| 4.21 (d) | Imitate the second stage of the LSM. |
| 4.23 | Let the factor base consist of all irreducible polynomials over of degrees ≤ m together with the polynomials of the form Xk + h(X), , deg h ≤ m. The optimal running time of this algorithm corresponds to .
|
| 4.24 (b) | is square-free.
|
| 4.24 (c) | Use the fact Xm – 1 = (Xm/pvp(m) – 1)pvp(m). |
| 4.24 (d) | Theorem 2.39. |
| 4.25 (a) | Look at the roots of the polynomials on the two sides. |
| 4.25 (c) | If ord ω = m, then ord(–ω) = 2m. |
| 4.25 (d) | ω, ωq, . . . , ωql–1 are all the roots of the minimal polynomial of ω over .
|
| 4.26 (b) | Use the Mordell–Weil theorem. |
| 4.26 (c) | Use Theorem 4.2. |
| 5.2 (a) | Solve the simultaneous congruences x ≡ ci (mod ni), i = 1, . . . , e, and then take the integer e-th root of the solution x, 1 ≤ x ≤ n1 · · · ne. |
| 5.2 (b) | Append (different) pseudorandom bit strings to m before encryption. This process is often referred to as salting. |
| 5.3 (a) | In view of the Chinese remainder theorem, reduce to the case n = pr for some and .
|
| 5.4 | ue1 + ve2 = 1 for some u, .
|
| 5.6 | If the same session key is used to generate the ciphertext pairs (r1, s1) and (r2, s2) on two plaintext messages m1 and m2, then m1/m2 = s1/s2. |
| 5.7 (c) | Let x = (xl–1 . . . x1x0)2. Define x′ := (xl–1 . . . x2x1)2 and y′ := gx′ (mod p). Then, y ≡ y′2gx0 (mod p). Since x0 is easily computable, y′ can be obtained by obtaining a square root of y modulo p. Argue that a call of the oracle helps us choose the correct square root y′ of y. Now, use recursion. |
| 5.8 | Let g′ be any randomly chosen generator of , where q := ph. One computes for i = 0, 1, . . . , p – 1. We then have the equality of the sets
modulo q – 1, where l := indg′ g. But then for each i we have a (yet unknown) j such that |
| 5.9 | Let g′, and l be as in Exercise 5.8. Now, we have the equality of the sets
modulo q – 1. |
| 5.11 | (mod β) are polynomials with small coefficients.
|
| 5.15 (a) | If Alice generates the signatures (M1, s1) and (M2, s2) on two messages M1 and M2, then her signature on a message M with H(M) ≡ H(M1)H(M2) (mod n) is s1s2 (mod n). Thus, without knowing the private key of Alice, an intruder can generate a valid signature (M, s1s2) of Alice, provided that such an M can be computed. Of course, here the intruder has little control over the message M. The PKC standards form RSA Laboratories add some redundancy to the hash function output before signing. The product of two hash values with redundancy is, in general, expected not to have the redundancy. This increases the security of the scheme against existential forgeries beyond that provided by the first pre-image resistance of the underlying hash function. |
| 5.15 (b) | For any , a valid signature is (M, s), where H(M) ≡ s2 (mod n).
|
| 5.15 (c) | Choose random integers u, v with gcd(v, n) = 1 and take d′ := u + dv. Of course, d and hence d′ are unknown to Carol, but she can compute s = gd′ = gu(gd)v and t ≡ –H(s)v–1 (mod n). But then (M, s, t) is a valid ElGamal signature on a message M for which H(M) ≡ tu (mod n). |
| 5.16 | Obviously, c itself could be a possible choice, but that is not random and Bob might refuse to sign c. Carol should hide c by cre (mod n) for some randomly chosen r known to her. |
| 5.23 (a) | by the CRT.
|
| 5.25 (a) | Replace the random challenge of the verifier by the hash value of the string obtained by concatenating the message to be signed with the witness. |
| 5.26 (d) | Bob finds a random b′ with and sends a := (b′)2 (mod n) to Alice. But then Alice’s response b yields a non-trivial factor gcd(b – b′, n) of n.
|
| 7.5 | (mod n) and m ≡ se (mod n).
|
| 7.9 (a) | Use Exercise 2.44(b). |
| 7.9 (c) | Again use Exercise 2.44(b). |
| 7.9 (d) | Use Part (c) in conjunction with the CRT, and separately consider the three cases v2(p–1) = v2(q – 1), v2(p – 1) > v2(q – 1) and v2(p – 1) < v2(q – 1). |
| A.2 | for all X, J. One does not have to look at the S-boxes for proving this.
|
| A.9 (c) | For i = 0, 1, 2, 3, 4Nr, 4Nr + 1, 4Nr + 2, 4Nr + 3, take . For other values of i, take .
|
| A.14 (b) | Let DL(X) := XdCL(1/X) = a0 + a1X + a2X2 + · · · + ad–1Xd–1 + Xd. Consider the -algebra , where x := X + 〈DL(X)〉. The -linear transformation λx : A → A defined by g(x) ↦ xg(x) has the matrix ΔL with respect to the polynomial basis (1, x, . . . , xd–1). If is the minimal polynomial of λx, then [f(λx)](1) = f(x) = 0. Now, use the fact that 1, x, . . . , xd–1 are linearly independent over .
|
| A.16 (b) | [only if] Take σ ≠ 00 · · · 01. Since σ is non-zero, si = 1 for some . Construct an LFSR with d – 1 stages initialized to s0s1 · · · sd–2 to generate σ.
|
| A.19 | Suppose that we want to compute a second pre-image for H2(x). If , any is a second pre-image for H2(x). If , computing a second pre-image for H2(x) is equivalent to computing a second pre-image for H(x). The density of the (finite) set S is 0 in the (infinite) set of all bit strings. Thus, H2 is second pre-image resistant. On the other hand, for any two distinct x, we have a collision (x, x′) for H2.
|
| A.20 | Collision resistance of H implies that of H3. On the other hand, for a positive fraction (half) of the (n + 1)-bit strings y, it is easy to compute a pre-image of y under H3. |
| A.21 | If y is a square root of a modulo m, then so is m – y too. |
| A.22 | Use the birthday paradox (Exercise 2.172). |
| A.23 (d) | Let L := F1(L′) and R := F1(R′) with both R and R′ non-zero. Then, F1(L ‖ R) = F2(L′ ‖ R′). |
| A.25 | Let h(i) denote the column vector of dimension 160 having the bits of H(i) as its elements and m(i) the column vector of dimension 512 + 160 = 672 having the bits of M(i) and of H(i) as its elements. Show that the modified design of SHA-1 leads to the relation h(i) ≡ Am(i–1) + c (mod 2) for some constant 160 × 672 matrix A over and for some constant vector c. So what then?
|
| C.6 | For α, , call α ≤ β if and only if |α| < |β| or |α| = |β| and α is lexicographically smaller than β. This ≤ produces a well-ordering of Σ*. For a one-way function f, look at the language for some with γ ≤ β}.
|
If you steal from one author, it’s plagiarism; if you steal from many, it’s research.
—Wilson Mizner
Literature is the question minus the answer.
—Roland Barthes
Everything that can be invented, has been invented.
—Charles H. Duell, 1899
[1] Adkins, W. A. and S. H. Weintraub (1992). Algebra: An Approach via Module Theory. Graduate Texts in Mathematics, 136. New York: Springer.
[2] Adleman, L. M., J. DeMarrais and M.-D. A. Huang (1994). “A Subexponential Algorithm for Discrete Logarithms over the Rational Subgroup of the Jacobians of Large Genus Hyperelliptic Curves over Finite Fields”, Algorithmic Number Theory—ANTS-I, Lecture Notes in Computer Science, 877. pp. 28–40. Berlin/Heidelberg: Springer.
[3] Adleman, L. M. and M.-D. A. Huang (1992). “Primality Testing and Two Dimensional Abelian Varieties over Finite Fields”, Lecture Notes in Mathematics, 1512. Berlin: Springer.
[4] Adleman, L. M., C. Pomerance and R. S. Rumely (1983). “On Distinguishing Prime Numbers from Composite Numbers”, Annals of Mathematics, 117: 173–206.
[5] Agarwal, M., N. Kayal and N. Saxena (2002), “Primes Is in P” [online document]. Available at http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf (October 2008).
[6] * Ahlfors, L. V. (1966). Complex Analysis. New York: McGraw-Hill.
[7] * Aho, A. V., J. E. Hopcroft and J. D. Ullman (1974). The Designs and Analysis of Algorithms. Reading, Massachusetts: Addison-Wesley.
[8] * Aho, A. V., J. E. Hopcroft and J. D. Ullman (1983). Data Structues and Algorithms. Reading, Massachusetts: Addison-Wesley.
[9] Aigner, M. and E. Oswald (2007), “Power Analysis Tutorial” [online document]. Available at http://www.iaik.tugraz.at/content/research/implementation_attacks/introduction_to_impa/dpa_tutorial.pdf (October 2008).
[10] Akkar, M.-L., R. Bevan, P. Dischamp and D. Moyart (2000). “Power Analysis, What Is Now Possible”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 489–502. Berlin/Heidelberg: Springer.
[11] Anderson, R. and M. Kuhn (1997). “Low Cost Attacks on Tamper Resistant Devices”, Security Protocols—5th International Workshop, Lecture Notes in Computer Science, 1361. pp. 125–136. Berlin/Heidelberg: Springer.
[12] * Apostol, T. M. (1976). Introduction to Analytic Number Theory. Undergraduate Texts in Mathematics. New York: Springer.
[13] Arnold, V. I. (1999). “Polymathematics: Is Mathematics a Single Science or a Set of Arts?”, in V. Arnold, M. Atiyah, P. Lax and B. Mazur (eds.), Mathematics: Frontiers and Perspectives, pp. 403–416. Providence, Rhode Island: American Mathematical Society.
[14] Atiyah, M. F. and I. G. MacDonald (1969). Introduction to Commutative Algebra. Reading, Massachusetts: Addison-Wesley.
[15] Aumüller, C., P. Bier, W. Fischer, P. Hofreiter and J.-P. Seifert (2002), “Fault Attacks on RSA with CRT: Concrete Results and Practical Countermeasures” [online document]. Available at http://eprint.iacr.org/2002/073 (October 2008).
[16] Balasubramanian, R. and N. Koblitz (1998). “The Improbability that an Elliptic Curve has Subexponential Discrete Log Problem under the Menezes-Okamoto Vanstone Algorithm”, Journal of Cryptology, 11: 141–145.
[17] Bao, F., R. H. Deng, Y. Han, A. B. Jeng, A. D. Narasimhalu, T.-H. Ngair (1997). “Breaking Public Key Cryptosystems on Tamper Resistant Devices in the Presence of Transient Faults”, Security Protocols—5th International Workshop, Lecture Notes in Computer Science, 1361. pp. 115–124. Berlin/Heidelberg: Springer.
[18] Bellare, M. and P. Rogaway (1995). “Optimal Asymmetric Encryption—How to Encrypt with RSA”, Advances in Cryptology—EUROCRYPT ’94, Lecture Notes in Computer Science, 950. pp. 92–111. Berlin/Heidelberg: Springer. A revised version is available at http://www-cse.ucsd.edu/users/mihir/papers/oaep.html (October 2008).
[19] Bellare, M. and P. Rogaway (1996). “The Exact Security of Digital Signatures: How to Sign with RSA and Rabin”, Advances in Cryptology—EUROCRYPT ’96, Lecture Notes in Computer Science, 1070. pp. 399–416. Berlin/Heidelberg: Springer. A revised version is available at http://www-cse.ucsd.edu/users/mihir/papers/exactsigs.html (October 2008).
[20] Bennett, C. H. and G. Brassard (1984). “Quantum Cryptography: Public Key Distribution and Coin Tossing”, pp. 175–179. Proceedings of the IEEE International Conference on Computers, Systems and Signal Processing, Bangalore, India, December.
[21] Berlekamp, E. R. (1968). Algebraic Coding Theory. New York: McGraw-Hill.
[22] Biham, E. and A. Shamir (1997). “Differential Fault Analysis of Secret Key Cryptosystems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 513–528. Berlin/Heidelberg: Springer.
[23] Blake, I. F., R. Fuji-Hara, R. C. Mullin and S. A. Vanstone (1984). “Computing Logarithms in Finite Fields of Characteristic Two”, SIAM Journal of Algebraic and Discrete Methods, 5: 276–285.
[24] Blake, I. F., G. Seroussi and N. P. Smart (1999). Elliptic Curves in Cryptography. Cambridge: Cambridge University Press.
[25] Blom, R. (1985). “An Optimal Class of Symmetric Key Generation Systems”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 335–338. Berlin/Heidelberg: Springer.
[26] Blum, L., M. Blum, and M. Shub (1986). “A Simple Unpredictable Pseudo-Random Number Generator”, SIAM Journal on Computing, 15: 364–383.
[27] Blum, M. and S. Goldwasser (1985). “An Efficient Probabilistic Public Key Encryption Scheme Which Hides All Partial Information”, Advances in Cryptology—CRYPTO ’84, Lecture Notes in Computer Science, 196. pp. 289–299. Berlin/Heidelberg: Springer.
[28] Blundo, C., A. De Santis, A. Herzberg, S. Kutten, U. Vaccaro and M. Yung (1993). “Perfectly-Secure Key Distribution for Dynamic Conferences”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 471–486. Berlin/Heidelberg: Springer.
[29] Boneh, D. (1999). “Twenty Years of Attacks on the RSA Cryptosystem”, Notices of the American Mathematical Society, 46 (2): 203–213.
[30] Boneh, D., R. A. DeMillo and R. J. Lipton (1997). “On the Importance of Checking Cryptographic Protocols for Faults”, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 37–51. Berlin/Heidelberg: Springer.
[31] Boneh, D., R. A. DeMillo and R. J. Lipton (2001). “On the Importance of Eliminating Errors in Cryptographic Computations”, Journal of Cryptology, 14 (2): 101–119.
[32] Boneh, D. and G. Durfee (1999). “Cryptanalysis of RSA with Private Key d Less Than N0.292”, Advances in Cryptology—EUROCRYPT ’99, Lecture Notes in Computer Science, 1592. pp. 1–11. Berlin/Heidelberg: Springer.
[33] Boneh, D., G. Durfee and Y. Frankel (1998). “Exposing an RSA Private Key Given a Small Fraction of Its Bits”, Advances in Cryptology—ASIACRYPT ’98, Lecture Notes in Computer Science, 1514. pp. 25–34. Berlin/Heidelberg: Springer.
[34] Boneh, D. and M. K. Franklin (2001). “Identity-based Encryption from the Weil Pairing”, Advances in Cryptology—CRYPTO 2001, Lecture Notes in Computer Science, 2139. pp. 213–229. Berlin/Heidelberg: Springer.
[35] Boneh, D. and M. K. Franklin (2003). “Identity-based Encryption from the Weil Pairing”, SIAM Journal of Computing, (32) 3: 586–615.
[36] Bressoud, D. M. (1989). Factorization and Primality Testing. Undergraduate Texts in Mathematics. New York: Springer.
[37] * Buchmann, J. A. (2004). Introduction to Cryptography. Undergraduate Texts in Mathematics. New York: Springer.
[38] Buchmann, J. A. et al. (2004), “The Number Field Cryptography Project” [online document]. Available at http://www.informatik.tu-darmstadt.de/TI/Forschung/nfc.html (October 2008).
[39] Buchmann, J. A. and S. Hamdy (2001). “A Survey on IQ Cryptography”. Technical report TI-4/01, TU Darmstadt, Fachbereich Informatik.
[40] Buchmann, J. A. and D. Weber (2000). “Discrete Logarithms: Recent Progress”, in J. Buchmann, T. Hoeholdt, H. Stichtenoth and H. Tapia-Recillas (eds.), Coding Theory, Cryptography and Related Areas, pp. 42–56. Proceedings of an International Conference on Coding Theory, Cryptography and Related Areas, Guanajuato, Mexico, April 1998.
[41] Buhler, J., H. W. Lenstra and C. Pomerance (1993). “Factoring Integers with the Number Field Sieve”, in A. K. Lenstra and H. W. Lenstra (eds.), The Development of the Number Field Sieve, Lecture Notes in Mathematics, 1554. pp. 50–94. Berlin: Springer.
[42] * Burton, D. M. (1998). Elementary Number Theory, 4th ed. New York: McGraw-Hill.
[43] Cantor, D. G. (1994). “On the Analogue of Division Polynomials for Hyperelliptic Curves”, Journal für die reine und angewandte Mathematik, 447: 91–145.
[44] Chan, H., A. Perrig and D. Song (2003). “Random Key Predistribution Schemes for Sensor Networks”, pp. 197–213. Proeedings of the 24th IEEE Symposium on Research in Security and Privacy, Berkeley, California, 11–14 May.
[45] Chari, S., C. S. Jutla, J. R. Rao, and P. Rohatgi (1999). “Towards Sound Approaches to Counteract Power-Analysis Attacks”, Advances in Cryptology—CRYPTO ’99, Lecture Notes in Computer Science, 1666. pp. 398–412. Berlin/Heidelberg: Springer.
[46] Charlap, L. S. and R. Coley (1990). “An Elementary Introduction to Elliptic Curves II”, CCR Expository Report 34.
[47] Charlap, L. S. and D. P. Robbins (1988). “An Elementary Introduction to Elliptic Curves”, CRD Expository Report 31.
[48] Chaum, D. (1983). “Blind Signatures for Untraceable Payments”, Advances in Cryptology—CRYPTO ’82. pp. 199–203. New York: Plenum Press.
[49] Chaum, D. (1985). “Security Without Identification: Transaction System to Make Big Brother Obsolete”, Communications of the ACM, 28 (10): 1030–1044.
[50] Chaum, D. (1989). “Privacy Protected Payments: Unconditional Payer and/or Payee Untraceability”, Smart Card 2000: The Future of IC Cards, pp. 69–93. Amsterdam: North-Holland.
[51] Chaum, D. (1990). “Zero-Knowledge Undeniable Signatures”, Advances in Cryptology—CRYPTO ’90, Lecture Notes in Computer Science, 473. pp. 458–464. Berlin/Heidelberg: Springer.
[52] Chaum, D. and H. van Antwerpen (1989). “Undeniable Signatures”, Advances in Cryptology—CRYPTO ’89, Lecture Notes in Computer Science, 435. pp. 212–217. Berlin/Heidelberg: Springer.
[53] Chaum, D., E. van Heijst and B. Pfitzmann (1991). “Cryptographically Strong Undeniable Signatures, Unconditionally Secure for the Signer”, Advances in Cryptology—CRYPTO ’91, Lecture Notes in Computer Science, 576. pp. 470–484. Berlin/Heidelberg: Springer.
[54] Chor, B. and R. L. Rivest (1988). “A Knapsack Type Cryptosystem Based on Arithmetic in Finite Fields”, IEEE Transactions on Information Theory, 34: 901–909.
[55] Clavier, C., J.-S. Coron and N. Dabbous (2000). “Differential Power Analysis in the Presence of Hardware Countermeasures”, Cryptographic Hardware and Embedded Systems—CHES 2000, Lecture Notes in Computer Science, 1965. pp. 252–263. Berlin/Heidelberg: Springer.
[56] Cohen, H. (1993). A Course in Computational Algebraic Number Theory. Graduate Texts in Mathematics, 138. New York: Springer.
[57] Coppersmith, D. (1984). “Fast Evaluation of Logarithms in Fields of Characteristic Two”, IEEE Transactions on Information Theory, 30: 587–594.
[58] Coppersmith, D. (1994). “Solving Homogeneous Equations over GF[2] via Block Wiedemann Algorithm”, Mathematics of Computation, 62: 333–350.
[59] Coppersmith, D., A. M. Odlyzko and R. Schroeppel (1986). “Discrete Logarithms in GF (p)”, Algorithmica, 1: 1–15.
[60] Coppersmith, D. and S. Winograd (1982). “On the Asymptotic Complexity of Matrix Multiplication”, SIAM Journal of Computing, 11 (3): 472–492.
[61] * Cormen, T. H., C. E. Lieserson, R. L. Rivest and C. Stein (2001). Introduction to Algorithms, 2nd ed. Cambridge, Massachusetts: MIT Press.
[62] Coron, J.-S. (1999). “Resistance Against Differential Power Analysis for Elliptic Curve Cryptosystems”, Cryptographic Hardware and Embedded Systems—CHES 1999, Lecture Notes in Computer Science, 1965. pp. 292–302. Berlin/Heidelberg: Springer.
[63] Coron, J.-S., L. Goubin (2000). “On Boolean and Arithmetic Masking Against Differential Power Analysis”, Cryptographic Hardware and Embedded Systems—CHES 2000, Lecture Notes in Computer Science, 1965. pp. 231–237. Berlin/Heidelberg: Springer.
[64] Coster, M. J., A. Joux, B. A. LaMacchia, A. M. Odlyzko, C. P. Schnorr and J. Stern (1992). “Improved Low-Density Subset Sum Algorithms”, Computational Complexity, 2: 111–128.
[65] Coster, M. J., B. A. LaMacchia, A. M. Odlyzko and C. P. Schnorr (1991). “An Improved Low-Density Subset Sum Algorithm”, Advances in Cryptology—EUROCRYPT ’91, Lecture Notes in Computer Science, 547. pp. 54–67. Berlin/Heidelberg: Springer.
[66] Courtois, N. (2003). “Fast Algebraic Attacks on Stream Ciphers with Linear Feedback”, Advances in Cryptology—CRYPTO 2003, Lecture Notes in Computer Science, 2729. pp. 177–194. Berlin/Heidelberg: Springer.
[67] Courtois, N. and W. Meier (2003). “Algebraic Attacks on Stream Ciphers with Linear Feedback”, Advances in Cryptology—EUROCRYPT 2003, Lecture Notes in Computer Science, 2656. pp. 345–359. Berlin/Heidelberg: Springer.
[68] Courtois, N. and J. Pieprzyk (2003). “Cryptanalysis of Block Ciphers with Overdefined Systems of Equations”, Advances in Cryptology—ASIACRYPT 2002, Lecture Notes in Computer Science, 2501. pp. 267–287. Berlin/Heidelberg: Springer.
[69] Crandall, R. and C. Pomerance (2001). Prime Numbers: A Computational Perspective. New York: Springer.
[70] Crépeau, C. and A. Slakmon (2003). “Simple Backdoors for RSA Key Generation”, Topics in Cryptology—CT-RSA 2003, Lecture Notes in Computer Science, 2612. pp. 403–416. Berlin/Heidelberg: Springer.
[71] Daemen, J. and V. Rijmen (2002). The Design of Rijndael: AES—The Advanced Encryption Standard. New York: Springer.
[72] Das, A. (1999). Galois Field Computations: Implementation of a Library and a Study of the Discrete Logarithm Problem [dissertation]. Bangalore, India: Indian Institute of Science.
[73] Das, A. and C. E. Veni Madhavan (1999). “Performance Comparison of Linear Sieve and Cubic Sieve Algorithms for Discrete Logarithms over Prime Fields”, Algorithms and Computation, ISAAC ’99, Lecture Notes in Computer Science, 1741. pp. 295–306. Berlin/Heidelberg: Springer.
[74] * Delfs, H. and H. Knebl (2007). Introduction to Cryptography: Principles and Applications, 2nd ed. Berlin and New York: Springer.
[75] Deutsch, D. (1985). “Quantum Theory, the Church-Turing Principle and the Universal Quantum Computer”. Proceedings of the Royal Society of London, Series A, 400. pp. 97–117.
[76] Deutsch, D. (1998). The Fabric of Reality: The Science of Parallel Universes—and Its Implications. London: Penguin.
[77] Dhem, J.-F., F. Koeune, P.-A. Leroux, P. Mestré, J.-J. Quisquater and J.-L. Willems (2000). “A Practical Implementation of the Timing Attack”, in J.-J. Quisquater and B. Schneier (eds.), Smart Card: Research and Applications, Lecture Notes in Computer Science, 1820. Proceedings of the Third Working Conference on Smart Card Research and Advanced Applications—CARDIS ’98, Louvain-la-Neuve, Belgium, 14–16 September 1998. Springer.
[78] Diffie, W. and M. Hellman (1976). “New Directions in Cryptography”, IEEE Transactions on Information Theory, 22: 644–654.
[79] Du, W., J. Deng, Y. S. Han and P. K. Varshney (2003). “Establishing Pairwise Keys in Distributed Sensor Networks”, pp. 42–51. Proceedings of the 10th ACM Conference on Computer and Communication Security, Washington D.C., USA, 27–30 October.
[80] Du, W., J. Deng, Y. S. Han, S. Chen and P. K. Varshney (2004). “A Key Management Scheme for Wireless Sensor Networks Using Deployment Knowledge”. Proceedings of IEEE INFOCOM 2004, Hong Kong, 7–11 March.
[81] * Dummit, D. and R. Foote (2004). Abstract Algebra, 3rd ed. Somerset, New Jersey: John Wiley & Sons.
[82] Durfee, G. and P. Q. Nguyen (2000). “Cryptanalysis of the RSA Schemes with Short Secret Exponent from Asiacrypt ’99”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 30–44. Berlin/Heidelberg: Springer.
[83] Dusart, P. (1999). “The kth Prime Is Greater than k(ln k+ln ln k–1) for k > 2”, Mathematics of Computation, 68: 411–415.
[84] ElGamal, T. (1985). “A Public-Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms”, IEEE Transactions on Information Theory, 31: 469–472.
[85] Elkies, N. D. (1998). “Elliptic and Modular Curves over Finite Fields and Related Computational Issues”, AMS/IP Studies in Advanced Mathematics, 7: 21–76.
[86] Enge, A. (1999). “Computing Discrete Logarithms in High-Genus Hyperelliptic Jacobians in Provably Subexponential Time”. Technical report CORR 99-04, University of Waterloo, Canada.
[87] Enge, A. and P. Gaudry (2002). “A General Framework for Subexponential Discrete Logarithm Algorithms”, Acta Arithmetica, 102 (1): 83–103.
[88] Eschenauer, L. and V. D. Gligor (2002). “A Key-Management Scheme for Distributed Sensor Networks”. Proceedings of the 9th ACM Conference on Computer and Communication Security, pp. 41–47. Washington D.C., USA, 18–22 November.
[89] * Esmonde, J. and M. Ram Murty (1999). Problems in Algebraic Number Theory. Graduate Texts in Mathematics, 190. New York: Springer.
[90] Fiat, A. and A. Shamir (1987). “How to Prove Yourself: Practical Solutions to Identification and Signature Problems”, Advances in Cryptology—CRYPTO ’86, Lecture Notes in Computer Science, 263. pp. 186–194. Berlin/Heidelberg: Springer.
[91] Feige, U., A. Fiat, and A. Shamir (1988). “Zero-Knowledge Proofs of Identity”, Journal of Cryptology, 1: 77–94.
[92] * Feller, W. (1966). Introduction to Probability Theory and Its Applications, 3rd ed. New York: John Wiley & Sons.
[93] Ferguson, N., J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner and D. Whiting (2000). “Improved Cryptanalysis of Rijndael”, Fast Software Encryption—FSE 2000, Lecture Notes in Computer Science, 1978. pp. 213–230. Berlin/Heidelberg: Springer.
[94] Fouquet, M., P. Gaudry and R. Harley (2000). “An Extension of Satoh’s Algorithm and Its Implementation”, Journal of Ramanujan Mathematical Society, 15: 281–318.
[95] Fouquet, M., P. Gaudry and R. Harley (2001). “Finding Secure Curves with the Satoh-FGH Algorithm and an Early-Abort Strategy”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. Berlin/Heidelberg: Springer.
[96] * Fraleigh, J. B. (1998). A First Course in Abstract Algebra, 6th ed. Reading, Massachusetts: Addison-Wesley.
[97] Fujisaki, E., T. Kobayashi, H. Morita, H. Oguro, T. Okamoto, S. Okazaki, D. Pointcheval and S. Uchiyama (1999). “EPOC: Efficient Probabilistic Public-Key Encryption”, contribution to IEEE P1363a.
[98] Fujisaki, E., T. Okamoto, D. Pointcheval, J. Stern (2001). “RSA-OAEP is Secure under the RSA Assumption”, Advances in Cryptology—CRYPTO 2001, Lecture Notes in Computer Science, 2139. pp. 260–274. Berlin/Heidelberg: Springer.
[99] Fulton, W. (1969). Algebraic Curves. Mathematics Lecture Notes Series. New York: W. A. Benjamin.
[100] Galbraith, S. D. (2003). “Weil Descent of Jacobians”, Discrete Applied Mathematics, 128 (1): 165–180.
[101] Galbraith, S. D., F. Hess and N. P. Smart (2002). “Extending the GHS Weil Descent Attack”, Advances in Cryptology—EUROCRYPT 2002, Lecture Notes in Computer Science, 2332. pp. 29–44. Berlin/Heidelberg: Springer.
[102] Galbraith, S. D., W. Mao, and K. G. Paterson (2002). “RSA-based Undeniable Signatures for General Moduli”, Topics in Cryptology—CT-RSA 2002, Lecture Notes in Computer Science, 2271. pp. 200–217. Berlin/Heidelberg: Springer.
[103] Gathen, J. von zur and J. Gerhard (1999). Modern Computer Algebra. Cambridge: Cambridge University Press.
[104] Gathen, J. von zur and V. Shoup (1992). “Computing Frobenius Maps and Factoring Polynomials”, pp. 97–105. Proceedings of the 24th Annual ACM Symposium on Theory of Computing, Victoria, British Columbia, Canada.
[105] Gaudry, P. (2000). “An Algorithm for Solving the Discrete Log Problem on Hyperelliptic Curves”, Advances in Cryptology—EUROCRYPT 2000, Lecture Notes in Computer Science, 1807. pp. 19–34. Berlin/Heidelberg: Springer.
[106] Gaudry, P. and R. Harley (2000). “Counting Points on Hyperelliptic Curves over Finite Fields”, Algorithmic Number Theory—ANTS-IV, Lecture Notes in Computer Science, 1838. pp. 313–332. Berlin/Heidelberg: Springer.
[107] Gaudry, P., F. Hess and N. P. Smart (2002). “Constructive and Destructive Facets of Weil Descent on Elliptic Curves”, Journal of Cryptology, 15 (1): 19–46.
[108] Geddes, K. O., S. R. Czapor and G. Labahn (1992). Algorithms for Computer Algebra. Boston: Kluwer Academic Publishers.
[109] Gennaro, R., H. Krawczyk and T. Rabin (2000). “RSA-based Undeniable Signatures”, Journal of Cryptology, 13 (4): 397–416.
[110] Gentry, C., J. Jonsson, M. Szydlo and J. Stern (2001). “Cryptanalysis of the NTRU Signature Scheme (NSS) from Eurocrypt 2001”, Advances in Cryptology—ASIACRYPT 2001, Lecture Notes in Computer Science, 2248. pp. 1–20. Berlin/Heidelberg: Springer.
[111] Gentry, C. and M. Szydlo (2002). “Cryptanalysis of the NTRU Signature Scheme”, Advances in Cryptology—EUROCRYPT ’02, Lecture Notes in Computer Science, 2332. pp. 299–320. Berlin/Heidelberg: Springer.
[112] Gilbert, H. and M. Minier (2000). “A Collision Attack on Seven Rounds of Rijndael”, pp. 230–241. Proceedings of the 3rd AES Conference, NIST, New York, April 2000.
[113] * Goldreich, O. (2001). Foundations of Cryptography, Volume 1: Basic Tools. Cambridge: Cambridge University Press.
[114] * Goldreich, O. (2004). Foundations of Cryptography, Volume 2: Basic Applications. Cambridge: Cambridge University Press.
[115] Goldreich, O., S. Goldwasser and S. Halevi (1997). “Public-key Cryptosystems from Lattice Reduction Problems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 112–131. Berlin/Heidelberg: Springer.
[116] Goldwasser, S. and J. Kilian (1986). “Almost All Primes Can Be Quickly Certified”, pp. 316–329. Prodeedings of the 18th Annual ACM Symposium on Theory of Computing, Berkeley, California.
[117] Goldwasser, S. and S. Micali (1984). “Probabilistic Encryption”, Journal of Computer and Systems Sciences, 28: 270–299.
[118] Gordon, D. M. (1985). “Strong Primes are Easy to Find”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 216–223. Berlin/Heidelberg: Springer.
[119] Gordon, D. M. (1993). “Discrete Logarithms in GF (p) Using the Number Field Sieve”, SIAM Journal of Discrete Mathematics, 6: 124–138.
[120] Gordon, D. M. and K. S. McCurley (1992). “Massively Parallel Computation of Discrete Logarithms”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 312–323. Berlin/Heidelberg: Springer.
[121] Grinstead, C. M. and J. L. Snell (1997). Introduction to Probability, 2nd revised ed. Providence, Rhode Island: American Mathematical Society. The book is also available at http://www.dartmouth.edu/~chance/book.html (October 2008).
[122] Guillou, L. C. and J.-J. Quisquater (1988). “A Practical Zero-Knowledge Protocol Fitted to Security Microprocessor Minimizing Both Trasmission and Memory”, Advances in Cryptology—EUROCRYPT ’88, Lecture Notes in Computer Science, 330. pp. 123–128. Berlin/Heidelberg: Springer.
[123] Hankerson, D., A. J. Menezes and S. Vanstone (2004). Guide to Elliptic Curve Cryptography. New York: Springer.
[124] Hartshorne, R. (1977). Algebraic Geometry. Graduate Texts in Mathematics, 52. New York, Heidelberg and Berlin: Springer.
[125] * Herstein, I. N. (1975). Topics in Algebra. New York: John Wiley & Sons.
[126] Hess, F., G. Seroussi and N. P. Smart (2000). “Two Topics in Hyperelliptic Cryptography”. HP Labs technical report HPL-2000-118.
[127] * Hoffman, K. and R. Kunze (1971). Linear Algebra. Englewood Cliffs, New Jersey: Prentice-Hall.
[128] Hoffstein, J., N. Howgrave-Graham, J. Pipher, J. H. Silverman and W. White (2003). “NTRUSign: Digital Signatures Using the NTRU Lattice”, Topics in Cryptology—CT-RSA 2003, Lecture Notes in Computer Science, 2612. pp. 122–140. Berlin/Heidelberg: Springer.
[129] Hoffstein, J., N. Howgrave-Graham, J. Pipher, J. H. Silverman and W. White (2005). “Performance Improvements and a Baseline Parameter Generation Algorithm for NTRUSign”, Workshop on Mathematical Problems and Techniques in Cryptology, Barcelona, Spain, June 2005. Also available at http://www.ntru.com/cryptolab/articles.htm (October 2008).
[130] Hoffstein, J., J. Pipher and J. H. Silverman (1998). “NTRU: A Ring-Based Public Key Cryptosystem”, Algorithmic Number Theory—ANTS-III, Lecture Notes in Computer Science, 1423. pp. 267–288. Berlin/Heidelberg: Springer.
[131] Hoffstein, J., J. Pipher and J. H. Silverman (2001). “NSS: An NTRU Lattice-Based Signature Scheme”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. pp. 211–228. Berlin/Heidelberg: Springer.
[132] Horster, P., M. Michels and H. Petersen (1994). “Meta-ElGamal Signature Schemes”. Technical report TR-94-5-F, Department of Computer Science, Teschnische Universität, Chemnitz-Zwickau.
[133] * Hungerford, T. W. (1974). Algebra, 5th ed. Graduate Texts in Mathematics, 73. Berlin: Springer.
[134] IEEE (2008), “Standard Specifications for Public-Key Cryptography” [online document]. Available at http://grouper.ieee.org/groups/1363/index.html (October 2008).
[135] IETF (2008), “The Internet Engineering Task Force” [online document]. Available at http://www.ietf.org/ (October 2008).
[136] * Ireland, K. and M. Rosen (1990). A Classical Introduction to Modern Number Theory. Graduate Texts in Mathematics, 84. New York: Springer.
[137] Izu, T., B. Möller and T. Takagi (2002). “Improved Elliptic Curve Multiplication Methods Resistant Against Side Channel Attacks”, Progress in Cryptology—INDOCRYPT 2002, Lecture Notes in Computer Science, 2551. pp. 296–313. Berlin/Heidelberg: Springer.
[138] Izu, T. and T. Takagi (2002). “A Fast Parallel Elliptic Curve Multiplication Resistant Against Side Channel Attacks”, Public Key Cryptography—PKC 2002, Lecture Notes in Computer Science, 2274. pp. 280–296. Berlin/Heidelberg: Springer. An improved version of this paper is published as the technical report CORR 2002-03 of the Centre for Applied Cryptographic Research, University of Waterloo, Canada, and is available at http://www.cacr.math.uwaterloo.ca/ (October 2008).
[139] Jacobson, M. J., N. Koblitz, J. H. Silverman, A. Stein and E. Teske (2000). “Analysis of the Xedni Calculus Attack”, Design, Codes and Cryptography, 20: 41–64.
[140] Janusz, G. J. (1995). Algebraic Number Fields. Providence, Rhode Island: American Mathematical Society.
[141] Johnson, D. and A. Menezes (1999). “The Elliptic Curve Digitial Signature Algorithm (ECDSA)”. Technical report CORR 99-34, Department of Combinatorics and Optimization, University of Waterloo, Canada. Also published in International Journal on Information Security (2001), 1: 36–63.
[142] Joye, M., A. K. Lenstra and J.-J. Quisquater (1999). “Chinese Remaindering Based Cryptosystems in the Presence of Faults”, Journal of Cryptology, 12 (4): 241–246.
[143] Kaltofen, E. and V. Shoup (1995). “Subquadratic-Time Factoring of Polynomials over Finite Fields”, pp. 398–406. Proceedings of the 27th Annual ACM Symposium on Theory of Computing, Las Vegas, Nevada.
[144] Kampkötter, W. (1991). Explizite Gleichungen für Jacobishe Varietäten hyperelliptischer Kurven [dissertation]. Essen: Gesamthochschule.
[145] Katz, J. and Y. Lindell (2007). Introduction to Modern Cryptography. Boca Raton, Florida; London and New York: CRC Press.
[146] Kaye, P. and C. Zalka (2004), “Optimized Quantum Implementation of Elliptic Curve Arithmetic over Binary Fields” [online document]. Available at http://arxiv.org/abs/quant-ph/0407095 (October 2008).
[147] * Knuth, D. E. (1997). The Art of Computer Programming, Volume 2: Seminumerical Algorithms. Reading, Massachusetts: Addison-Wesley.
[148] Ko, K. H., S. J. Lee, J. H. Cheon, J. W. Han, J. S. Kang and C. S. Park (2000). “New Public-Key Cryptosystem Using Braid Groups”, Advances in Cryptology—CRYPTO 2000, Lecture Notes in Computer Science, 1880. pp. 166–183. Berlin/Heidelberg: Springer.
[149] Koblitz, N. (1984). p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd ed. Graduate Texts in Mathematics, 58. New York, Heidelberg and Berlin: Springer.
[150] Koblitz, N. (1987). “Elliptic Curve Cryptosystems”, Mathematics of Computation, 48: 203–209.
[151] Koblitz, N. (1989). “Hyperelliptic Cryptosystems”, Journal of Cryptology, 1: 139–150.
[152] Koblitz, N. (1993). Introduction to Elliptic Curves and Modular Forms, 2nd ed. Graduate Texts in Mathematics, 97. Berlin: Springer.
[153] * Koblitz, N. (1994). A Course in Number Theory and Cryptography, 2nd ed. New York:Springer.
[154] Koblitz, N. (1998). Algebraic Aspects of Cryptography. New York: Springer.
[155] Kocher, P. C. (1996). “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”, Advances in Cryptology—CRYPTO ’96, Lecture Notes in Computer Science, 1109. pp. 104–113. Berlin/Heidelberg: Springer.
[156] Kocher, P. C., J. Jaffe and B. Jun (1999). “Differential Power Analysis”, Advances in Cryptology—CRYPTO ’99, Lecture Notes in Computer Science, 1666. pp. 388–397. Berlin/Heidelberg: Springer.
[157] Lagarias, J. C. and A. M. Odlyzko (1985). “Solving Low-Density Subset Sum Problems”, Journal of ACM, 32: 229–246.
[158] LaMacchia, B. A. and A. M. Odlyzko (1991a). “Computation of Discrete Logarithms in Prime Fields”, Designs, Codes and Cryptography, 1: 46–62.
[159] LaMacchia, B. A. and A. M. Odlyzko (1991b). “Solving Large Sparse Linear Systems over Finite Fields”, Advances in Cryptology—CRYPTO ’90, Lecture Notes in Computer Science, 537. pp. 109–133. Berlin/Heidelberg: Springer.
[160] Lang, S. (1994). Algebraic Number Theory. Graduate Texts in Mathematics, 110. New York: Springer.
[161] Law, L., A. Menezes, A. Qu, J. Solinas and S. Vanstone (1998). “An Efficient Protocol for Authenticated Key Agreement”. Technical report CORR 98-05, Department of Combinatorics and Optimization, University of Waterloo, Canada.
[162] Lehmer, D. H. and R. E. Powers (1931). “On Factoring Large Numbers”, Bulletin of the AMS, 37: 770–776.
[163] Lenstra, A. K., E. Tromer, A. Shamir, W. Kortsmit, B. Dodson, J. Hughes and P. Leyland (2003). “Factoring Estimates for a 1024-Bit RSA Modulus”, Advances in Cryptology—ASIACRYPT 2003, Lecture Notes in Computer Science, 2894. pp. 55–74. Berlin/Heidelberg: Springer.
[164] Lenstra, A. K. and H. W. Lenstra (1990). “Algorithms in Number Theory”, in J. van Leeuwen (ed.), Handbook of Theoretical Computer Science, Volume A, pp. 675–715, Amsterdam: Elsevier.
[165] Lenstra, A. K. and H. W. Lenstra (ed.) (1993). The Development of the Number Field Sieve. Lecture Notes in Mathematics, 1554. Berlin: Springer.
[166] Lenstra, A. K., H. W. Lenstra and L. Lovasz (1982). “Factoring Polynomials with Rational Coefficients”, Mathematische Annalen, 261: 515–534.
[167] Lenstra, A. K., H. W. Lenstra, M. S. Manasse and J. M. Pollard (1990). “The Number Field Sieve”, pp. 564–572. Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, Baltimore, Maryland, USA, 13–17 May.
[168] Lenstra, A. K. and A. Shamir (2000). “Analysis and Optimization of the TWINKLE Factoring Device”, Advances in Cryptology—EUROCRYPT 2000, Lecture Notes in Computer Science, 1807. pp. 35–52. Berlin/Heidelberg: Springer.
[169] Lenstra, A. K., A. Shamir, J. Tomlinson and E. Tromer (2002). “Analysis of Bernstein’s Factorization Circuit”, Advances in Cryptology—ASIACRYPT 2002, Lecture Notes in Computer Science, 2501. pp. 1–26. Berlin/Heidelberg: Springer.
[170] Lenstra, A. K. and E. R. Verheul (2000a). “The XTR Public Key System”, Advances in Cryptology—CRYPTO 2000, Lecture Notes in Computer Science, 1880. pp. 1–20. Berlin/Heidelberg: Springer.
[171] Lenstra, A. K. and E. R. Verheul (2000b). “Key Improvements to XTR”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 220–233. Berlin/Heidelberg: Springer.
[172] Lenstra, A. K. and E. R. Verheul (2001a). “An Overview of the XTR Public Key System”, pp. 151–180. Proceedings of the Public Key Cryptography and Computational Number Theory Conference, Warsaw, Poland, 2000. Berlin: Verlages Walter de Gruyter.
[173] Lenstra, A. K. and E. R. Verheul (2001b). “Fast Irreducibility and Subgroup Membership Testing in XTR”, Public Key Cryptography—PKC 2001, Lecture Notes in Computer Science, 1992. pp. 73–86. Berlin/Heidelberg: Springer.
[174] Lenstra, H. W. (1987). “Factoring Integers with Elliptic Curves”, Annals of Mathematics, 126: 649–673.
[175] Lenstra, H. W. and C. Pomerance (2005), “Primality Testing with Gaussian Periods” [online document]. Available at http://www.math.dartmouth.edu/~carlp/PDF/complexity12.pdf (October 2008).
[176] Lercier, R. (1997). “Finding Good Random Elliptic Curves for Cryptosystems Defined over
“, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 379–392. Berlin/Heidelberg: Springer.
[177] Lercier, R. and D. Lubicz (2003). “Counting Points on Elliptic Curves over Finite Fields of Small Characteristic in Quasi Quadratic Time”, Advances in Cryptology—EUROCRYPT 2003, Lecture Notes in Computer Science, 2656. pp. 360–373. Berlin/Heidelberg: Springer.
[178] Libert, B. and J.-J. Quisquater (2003), “New Identity Based Signcryption Schemes from Pairings” [online document]. Available at http://eprint.iacr.org/2003/023/ (October 2008).
[179] Lidl, R. and H. Niederreiter (1984). Finite Fields, Encyclopedia of Mathematics and Its Applications, 20. Cambridge: Cambridge University Press.
[180] Lidl, R. and H. Niederreiter (1994). Introduction to Finite Fields and Their Applications. Cambridge: Cambridge University Press.
[181] Liu, D. and P. Ning (2003a). “Establishing Pairwise Keys in Distributed Sensor Networks”, pp. 52–61. Proceedings of the 10th ACM Conference on Computer and Communication Security, Washington D.C., USA, October 2003.
[182] Liu, D. and P. Ning (2003b). “Location-Based Pairwise Key Establishments for Static Sensor Networks”, pp. 72–82. Proceedings of the 1st ACM Workshop on Security in Ad Hoc and Sensor Networks, Fairfax, Virginia, 31 October 2003.
[183] Liu, D., P. Ning and R. Li (2005). “Establishing Pairwise Keys in Distributed Sensor Networks”, ACM Transactions on Information and System Security, (8) 1: 41–77.
[184] Lucks, S. (2000). “Attacking Seven Rounds of Rijndael Under 192-bit and 256-bit Keys”, pp. 215–229. Proceedings of the 3rd Advanced Encryption Standard Candidate conference, New York, April 2000.
[185] Malone-Lee, J. (2002), “Identity-Based Signcryption” [online document]. Available at http://eprint.iacr.org/2002/098/ (October 2008).
[186] Mao, W. (2001). “New Zero-Knowledge Undeniable Signatures—Forgery of Signature Equivalent to Factor-isation”. Hewlett-Packard technical report HPL-2201-36.
[187] Mao, W. and K. G. Paterson (2000). “Convertible Undeniable Standard RSA Signatures”. Hewlett-Packard technical report HPL-2000-148.
[188] Matsumoto, T. and H. Imai (1988). “Public Quadratic Polynomial-Tuples for Efficient Signature-Verification and Message-Encryption”, Advances in Cryptology—EUROCRYPT ’88, Lecture Notes in Computer Science, 330. pp. 419–453. Berlin/Heidelberg: Springer.
[189] McCurley, K. S. (1990). “The Discrete Logarithm Problem”, in C. Pomerance and S. Goldwasser (eds.), Cryptology and Computational Number Theory: American Mathematical Society Short Course, Boulder, Colorado, 6–7 August 1989. Proceedings of Symposia in Applied Mathematics, 42. pp. 49–74. Providence, Rhode Island: American Mathematical Society.
[190] McEliece, R. J. (1978). “A Public-Key Cryptosystem Based on Algebraic Coding Theory”. DSN progress report 42–44, Jet Propulsion Laboratory, California Institute of Technology, pp. 114–116.
[191] Menezes, A. J. (ed.) (1993). Applications of Finite Fields. Boston: Kluwer Academic Publishers.
[192] Menezes, A. J. (1993). Elliptic Curve Public Key Cryptosystems. The Springer International Series in Engineering and Computer Science, 234. Springer. Available at http://books.google.co.in/books?id=bIb54ShKS68C (October 2008).
[193] Menezes, A. J., T. Okamoto and S. Vanstone (1993). “Reducing Elliptic Curve Logarithms to a Finite Field”, IEEE Transactions on Information Theory, 39: 1639–1646.
[194] Menezes, A. J., P. van Oorschot and S. Vanstone (1997). Handbook of Applied Cryptography. Boca Raton, Florida: CRC Press.
[195] Menezes, A. J., Y. Wu and R. Zuccherato (1996). “An Elementary Introduction to Hyperelliptic Curves”. CACR technical report CORR 96-19, University of Waterloo, Canada.
[196] Merkle, R. C. amd M. E. Hellman (1978). “Hiding Information and Signatures in Trapdoor Knapsacks”, IEEE Transactions on Information Theory, 24 (5): 525–530.
[197] Mermin, N. D. (2003). “From Cbits to Qbits: Teaching Computer Scientists Quantum Mechanics”, American Journal of Physics, 71: 23–30.
[198] Mermin, N. D. (2006), “Phys481-681-CS483 Lecture Notes and Homework Assignments” [online document]. Available at http://people.ccmr.cornell.edu/~mermin/qcomp/CS483.html (October 2008).
[199] Messerges, T. S. (2000). “Securing the AES Finalists Against Power Analysis Attacks”, Fast Software Encryption—FSE 2000, Lecture Notes in Computer Science, 1978. pp. 150–164. Berlin/Heidelberg: Springer.
[200] Messerges, T. S., E. A. Dabbish and R. H. Sloan (1999). “Power Analysis Attacks of Modular Exponentiation in Smartcards”, Cryptographic Hardware and Embedded Systems—CHES 1999, Lecture Notes in Computer Science, 1717. pp. 144–157. Berlin/Heidelberg: Springer.
[201] Messerges, T. S., E. A. Dabbish and R. H. Sloan (2002). “Examining Smart-Card Security Under the Threat of Power Analysis Attacks”, IEEE Transactions on Computers, 51 (4): 541–552.
[202] Michels, M. and M. Stadler (1997). “Efficient Convertible Undeniable Signature Schemes”, pp. 231–244. Proceedings of the 4th International Workshop on Selected Areas in Cryptography, Ottawa, Canada.
[203] Mignotte, M. (1992). Mathematics for Computer Algebra. New York: Springer.
[204] Miller, G. L. (1976). “Riemann’s Hypothesis and Tests for Primality”, Journal of Computer and System Sciences, 13: 300–317.
[205] Miller, V. (1986). “Uses of Elliptic Curves in Cryptography”, Advances in Cryptology—CRYPTO ’85, Lecture Notes in Computer Science, 18. pp. 417–426. Berlin/Heidelberg: Springer.
[206] Möller, B. (2001). “Securing Elliptic Curve Point Multiplication Against Side-Channel Attacks”, Information Security Conference, Lecture Notes in Computer Science, 2200. pp. 324–334. Berlin/Heidelberg: Springer.
[207] Mollin, R. A. (1998). Fundamental Number Theory with Applications. Boca Raton, Florida: Chapman & Hall/CRC.
[208] Mollin, R. A. (1999). Algebraic Number Theory. Boca Raton, Florida: Chapman & Hall/CRC.
[209] Mollin, R. A. (2001). An Introduction to Cryptography. Boca Raton, Florida: Chapman & Hall/CRC.
[210] Montgomery, P. L. (1985). “Modular Multiplication Without Trial Division”, Mathematics of Computation, 44: 519–521.
[211] Montgomery, P. L. (1994). “A Survey of Modern Integer Factorization Algorithms”, CWI Quarterly, 7 (4): 337–366.
[212] Montgomery, P. L. (1995). “A Block Lanczos Algorithm for Finding Dependencies over GF(2)”, Advances in Cryptology—EUROCRYPT ’95, Lecture Notes in Computer Science, 921. pp. 106–120. Berlin/Heidelberg: Springer.
[213] Morrison, M. A. and J. Brillhart (1975). “A Method of Factoring and a Factorization of F7”, Mathematics of Computation, 29: 183–205.
[214] * Motwani, R. and P. Raghavan (1995). Randomized Algorithms. Cambridge: Cambridge University Press.
[215] Muir, J. A. (2001). Techniques of Side Channel Cryptanalysis [dissertation]. Canada: University of Waterloo. Available at http://www.uwspace.uwaterloo.ca/bitstream/10012/1098/1/jamuir2001.pdf (October 2008).
[216] Neukirch, J. (1999). Algebraic Number Theory. Berlin and Heidelberg: Springer.
[217] Nguyen, P. Q. (2006), “A Note on the Security of NTRUSign” [online document]. Available at http://eprint.iacr.org/2006/387 (October 2008).
[218] * Nielsen, M. A. and I. L. Chuang (2000). Quantum Computation and Quantum Information. Cambridge: Cambridge University Press.
[219] NIST (2001), “Advanced Encryption Standard” [online document]. Available at http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf (October 2008).
[220] NIST (2006), “Digital Signature Standard (DSS)” [online document]. Available at http://csrc.nist.gov/publications/drafts/fips_186-3/Draft-FIPS-186-3%20_March2006.pdf (October 2008).
[221] NIST (2007a), “Federal Information Processing Standards” [online document]. Available at http://csrc.nist.gov/publications/PubsFIPS.html (October 2008).
[222] NIST (2007b), “Secure Hash Standard (SHS)” [online document]. Available at http://csrc.nist.gov/publications/drafts/fips_180-3/draft_fips-180-3_June-08-2007.pdf (October 2008).
[223] Nyberg, K. and R. A. Rueppel (1993). “A New Signature Scheme Based on the DSA Giving Message Recovery”, pp. 58–61. Proceedings of the 1st ACM Conference on Computer and Communications Security, Fairfax, Virginia, 3–5 November.
[224] Nyberg, K. and R. A. Rueppel (1995). “Message Recovery for Signature Schemes Based on the Discrete Logarithm Problem”, Advances in Cryptology—EUROCRYPT ’94, Lecture Notes in Computer Science, 950. pp. 182–193. Berlin/Heidelberg: Springer.
[225] Odlyzko, A. M. (1985). “Discrete Logarithms and Their Cryptographic Significance”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 224–314. Berlin/Heidelberg: Springer.
[226] Odlyzko, A. M. (2000). “Discrete Logarithms: The Past and the Future”, Designs, Codes and Cryptography, 19: 129–145.
[227] Okamoto, T. (1992). “Provably Secure and Practical Identification Schemes and Corresponding Signature Schemes”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 31–53. Berlin/Heidelberg: Springer.
[228] Okamoto, T., E. Fujisaki and H. Morita (1998). “TSH-ESIGN: Efficient Digital Signature Scheme Using Trisection Size Hash”, submission to IEEE P1363a.
[229] Papadimitriou, C. H. (1994). Computational Complexity. Reading, Massachusetts: Addison-Wesley.
[230] Park, S., T. Kim, Y. An and D. Won (1995). “A Provably Entrusted Undeniable Signature”, pp. 644–648. IEEE Singapore International Conference on Network/International Conference on Information Engineering (SICON/ICIE ’95).
[231] Patarin, J. (1995). “Cryptanalysis of the Matsumoto and Imai Public Key Scheme of Eurocrypt’88”, Advances in Cryptology—CRYPTO ’95, Lecture Notes in Computer Science, 963. pp. 248–261. Berlin/Heidelberg: Springer.
[232] Patarin, J. (1996). “Hidden Fields Equations (HFE) and Isomorphisms of Polynomials (IP): Two New Families of Asymmetric Algorithms”, Advances in Cryptology—EUROCRYPT ’96, Lecture Notes in Computer Science, 1070. pp. 33–48. Berlin/Heidelberg: Springer.
[233] Pirsig, R. M. (1974). Zen and the Art of Motorcycle Maintenance: An Inquiry into Values. London: Bodley Head.
[234] Pohlig, S. and M. Hellman (1978). “An Improved Algorithm for Computing Logarithms over GF (p) and its Cryptographic Significance”, IEEE Transactions on Information Theory, 24: 106–110.
[235] Pohst, M. and H. Zassenhaus (1989). Algorithmic Algebraic Number Theory, Encyclopaedia of Mathematics and Its Applications, 30. Cambridge: Cambridge University Press.
[236] Pointcheval, D. and J. Stern (1996). “Provably Secure Blind Signature Schemes”, Advances in Cryptology—ASIACRYPT ’96, Lecture Notes in Computer Science, 1163. pp. 252–265. Berlin/Heidelberg: Springer.
[237] Pointcheval, D. and J. Stern (2000). “Security Arguments for Digital Signatures and Blind Signatures”, Journal of Cryptology, 13 (3): 361–396.
[238] Pollard, J. M. (1974). “Theorems on Factorization and Primality Testing”, Proceedings of the Cambridge Philosophical Society, 76 (2): 521–528.
[239] Pollard, J. M. (1975). “A Monte Carlo Method for Factorization”, BIT, 15 (3): 331–334.
[240] Pollard, J. M. (1993). “Factoring with Cubic Integers”, in A. K. Lenstra and H. W. Lenstra (eds.), The Development of the Number Field Sieve, Lecture Notes in Mathematics, 1554. pp. 4–10. Berlin: Springer.
[241] Pomerance, C. (1985). “The Quadratic Sieve Factoring Algorithm”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 169–182. Berlin/Heidelberg: Springer.
[242] Pomerance, C. (2008). “Elementary Thoughts on Discrete Logarithms”, pp. 385–396. in J. P. Buhler and P. Stevenhagen (eds.), Surveys in Algorithmic Number Theory, Publications of the Research Institute for Mathematical Sciences, 44. New York: Cambridge University Press.
[243] Preskill, J. (1998). “Quantum Computing: Pro and Con”, Proceedings of the Royal Society of London, A454:469–486.
[244] Preskill, J. (2007), “Course Information for Quantum Computation” [online document]. Available at http://theory.caltech.edu/people/preskill/ph219/ (October 2008).
[245] Proos, J. and C. Zalka (2004), “Shor’s Discrete Logarithm Quantum Algorithm for Elliptic Curves” [online document]. Available at http://arxiv.org/abs/quant-ph/0301141 (October 2008).
[246] Rabin, M. O. (1979). “Digitalized Signatures and Public-Key Functions as Intractable as Factorization”. Technical report MIT/LCS/TR-212, MIT Laboratory for Computer Science, Massachusetts.
[247] Rabin, M. O. (1980a). “Probabilistic Algorithms in Finite Fields”, SIAM Journal of Computing, 9: 273–280.
[248] Rabin, M. O. (1980b). “Probabilistic Algorithm for Testing Primality”, Journal of Number Theory, 12: 128–138.
[249] Ram Murty, M. (2001). Problems in Analytic Number Theory. New York: Springer.
[250] Raymond, J.-F. and A. Stiglic (2000), “Security Issues in the Diffie-Hellman Key Agreement Protocol” [online document]. Available at http://crypto.cs.mcgill.ca/~stiglic/Papers/dhfull.pdf (October 2008).
[251] Ribenboim, P. (2001). Classical Theory of Algebraic Numbers. Universitext. New York: Springer.
[252] Rivest, R. L., A. Shamir, and L. M. Adleman (1978). “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems”, Communications of the ACM, 2: 120–126.
[253] Rosser, J. and J. Schoenfield (1962). “Approximate Formulas for Some Functions of Prime Numbers”, Illinois Journal of Mathematics, 6: 64–94.
[254] RSA Security Inc. (2008), “Public-Key Cryptography Standards” [online document]. Available at http://www.rsa.com/rsalabs/node.asp?id=2124 (October 2008).
[255] Sakurai, J. J. (1994). Modern Quantum Mechanics. Revised by San-Fu Tuan, Reading, Massachusetts: Addison-Wesley.
[256] Satoh, T. (2000). “The Canonical Lift of an Ordinary Elliptic Curve over a Finite Field and Its Point Counting”, Journal of Ramanujan Mathematical Society, 15: 247–270.
[257] Satoh, T. and K. Araki (1998). “Fermat Quotients and the Polynomial Time Discrete Log Algorithm for Anomalous Elliptic Curves”, Commentarii Mathematici Universitatis Sancti Pauli, 47: 81–92.
[258] Schiff, L. I. (1968). Quantum Mechanics, 3rd ed. New York: McGraw-Hill.
[259] Schindler, W., F. Koeune and J.-J. Quisquater (2001). “Unleashing the Full Power of Timing Attack”. Technical report CG-2001/3, Université Catholique de Louvain, Belgium. Available at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.6622.
[260] Schirokauer, O. (1993). “Discrete Logarithms and Local Units”, Philosophical Transactions of the Royal Society of London, Series A, 345: 409–423.
[261] Schirokauer, O., D. Weber, and T. Denny (1996). “Discrete Logarithms: The Effectiveness of the Index Calculus Method”, Algorithmic Number Theory—ANTS-II, Lecture Notes in Computer Science, 1122. pp. 337–361. Berlin/Heidelberg: Springer.
[262] * Schneier, B. (2006). Applied Cryptography, 2nd ed. New York: John Wiley & Sons.
[263] Schnorr, C. P. (1991). “Efficient Signature Generation for Smart Cards”, Journal of Cryptology, 4: 161–174.
[264] Schoof, R. (1995). “Counting Points on Elliptic Curves over Finite Fields”, Journal de Théorie des Nombres de Bourdeaux, 7: 219-254.
[265] Semaev, I. A. (1998). “Evaluation of Discrete Logarithms on Some Elliptic Curves”, Mathematics of Computation, 67: 353–356.
[266] Shamir, A. (1984). “A Polynomial-Time Algorithm for Breaking the Basic Merkle-Hellman Cryptosystem”, IEEE Transactions on Information Theory, 30: 699–704.
[267] Shamir, A. (1984). “Identity-Based Cryptosystems and Signature Schemes”, Advances in Cryptology—CRYPTO ’84, Lecture Notes in Computer Science, 196. pp. 47–53. Berlin/Heidelberg: Springer.
[268] Shamir, A. (1997). “How to Check Modular Exponentiation”, presented at the rump session of Advances in Cryptology—EUROCRYPT ’97, May.
[269] Shamir, A. (1999). “Factoring Large Numbers with the TWINKLE Device”, Cryptographic Hardware and Embedded Systems—CHES ’99, Lecture Notes in Computer Science, 1717. pp. 2–12. Berlin/Heidelberg: Springer.
[270] Shamir, A. and E. Tromer (2003). “Factoring Large Numbers with the TWIRL Device”, Advances in Cryptology—CRYPTO 2003, Lecture Notes in Computer Science, 2729. pp. 1–26. Berlin/Heidelberg: Springer.
[271] Shor, P. W. (1997). “Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer”, SIAM Journal of Computing, 26: 1484–1509.
[272] Shoup, V. (1990). “On the Deterministic Complexity of Factoring Polynomials over Finite Fields”, Information Processing Letters, 33: 261–267.
[273] Shparlinski, I. E. (1991). “On Some Problems in the Theory of Finite Fields”, Russian Mathematical Surveys, 46 (1): 199–240.
[274] Shparlinski, I. E. (1992). Computational and Algorithmic Problems in Finite Fields, Mathematics and its Applications, 88. Kluwer Academic Publishers.
[275] * Silverman, J. H. (1986). The Arithmetic of Elliptic Curves. Graduate Texts in Mathematics, 106. Berlin and New York: Springer.
[276] Silverman, J. H. (1994). Advanced Topics in the Arithmetic of Elliptic Curves. Graduate Texts in Mathematics, 151. New York: Springer.
[277] Silverman, J. H. (2000). “The Xedni Calculus and the Elliptic Curve Discrete Logarithm Problem”, Design, Codes and Cryptography, 20: 5–40.
[278] Silverman, J. H. and J. Suzuki (1998). “Elliptic Curve Discrete Logarithms and the Index Calculus”, Advances in Cryptology—ASIACRYPT ’98, Lecture Notes in Computer Science, 1514. pp. 110–125. Berlin/Heidelberg: Springer.
[279] Silverman, R. D. (1987). “The Multiple Polynomial Quadratic Sieve”, Mathematics of Computation, 48: 329–339.
[280] * Sipser, M. (1997). Introduction to the Theory of Computation, 2nd ed. Boston: PWS Publishing Company.
[281] B. Skjernaa (2003). “Satoh’s Algorithm in Characteristic 2”, Mathematics of Computation, 72: 477–487.
[282] Smart, N. P. (1999). “The Discrete Logarithm Problem on Elliptic Curves of Trace One”, Journal of Cryptology, 12: 193–196.
[283] Smart, N. P. (2002). Cryptography: An Introduction. New York: McGraw-Hill. The 2nd edition of this book is available online at http://www.cs.bris.ac.uk/~nigel/Crypto_Book/ (October 2008).
[284] Smith, P. J. (1993). “LUC Public-Key Encryption: A Secure Alternative to RSA”, Dr. Dobb’s Journal, 18 (1): 44–49.
[285] Smith, P. J. and M. J. J. Lennon (1993). “LUC: A New Public Key System”, IFIP Transactions, A 37. pp. 103–117. Proceedings of the IFIP TC11, 9th International Conference on Information Security. Computer Security. Amsterdam: North-Holland Co.
[286] Smith, P. J. and C. Skinner (1995). “A Public-Key Cryptosystem and Digital Signature System Based on the Lucas Function Analogue to Discrete Logarithms”, Advances in Cryptology—ASIACRYPT ’94, Lecture Notes in Computer Science, 917. pp. 357–364. Berlin/Heidelberg: Springer.
[287] Solovay, R. and V. Strassen (1977). “A Fast Monte Carlo Test for Primality”, SIAM Journal of Computing, 6: 84–86.
[288] * Stallings, W. (2006). Cryptography and Network Security, 4th ed. Upper Saddle River, New Jersey: Prentice-Hall.
[289] Stam, M. and A. K. Lenstra (2001). “Speeding up XTR”, Advances in Cryptology—ASIACRYPT 2001, Lecture Notes in Computer Science, 2248. pp. 125–143. Berlin/Heidelberg: Springer.
[290] Stein, A. and E. Teske (2005). “Optimized Baby Step-Giant Step Methods”, Journal of Ramanujan Mathematical Society, 20 (1): 27–58.
[291] * Stinson, D. (2005). Cryptography: Theory and Practice, 3rd ed. Boca Raton, Florida: CRC Press.
[292] Strassen, V. (1969). “Gaussian Elimination Is not Optimal”, Numerische Mathematik, 13: 354–356.
[293] Stucki, D., N. Gisin, O. Guinnard, G. Ribordy and H. Zbinden (2002). “Quantum Key Distribution over 67 km with a Plug & Play System”, New Journal of Physics, 4: 41.1–41.8.
[294] Sun, H.-M., W.-C. Yang and C.-S. Laih (1999). “On the Design of RSA with Short Secret Exponent”, Advances in Cryptology—ASIACRYPT ’99, Lecture Notes in Computer Science, 1716. pp. 150–164. Berlin/Heidelberg: Springer.
[295] Swade, D. (2000). The Cogwheel Brain: Charles Babbage and the Quest to Build the First Computer. London: Little, Brown and Company.
[296] Trappe, W. and L. C. Washington (2006). Introduction to Cryptography with Coding Theory, 2nd ed. Upper Saddle River: Prentice-Hall.
[297] Verheul, E. R. (2001). “Evidence that XTR is More Secure than Supersingular Elliptic Curve Cryptosystems”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. pp. 195–210. Berlin/Heidelberg: Springer.
[298] Washington, L. C. (2003). Elliptic Curves: Number Theory and Cryptography. Boca Raton, Florida: Chapman & Hall/CRC.
[299] Weber, D. (1996). “Computing Discrete Logarithms with the General Number Field Sieve”, Algorithmic Number Theory—ANTS-II, Lecture Notes in Computer Science, 1122. pp. 337–361. Berlin/Heidelberg: Springer.
[300] Weber, D. (1998). “Computing Discrete Logarithms with Quadratic Number Rings”, Advances in Cryptology—EUROCRYPT ’98, Lecture Notes in Computer Science, 1403. pp. 171–183. Berlin/Heidelberg: Springer.
[301] Weber, D. and T. Denny (1998). “The Solution of McCurley’s Discrete Log Challenge”, Advances in Cryptology—CRYPTO ’98, Lecture Notes in Computer Science, 1462. pp. 458–471. Berlin/Heidelberg: Springer.
[302] Western, A. E. and J. C. P. Miller (1968). “Tables of Indices and Primitive Roots”, Royal Mathematical Tables, 9, Cambridge: Cambridge University Press.
[303] Wiedemann, D. H. (1986). “Solving Sparse Linear Equations over Finite Fields”, IEEE Transactions on Information Theory, 32: 54–62.
[304] Wiener, M. J. (1990). “Cryptanalysis of Short RSA Secret Exponents”, IEEE Transactions on Information Theory, 36: 553–558.
[305] Williams, H. C. (1982). “A p + 1 Method for Factoring”, Mathematics of Computation, 39 (159): 225–234.
[306] Yang, L. T. and R. P. Brent (2001). “The Parallel Improved Lanczos Method for Integer Factorization over Finite Fields for Public Key Cryptosystems”, pp. 106–114. Proceedings of the ICPP Workshops 2001, Valencia, Spain, 3–7 September.
[307] Young, A. and M. Yung (1996). “The Dark Side of “Black-Box” Cryptography, or: Should We Trust Capstone?”, Advances in Cryptology—CRYPTO ’96, Lecture Notes in Computer Science, 1109. pp. 89–103. Berlin/Heidelberg: Springer.
[308] Young, A. and M. Yung (1997a). “Kleptography: Using Cryptography Against Cryptography”, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 62–74. Berlin/Heidelberg: Springer.
[309] Young, A. and M. Yung (1997b). “The Prevalence of Kleptographic Attacks on Discrete-Log Based Cryptosystems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 264–276. Berlin/Heidelberg: Springer.
[310] Zheng, Y. (1997). “Digital Signcryption or How to Achieve Cost(Signature & Encryption) << Cost(Signature) + Cost(Encryption)”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 165–179. Berlin/Heidelberg: Springer.
[311] Zheng, Y. (1998a). “Signcryption and Its Applications in Efficient Public Key Solutions”, 1997 Information Security Workshop ISW ’97, Lecture Notes in Computer Science, 1397. pp. 291–312. Berlin/Heidelberg: Springer.
[312] Zheng, Y. (1998b). “Shortened Digital Signature, Signcryption, and Compact and Unforgeable Key Agreement Schemes”, contribution to IEEE P1363 Standard for Public Key Cryptography.
[313] Zheng, Y. and H. Imai (1998a). “Efficient Signcryption Schemes on Elliptic Curves”. Proceedings of the IFIP 14th International Information Security Conference IFIP/SEC ’98, Vienna, Austria, September 1998. Chapman & Hall.
[314] Zheng, Y. and H. Imai (1998b). “How to Construct Efficient Signcryption Schemes on Elliptic Curves”, Information Processing Letters, 68: 227–233.
[315] Zheng, Y. and T. Matsumoto (1996). “Breaking Smartcard Implementations of ElGamal Signatures and Its Variants”, presented at the rump session of Advances in Cryptology—ASIACRYPT ’96. Available at http://www.sis.uncc.edu/~yzheng/publications/ (October 2008).
[316] * Zuckerman, H. S., H. L. Montgomery, I. M. Niven and A. Niven (1991). An Introduction to the Theory of Numbers. New York: John Wiley & Sons.
Books marked by stars have Asian editions (at the time of writing this book).
Library of Congress Cataloging-in-Publication Data
Das, Abhijit.
Public-key cryptography : theory and practice / Abhijit Das, C. E. Veni Madhavan.
p. cm.
Includes bibliographical references and index.
ISBN: 978-8131708323 (pbk.)
1. Public key cryptography. 2. Telecommunication—Security
measures-Mathematics. 3. Computers-Access control-Mathematics. I. Madhavan,
C. E. Veni. II. Title. TK5102.94.D37 2009
005.8'2-dc22
2009012766
Copyright © 2009 Dorling Kindersley (India) Pvt. Ltd.
Licensees of Pearson Education in South Asia
This book is sold subject to the condition that it shall not, by way of trade or otherwise, be lent, resold, hired out, or otherwise circulated without the publisher’s prior written consent in any form of binding or cover other than that in which it is published and without a similar condition including this condition being imposed on the subsequent purchaser and without limiting the rights under copyright reserved above, no part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise), without the prior written permission of both the copyright owner and the above-mentioned publisher of this book.
ISBN 9788131708323
Head Office: 482 FIE, Patparganj, Delhi 110 092, India
Registered Office: 14 Local Shopping Centre, Panchsheel Park, New Delhi 110 017, India
Printed in India.
Pearson Education Inc., Upper Saddle River, NJ
Pearson Education Ltd., London
Pearson Education Australia Pty, Limited, Sydney
Pearson Education Singapore, Pte. Ltd
Pearson Education North Asia Ltd, Hong Kong
Pearson Education Canada, Ltd., Toronto
Pearson Educacion de Mexico, S.A. de C.V.
Pearson Education-Japan, Tokyo
Pearson Education Malaysia, Pte. Ltd.
I can’t understand why a person will take a year to write a novel when he can easily buy one for a few dollars.
—Fred Allen
The first moral question that we faced (like most authors) is: “Why another book?” Available textbooks on public-key cryptography (or cryptography in general) are many [37, 74, 113, 114, 145, 152, 153, 194, 209, 262, 283, 288, 291, 296]. In the presence of all these books, writing another may sound like a waste of energy and effort.
Fortunately, we have a big answer. Most cryptography textbooks today, even many of the celebrated ones, essentially take a narrative approach. While such an approach may be suitable for beginners at an undergraduate level, it misses the finer details in this rapidly growing area of applied mathematics. The fact that public-key cryptography is mathematical is hard to deny and a mathematical subject would be better treated in the mathematical way.
This is precisely the point that this book addresses, that is, it proceeds in a canonically mathematical way while revealing cryptographic concepts. This mathematics is often not so simple (and that is why other textbooks didn’t bother to mention it), but we plan to stick to mathematical sophistication as far as possible. A typical feature of this book is that it does not rely on anything other than the readers’ mathematical intuitions; it develops all the mathematical abstractions starting from scratch. Although computer science and mathematics students nowadays do undergo some courses on discrete structures somewhere in their curricula, we do not assume this; instead we develop the algebra starting at the level of set operations. Simpler structures like groups, rings and fields are followed by more complex concepts like finite fields, algebraic curves, number fields and p-adic numbers. The resulting (long) compilation of abstract mathematical tools tends to relieve cryptography students and researchers from consulting many mathematics books for understanding the background concepts. We are happy to offer this self-sufficient treatment complete with proofs and other details. The only place where we had to be somewhat sketchy is the discussion on elliptic and hyperelliptic curves. The mathematics here seems to be too vast to fit in a few pages and we opted for a deliberate simplification of these topics.
A big problem with discrete mathematics is that many of its proofs are existential. However, in order to make things work in a practical environment one must undergo algorithmic studies of algebra and number theory. This is what our book does next. While many algorithmic issues in this area are settled favourably, there remain some problems whose best known algorithmic complexities are still poor. Some of these so-called computationally difficult problems are used to build secure public-key cryptosystems. The security of these systems are assumed (rather than proven) and so we extensively deal with the algorithms known till date to solve these difficult problems. This is precisely the point that utilizes the mathematics developed in earlier chapters, to a great extent.
In Chapter 5, we eventually hit upon the culmination of all these mathematical and algorithmic studies in the design of public-key systems for achieving various cryptographic goals. Under the theoretical base developed in earlier chapters, Chapter 5 turns out to be an easy chapter. This is our way of looking into the problem, namely, a formal bottom–up approach. We claim to be different from most textbooks in this regard. Our discussion of mathematics is not for its own sake, but to develop the foundation of cryptographic primitives.
We then turn to some purely implementation and practical issues of public-key cryptography. Standards proposed by organizations such as IEEE and RSA Security Inc. promote interoperability of using crypto primitives in Internet applications. We then look at some small applications of the crypto basics. Some indirect ways of cryptanalysis are described next. These techniques (side-channel and backdoor attacks) give the book a strong practical flavour in tandem with its otherwise formal appearance.
As an eleventh-hour decision, we added a final chapter to our book, a chapter on quantum computation and its implications on public-key cryptography. Although somewhat theoretical at this point, quantum computation exhibits important ramifications in public-key cryptography. The mathematics behind quantum mechanics and computation are never discussed earlier just to highlight the distinctive nature of this chapter, which may perhaps be titled cryptography in future.
This schematic description of this book perhaps makes it clear that this book is better suited as a graduate-level textbook. A one- or two-semester graduate or advanced undergraduate course can run based on the contents of this book. Self-studying this book is also possible at an advanced graduate or research level, but is expected to be difficult at an undergraduate level. We highlight the importance of classroom teaching, if an undergraduate course is to be based on this textbook.
We rated different items in the book by their levels of difficulty and/or mathematical sophistication. Unstarred items can be covered even in undergraduate courses. Items marked by single stars can be taken seriously for a second course or a second reading. Doubly starred items, on the other hand, are research-level materials and can be pursued only in really advanced courses or for undergoing research. Inclusion of a good amount of these advanced topics marks another distinction of this book compared to other available textbooks.
The book comes with plenty of exercises. We have two-fold motivations behind these exercises. In the first place, they help the readers deepen their understanding of the matter discussed in the text. In the second place, some of these exercises build additional theory that we omit in the text proper. We occasionally make use of these additional topics in proving and/or explaining results in the text. We do not classify the exercises into easy and difficult ones, but specify hints, some of which are pretty explicit, for intellectually challenging parts. We separate out the hints in an appendix near the end of this book and leave the marker [H] in appropriate locations of the statements of the exercises. This practice prevents a reader from accidentally seeing a hint. Only when the reader gets stuck, (s)he can look at the hints at the end. We believe that the exercises, together with our discussion on algorithms and implementation issues, will offer serious students many ways to carry out substantial implementation work to further their research and development in cryptography.
Every chapter ends with annotated references for further studies. We do not claim to be encyclopaedic in this respect. Instead we mention only those references that, we feel, are directly related to the topics dealt with in the respective chapters.
As a trade-off between bulk and coverage, we had to leave many issues untouched. For example, we were limited by constraints of space to present symmetric-key cryptography in detail. However, in view of its importance today, we include brief discussions in an appendix on block ciphers, stream ciphers and hash functions. We also do not discuss anything about formal security of public-key protocols. The issues related to provable security are at the minimum theoretically important in the study of cryptography, but are entirely left out here. Only a brief discussion on the implication of complexity theory on the security of public-key protocols is included in another appendix. The Handbook of Applied Cryptography [194] by Menezes et al. can supplement this book for learning symmetric techniques, whereas the book by Delfs and Knebl [74] or those by Goldreich [113, 114] can be consulted for formal security issues.
We are indebted to everybody whose criticism, encouragement and support made this project materializable. Special thanks go to Bimal Roy, Chandan Mazumdar, C. Pandurangan, Debdeep Mukhopadhyay, Dipanwita Roychowdhury, Gagan Garg, Hartmut Wiebe, H. V. Kumar Swamy, Indranil Sengupta, Kapil Paranjape, Manindra Agarwal, Palash Sarkar, Rajesh Pillai, Rana Barua, R. Balasubramanian, Sanjay Barman, Shailesh, Satrajit Ghosh, Souvik Bhattacherjee, Srihari Vavilapalli, Subhamoy Maitra, Surjyakanta Mohapatro, and Uwe Storch. This book has been tested in postgraduate courses in the Indian Institute of Science, Bangalore, and in the Indian Institute of Technology Kharagpur. We sincerely thank all our students for pointing out many errors and suggesting several improvements. We express our deep gratitude to our family members for their constant understanding and moral support. We are also indebted to our institutes for providing the wonderful intellectual climate for completing this work.
A. D.
C. E. V. M.
Any time you are stuck on a problem, introduce more notation.
—Chris Skinner [Plenary Lecture, Aug 1997, Topics in Number Theory, Penn State]
| General | |
| |a| | absolute value of real number a |
| min S | minimum of elements of set S |
| max S | maximum of elements of set S |
| exp(a) | ea, where ![]() |
| log x | logarithm of x with respect to some unspecified base (like 10) |
| ln x | loge x, where ![]() |
| lg x | log2 x |
| logk x | (log x)k (similarly, lnk x = (ln x)k and lgk x = (lg x)k) |
| := | is defined as (or “is assigned the value” in code snippets) |
| i | ![]() |
![]() | complex conjugate (x – iy) of the complex number z = x + iy |
| δij | Kronecker delta |
| (asas–1 . . . a0)b | b-ary representation of a non-negative integer |
![]() | binomial coefficient, equals n(n – 1) ··· (n – r + 1)/r! |
| ⌊x⌋ | floor of real number x |
| ⌈x⌉ | ceiling of real number x |
| [a, b] | closed interval, that is, the set of real numbers x in the range a ≤ x ≤ b |
| (a, b) | open interval, that is, the set of real numbers x in the range a < x < b |
| L(t, α, c) | expression of the form exp ((c + o(1))(ln t)α(ln ln t)1–α) |
| Lt[c] | abbreviation for L(t, 1/2, c) (denoted also as L[c] if t is understood) |
| Bit-wise operations (on bit strings a, b) | |
| NAND | negation of AND |
| NOR | negation of OR |
| XOR | exclusive OR |
| a ⊕ b | bit-wise exclusive OR (XOR) of a and b |
| a AND b | bit-wise AND of a and b |
| a OR b | bit-wise inclusive OR of a and b |
| LSk(a) | left shift of a by k bits |
| RSk(a) | right shift of a by k bits |
| LRk(a) | left rotate (cyclic left shift) of a by k bits |
| RRk(a) | right rotate (cyclic right shift) of a by k bits |
| ā | bit-wise complement of a |
| a ‖ b | concatenation of a and b |
| Sets | |
![]() | empty set |
| #A | cardinality of set A |
![]() | a is an element of set A |
| A ⊆ B | set A is contained in set B |
| A ⊈ B | set A is not contained in set B |
![]() | set A is properly contained in set B |
| A ∪ B | union of sets A and B |
| A ⊎ B | disjoint union of sets A and B |
| A ∩ B | intersection of sets A and B |
| A \ B | difference of sets A and B |
| Ā | complement of set A (in a bigger set) |
| A × B | (Cartesian) product of sets A and B |
![]() | set of all natural numbers, that is, {1, 2, 3, . . .} |
![]() | set of all non-negative integers, that is, {0, 1, 2, . . .} |
![]() | set of all integers, that is, {. . . , –2, –1, 0, 1, 2, . . .} |
![]() | set of all (positive) prime numbers, that is, {2, 3, 5, 7, . . .} |
![]() | set of all rational numbers, that is, ![]() |
![]() | set of all non-zero rational numbers |
![]() | set of all real numbers |
![]() | set of all non-zero real numbers |
![]() | set of all non-negative real numbers |
![]() | set of all complex numbers |
![]() | set of all non-zero complex numbers |
![]() | , can be represented by the set {0, 1, . . . , n –1} |
![]() | group of units in , can be represented as {a | 0 ≤ a < n, gcd(a, n) = 1} |
![]() | finite field of cardinality q |
![]() | multiplicative group of , that is, ![]() |
![]() | ring of integers of number field K |
![]() | group of units of ![]() |
![]() | ring of p-adic integers |
![]() | field of p-adic numbers |
| Up | group of units of ![]() |
| Functions and relations | |
| f : A → B | f is a function from set A to set B |
| f : A ↪ B | f is an injective function from set A to set B |
| f : A ↠ B | f is a surjective function from set A to set B |
| a ↦ b | a is mapped to b (by a function) |
| f ο g | composition of functions f and g (applied from right to left) |
| f–1 | inverse of bijective function f |
| Ker f | kernel of function (homomorphism) f |
| Im f | image of function f |
| ~ | equivalent to |
| [a] | equivalence class of a |
| Groups | |
| aH | coset in a multiplicative group |
| a + H | coset in an additive group |
| HK | internal direct product of (sub)groups H and K |
| H × K | external direct product of (sub)groups H and K |
| [G : H] | index of subgroup H in group G |
| G/H | quotient group |
| G1 ≅ G2 | groups G1 and G2 are isomorphic |
| ord G | order (that is, cardinality) of group G |
| ordG a | order of element a in group G |
| Exp G | exponent of group G |
| Z(G) | centre of group G |
| C(a) | centralizer of group element a |
| GLn(K) | general linear group over field K (of n × n matrices) |
| SLn(K) | special linear group over field K (of n × n matrices) |
| Gtors | torsion subgroup of G |
| Rings | |
| char A | characteristic of ring A |
| A × B | direct product of rings A and B |
| A* | multiplicative group of units of ring A |
| 〈S〉 | for ring A, ideal generated by S ⊆ A, also written as ![]() |
| 〈a〉 | for ring A, principal ideal generated by , also written as aA and Aa |
a ≡ b (mod ) | a is congruent to b modulo ideal , that is, ![]() |
| A ≅ B | rings A and B are isomorphic |
![]() | quotient ring (modulo ideal ) |
| a|b | a divides b (in some ring) |
| vp(a) | multiplicity of prime p in element a |
| pk‖a | k = vp(a) |
![]() | nilradical of ring A |
| Ared | reduction of ring A, equals ![]() |
| gcd(a, b) | greatest common divisor of elements a and b |
| lcm(a, b) | least common multiple of elements a and b |
![]() | sum of ideals and ![]() |
![]() | intersection of ideals and ![]() |
![]() | product of ideals and ![]() |
![]() | root (or radical) of ideal ![]() |
| Q(A) | total quotient ring of ring A (quotient field of A, if A is an integral domain) |
| S–1A | localization of ring A at multiplicative set S |
![]() | localization of ring A at prime ideal ![]() |
![]() | ring of integers of number field K |
N( ) | norm of ideal (in a Dedekind domain) |
| CRT | Chinese remainder theorem |
| ED | Euclidean domain |
| DD | Dedekind domain |
| DVD (or DVR) | discrete valuation domain (or ring) |
| PID | principal ideal domain |
| UFD | unique factorization domain |
| Fields | |
| char K | characteristic of field K |
| K* | multiplicative group of units of field K, that is, K \ {0} |
![]() | algebraic closure of field K |
| [K : F] | degree of the field extension F ⊆ K |
| K[a] | ![]() |
| K(a) | {f(a)/g(a) | f(X), , g(a) ≠ 0} |
| Aut K | group of automorphisms of field K |
| AutF K | for field extension F ⊆ K, group of F-automorphisms of K (also Gal(K|F)) |
| FixF H | for field extension F ⊆ K, fixed field of subgroup H of AutF K |
![]() | finite field of cardinality q |
![]() | multiplicative group of units of , that is, ![]() |
| Tr | trace function |
| TrK|F (a) | for field extension F ⊆ K, trace of over F |
| N | norm function |
| NK|F (a) | for field extension F ⊆ K, norm of over F |
![]() | Frobenius automorphism , a ↦ aq |
![]() | ring of integers of number field K |
![]() | group of units of ![]() |
| ΔK | discriminant of number field K |
![]() | ring of p-adic integers |
![]() | field of p-adic numbers |
| Up | group of units of ![]() |
| | |p | p-adic norm on ![]() |
| Integers | |
| a quot b | quotient of Euclidean division of a by b ≠ 0 |
| a rem b | remainder of Euclidean division of a by b ≠ 0 |
| a|b | a divides b in , that is, b = ca for some ![]() |
| vp(a) | multiplicity of prime p in non-zero integer a |
| gcd(a, b) | greatest common divisor of integers a and b (not both zero) |
| lcm(a, b) | least common multiple of integers a and b |
| a ≡ b (mod n) | a is congruent to b modulo n |
| a–1 (mod n) | multiplicative inverse of a modulo n (given that gcd(a, n) = 1) |
| φ(n) | Euler’s totient function |
![]() | Legendre (or Jacobi) symbol |
| [a]n | coset ![]() |
| ordn a | multiplicative order of a modulo n (given that gcd(a, n) = 1) |
| μ(n) | Möbius function |
| π(x) | number of primes between 1 and positive real number x |
| Li(x) | Gauss’ Li function |
| ψ(x, y) | fraction of positive integers ≤ x, that are y-smooth |
| ζ(s) | Riemann zeta function |
| RH | Riemann hypothesis |
| ERH | extended Riemann hypothesis |
| Mn | 2n – 1 (Mersenne number) |
| ℜ | 232, standard radix for representation of multiple-precision integers |
| Polynomials | |
| A[X1, . . . , Xn] | polynomial ring in indeterminates X1, . . . , Xn over ring A |
| A(X1, . . . , Xn) | ring of rational functions in indeterminates X1, . . . , Xn over ring A |
| deg f | degree of polynomial f |
| lc f | leading coefficient of polynomial f |
| minpolyα,K(X) | minimal polynomial of α over field K, belongs to K[X] |
| cont f | content of polynomial f |
| pp f | primitive part of polynomial f |
| f′(X) | formal derivative of polynomial f(X) |
| Δ(f) | discriminant of polynomial f |
![]() | the polynomial ![]() |
| μm | group of m-th roots of unity |
| Фm | m-th cyclotomic polynomial |
| Vector spaces, modules and matrices | |
| dimK V | dimension of vector space V over field K |
| Span S | span of subset S of a vector space |
| HomK(V, W) | set of all K-linear transformations V → W |
| EndK(V) | set of all K-linear transformations V → V |
| M/N | quotient vector space or module |
| M ≅ N | vector spaces or modules M and N are isomorphic |
![]() | direct product of modules Mi, ![]() |
![]() | direct sum of modules Mi, ![]() |
| At | transpose of matrix (or vector) A |
| A–1 | inverse of matrix A |
| Rank T | rank of matrix or linear transformation T |
| RankA M | rank of A-module M |
| Null T | nullity of matrix or linear transformation T |
| (M : N) | for A-module M and submodule N, the ideal of A |
| AnnA(M) | annihilator of A-module M, same as (M : 0) |
| Tors M | torsion submodule of M |
| A[S] | A-algebra generated by set S |
| 〈v, w〉 | inner product of two real vectors v and w |
| Algebraic curves | |
![]() | n-dimensional affine space over field K |
![]() | n-dimensional projective space over field K |
| (x1, . . . , xn) | homogeneous coordinates of a point in ![]() |
| [x0, x1, . . . , xn] | projective coordinates of a point in ![]() |
| f(h) | homogenization of polynomial f |
| C(K) | set of K-rational points over curve C defined over field K |
| K[C] | ring of polynomial functions on curve C defined over K |
| K(C) | field of rational functions on curve C defined over K |
| [P] | point P on a curve in formal sums |
| ordP (r) | order of rational function r at point P |
| DivK (C) | group of divisors on curve C defined over field K |
![]() | group of divisors of degree 0 on curve C defined over field K |
| DivK(r) | divisor of a rational function r |
| PrinK(C) | group of principal divisors on curve C defined over field K |
![]() | Jacobian of curve C defined over field K |
| PicK(C) | Picard group of curve C (equals DivK(C)/ PrinK(C)) |
![]() | , same as Jacobian ![]() |
![]() | point at infinity on an elliptic or a hyperelliptic curve |
| Δ(E) | discriminant of elliptic curve E |
| j(E) | j-invariant of elliptic curve E |
| E(K) | group of points on elliptic curve E defined over field K |
| P + Q | sum of two points P, ![]() |
| mP | m-th multiple (that is, m-fold sum) of point ![]() |
ψm, , fm | m-th division polynomials |
| t | trace of Frobenius of elliptic curve ![]() |
| EK[m] | group of m-torsion points in E(K) |
| E[m] | abbreviation for ![]() |
| em | Weil pairing (a map E[m] × E[m] → μm) |
| Div(a, b) | representation of reduced divisor on hyperelliptic curve by polynomials a, b |
| Probability and statistics | |
| Pr(E) | probability of event E |
| Pr(E1|E2) | conditional probability of event E1 given event E2 |
| E(X) | expectation of random variable X |
| Var(X) | variance of random variable X |
| σX | standard deviation of random variable X (equals ) |
| Cov(X, Y) | covariance of random variables X, Y |
| ρX,Y | correlation coefficient of random variables X, Y |
| Computational complexity | |
| f = O(g) | big-Oh notation: f is of the order of g |
| f = Ω(g) | big-Omega notation: g is of the order of f |
| f = Θ(g) | big-Theta notation: f and g have the same order |
| f = o(g) | small-oh notation: f is of strictly smaller order than g |
| f = ω(g) | small-omega notation: f is of strictly larger order than g |
| f = O~(g) | soft-Oh notation: f = O(g logk g) for real constant k ≥ 0 |
![]() | problem P1 is polynomial-time reducible to problem P2 |
| P1 ≅ P2 | problems P1 and P2 are polynomial-time equivalent |
| Intractable problems | |
| CVP | closest vector problem |
| DHP | (finite field) Diffie–Hellman problem |
| DLP | (finite field) discrete logarithm problem |
| ECDHP | elliptic curve Diffie–Hellman problem |
| ECDLP | elliptic curve discrete logarithm problem |
| HECDHP | hyperelliptic curve Diffie–Hellman problem |
| HECDLP | hyperelliptic curve discrete logarithm problem |
| GIFP | general integer factorization problem |
| IFP | integer factorization problem |
| QRP | quadratic residuosity problem |
| RSAIFP | RSA integer factorization problem |
| RSAKIP | RSA key inversion problem |
| RSAP | RSA problem |
| SQRTP | modular square root problem |
| SSP | subset sum problem |
| SVP | shortest vector problem |
| Algorithms | |
| ADH | Adleman, DeMarrais and Huang’s algorithm |
| AES | advanced encryption standard |
| AKS | Agarwal, Kayal and Saxena’s deterministic primality test |
| BSGS | Shanks’ baby-step–giant-step method |
| CBC | cipher-block chaining mode |
| CFB | cipher feedback mode |
| CSM | cubic sieve method |
| CSPRBG | cryptographically strong pseudorandom bit generator |
| CvA | Chaum and Van Antwerpen’s undeniable signature scheme |
| DDF | distinct-degree factorization |
| DES | data encryption standard |
| DH | Diffie–Hellman key exchange |
| DPA | differential power analysis |
| DSA | digital signature algorithm |
| DSS | digital signature standard |
| ECB | electronic codebook mode |
| ECDSA | elliptic curve digital signature algorithm |
| ECM | elliptic curve method |
| E-D-E | encryption–decryption–encryption scheme of triple encryption |
| EDF | equal-degree factorization |
| EG | Eschenauer and Gligor’s scheme |
| FEAL | fast data encipherment algorithm |
| FFS | Feige, Fiat and Shamir’s zero-knowledge protocol |
| GKR | Gennaro, Krawczyk and Rabin’s RSA-based undeniable signature scheme |
| GNFSM | general number field sieve method |
| GQ | Guillou and Quisquater’s zero-knowledge protocol |
| HFE | cryptosystem based on hidden field equations |
| ICM | index calculus method |
| IDEA | international data encryption algorithm |
| KLCHKP | braid group cryptosystem |
| L3 | Lenstra–Lenstra–Lovasz algorithm |
| LFSR | linear feedback shift register |
| LSM | linear sieve method |
| LUC | cryptosystem based on Lucas sequences |
| MOV | Menezes, Okamoto and Vanstone’s reduction |
| MPQSM | multiple polynomial quadratic sieve method |
| MQV | Menezes–Qu–Vanstone key exchange |
| NFSM | number field sieve method |
| NR | Nyberg–Rueppel signature algorithm |
| NTRU | Hoffstein, Pipher and Silverman’s encryption algorithm |
| NTRUSign | NTRU signature algorithm |
| OAEP | optimal asymmetric encryption procedure |
| OFB | output feedback mode |
| PAP | pretty awful privacy |
| PGP | pretty good privacy |
| PH | Pohlig–Hellman method |
| PRBG | pseudorandom bit generator |
| PSS | probabilistic signature scheme |
| QSM | quadratic sieve method |
| RSA | Rivest, Shamir and Adleman’s algorithm |
| SAFER | secure and fast encryption routine |
| Satoh–FGH | Point counting algorithm on elliptic curves over fields of characteristic 2 |
| SDSA | shortened digital signature algorithm |
| SEA | Schoof, Elkies and Atkins’ algorithm for point counting on elliptic curves |
| SETUP | secretly embedded trapdoor with universal protection |
| SFF | square-free factorization |
| SHA | secure hash algorithm |
| SmartASS | algorithm for computing discrete logs in anomalous elliptic curves |
| SNFSM | special number field sieve method |
| SPA | simple power analysis |
| TWINKLE | the Weizmann Institute key location engine |
| TWIRL | the Weizmann Institute relation locator |
| XCM | xedni calculus method |
| XSL | extended sparse linearization attack |
| XTR | efficient and compact subgroup trace representation |
| ZK | zero-knowledge |
| Quantum computation | |
| |ψ〉 | ket notation for vector ψ |
![]() | inner product of vectors |ψ〉 and ![]() |
| ‖ψ‖ | norm of vector |ψ〉 (equals ) |
![]() | n-dimensional Hilbert space (over ) |
| |0〉, |1〉, . . . , |n – 1〉 | orthonormal basis of ![]() |
| cbit | classical bit |
| qubit | quantum bit |
| ⊗ | tensor product of Hilbert spaces |
| F | Fourier transform |
| H | Hadamard transform |
| I | Identity transform |
| X | Exchange transform |
| Z | Z transform |
| Computational primitives | |
| ulong | 32-bit unsigned integer data type (unsigned long) |
| ullong | 64-bit unsigned integer data type (unsigned long long) |
| a := b | assignment operator (returns the value assigned) |
| +, –, ×, /, % | arithmetic operators |
| ++, – – | increment and decrement operators |
| a ◊ = b | a := a ◊ b for ![]() |
| =, ≠, >, <, ≥, ≤ | comparison operators |
| 1 | True as a condition |
| if | conditional statement: if (condition)··· |
| if-else | conditional statement: if (condition)··· , else··· |
| while | while loop: while (condition)··· |
| do | do loop: do···while (condition) |
| for | for loop: for (range of values)··· |
| {···} | block of statements |
| , or. or new-line | statement terminator |
| /*··· */ | comment |
| return | return from this routine |
| Miscellaneous | |
![]() | end of (visible or invisible) proof |
![]() | end of item (like example, definition, assumption) |
| [H] | hint available in Appendix D |
| 1.1 | Introduction |
| 1.2 | Common Cryptographic Primitives |
| 1.3 | Public-key Cryptography |
| 1.4 | Some Cryptographic Terms |
| Chapter Summary |
Aller Anfang ist schwer: All beginnings are difficult.
—German proverb
Defendit numerus: There is safety in numbers.
—Anonymous
The ability to quote is a serviceable substitute for wit.
—W. Somerset Maugham
It is rather difficult to give a precise definition of cryptography. Loosely speaking, it is the science (or art or technology) of preventing access to sensitive data by parties who are not authorized to access the data. Secure transmission of messages over a public channel is the first, simplest and oldest example of a cryptographic protocol. For assessing the security of these protocols, one studies their possible weak points, namely the strategies for breaking them. This study is commonly referred to as cryptanalysis. And, finally, the study of both cryptography and cryptanalysis is known as cryptology.
| Cryptology = Cryptography + Cryptanalysis |
The science of cryptology is rather old. It naturally developed as and when human beings felt the need for privacy and secrecy. The rapid deployment of the Internet in the current years demands that we look into this subject with a renewed interest. Newer requirements tailored to Internet applications have started cropping up and as a result newer methods, protocols and algorithms are coming up. The most startling discoveries include that of the key-exchange protocol by Diffie and Hellman in 1976 and that of the RSA cryptosystem by Rivest, Shamir and Adleman in 1978. They opened up a new branch of cryptology, namely public-key cryptology. Historically, public-key technology came earlier than the Internet, but it is the latter that makes an extensive use of the former.
This book is an attempt to introduce to the reader the vast and interesting branch of public-key cryptology. One of the most distinguishing features of public-key cryptology is that it involves a reasonable amount of abstract mathematics which often comes in the way of a complete understanding to an uninitiated reader. This book tries to bridge the gap. We develop the required mathematics in necessary and sufficient details.
This chapter is an overview of the topics that the rest of the book deals with. We start with a description of the most common cryptographic protocols. Then we introduce the public-key paradigm and discuss the source of its security. We use certain mathematical terms and notations throughout this chapter. If the reader is not already familiar with these terms, there is nothing to worry about. As we have just claimed, we will introduce the mathematics in the later chapters. The exposition of this chapter is expected to give the reader an overview of the area of public-key cryptography and also the requisite motivation for learning the mathematical tools that follow.
As claimed at the outset of this chapter, it is rather difficult to give a precise definition of the term cryptography. The best way to understand it is by examples. In this section, we briefly describe the common problems that cryptography deals with.
To start with, we introduce the legendary figures of cryptography: Alice, Bob and Carol. Alice wants to send a message to Bob over a public communication channel like the Internet and wants to ensure that nobody other than Bob can make out the meaning of the message. A third party like Carol, who has access to the communication channel, can intercept the message. But the message should be wrapped or transformed before transmission in such a way that knowledge of some secret piece of information is needed to unwrap or transform back the message. It is Bob who has this information, but not Carol (nor Dorothy nor Emily nor . . .).
It is expedient to point out here that Alice, Bob and Carol need not be human beings. They can stand for organizations (like banks) or, more correctly, for computers or computer programs run by individuals or organizations. It is, therefore, customary to call them parties, entities or subjects instead of persons or characters. In the cryptology jargon, Carol has got several names used interchangeably: adversary, eavesdropper, opponent, intruder, attacker and enemy are the most common ones. When a message transmission like that just mentioned is involved, Alice is called the sender and Bob is called the receiver of the message.
It is a natural strategy to put the message in a box and lock the box using a key, called the encryption key. A matching decryption key is needed to unlock the box and retrieve the message. The process of putting the message in the box is commonly called encoding and that of locking the box is called encryption. The reverse processes, namely unlocking the box and taking the message out of the box are respectively called decryption and decoding. This is precisely the classical encryption–decryption protocol of cryptography.[1]
[1] Some people prefer to use the terms enciphering and deciphering in place of the words encryption and decryption respectively.
In the world of electronic communication, a message M is usually a bit string, and encoding, encryption, decryption and decoding are well-defined transformations of bit strings. If we denote by fe the transformation function consisting of encoding and encryption, then we get a new bit string C = fe(M, Ke), where Ke stands for the encryption key. This bit string C is sent over the communication channel. After Bob receives C, he uses the reverse transformation fd (decryption followed by decoding) to get the original message M back; that is, M = fd(C, Kd). Note that the decryption key Kd is needed as an argument to fd. If Carol does not know this, she cannot compute M. We conventionally call M the plaintext message and C the ciphertext message.
The encoding and decoding operations do not make use of keys and can be performed by anybody. (It should not be difficult to put a letter in or take a letter out of an unlocked box!) One might then wonder why it is necessary to do these transformations instead of applying the encryption and decryption operations directly on M and C respectively. With whatever we have discussed so far, we cannot give a full answer to this question. For the answer, we will need to wait until we reach the later chapters. We only mention here that the encryption algorithms often require as input some mathematical entities (like integers or elements of a field) which are logically not bit strings. But that’s not all! As we see later, the additional transformations often add to the security of the protocols. On the other hand, for a general discussion, it is often unnecessary to start from the encoding process and end at the decoding process. As a result, we will assume, unless otherwise stated, that M is the input to the encryption routine and the output of the decryption routine, in which case fe and fd stand for the encryption and decryption functions only.
In the simplest form of locking mechanism, one has Ke = Kd. That is, the same key, called the symmetric key or the secret key, is used for both encryption and decryption. Common examples of such symmetric-key algorithms include DES (Data Encryption Standard) together with its various modifications like the Triple DES and DES-X, IDEA (International Data Encryption Algorithm), SAFER (Secure And Fast Encryption Routine), FEAL (Fast Encryption Algorithm), Blowfish, RC5 and AES (Advanced Encryption Standard). We will not describe all these algorithms in this book. Interested readers can look at the abundant literature to know more about them.
The biggest disadvantage of using a secret-key system is that Alice and Bob must agree upon the key Ke = Kd secretly, for example by personal contact or over a secure channel. This is a serious limitation and is not often practical nor even possible. Another drawback of secret-key systems is that every pair of parties needs a key for communication. Thus, if there are n entities communicating over a net, the number of keys would be of the order of n2. Also, each entity has to remember O(n) keys for communicating with other entities. In practice, however, an entity does not communicate with every other entity on the net. Yet the total number of keys to be remembered by an entity could be quite high.
Both these problems can be avoided by using what is called an asymmetric-key or a public-key protocol. In such a protocol, each entity decides a key pair (Ke, Kd), makes the encryption key Ke public and keeps the decryption key Kd secret. Ke is also called the public key and Kd the private key. Anybody who wants to send a message to Bob gets Bob’s public key, encrypts the message with the key, and sends the ciphertext to Bob. Upon receiving the ciphertext, Bob uses his private key to decrypt the message. One may view such a lock as a self-locking padlock. Anybody can lock a box with a self-locking padlock, but opening it requires a key which only Bob possesses.
The source of security of such a system is based on the difficulty of computing the private key Kd given the public key Ke. It is apparent that Ke and Kd are sort of inverses of each other, because the former is used to generate C from M and the latter is used to generate M from C. This is where mathematics comes into the picture. We mention a few possible constructions of key pairs in the next section and the rest of the book deals with an in-depth study of these public-key protocols.
Attractive as it looks, public-key protocols have a serious drawback, namely they are orders of magnitude slower than their secret-key counterparts. This is of concern, if huge amounts of data need to be encrypted and decrypted. This shortcoming can be overcome by using both secret-key and public-key protocols in tandem as follows: Alice generates a secret key (say, for AES), encrypts the message by the secret key and the secret key by the public key of Bob and sends both the encrypted message and the encrypted secret key. Bob first decrypts the encrypted secret key using his private key and uses this decrypted secret key to decrypt the message. Since secret keys are usually short bit strings (most commonly of length 128 bits), the slow performance of the public-key algorithms causes little trouble. But at the same time, Alice and Bob are relieved of having a previous secret meeting or communication for agreeing on the secret key. Moreover, neither Alice nor Bob needs to remember the secret key. During every session of message transmission, a random secret key can be generated and later destroyed, when the communication is over.
There is an alternative method by which Alice and Bob can exchange secret information (like AES keys) over a public communication channel. Let us first see how this can be done in the physical lock-and-key scenario. Alice generates a secret, puts it in a box, locks the box with her own key and sends it to Bob. Bob, upon receiving the locked box, adds a second lock to it and sends the doubly locked box back to Alice. Alice then removes her lock and again sends the box to Bob. Finally, Bob uses his key to unlock the box and retrieve the secret. A third party (Carol) that can access the box during the three communications finds it locked by Alice or Bob or both. Since Carol does not possess the keys to these locks, she cannot open the box to discover the secret.
This process can be abstractly described as follows: Alice and Bob first independently generate key pairs (AKe, AKd) and (BKe, BKd) respectively. Alice then sends AKe to Bob and Bob sends BKe to Alice. The private keys AKd and BKd are not disclosed. They also agree upon a function g with which Alice computes gA = g(AKd, BKe) and Bob computes gB = g(BKd, AKe). If gA = gB, then this common value can be used as a shared secret between Alice and Bob.
Our intruder Carol knows g and taps the values of AKe and BKe. So the function g should be such that a knowledge of these values alone does not suffice for the computation of gA = gB. One of the private keys AKd or BKd is needed for the computation. Since (AKe, AKd) and (BKe, BKd) are key pairs, it is assumed that private keys are difficult to compute from the knowledge of the corresponding public keys.
Such a technique of exchanging secret values over an insecure channel is called a key-exchange or a key-agreement protocol. It is important to point out here that such a protocol is usually based on the public-key paradigm; that is to say, we do not know secret-key counterparts for a key-exchange protocol. Since a shared secret between the communicating parties is usually short, the low speed of public-key algorithms is really not a concern in this case.
A digital signature is yet another application of the public-key paradigm. Suppose Alice wants to sign a message M in such a way that the signature S can be verified by anybody but nobody other than Alice would be able to generate the signature S on the message M. This can be achieved as follows: Alice generates a key pair (Ke, Kd), makes Ke public and keeps Kd secret. She now uses the decryption function fd to generate the signature, that is, S = fd(M, Kd). The signature S is then made public. Anybody who has access to Alice’s public key Ke applies the reverse transformation fe to get back the message M = fe(S, Ke).
If Carol signs the message M with a different key
, then she generates the signature S′ = fd(M,
). Now, since
and Ke are not matching keys, verification using Ke gives M′ = fe(S′, Ke), which is different from M. If we assume that M is a message written in a human-readable language (like English), then M′ would generally look like a meaningless sequence of characters which is neither English nor any sensible string to a human reader. So the signature verifier would then immediately conclude that this is a case of forged signature.
Such a scheme of generating digital signatures is called a signature scheme with message recovery. It is obvious that this is the same as our encrypt–decrypt scheme with the sequence of encryption and decryption steps reversed. If the message M to be signed is quite long, using this algorithm calls for a large execution time both for signature generation and for verification. It is, therefore, customary to use another variant of signature schemes called signature schemes with appendix that we describe now.
Instead of applying the decryption transform directly on M, Alice first computes a short representative H(M) of her message M. Her signature now becomes the pair S = (M, σ), where σ = fd(H(M), Kd). Typically, a hash function (see Section 1.2.6) is used to compute the representative H(M) from M and is assumed to be a public knowledge. Now anybody can verify the signature by checking if the equality H(M) = fe(σ, Ke) holds. If a key different from Kd is used to generate the signature, one would (in general) get a value σ′ ≠ σ and the signature forging will be detected by observing that H(M) ≠ fe(σ′, Ke).
By entity authentication, we mean a process in which one entity called the claimant proves its identity to another entity called the prover. Entity-authentication techniques, thus, tend to prevent impersonation of an entity by an intruder. Both secret-key and public-key techniques are used for entity-authentication schemes.
The simplest example of an entity-authentication scheme is the use of passwords, as in a computer where a user (the claimant) tries to gain access to some resources in a computer (the prover) by proving its identity using a password. Password schemes are mostly based on secret-key techniques. For example, the UNIX password system is based on encrypting the zero message (a string of 64 zero bits) using a repeated application of a variant of the DES algorithm with 64 bits of the user input (the password) as the key. Password-based authentication schemes are fixed and time-invariant and are often called weak authentication schemes.
We see applications of public-key techniques in challenge–response authentication schemes (also called strong authentication schemes). Assume that an entity, Alice, wants to prove her identity to another entity, Bob. Alice generates a key pair (Ke, Kd), makes Ke public and keeps Kd secret. Now, Bob chooses a random message M, encrypts M using Alice’s public key—that is, computes C = fe(M, Ke)—and sends C to Alice. Alice, upon reception of C, decrypts it using her private key Kd; that is, she regenerates M = fd(C, Kd) and sends M to Bob. Bob compares this value of M with the one he generated, and if a match occurs, Bob becomes sure that the entity who is claiming to be Alice possesses the knowledge of Alice’s private key. If Carol uses any private key other than Kd for the decryption, she gets a message M′ different from M and thereby cannot prove to Bob her identity as Alice. This is how this scheme prevents impersonation of Alice by Carol.
Entity authentication is often carried out using another interesting technique called zero-knowledge proof. In such a protocol, the prover (or any third party listening to the conversation) gains no knowledge regarding the secret possessed by the claimant, but develops the desired confidence regarding the claim by the claimant of the possession of the secret. We provide here an informal example explaining zero-knowledge proofs.
Let us think of a circular cave as shown in Figure 1.1. The cave has two exits, left and right, denoted by L and R respectively. The cave also has a door inside it, which is invisible outside the cave. Alice (A) wants to prove to Bob (B) that she possesses a key to this door without showing him the key or the process of unlocking the door with the key. Bob stations himself somewhere outside the exits of the cave. Alice enters the cave and randomly chooses the left or right wing of the cave (and goes there). She does not disclose this choice to Bob, because Bob is not allowed to know the session secrets too. Once Alice is placed in the cave, Bob makes a random choice from L and R and asks Alice (using cell phones or by shouting loudly) to come out of the cave via that chosen exit. Suppose Bob challenges Alice to use L. If Alice is in the left wing, she can come out of the cave using L. If Alice is in the right wing, she must use her secret key to open the central door to come to the left wing and then go out using exit L. If Alice does not possess the secret key, she can succeed in obeying Bob’s directive with a probability of half. If this procedure is repeated t times, then the probability that Alice succeeds on all occasions without possessing the secret key is (1/2)t = 1/2t. By choosing t appropriately, Bob can make the probability of accepting a false claim arbitrarily small. For example, if t = 20, then the chance is less than one in a million that Alice can establish a false claim.

Thus, if Alice succeeds every time, Bob gains the desired confidence that Alice actually possesses the secret. However, during this entire process, Bob can obtain no information regarding Alice’s secrets (the key and the choices of wings). Another important aspect of this interaction is that Alice has no way of predicting Bob’s questions, preventing impostors (of Alice) from fooling Bob.
Suppose that a secret piece of information is to be distributed among n entities in such a way that n – 1 (or fewer) entities are unable to construct the secret. All of the n entities must participate to reveal the secret. As usual, let us assume that the secret is an l-bit string. A simple strategy would be to break the string into n parts and provide each entity with a part. This method is, however, not really attractive, because it gives partial information about the secret. Thus, for example, if a 256-bit long bit string is to be distributed equally among 16 entities, any 15 of them working together can reconstruct the secret by trying only 216 = 65536 possibilities for the unknown 16 bits.
We now describe an alternative strategy that does not suffer from this drawback. Once again, we break the secret string into n parts and consider the parts as integers a0, . . . , an–1. We construct the polynomial f(x) = xn+an–1xn–1 + · · · + a1x+a0 and give the integers f(1), f(2), . . . , f(n) to the entities. When all of the entities cooperate, the linear system of equations f(i) = in + an–1in–1 + · · · + a1i + a0, 1 ≤ i ≤ n, can be solved to find out the unknown coefficients a0, . . . , an–1 which, in turn, reveal the secret. On the other hand, if n – 1 or less entities cooperate, they get an underspecified system of equations in n unknowns, from which the actual solution is not readily available.
The secret-sharing problem can be generalized in the following way: to distribute a secret among n parties in such a way that any m or more of the parties can reconstruct the secret (for some m ≤ n), whereas any m – 1 or less parties cannot do the same. A polynomial of degree m as in the above example readily adapts to this generalized situation.
A function which converts bit strings of arbitrary lengths to bit strings of a fixed (finite) length is called a hash function. Hash functions play a crucial role in cryptography. We have already seen an application of it for designing a digital signature scheme with appendix. If H is a hash function, a pair of input values (strings) x1 and x2 for which H(x1) = H(x2) is called a collision for H. For any hash function H, collisions must exist, since H is a map from an infinite set to a finite set. However, for cryptographic purposes we want that collisions should be difficult to obtain. More specifically, a cryptographic hash function H should satisfy the following desirable properties:
Except for a small set of hash values y it should be difficult to find an input x with H(x) = y. We exclude a small set of values, because an adversary might prepare (and maintain) a list of pairs (x, H(x)) for certain values of x of her choice. If the given value of y is the second coordinate of one pair in her list, she can produce the corresponding input value x easily.
Given a pair (x, H(x)), it should be difficult to find an input x′ different from x with H(x) = H(x′).
It should be difficult to find two different input strings x, x′ with H(x) = H(x′).
Hash functions are also called message digests and can be used with a secret key. Popular examples of unkeyed hash functions are SHA-1, MD5 and MD2, whereas those for keyed hash functions include HMAC and CBCMAC.
So far we have seen several protocols which are based on the use of public keys of remote entities, but have never questioned the authenticity of public keys. In other words, it is necessary to ascertain that a public key is really owned by a remote entity. Public-key certificates are used to that effect. These are data structures that bind public-key values to entities. This binding is achieved by having a trusted certification authority digitally sign each certificate.
Typically a certificate is issued for a period of validity. However, it is possible that a certificate becomes invalid before its date of expiry for several reasons, like possible or suspected compromise of the private key. Under such circumstances it is necessary that the certification authority revokes the certificate and maintains a list called certificate revocation list (CRL) of revoked certificates. When Alice verifies the authenticity of Bob’s public-key certificate by verifying the digital signature of the authority and does not find the certificate in the CRL, she gains the desired confidence in using Bob’s public key.
The X 5.09 public-key infrastructure specifies Internet standards for certificates and CRLs.
In this section, we give a short introduction to the realization of public-key cryptosystems. More specifically, we list some of the computationally intensive mathematical problems and describe how the (apparent) intractability of these problems can be used for designing key pairs. We use some mathematical terms that we will introduce later in this book.
The security of the public-key cryptosystems is based on the presumed difficulty of solving certain mathematical problems.
Given the product n = pq of two distinct prime integers p and q, find p and q.
Let G be a finite cyclic (multiplicatively written) group with cardinality n and a generator g. Given an element
, find an integer x (or the integer x with 0 ≤ x ≤ n – 1) such that a = gx in G. Three different types of groups are commonly used for cryptographic applications: the multiplicative group of a finite field, the group of rational points on an elliptic curve over a finite field and the Jacobian of a hyperelliptic curve over a finite field. By an abuse of notation, we often denote the DLP over finite fields as simply DLP, whereas the DLP in elliptic curves and hyper-elliptic curves is referred to as the elliptic curve discrete logarithm problem (ECDLP) and the hyperelliptic curve discrete logarithm problem (HECDLP).
Let G and g be as above. Given elements ga and gb of G, compute the element gab. As in the case of the DLP, the DHP can be applied to the multiplicative group of finite fields, the group of rational points on an elliptic curve and the Jacobian of a hyperelliptic curve.
We show in the next section how (the intractability of) these problems can be exploited to create key pairs for various cryptosystems. These computational problems are termed difficult, intractable, infeasible or intensive in the sense that there are no known algorithms to solve these problems in time polynomially bounded by the input size. The best-known algorithms are subexponential or even fully exponential in some cases. This means that if the input size is chosen to be sufficiently large, then it is infeasible to compute the private key from a knowledge of the public key in a reasonable amount of time. This, in turn, implies (not provably, but as the current state of the art stands) that encryption or signature verification can be done rather quickly (in polynomial time), but the converse process of decryption or signature generation cannot be done in feasible time, unless one knows the private key. As a result, encryption (or signature verification) is called a trapdoor one-way function, that is, a function which is easy to compute but for which the inverse is computationally infeasible, unless some additional information (the trapdoor) is available.
It is, however, not known that these problems are really computationally infeasible, that is, there is no proof of the fact that these problems cannot be solved in polynomial time. As a result, the public-key cryptographic systems based on these problems are not provably secure.
In RSA and similar cryptosystems, one generates two (distinct) suitably large primes p and q and computes the product n = pq. Then φ(n) = (p – 1)(q – 1), where φ denotes Euler’s totient function. One then chooses a random integer e with gcd(e, φ(n)) = 1. There exists an integer d such that ed ≡ 1 (mod φ(n)). The integer e is used as the public key, whereas the integer d is used as the private key.
If the IFP can be solved fast, one can also compute φ(n) easily, and subsequently d can be computed from e using the (polynomial-time) extended GCD algorithm. This is why[2] we say that the RSA cryptosystem derives its security from the intractability of the IFP.
[2] The problem of factoring n = pq is polynomial-time equivalent to computing φ(n) = (p – 1)(q – 1).
In order to see how RSA encryption and decryption work, let the plaintext message be encoded as an integer m with 2 ≤ m < n. The ciphertext message is generated (as an integer) as c = me (mod n). Decryption is analogous, that is, m = cd (mod n). The correctness of the algorithm follows from the fact that ed ≡ 1 (mod φ(n)). It is, however, not proved that one has to know d or φ(n) or the factorization of n in order to decrypt an RSA-encrypted message. But at present no better methods are known.
Let us now consider the discrete logarithm problem. Let G be a finite cyclic multiplicative group (as those mentioned above) where it is easy to multiply two elements, but where it is difficult to compute discrete logarithms. Let g be a generator of G. In order to set up a random key pair over such a group, one chooses the private key as a random integer d, 2 ≤ d < n, where n is the cardinality of G. The public key e is then computed as an element of G as e = gd.
Applications of encryption–decryption schemes based on the key pair (gd, d) are given in Chapter 5. Now, we only remark that many such schemes (like the ElGamal scheme) derive their security from the DHP instead of the DLP, whereas the other schemes (like the Nyberg–Rueppel scheme) do so from the DLP. It is assumed that these two problems are computationally equivalent (at least for the groups of our interest). Obviously, if one assumes availability of a solution of the DLP, one has a solution for the DHP too (gab = (ga)b). The reverse implication is not clear.
As we pointed out earlier, (most of) the public-key cryptosystems are not provably secure in the sense that they are based on the apparent difficulty of solving certain computational problems. It is expedient to know how difficult these problems are. No non-trivial complexity–theoretic statements are available for these problems, and as such it is worthwhile to study the algorithms known till date to solve these problems. Unfortunately, however, many of the algorithms of this kind are often much more complicated than the algorithms for building the corresponding cryptographic systems. One needs to acquire more mathematical machinery in order to understand (and augment) these cryptanalytic algorithms. We devote Chapter 4 to a detailed discussion on these algorithms.
In specific situations, one need not always use these computationally intensive algorithms. Access to a party’s decryption equipment may allow an adversary to gain partial or complete information about the private key by watching a decryption process. For example, an adversary (say, the superuser) might have the capability to read the contents of the memory holding a private key during some decryption process. For another possibility, think of RSA decryption which involves a modular exponentiation. If the standard square-and-multiply algorithm (Algorithm 3.9) is used for this purpose and the adversary can tap some hardware details (like machine cycles or power fluctuations) during a decryption process, she can guess a significant number of the bits in the private key. Such attacks, often called side-channel attacks, are particularly relevant for cryptographic applications based on smart cards.
A cryptographic system is (believed to be) strong if and only if there are no good known mechanisms to break it. It is, therefore, for the sake of security that we must study cryptanalysis. Cryptography and cryptanalysis are deeply intertwined and a complete study of one must involve the other.
In cryptology, there are different models of attacks or attackers.
So far we have assumed that an adversary can only read messages during transmission over a channel. Such an adversary is called a passive adversary. An active adversary, on the other hand, can mutilate or delete messages during transmission and/or generate false messages. An attack mounted by an active (resp.[3] a passive) adversary is called an active (resp. a passive) attack. In this book, we will mostly concentrate on passive attacks.
[3] Throughout the book, resp. stands for respectively.
A two-party communication involves transmission of ciphertext messages over a communication channel. A passive attacker can read these ciphertext messages. In practice, however, an attacker might have more control over the choice of ciphertext and/or plaintext messages. Based on these capabilities of the attacker we have the following types of attacks.
This is the weakest model of the adversary. Here the attacker has absolutely no choices on the ciphertext messages that flow in the channel and also on the corresponding plaintext messages. Using only these ciphertext messages the attacker has to obtain a private key and/or a plaintext message corresponding to a new ciphertext message.
In this kind of attack (also called known-plaintext or known-ciphertext attack), the attacker uses her knowledge of some plaintext–ciphertext pairs. If many such pairs are available to the attacker, she can use these pairs to deduce a pattern based on which she can subsequently gain some information on a new plaintext for which the ciphertext is available. In a public-key scheme, the adversary can generate as many such pairs as she wants, because in order to generate such a pair it is sufficient to have a knowledge of the receiver’s public key. Thus a public-key encryption scheme must provide sufficient security against known plaintext attacks.
In this kind of attack, the attacker knows some plaintext–ciphertext pairs in which the plaintexts are chosen by the attacker. As discussed earlier, such an attack is easily mountable for a public-key encryption scheme.
This is similar to the chosen-plaintext attack with the additional possibility that the attacker chooses the plaintexts in the known plaintext–ciphertext pairs sequentially and adaptively based on the knowledge of the previous pairs. This kind of attack can be easily mounted on public-key encryption systems.
The attacker has knowledge of some plaintext–ciphertext pairs in which the ciphertexts are chosen by the attacker. Such an attack is not directly mountable on a public-key scheme, since obtaining a plaintext from a chosen ciphertext requires knowledge of the private key. However, if the attacker has access to the receiver’s decryption equipment, the machine can divulge the plaintexts corresponding to the ciphertexts that the attacker supplies to the machine. In this context, we assume that the machine does not reveal the private key itself, that is, it has the key stored secretly somewhere in its hardware which the attacker cannot directly access. However, the attacker can run the machine to know the plaintexts corresponding to the ciphertexts of her choice. Later (when the attacker no longer has access to the decryption equipment) the known pairs may be exploited to obtain information about the plaintext corresponding to a new ciphertext.
This is similar to the chosen-ciphertext attack with the additional possibility that the attacker chooses the ciphertexts in the known pairs sequentially and adaptively based on her knowledge of the previously generated plaintext–ciphertext pairs. This attack is mountable in a scenario described in connection with chosen-ciphertext attacks.
For a digital signature scheme, there are equivalent names for these types of attacks. The attacker is assumed to have access to the public key of the signer, because this key is used for signature verification. An attempt to forge signatures based only on the knowledge of this verification key is called a key-only attack. The adversary may additionally possess knowledge of some message–signature pairs. An attack based on this knowledge is called a known-pair or known-message or known-signature attack. If the messages are chosen by the adversary, we call the attack a chosen-message attack. If the adversary generates the sequence of messages in a chosen-message attack adaptively (based on the previously generated message–signature pairs), we have an adaptive chosen-message attack. An (adaptive or non-adaptive) chosen-message attack can be mounted, if the attacker gains access to the signer’s signature generation equipment, or if the signer is willing to sign arbitrary messages provided by the adversary.
The attacker can choose some signatures and generate the corresponding messages by encrypting them with the signer’s public key. The private-key operation on these messages generates the signatures chosen by the attacker. This gives chosen-signature and adaptive chosen-signature attacks on a digital signature scheme. Now the adversary cannot directly control the messages to sign. On the other hand, such an attack is easily mountable, because it utilizes only some public knowledge (the signer’s public key). Indeed, one may treat chosen-signature attacks as variants of key-only attacks.
So far, we have assumed that all the parties connected to a network know the algorithms used in a cryptographic scheme. The security of the scheme is based on the difficulty of obtaining some secret information (the secret or private key).
It, however, remains possible that two parties communicate using an algorithm unknown to other entities. Top-secret communications (for example, during wars or diplomatic transactions) often use private cryptographic algorithms. In this book, we will not deal with such techniques. Our attention is focused mostly on Internet applications in which public knowledge of the algorithms is of paramount importance (for the sake of universal applicability and convenience).
In short, this book is going to deal with a world in which only public public-key algorithms are deployed and in which adversaries are usually passive. A restricted model of the world though it may be, it is general and useful enough to concentrate on. Let us begin our journey!
This chapter provides an overview of the problems that cryptology deals with. The first and oldest cryptographic primitive is encryption for secure transmission of messages. Some other primitives are key exchange, digital signature, authentication, secret sharing, hashing, and digital certificates. We then highlight the difference between symmetric (secret-key) and asymmetric (public-key) cryptography. The relevance of some computationally intractable mathematical problems in public-key cryptography is discussed next, and the working of a prototype public-key cryptosystem (RSA) is explained. We finally discuss different models of attacks on cryptosystems.
Not uncommonly, some people think that cryptology also deals with intrusion, viruses, and Trojan horses. We emphasize that this is never the case. Data and network security is the branch that deals with these topics. Cryptography is also a part of this branch, but not conversely. Imagine that your house is to be secured against theft. First, you need a good lock—that is, cryptography. However, a lock has nothing to prevent a thief from entering the house after breaking the window panes. A bad butler who leaks secret information of the house to the outside world also does not come under the jurisdiction of the lock. Securing your house requires adopting sufficient guards against all these possibilities of theft. In this book, we will study only the technology of manufacturing and breaking locks.
| 2.1 | Introduction |
| 2.2 | Sets, Relations and Functions |
| 2.3 | Groups |
| 2.4 | Rings |
| 2.5 | Integers |
| 2.6 | Polynomials |
| 2.7 | Vector Spaces and Modules |
| 2.8 | Fields |
| 2.9 | Finite Fields |
| 2.10 | Affine and Projective Curves |
| 2.11 | Elliptic Curves |
| 2.12 | Hyperelliptic Curves |
| 2.13 | Number Fields |
| 2.14 | p-adic Numbers |
| 2.15 | Statistical Methods |
| Chapter Summary | |
| Sugestions for Further Reading |
Young man, in mathematics you don’t understand things, you just get used to them.
—John von Neumann
Mathematics contains much that will neither hurt one if one does not know it nor help one if one does know it.
—J. B. Mencken
Mathematics is the Queen of Science but she isn’t very pure; she keeps having babies by handsome young upstarts and various frog princes.
—Donald Kingsbury
In this chapter, we introduce the basic mathematical concepts that one should know in order to understand the public-key cryptographic protocols and the corresponding cryptanalytic algorithms described in the later chapters. If the reader is already familiar with these concepts, she may quickly browse through the chapter in order to know about our notations and conventions.
This chapter is meant for cryptology students and as such does not describe the mathematical topics in their full generality. It is our intention only to state (and, if possible, prove) the relevant results that would be useful for the rest of the book. For further study, we urge the reader to consult the books suggested at the end of this chapter.
Sets are absolutely basic entities used throughout the present-day study of mathematics. Unfortunately, however, we cannot define sets. Loosely speaking, a set is an (unordered) collection of objects. But we run into difficulty with this definition for collections that are too big. Of course, infinite sets like the set of all integers or real numbers are not too big. However, a collection of all sets is too big to be called a set. (Also see Exercise 2.6.) It is, therefore, customary to have an axiomatic definition of sets. That is to say, a collection qualifies to be a set if it satisfies certain axioms. We do not go into the details of this axiomatic definition, but tell the axioms as properties of sets. Luckily enough, we won’t have a chance in the rest of this book to deal with collections that are not sets. So the reader can, for the time being, have faith in the above (wrong) identification of a set as a collection.
An object in a set is commonly called an element of A. By the notation
, we mean that a is an element of the set A. Often a set A can be represented explicitly by writing down its elements within curly brackets or braces. For example, A = {2, 3, 5, 7} denotes the set consisting of the elements 2, 3, 5, 7 which are incidentally all the (positive) prime numbers less than 10. We often use the ellipsis sign (. . .) to denote an infinite (or even a finite) set. For example,
would denote the set of all (positive) prime numbers. (We prove later that
is an infinite set.) Alternatively, we often describe a set by mentioning the properties of the elements of the set. For example, the set
can also be described as
.
Some frequently occurring sets are denoted by special symbols. We list a few of them here.
![]() | The set of all natural numbers, that is, {1, 2, 3, . . .} |
![]() | The set of all non-negative integers, that is, {0, 1, 2, . . .} |
![]() | The set of all integers, that is, {. . . , –2, –1, 0, 1, 2, . . .} |
![]() | The set of all (positive) prime numbers, that is, {2, 3, 5, 7, . . .} |
![]() | The set of all rational numbers, that is, ![]() |
![]() | The set of all non-zero rational numbers |
![]() | The set of all real numbers |
![]() | The set of all non-zero real numbers |
![]() | The set of all complex numbers |
![]() | The set of all non-zero complex numbers |
![]() | The empty set |
The cardinality of a set A is the number of elements in A. We use the symbol #A to denote the cardinality of A. If #A is finite, we call A a finite set. Otherwise A is said to be infinite. The empty set has cardinality zero.
Let A and B be two sets. We say that A is a subset of B and denote this as A ⊆ B, if all elements of A are in B. Two sets A and B are equal (that is, A = B) if and only if A ⊆ B and B ⊆ A. A is said to be a proper subset of B (denoted
), if A ⊆ B and A ≠ B (that is, B ⊈ A).
The union of A and B is the set whose elements are either in A or in B (or both). This set is denoted by A ∪ B. The intersection of A and B is the set consisting of elements that are common to A and B. The intersection of A and B is denoted by A ∩ B. If
, then we say that A and B are disjoint. In that case, the union A∪B is also called a disjoint union and is referred to as by A⊎B. (For a generalization, see Exercise 2.7.) The difference of A and B, denoted A \ B, is the set whose elements are in A but not in B. If A is understood from the context and B ⊆ A, then we denote A \ B by
and refer to
as the complement of B (in A). The product A × B of two sets A and B is the set of all ordered pairs (a, b) where
and
.
The notion of union, intersection and product of sets can be readily extended to an arbitrary family of sets. Let Ai,
, be a family of sets indexed by I. In this case, we denote the union and intersection of Ai,
, by
and
respectively. The product of Ai,
, is denoted by
. When Ai = A for all
, we denote the product also as AI. If, in addition, I is a finite set of cardinality n, then the product AI is also written as An.
A relation ρ on a set A is a subset of A × A. For
, we usually say a ρ b implying that a is related by ρ to b. Common examples are the standard relations =, ≠, ≤, <, ≥, > on
(or
or
).
A relation ρ on a set A is called reflexive, if a ρ a for all
. For example, =, ≤ and ≥ are reflexive relations on
, but the relations ≠, <, > are not.
A relation ρ on A is called symmetric, if a ρ b implies b ρ a. On the other hand, ρ is called anti-symmetric if a ρ b and b ρ a imply a = b. For example, = is symmetric and anti-symmetric, <, ≤, > and ≥ are anti-symmetric but not symmetric, ≠ is symmetric but not anti-symmetric.
A relation ρ on A is called transitive if a ρ b and b ρ c imply a ρ c, For example, =, <, ≤, >, ≥ are all transitive, but ≠ is not transitive.
An equivalence relation is one which is reflexive, symmetric and transitive. For example, = is an equivalence relation on
, but neither of the other relations mentioned above (≠, <, ≥ and so on) is an equivalence relation on
.
A partition of a set A is a collection of pairwise disjoint subsets Ai,
, of A, such that
, that is, A is the union of Ai,
, and for i,
, i ≠ j,
. The following theorem establishes an important connection between equivalence relations and partitions.
|
An equivalence relation on a set A produces a partition of A. Conversely, every partition of a set A corresponds to an equivalence relation on A. Proof Let ρ be an equivalence relation on a set A. For Conversely, let Ai, |
The subset [a] of A defined in the proof of the above theorem is called the equivalence class of a with respect to the equivalence relation ρ.
An anti-symmetric and transitive relation is called a partial order (or simply an order). All of the relations =, ≤, <, ≥, > are partial orders on
(but ≠ is not). A partial order ρ on A is called a total order or a linear order or a simple order, if for every a,
, a ≠ b, either a ρ b or b ρ a. For example, if we take A = {1, 2, 3} and the relation ρ = {(1, 2), (1, 3)}, then ρ is a partial order but not a total order (because it does not specify a relation between 2 and 3). On the other hand, ρ′ = {(1, 2), (1, 3), (2, 3)} is a total order. A set with a partial (resp. total) order is often called a partially ordered (resp. totally ordered or linearly ordered or simply ordered) set.
Let A and B two sets (not necessarily distinct). A function or a map f from A to B, denoted f : A → B, assigns to each
some element
. In this case, we write b = f(a) or f maps a ↦ b and say that b is the image of a (under f). For example, if
, then the assignment a ↦ a2 is a function. On the other hand, the assignment
(the non-negative square root) is not a function, because it is not defined for negative values of a. However, if
and
, then the assignment
(with non-negative real and imaginary parts) is a function.
The function f : A → A assigning a ↦ a for all
is called the identity map on A and is usually denoted by idA. On the other hand, if f : A → B maps all the elements of A to a fixed element of B, then f is said to be a constant function. A function which is not constant is called a non-constant function.
A function f : A → B that maps different elements of A to different elements of B is called injective or one-one. In other words, we call f to be injective if and only if f(a) = f(a′) implies a = a′. The function
given by a ↦ a2 is not injective, since f(–a) = f(a) for all
. On the other hand, the function
given by a ↦ 2a is injective. An injective map f : A → B is sometimes denoted by the special symbol f : A ↪ B.
The image of a function f : A → B is defined to be the following subset of
. It is denoted by f(A) or by Im f. The function f is said to be surjective or onto or a surjection, if Im f = B, that is, every element b of B has at least one preimage
(which means f(a) = b). As an example, the function
given by a ↦ a/2 (if a is even) and by a ↦ (a – 1)/2 (if a is odd) is surjective, whereas the function
that maps a → |a| (the absolute value) is not surjective. A surjective map f : A → B is sometimes denoted by the special symbol f : A ↠ B.
A map f : A → B is called bijective or a bijection, if it is both injective and surjective. For example, the identity map on a set is bijective. Another example of a bijective function is
that maps a to the ath prime.
Let f : A → B and g : B → C be functions. The composition of f and g is the function from A to C that takes a ↦ g(f(a)). It is denoted by g ο f, that is, (g ο f)(a) = g(f(a)). Note that in the notation g ο f one applies f first and then g. The notion of composition of functions can be extended to more than two functions. In particular, if f : A → B, g : B → C and h : C → D are functions, then (h ο g) ο f and h ο (g ο f) are the same function from A to D, so that we can unambiguously write this as h ο g ο f.
The study of mathematics is based on certain axioms. We state four of these axioms. It is not possible to prove the axioms independently, but it can be shown that they are equivalent in the sense that each of them can be proved, if any of the others is assumed to be true.
Let A be a partially ordered set under the relation
. An element
is called maximal (resp. minimal), if there is no element
, b ≠ a, that satisfies
(resp.
). Let B be a non-empty subset of A. Then an upper bound (resp. a lower bound) for B is an element
such that
(resp.
) for all
. If an upper bound (resp. a lower bound) a of B is an element of B, then a is called a last element or a largest element or a maximum element (resp. a first element or a least element or a smallest element or a minimum element) of B. By antisymmetry, it follows that a first (resp. last) element of B, if existent, is unique. A chain of A is a totally ordered (under
) subset of A.
Consider the sets
,
and
with the natural order ≤. Neither of these sets contains a maximal element.
contains a minimal element 1, but
and
do not contain minimal elements. The subset
of even natural numbers has two lower bounds, namely 1 and 2, of which 2 is the first element of
.
A totally ordered set A is said to be well ordered (and the relation is called a well order), if every non-empty subset B of A contains a first element.
|
Every set A can be well ordered, that is, there is a relation |
The set
is well-ordered under the natural relation ≤. The set
can be well ordered by the relation
defined as
. A well ordering of
is not known.
|
Let A be a partially ordered set. If every chain of A has an upper bound (in A), then A has at least one maximal element. |
To illustrate Zorn’s lemma, consider any non-empty set A and define
to be the set of all subsets of A.
is called the power set of A and is partially ordered under containment ⊆. A chain of
is a set
of subsets of A such that for all i,
either Ai ⊆ Aj or Aj ⊆ Ai. Clearly, the union
is an upper bound of the chain. Then Zorn’s lemma guarantees that
has at least one maximal element. In this case, the maximal element, namely A, is unique. If A is finite, then for the set
of all proper subsets of A, a maximal element (under the partial order ⊆) exists by Zorn’s lemma, but is not unique, if #A > 1.
|
Let |
Finally, let A be a set and
, that is,
is the set of all non-empty subsets of A. A choice function of A is a function
such that for every
we have
.
|
Every set has a choice function. |
| 2.1 |
|
| 2.2 | Let f : A → B and g : B → A be functions. Show that if f ο g = idB, then g is injective and f is surjective. In particular, f (and also g) is bijective, if f ο g = idB and g ο f = idA. In this case, we call g to be the inverse of f and denote this as g = f–1. Show by examples that both the conditions f ο g = idB and g ο f = idA are necessary for f to be bijective. |
| 2.3 | Let f : A → B a map from a finite set A to a finite set B. Prove that
|
| 2.4 | Let A be a finite set and let f : A → A be a map. Show that the following conditions are equivalent.
Show by examples that this equivalence need not hold, if A is an infinite set. |
| 2.5 | Let A and B be two arbitrary sets, f : A → B a map, A′ ⊆ A and B′ ⊆ B. We define and . Show that:
|
| 2.6 | Russell’s paradox A collection C is called ordinary, if C is not a member of C. A collection which is not ordinary is called extraordinary. Show that the collection of all ordinary collections is neither ordinary nor extraordinary. |
| 2.7 | Let Ai, , be a family of sets (not necessarily pairwise disjoint). For each , consider the set . Show that the family Bi, , are pairwise disjoint. The union is called the disjoint union of Ai, .
|
So far we have studied sets as unordered collections. However things start getting interesting if we define one or more binary operations on sets. Such operations define structures on sets and we compare different sets in light of their respective structures. Groups are the first (and simplest) examples of sets with binary operations.
|
A binary operation on a set A is a map from A × A to A. If ◊ is a binary operation on A, it is customary to write a ◊ a′ to denote the image of (a, a′) (under ◊). |
For example, addition, subtraction and multiplication are all binary operations on
(or
or
). Subtraction is not a binary operation on
, since, for example, 2 – 3 is not an element of
. Division is not a binary operation on
, since division by zero is not defined. Division is a binary operation on
.
|
A group[1] (G, ◊) is a set G together with a binary operation ◊ on G, that satisfy the following three conditions:
|
A group (G, ◊) is also written in short as G, when the operation ◊ is understood from the context. More often than not, the operation ◊ is either addition (+) or multiplication (·) in which cases we also say that G is respectively an additive or a multiplicative group. For a multiplicative group, we often omit the multiplication sign and denote a · b simply as ab. The identity in an additive group is usually denoted by 0, whereas that in a multiplicative group by 1. The inverse of an element a in these cases are denoted respectively by –a and a–1. Groups written additively are usually Abelian, but groups written multiplicatively need not be so.
Note that associativity allows us to write a ◊ b ◊ c unambiguously to represent (a ◊ b) ◊ c = a ◊ (b ◊ c). More generally, if
, then a1 ◊ ··· ◊ an represents a unique element of the group irrespective of how we insert brackets to compute the element a1 ◊ ··· ◊ an.
|
|
Let (G, ◊) be a group and let a, b, Proof We prove only the left cancellation law. The proof of the other law is similar. Let e denote the identity of G and d the inverse of a. Then b = e ◊ b = (d ◊ a) ◊ b = d ◊ (a ◊ b) = d ◊ (a ◊ c) = (d ◊ a) ◊ c = e ◊ c = c. |
|
Let (G, ◊) be a group. Then a subset H of G is called a subgroup of G, if H is a group under the operation ◊ inherited from G. For a subset H of G to be a subgroup, it is necessary and sufficient that H is closed under the operation ◊ and under inverse. Any subgroup of an Abelian group is also Abelian. |
|
Let (G, ◊) be a group. For subsets A and B of G, we denote by A ◊ B the set
. In particular, if A = {a} (resp. B = {b}), then A ◊ B is denoted by a ◊ B (resp. A ◊ b). Note that the sets A ◊ B and B ◊ A are not necessarily equal. If G is Abelian, then A ◊ B = B ◊ A.
|
Let (G, ◊) be a group, H a subgroup of G and |
From now onward, we consider left cosets only and call them cosets. If the underlying group is Abelian, then they are the same thing. The theory of right cosets can be parallelly developed, but we choose to omit that here. For simplicity, we also assume that the group G is a multiplicative group, so that the operation ◊ would be replaced by · (or by mere juxtaposition).
|
Let G be a (multiplicative) group and H a subgroup of G. Then, the cosets aH, Proof We define a relation ~ on G such that a ~ b if and only if Now we define a map |
The following theorem is an important corollary to the last proposition.
|
Let G be a finite group and H a subgroup of G. Then, the cardinality of G is an integral multiple of the cardinality of H. Proof From Proposition 2.2, the cosets form a partition of G and there is a bijective map from one coset to another. Hence by Exercise 2.3 all cosets have the same cardinality. Finally, note that H is the coset of the identity element. |
|
Let G be a group and H a subgroup of G. The number of distinct cosets of H in G is called the index of H in G and is denoted by [G : H]. If G is finite, then [G : H] = #G/#H. |
|
Let H be a subgroup of a (multiplicative) group G. Then H is called a normal subgroup of G, if (aH)(bH) = (abH) for all a, If H is a normal subgroup of a group G, then the cosets aH, |
|
|
Let (G, ◊) and (G′, ⊙) be groups. A function f : G → G′ is called a homomorphism (of groups), if f(a ◊ b) = f(a) ⊙ f(b) for all a, A group homomorphism f : G → G′ is called an isomorphism, if there exists a group homomorphism g : G′ → G such that g ο f = idG and f ο g = idG′. It can be easily seen that a homomorphism f : G → G′ is an isomorphism if and only if f is bijective as a function.[2] If there exists an isomorphism f : G → G′, we say that the groups G and G′ are isomorphic and write G ≅ G′.
A homomorphism f from G to itself is called an endomorphism (of G). An endomorphism which is also an isomorphism is called an automorphism. The set of all automorphisms of a group G is a group under function composition. We denote this group by Aut G. |
|
|
Let f be a group homomorphism from (G, ◊) to (G′, ⊙). Let e and e′ denote the identity elements of G and G′ respectively. Then f(e) = e′. If a, Proof We have e′ ⊙ f(e) = f(e) = f(e ◊ e) = f(e) ⊙ f(e), so that by right cancellation f(e) = e′. To prove the second assertion we note that c ⊙ d = e′ = f(e) = f(a ◊ b) = f(a) ⊙ f(b) = c ⊙ f(b). Thus f(b) = d. |
|
With the notations of the last proposition we define the kernel of f to be the following subset of G: Ker We also define the image of f to be the subset Im of G′. Then we have the following important theorem. |
|
Ker f is a normal subgroup of G, Im f is a subgroup of G′, and G/ Ker f ≅ Im f. Proof In order to simplify notations, let us assume that G and G′ are multiplicatively written groups. For u, Now define a map |
|
Let G be a group. In this section, we assume, unless otherwise stated, that G is multiplicatively written and has identity e. Let ai,
with the empty product (corresponding to r = 0) being treated as e. It is easy to check that H is a subgroup of G and contains all ai, |
|
|
Let G be a finite group with identity e. The order of G is defined to be the cardinality of the set G and is denoted by ord G. The order of an element |
With these notations we prove the following important proposition.
|
The order m := ordG a of Proof Let H be the (cyclic) subgroup of G generated by a. Then by Example 2.5 H = {ar | r = 0, . . . , m – 1} and m is the smallest of the positive integers r for which ar = e. By Lagrange’s theorem (Theorem 2.2), n is an integral multiple of m. That is, n = km for some |
|
Let G be a finite cyclic group. Then any subgroup of G is also cyclic. Proof Let G be generated by g and ord G = n. Then G = {gr | r = 0, . . . , n – 1}. The subgroup {e} of G is clearly cyclic. For an arbitrary subgroup H ≠ {e} of G, define |
|
Let G be a finite cyclic multiplicative group with identity e and let H be a subgroup of order m. Then an element Proof If |
Finite cyclic groups play a crucial role in public-key cryptography. To see how, let G be a group which is finite, cyclic with generator g and multiplicatively written. Given
one can compute gr using ≤ 2 lg r + 2 group multiplications (See Algorithms 3.9 and 3.10). This means that if it is easy to multiply elements of G, then it is also easy to compute gr. On the other hand, there are certain groups for which it is very difficult to find out the integer r from the knowledge of g and gr, even when one is certain that such an integer exists. This is the basic source of security in many cryptographic protocols, like those based on finite fields, elliptic and hyperelliptic curves.
Sylow’s theorem is a powerful tool for studying the structure of finite groups. Recall that if G is a finite group of order n and if H is a subgroup of G of order m, then by Lagrange’s theorem m divides n. But given any divisor m′ of n, there need not exist a subgroup of G of order m′. However, for certain special values of m′, we can prove the existence of subgroups of order m′. Sylow’s theorem considers the case that m′ is a power of a prime.
|
Let G be a finite group of cardinality n and let p be a prime. If n = pr for some |
We shortly prove that p-Sylow subgroups always exist. Before doing that, we prove a simpler result.
|
Let G be a finite group and p a prime dividing ord G. Then G has a subgroup of order p. Proof Let n := ord G. Note that if we can find an element |
Now we are in a position to prove the general theorem.
|
Let G be a finite group of order n and let p be a prime dividing n. Then there exists a p-Sylow subgroup of G. Proof We proceed by induction on n. If n = p, then G itself is a p-Sylow subgroup of G. So we assume n > p and write n = prm, where p does not divide m. If r = 1, then the theorem follows from Cauchy’s theorem (Theorem 2.4). So we assume r > 1 and consider the class equation of G, namely, |
Note that if H is a p-Sylow subgroup of G and
, then gHg–1 is also a p-Sylow subgroup of G. The converse is also true, that is, if H and H′ are two p-Sylow subgroups of G, then there exists a
such that H′ = gHg–1. We do not prove this assertion here, but mention the following important consequence of it. If G is Abelian, then H′ = gHg–1 = gg–1H = H, that is, there is only one p-Sylow subgroup of G. If G is Abelian and
with pairwise distinct primes pi and with
, then G is the internal direct product of its pi-Sylow subgroups, i = 1, . . . , t (Exercises 2.17 and 2.19).
| 2.8 | Let G be a multiplicatively written group (not necessarily Abelian). Prove the following assertions. |
| 2.9 | Let G be a multiplicatively written group and let H and K be subgroups of G. Show that:
|
| 2.10 |
|
| 2.11 | Let G be a (multiplicative) group.
|
| 2.12 | |
| 2.13 | Let H be a subgroup of G generated by ai, . Show that H is the smallest subgroup of G, that contains all of ai, .
|
| 2.14 | Let be a homomorphism of (multiplicative) groups. Show that:
|
| 2.15 | Let G be a cyclic group. Show that G is isomorphic to or to for some depending on whether G is infinite or finite.
|
| 2.16 | Let G be a finite (multiplicative) group (not necessarily Abelian).
|
| 2.17 | Let G be a (multiplicative) Abelian group with identity e and order , where pi are distinct primes and . For each i, let Hi be the pi-Sylow subgroup of G. Show that:
|
| 2.18 | Let G be a finite (multiplicative) Abelian group with identity e. Assume that for every there are at most n elements x of G satisfying xn = e. Show that G is cyclic. [H]
|
| 2.19 | Let G be a (multiplicative) group and let H1, . . . , Hr be normal subgroups of G. If G = H1 · · · Hr and every element can be written uniquely as g = h1 · · · hr with , then G is called the internal direct product of H1, . . . , Hr. (For example, if G is finite and Abelian, then by Exercise 2.17 it is the internal direct product of its Sylow subgroups.) Show that:
|
| 2.20 | Let Hi, i = 1, . . . , r, be finite Abelian groups of orders mi and let H := H1 × · · ·× Hr be their direct product. Show that H is cyclic if and only if each Hi is cyclic and m1, . . . , mr are pairwise coprime. |
So far we have studied algebraic structures with only one operation. Now we study rings which are sets with two (compatible) binary operations. Unlike groups, these two operations are usually denoted by + and · . One can, of course, go for general notations for these operations. However, that generalization doesn’t seem to pay much, but complicates matters. We stick to the conventions.
|
A ring (R, +, ·) (or R in short) is a set R together with two binary operations + and · on R such that the following conditions are satisfied. As in the case of multiplicative groups we write ab for a · b.
|
Notice that it is more conventional to define a ring as an algebraic structure (R, +, ·) that satisfies conditions (1), (2) and (5) only. A ring (by the conventional definition) is called a commutative ring (resp. a ring with identity), if it (additionally) satisfies condition (3) (resp. (4)). As per our definition, a ring is always a commutative ring with identity. Rings that are not commutative or that do not contain the identity element are not used in the rest of the book. So let us be happy with our unconventional definition of a ring.[3]
[3] Cool! But what’s circular in a ring? Historically, such algebraic structures were introduced by Hilbert to designate a Zahlring (a number ring, see Section 2.13). If α is an algebraic integer (Definition 2.95) and we take a Zahlring of the form
and consider the powers α, α2, α3, . . . , we eventually get an αd which can be expressed as a linear combination of the previous (that is, smaller) powers of α. This is perhaps the reason that prompted Hilbert to call such structures “rings”. Also see Footnote 1.
We do not rule out the possibility that 0 = 1 in R. In that case, for any
, we have a = a · 1 = a · 0 = 0 (See Proposition 2.6), that is to say, the set R consists of the single element 0. In this case, R is called the zero ring and is denoted (by an abuse of notation) by 0.
Finally, note that R is, in general, not a group under multiplication. This is because we do not expect a ring R to contain the multiplicative inverse of every element of R. Indeed the multiplicative inverse of the element 0 exists if and only if R = 0.
|
|
Let R be a ring. For all a,
Proof
|
|
|
Let R be a ring.
|
|
A field is an integral domain. Proof Recall from Definition 2.13 that an element in a ring cannot be simultaneously a unit and a zero-divisor. |
|
Let R be a non-zero ring. The characteristic of R, denoted char R, is the smallest positive integer n such that 1 + 1 + · · · + 1 (n times) = 0. If no such integer exists, then we take char R = 0. |
,
,
and
are rings of characteristic zero. If R is a non-zero finite ring, then the elements 1, 1 + 1, 1 + 1 + 1, · · · cannot be all distinct. This shows that there are positive integers m and n, m < n, such that 1+1+· · · + 1 (n times) = 1 + 1 + · · · + 1 (m times). But then 1 + 1 + · · · + 1 (n – m times) = 0. Thus any non-zero finite ring has positive (that is, non-zero) characteristic. If char R = t is finite, then for any
one has
.
In what follows, we will often denote by n the element 1 + 1 + · · · + 1 (n times) of any ring. One should not confuse this with the integer n. One can similarly identify a negative integer –n with the ring element –(1 + 1 + · · · + 1)(n times) = (–1) + (–1) + · · · + (–1)(n times).
|
Let R be an integral domain of positive characteristic p. Then p is a prime. Proof If p is composite, then we can write p = mn with 1 < m < p and 1 < n < p. But then p = mn = 0 (in R). Since R is an integral domain, we must have m = 0 or n = 0 (in R). This contradicts the minimality of p. |
Just as we studied subgroups of groups, it is now time to study subrings of rings. It, however, turns out that subrings are not that important for the study of rings as the subsets called ideals are. In fact, ideals (and not subrings) help us construct quotient rings. This does not mean that ideals are “normal” subrings! In fact, ideals are, in general, not subrings at all, and conversely. The formal definitions are waiting!
|
Let R be a ring. A subset S of R is called a subring of R, if S is a ring under the ring operations of R. In this case, one calls R a superring or a ring extension of S. If R and S are both fields, then S is often called a subfield of R and R a field extension (or simply an extension) of S. In that case, one also says that S ⊆ R is a field extension or that R is an extension over S. |
is a subring of
,
and
, whereas
and
are field extensions.
We demand that a ring always contains the multiplicative identity (Definition 2.12). This implies that if S is a subring of R, then for all integers n, the elements
are also in S (though they need not be pairwise distinct). Similarly, if R and S are fields, then S contains all the elements of the form mn–1 for m,
,
(cf. Exercise 2.26). Thus
, the set of all even integers, is not a subring of
, though it is a subgroup of (
, +) (Example 2.2).
|
Let R be a ring. A subset
|
In this book, we will use Gothic letters (usually lower case) like
,
,
,
,
to denote ideals.[5]
[5] Mathematicians always run out of symbols. Many believe if it is Gothic, it is just ideal!
The condition for being an ideal is in one sense more stringent than that for being a subring, that is, an ideal has to be closed under multiplication by any element of the entire ring. On the other hand, we do not demand an ideal to necessarily contain the identity element 1. In fact,
is an ideal of
. Conversely,
is a subring of
but not an ideal. Subrings and ideals are different things.
|
|
The only ideals of a field are the zero ideal and the unit ideal. Proof By definition, every non-zero element of a field is a unit. |
|
Let R be a ring and ai, An integral domain every ideal of which is principal is called a principal ideal domain or PID in short. A ring every ideal of which is finitely generated is called Noetherian. Thus principal ideal domains are Noetherian. |
Note that an ideal may have different generating sets of varying cardinalities. For example, the unit ideal in any ring is principal, since it is generated by 1. The integers 2 and 3 generate the unit ideal of
, since
. However, neither 2 nor 3 individually generates the unit ideal of
. Indeed, using Bézout’s relation (Proposition 2.16) one can show that for every
there is a (minimal) generating set of the unit ideal of
, that contains exactly n integers. Interested readers may try to construct such generating sets as an (easy) exercise.
|
Proof The zero ideal is generated by 0. Let |
A very similar argument proves the following theorem. The details are left to the reader. Also see Exercise 2.31.
We now prove a very important theorem:
|
If R is a Noetherian ring, then so is the polynomial ring R[X1, . . . , Xn] for Proof Using induction on n we can reduce to the case n = 1. So we prove that if R is Noetherian, then R[X] is also Noetherian. Let |
Two particular types of ideals are very important in algebra.
|
Let R be a ring.
|
Prime and maximal ideals can be characterized by some nice equivalent criteria. See Proposition 2.9.
|
Let R be a ring and We say that two elements a, |
|
|
Let R be a ring and
Proof
|
The last proposition in conjunction with Corollary 2.1 indicates:
|
Maximal ideals are prime. |
|
For every Proof Since |
Recall how we have defined homomorphisms of groups. In a similar manner, we define homomorphisms of rings. A ring homomorphism is a map from one ring to another, which respects addition, multiplication and the identity element. More precisely:
|
Let R and S be rings. A map f : R → S is called a (ring)homomorphism, if f(a+b) = f(a) + f(b) and f(ab) = f(a)f(b) for all a, A homomorphism f : R → R is called an endomorphism of R. An automorphism is a bijective endomorphism. |
|
|
Let f : R → S be a ring homomorphism.
Proof
|
The ideal
of the above proposition is called the contraction of
and is often denoted by
. If R ⊆ S and f is the inclusion homomorphism, then
.
|
Let f : R → S be a ring homomorphism. The set |
|
With the notations of the last definition, Ker f is an ideal of R, Im f is a subring of S and R/ Ker f ≅ Im f. Proof Consider the map |
|
Two ideals |
|
Let R be a ring and Proof The assertion is obvious for n = 1. So assume that n ≥ 2 and define the map |
In Section 2.5, we will see an interesting application of this theorem. Notice that the injectivity of
in the last proof does not require the coprimality of
; the surjectivity of
requires this condition.
Now we introduce the concept of divisibility in a ring. We also discuss about an important type of rings known as unique factorization domains. This study is a natural generalization of that of the rings
and K[X], K a field.
|
Let R be a ring, a,
|
Note that for
the concepts of prime and irreducible elements are the same. This is indeed true for any PID (Proposition 2.12). Thus our conventional definition of a prime integer p > 0 as one which has only 1 and p as (positive) divisors tallies with the definition of irreducible elements above. For the ring K[X], on the other hand, it is more customary to talk about irreducible polynomials instead of prime polynomials; they are the same thing anyway.
|
Let R be an integral domain and Proof Let p = ab. Then p|(ab), so that by hypothesis p|a or p|b. If p|a, then a = up for some |
|
Let R be a PID. An element Proof [if] Let p be irreducible, but not prime. Then there are a, [only if] Immediate from Proposition 2.11. |
|
An integral domain R is called a unique factorization domain or a UFD in short, if every non-zero element If |
|
Let R be a UFD. An element Proof The only if part is immediate from Proposition 2.11. For proving the if part, let p = up1 · · · pr ( |
A classical example of an integral domain that is not a UFD is
. In this ring, we have two essentially different factorizations
of 6 into irreducible elements. The failure of irreducible elements to be primes in such rings is a serious thing to patch up!
|
A PID is a UFD Proof Let R be a PID and |
The converse of the above theorem is not necessarily true. For example, the polynomial ring K[X1, . . . , Xn] over a field K is a UFD for every
, but not a PID for n ≥ 2.
Divisibility in a UFD can be rephrased in terms of prime factorizations. Let R be a UFD and let the non-zero elements a,
have the prime factorizations
and
with units u, u′, pairwise non-associate primes p1, . . . , pr and with αi ≥ 0 and βi ≥ 0. Then a|b if and only if αi ≤ βi for all i = 1, . . . , r. This notion leads to the following definitions.
|
Let R be a UFD and let a, |
It is clear that these definitions of gcd and lcm can be readily generalized for any arbitrary finite number of elements.
|
Let R be a UFD and a, Proof Immediate from the definitions. |
|
Let R be a UFD and a, b, Proof Consider the prime factorizations of a, b and c. |
For a PID, the gcd and lcm have equivalent characterizations.
|
Let R be a PID and a, b be non-zero elements of R. Let d be a gcd of a and b. Then 〈d〉 = 〈a〉 + 〈b〉. If f is an lcm of a and b, then 〈f〉 = 〈a〉 ∩ 〈b〉. Proof Let 〈a〉 + 〈b〉 = 〈c〉. We show that c and d are associates. There exist u, |
A direct corollary to the last proposition is the following.
|
Let R be a PID, a, |
This completes our short survey of factorization in rings. Note that
and K[X] (for a field K) are PID and hence UFD. Thus all the results we have proved in this section apply equally well to both these rings. It is because of this (and not of a mere coincidence) that these two rings enjoy many common properties. Thus our abstract treatment saves us from the duplicate effort of proving the same results once for integers (Section 2.5) and once more for polynomials (Section 2.6).
| 2.21 | For a non-zero ring R, prove the following assertions:
Let K be a field. What are the units in the polynomial ring K[X]? In K[X1, . . . , Xn]? In the ring K(X) of rational functions? In K(X1, . . . , Xn)? |
| 2.22 | Binomial theorem Let R be a ring, a,
are the binomial coefficients. |
| 2.23 | Show that every non-zero ring has a maximal (and hence prime) ideal. More generally, show that every non-unit ideal of a non-zero ring is contained in a maximal ideal. [H] |
| 2.24 | Let R be a ring.
|
| 2.25 | Show that a finite integral domain R is a field. [H] |
| 2.26 | Let R be a ring of characteristic 0. Show that:
|
| 2.27 | Let f : R → S be a ring-homomorphism and let and be ideals in R and S respectively. Find examples to corroborate the following statements.
|
| 2.28 | Let K be a field.
|
| 2.29 |
|
| 2.30 | Let R be a ring and let and be ideals of R with . Show that is an ideal of and that . [H]
|
| 2.31 | An integral domain R is called a Euclidean domain (ED) if there is a map satisfying the following two conditions:
Show that:
|
| 2.32 | Let R be a ring and an ideal. Consider the set
Show that
|
| 2.33 | Let R be a ring. An ascending chain of ideals is a sequence . The ascending chain is called stationary, if there is some such that for all n ≥ n0. Show that the following conditions are equivalent. [H]
|
| 2.34 |
|
The set
of integers is the main object of study in this section. We use many results from previous sections to derive properties of integers. Recall that
is a PID and hence a UFD.
The notions of divisibility, prime and relatively prime integers, gcd and lcm of integers are essentially the same as discussed in connection with a PID or a UFD. We avoid repeating the definitions here, but concentrate on other useful properties of integers, not covered so far. We only mention that whenever we talk about a prime integer, or the gcd or lcm of two or more integers, we will usually refer to a non-negative integer. This convention makes primes, gcds and lcms unique.
|
There are infinitely many prime integers. Proof Let |
|
For an integer a and an integer b ≠ 0, there exist unique integers q and r such that a = qb + r with 0 ≤ r < |b|. Proof Call the smallest non-negative element in the set |
The integers q and r in the above theorem are respectively called the quotient and the remainder of Euclidean division of a by b and are denoted respectively by a quot b and a rem b. Do not confuse Euclidean division with the division (that is, the inverse of multiplication) of the ring
. Euclidean division is the basis of the Euclidean gcd algorithm. More specifically:
|
For integers a, b with b ≠ 0, let r be the remainder of Euclidean division of a by b. Then gcd(a, b) = gcd(b, r). Proof Clearly, 〈a〉 + 〈b〉 = 〈r〉 + 〈b〉. Now use Proposition 2.14. |
|
Let a and b be two integers, not both zero, and let d be the (positive) gcd of a and b. Then there are integers u and v such that d = ua + vb. (Such an equality is called a Bézout relation.) Furthermore, if a and b are both non-zero and (|a|, |b|) ≠ (1, 1), then u and v can be so chosen that |u| < |b| and |v| < |a|. Proof The existence of u and v follows immediately from Proposition 2.14. If a = qb, then u = 0 and v = 1 is a suitable choice. So assume that a b and b a, in which case d < |a| and d < |b|. We may assume, without loss of generality, that a and b are positive. First note that if (u, v) satisfies the Bézout relation, then for any |
The notions of the gcd and of the Bézout relation can be generalized to any finite number of integers a1, . . . , an as
gcd(a1, . . . , an) = gcd(· · · (gcd(gcd(a1, a2), a3) · · ·), an) = u1a1 + · · · + unan
for some integers u1, . . . , un (provided that all the gcds mentioned are defined).
Since
is a PID, congruence modulo a non-zero ideal of
can be rephrased in terms of congruence modulo a positive integer as follows.
|
Let |
By an abuse of notation, we often denote the equivalence class [a] of
simply by a. The following are some basic properties of congruent integers.
|
Let
Proof (1) and (2) follow from the consideration of the quotient ring |
Let
with gcd(ni, nj) = 1 for i ≠ j. Then lcm(n1, . . . , nr) = n1 · · · nr, and by the Chinese remainder theorem (Theorem 2.10), we have

This implies that, given integers a1, . . . , ar, there exists an integer x unique modulo n1 · · · nr such that x satisfies the following congruences simultaneously:
| x | ≡ | a1 (mod n1) |
| x | ≡ | a2 (mod n2) |
| ⋮ | ||
| x | ≡ | ar (mod nr) |
We now give a procedure for constructing the integer x explicitly. Define N := n1 · · · nr and Ni := N/ni for 1 ≤ i ≤ r. Then for each i we have gcd(ni, Ni) = 1 and, therefore, there are integers ui and vi with uini + viNi = 1. Then
(mod N) is the desired solution.
Let
. We now study the multiplicative group
of the ring
. We say that an integer
has a multiplicative inverse modulo n, if
, or, equivalently, if there is an integer b with ab ≡ 1 (mod n). The following proposition is an important characterization of the elements of
.
|
(The equivalence class of) an integer a belongs to Proof [if] By Proposition 2.16, there exist integers u and v such that ua + vn = 1. But then ua ≡ 1 (mod n). [only if] For some integers u and v, we have ua + vn = 1, which implies that the gcd of a and n divides 1 and hence is equal to 1. |
|
The cardinality of |
The following two theorems are immediate consequences of Proposition 2.4.
|
Let aφ(n) ≡ 1 (mod n). |
|
Let p be a prime and ap–1 ≡ 1 (mod p). For any integer |
|
For every prime p, we have (p – 1)! ≡ –1 (mod p). Proof The result holds for p = 2. So assume that p is an odd prime. Since Equation 2.1
Looking at the constant terms in two sides proves Wilson’s theorem. |
The structure of the group
,
, can be easily deduced from Fermat’s little theorem. This gives us the following important result.
|
For a prime p, the group Proof For every divisor d of p –1, we have Xp–1–1 = (Xd–1)f(X) for some |
Euler’s totient function plays an extremely important role in number theory (and cryptology). We now describe a method for computing it.
|
If n and n′ are relatively prime positive integers, then φ(nn′) = φ(n)φ(n′). Proof If a is invertible modulo nn′, then clearly it is invertible modulo both n and n′. Conversely, if ua ≡ 1 (mod n) and u′a′ ≡ 1 (mod n′), then by the Chinese remainder theorem there are integers x and α, unique modulo nn′, satisfying x ≡ u (mod n), x ≡ u′ (mod n′), α ≡ a (mod n) and α ≡ a′ (mod n′). But then xα ≡ 1 (mod nn′). Therefore, |
|
If p is a prime and Proof Integers between 0 and pe – 1, which are relatively prime to pe are precisely those that are not multiples of p. |
|
Let
Proof Immediate from Lemmas 2.2 and 2.3. |
By Proposition 2.18, the linear congruence ax ≡ 1 (mod n) is solvable for x if and only if gcd(a, n) = 1. In such a case, the solution is unique modulo n. Now, let us concentrate on the solutions of the general linear congruence:
ax ≡ b (mod n).
Theorem 2.17 characterizes the solutions of this congruence.
|
Let d := gcd(a, n). Then the congruence ax ≡ b (mod n) is solvable for x if and only if d|b. A solution of the congruence, if existent, is unique modulo n/d. Proof [if] By Proposition 2.17, (a/d)x ≡ b/d (mod n/d). Since gcd(a/d, n/d) = 1, the congruence (a/d)x′ ≡ 1 (mod n/d) is solvable for x′. Then a solution for x is x ≡ (b/d)x′ (mod n/d). [only if] There exists an integer k such that ax + kn = b. This shows that d|b. To prove the uniqueness let x and x′ be two integers satisfying the given congruence. But then a(x – x′) ≡ 0 (mod n), that is, (a/d)(x – x′) ≡ 0 (mod n/d), that is, x – x′ ≡ 0 (mod n/d), since gcd(a/d, n/d) = 1. |
The last theorem implies that if d|b, then the congruence ax ≡ b (mod n) has d solutions modulo n. These solutions are given by ξ + r(n/d), r = 0, . . . , d – 1, where ξ is the solution modulo n/d of the congruence (a/d)ξ ≡ b/d (mod n/d).
In this section, we consider quadratic congruences, that is, congruences of the form ax2+bx+c ≡ 0 (mod n). We start with the simple case
. We assume further that p is odd, so that 2 has a multiplicative inverse mod p. Since we are considering quadratic equations, we are interested only in those integers a for which gcd(a, p) = 1. In that case, a also has a multiplicative inverse mod p and the above congruence can be written as y2 ≡ α (mod p), where y ≡ x + b(2a)–1 (mod p) and α ≡ b2(4a2)–1 – c(a–1) (mod p). This motivates us to provide Definition 2.29.
|
Let p be an odd prime and a an integer with gcd(a, p) = 1. We say that a is a quadratic residue modulo p, if the congruence x2 ≡ a (mod p) has a solution (for x). Otherwise we say that a is a quadratic non-residue modulo p. |
If a is a quadratic residue modulo an odd prime p, then the equation x2 ≡ a (mod p) has exactly two solutions. If ξ is one solution, the other solution is p – ξ. It is, therefore, evident that there are exactly (p – 1)/2 quadratic residues and exactly (p – 1)/2 quadratic non-residues modulo p. For example, the quadratic residues modulo p = 11 are 1 = 12 = 102, 3 = 52 = 62, 4 = 22 = 92, 5 = 42 = 72 and 9 = 32 = 82. The quadratic non-residues modulo 11 are, therefore, 2, 6, 7, 8 and 10. We treat 0 neither as a quadratic residue nor as a quadratic non-residue.
|
Let p be an odd prime and a an integer with gcd(a, p) = 1. The Legendre symbol
|
|
Let p be an odd prime and a and b integers coprime to p.
Proof If a is a quadratic residue modulo p, then a ≡ b2 (mod p) for some integer b (coprime to p) and by Fermat’s little theorem we have a(p–1)/2 ≡ bp–1 ≡ 1 (mod p). Conversely, the polynomial Xp–1 – 1 = (X(p–1)/2 – 1)(X(p–1)/2 + 1) has p – 1 (distinct) roots mod p (again by Fermat’s little theorem). We have seen that no quadratic residues are roots of X(p–1)/2 + 1. Since |
Euler’s criterion gives us a nice way to check if a given integer is a quadratic residue modulo an odd prime. While this is much faster than the brute-force strategy of enumerating all the quadratic residues, it is still not the best solution, because it involves a modular exponentiation. We can, however, employ a gcd-like procedure for a faster computation. The development of this method demands further results which are otherwise interesting in themselves as well. The first important result is known as the law of quadratic reciprocity (Theorem 2.18 below). Gauss was the first to prove it and he deemed the result so important that he gave eight proofs for it. At present about two hundred published proofs of this law exist in the literature. We go in the classical way, that is, the Gaussian way, because the proof, though somewhat long, is elementary.
|
Let p be an odd prime and a an integer with gcd(a, p) = 1. Let us denote t := (p – 1)/2. For an integer i, let ri be the unique integer with ri ≡ ia (mod p) and –t ≤ ri ≤ t. Let n be the number of i, 1 ≤ i ≤ t, for which ri is negative. Then Proof It is easy to check that ri ≢ ±rj (mod p) for all i ≠ j with 1 ≤ i, j ≤ t. Thus |ri|, i = 1, . . . , t, are precisely (a permuted version of) the integers 1, . . . , t. Thus |
|
Let |
|
With the notations of Lemma 2.4 we have Proof Since If a is odd, p + a is even. Also 4 is a quadratic residue modulo p. So |
|
Let p and q be distinct odd primes. Then Proof By Corollary 2.7, |
To demonstrate how we can use the results deduced so far, let us compute
. Since 360 = 23 · 32 · 5, we have

Thus 360 is a quadratic residue modulo 997. The apparent attractiveness of this method is beset by the fact that it demands the factorization of several integers and as such does not lead to a practical algorithm. We indeed need further machinery in order to have an efficient algorithm. First, we define a generalization of the Legendre symbol.
|
Let a, b be integers with b > 0 and odd. We define the Jacobi symbol
where, in the last case, p1, . . . , pt are all the prime factors of b (not necessarily all distinct). |
Note that if
, then a is not a quadratic residue mod b. However, the converse is not always true, that is,
does not necessarily imply that a is a quadratic residue modulo b (Example: a = 2 and b = 9). Of course, if b is an odd prime and if gcd(a, b) = 1, the Legendre and Jacobi symbols
correspond to the same value and meaning.
The Jacobi symbol enjoys many properties similar to the Legendre symbol.
|
For integers a, a′ and positive odd integers b, b′, we have:
Proof Immediate from the definition and Proposition 2.21. |
Proof
|
Now, we can calculate
without factoring as follows.

So far, we have studied some elementary properties of integers. Number theory is, however, one of the oldest and widest branches of mathematics. Various complex-analytic and algebraic tools have been employed to derive more complicated properties of integers. In Section 2.13, we give a short introductory exposition to algebraic number theory. Here, we mention a collection of useful results from analytic number theory. The proofs of these analytic results would lead us too far away and hence are omitted here. Inquisitive (and/or cynical) readers may consult textbooks on analytic number theory for the details missing here.
The famous prime number theorem gives an asymptotic estimate of the density of primes smaller than or equal to a positive real number. Gauss conjectured this result in 1791. Many mathematicians tried to prove it during the 19th century and came up with partial results. Riemann made reasonable progress towards proving the theorem, but could not furnish a complete proof before he died in 1866. It is interesting to mention here that a good portion of the theory of analytic functions (also called holomorphic functions) in complex analysis was developed during these attempts to prove the prime number theorem. The first complete proof of the theorem (based mostly on the ideas of Riemann and Chebyshev) was given independently by the French mathematician Hadamard and by the Belgian mathematician de la Vallée Poussin in 1896. Their proof is regarded as one of the major achievements of modern mathematics. People started believing that any proof of the prime number theorem has to be analytic. Erdös and Selberg destroyed this belief by independently providing the first elementary proof of the theorem in 1949. Here (and elsewhere in mathematics), the adjective elementary refers to something which does not depend on results from analysis or algebra. Caution: Elementary is not synonymous with easy !
|
Let π(x) denote the number of primes less than or equal to a real number x > 0. As x → ∞ we have π(x) → x/ln x (that is, the ratio π(x)/(x/ln x) → 1). In particular, for |
Though the prime number theorem provides an asymptotic estimate (that is, one for x → ∞), for finite values of x (for example, for the values of x in the cryptographic range) it does give good approximations for π(x). Table 2.1 lists π(x) against the rounded values of x/ ln x for x equal to small powers of 10.
| x | π(x) | x/ ln x | x/(ln x – 1) | Li(x) |
|---|---|---|---|---|
| 103 | 168 | 145 | 169 | 178 |
| 104 | 1229 | 1086 | 1218 | 1246 |
| 105 | 9592 | 8686 | 9512 | 9630 |
| 106 | 78,498 | 72,382 | 78,030 | 78,628 |
| 107 | 664,579 | 620,421 | 661,458 | 664,918 |
| 108 | 5,761,455 | 5,428,681 | 5,740,304 | 5,762,209 |
Given the prime number theorem it follows that π(x) approaches x/(ln x – ξ) for any real ξ. It turns out that ξ = 1 is the best choice. Gauss’ Li function is also an asymptotic estimate for π(x), where for real x > 0 one defines:

Gauss conjectured that Li(x) asymptotically equals π(x). The prime number theorem is, in fact, equivalent to this conjecture. Furthermore, de la Vallée Poussin proved that Li(x) is a better approximation to π(x) than x/(ln x – ξ) for any real ξ. Table 2.1 also lists x/(ln x – 1) and Li(x) against the actual values of π(x).
The asymptotic formula
does not rule out the possibility that the error π(x)–(x/ ln x) tends to zero as x → ∞. It has been shown by Dusart [83] that (x/ ln x) – 0.992(x/ ln2 x) ≤ π(x) ≤ (x/ ln x) + 1.2762(x/ ln2 x) for all x > 598.
Integers having only small prime divisors play an interesting role in cryptography and in number theory in general.
|
Let |
The following theorem gives an asymptotic estimate for ψ(x, y).
|
Let x, ψ(x, y) → u–u+o(u) = e–[(1+o(1))u ln u]. |
In Theorem 2.21, the notation g(u) = o(f(u)) implies that the ratio g(u)/f(u) tends to 0 as u approaches ∞. See Definition 3.1 for more details. An interesting special case of the formula for ψ(x, y) will be used quite often in this book and is given as Corollary 4.1 in Chapter 4.
Like the prime number theorem, Theorem 2.21 gives only asymptotic estimates, but is indeed a good approximation for finite values of x, y and u (that is, for the values of practical interest). The most important implication of this theorem is that the density of y-smooth integers in the set {1, . . . , x} is a very sensitive function of u = ln x/ ln y and decreases very rapidly as x increases. For example, if y = 15,485,863, the millionth prime, then a random integer ≤ 2250 is y-smooth with probability approximately 2.12 × 10–11, whereas a random integer ≤ 2500 is y-smooth with probability approximately 2.23 × 10–28. (These figures are computed neglecting the o(u) term in the expression of ψ(x, y).) In other words, smaller integers have higher probability of being smooth (that is, y-smooth for a given y).
The Riemann hypothesis (RH) is one of the deepest unsolved problems in mathematics. An extended version of this hypothesis has important bearings on the solvability of certain computational problems in polynomial time.
|
The Euler zeta function ζ(s) is defined for a complex variable s with Re s ≥ 1 as
The reader may already be familiar with the results: ζ(1) = ∞, ζ(2) = π2/6 and ζ(4) = π4/90. Riemann (analytically) extended the Euler Zeta function for all complex values of s (except at s = 1, where the function has a simple pole). This extended function, called the Riemann zeta function, is known to have zeros at s = –2, –4, –6, . . . . These are called the trivial zeros of ζ(s). It can be proved that all non-trivial zeros of ζ(s) must lie in the so-called critical strip : 0 ≤ Re s ≤ 1, and are symmetric about the critical line: Re s = 1/2. |
In 1900, Hilbert asserted that proving or disproving the RH is one of the most important problems confronting 20th century mathematicians. The problem continues to remain so even to the 21st century mathematicians.
In 1901, von Koch proved that the RH is equivalent to the formula:
|
π(x) = Li(x) + O(x1/2 ln x) |
Here the order notation f(x) = O(g(x)) means that |f(x)/g(x)| is less than a constant for all sufficiently large x (See Definition 3.1).
Hadamard and de la Vallée Poussin proved that

for some positive constant α. While this estimate was sufficient to prove the prime number theorem, the tighter bound of Conjecture 2.2 continues to remain unproved.
|
Let a, |
Dirichlet’s theorem is a powerful generalization of Theorem 2.12 (which corresponds to a = b = 1). One can accordingly generalize the notation π(x) as follows:
|
Let a, |
The prime number theorem gives the estimate:

where φ is Euler’s totient function. The RH now generalizes to:
|
For a,
|
Some authors use the expression Generalized Riemann hypothesis (GRH) in place of ERH. Taking b = 1 demonstrates that the ERH implies the RH. The ERH also implies the following:
|
The smallest positive quadratic non-residue modulo a prime p is < 2 ln2 p. |
| 2.35 |
|
| 2.36 | Let and S a subset of {1, 2, ..., 2n} of cardinality n + 1. Show that: [H]
|
| 2.37 | Show that for any , n > 1, the rational number is not an integer. [H]
|
| 2.38 |
|
| 2.39 | Let n ≥ 2 be a natural number. A complete residue system modulo n is a set of n integers a1, . . . , an such that ai ≢ aj (mod n) for i ≠ j. Similarly, a reduced residue system modulo n is a set of φ(n) integers b1, . . . , bφ(n) such that gcd(bi, n) = 1 for all i = 1, . . . , φ(n) and bi ≢ bj (mod n) for i ≠ j. Show that:
|
| 2.40 | Prove that the decimal expansion of any rational number a/b is recurring, that is, (eventually) periodic. (A terminating expansion may be viewed as one with recurring 0.) [H] |
| 2.41 | Let p be an odd prime. Show that the congruence x2 ≡ –1 (mod p) is solvable if and only if p ≡ 1 (mod 4). [H] |
| 2.42 | Let .
|
| 2.43 | For , show that .
|
| 2.44 | Let n > 2 and gcd(a, n) = 1. Let h be the multiplicative order of a modulo n (that is, in the group ). Show that:
|
| 2.45 | Device a criterion for the solvability of ax2 + bx + c ≡ 0 (mod p), where p is an odd prime and gcd(a, p) = 1. [H] |
| 2.46 | Let p be a prime and . An integer a with gcd(a, p) = 1 is called an r-th power residue modulo p, if the congruence xr ≡ a (mod p) has a solution. Show that a is an r-th power residue modulo p if and only if a(p–1)/ gcd(r, p–1) ≡ 1 (mod p). This is a generalization of Euler’s criterion for quadratic residues.
|
| 2.47 | Let G be a finite cyclic group of cardinality n. Show that and that there are exactly φ(n) generators (that is, primitive elements) of G.
|
| 2.48 | Let m, with m|n. Show that the canonical (surjective) ring homomorphism induces a surjective group homomorphism of the respective groups of units. (Note that every ring homomorphism induces a group homomorphism , where A* and B* are the groups of units of A and B respectively. Even when is surjective, need not be surjective, in general. As an example consider the canonical surjection for a prime p > 3.)
|
| 2.49 | In this exercise, we investigate which of the groups is cyclic for a prime p and .
|
| 2.50 | Show that the multiplicative group , n ≥ 2, is cyclic if and only if n = 2, 4, pe, 2pe, where p is an odd prime and . [H]
|
Unless otherwise stated, in this section we denote by K an arbitrary field and by K[X] the ring of polynomials in one indeterminate X and with coefficients from K. Since K[X] is a PID, it enjoys many properties similar to those of
. To start with, we take a look at these properties. Then we introduce the concept of algebraic elements and discuss how irreducible polynomials can be used to construct (algebraic) extensions of fields. When no confusions are likely, we denote a polynomial
by f only.
Since K[X] is a PID and hence a UFD, every polynomial in K[X] can be written essentially uniquely as a product of prime polynomials. Conventionally prime polynomials are more commonly referred to as irreducible polynomials. Similar to the case of
the ring K[X] contains an infinite number of irreducible elements, for if K is infinite, then
is an infinite set of irreducible polynomials of K[X], and if K is finite, then as we will see later, there is an irreducible polynomial of degree d in K[X] for every
.
It is important to note here that the concept of irreducibility of a polynomial is very much dependent on the field K. If K ⊆ L is a field extension, then a polynomial in K[X] is naturally an element of L[X] also. A polynomial which is irreducible over K need not continue to remain so over L. For example, the polynomial x2 – 2 is irreducible over
, but reducible over
, since
,
being a real number but not a rational number. As a second example, the polynomial x2 + 1 is irreducible over both
and
but not over
. In fact, we will show shortly that an irreducible polynomial in K[X] of degree > 1 becomes reducible over a suitable extension of K.
For polynomials f(X),
with g(X) ≠ 0, there exist unique polynomials q(X) and r(X) in K[X] such that f(X) = q(X)g(X) + r(X) with r(X) = 0 or deg r(X) < deg g(X). The polynomials q(X) and r(X) are respectively called the quotient and remainder of polynomial division of f(X) by g(X) and can be obtained by the so-called long division procedure. We use the notations: q(X) = f(X) quot g(X) and r(X) = f(X) rem g(X).
Whenever we talk about the gcd of two non-zero polynomials, we usually refer to the monic gcd, that is, a polynomial with leading coefficient 1. This makes the gcd of two polynomials unique. We have gcd(f(X), g(X)) = gcd(g(X), r(X)), where r(X) = f(X) rem g(X). This gives rise to an algorithm (similar to the Euclidean gcd algorithm for integers) for computing the gcd of two polynomials. Bézout relations also hold for polynomials. More specifically:
|
Let f(X),
Proof Similar to the proof of Proposition 2.16. |
The concept of congruence can be extended to polynomials, namely, if
, then two polynomials g(X),
are said to be congruent modulo f(X), denoted g(X) ≡ h(X) (mod f(X)), if f(X)|(g(X) – h(X)), that is, if there exists
with g(X) – h(X) = u(X)f(X), or equivalently, if g(X) rem f(X) = h(X) rem f(X).
The principal ideals 〈f(X)〉 of K[X] play an important role (as do the ideals 〈n〉 of
). Let us investigate the structure of the quotient ring R := K[X]/〈f(X)〉 for a non-constant polynomial
. If r(X) denotes the remainder of division of
by f(X), then it is clear that the residue classes of g(X) and r(X) are the same in R. On the other hand, two polynomials g(X),
with deg g(X) < deg f(X) and deg h(X) < deg f(X) represent the same residue class in R if and only if g(X) = h(X). Thus elements of R are uniquely representable as polynomials of degrees < deg f(X). In other words, we may represent the ring R as the set
together with addition and multiplication modulo the polynomial f(X). The ring R contains all the constant polynomials
, that is, the field K is canonically embedded in R. In general, R is not a field. The next theorem gives the criterion for R to be a field.
|
For a non-constant polynomial Proof If f(X) is reducible over K, then we can write f(X) = g(X)h(X) for some polynomials g(X), Conversely, if f(X) is irreducible over K and if g(X) is a non-zero polynomial of degree < deg f(X), then gcd(f(X), g(X)) = 1, so that by Proposition 2.23 there exist polynomials u(X), |
Let L := K[X]/〈f(X)〉 with f(X) irreducible over K. Then K ⊆ L is a field extension. If deg f(X) = 1, then L is isomorphic to K. If deg f(X) ≥ 2, then L is a proper extension of K. This gives us a useful and important way of representing the extension field L, given a representation for K. (For example, see Section 2.9.)
The study of the roots of a polynomial is the central objective in algebra. We now derive some elementary properties of roots of polynomials.
|
Let |
|
Let Proof Polynomial division of f(X) by X – a gives f(X) = (X – a)q(X) + r(X) with deg r(X) < deg(X – a) = 1. Thus r(X) is a constant polynomial. Let us denote r(X) by |
|
A non-zero polynomial Proof We proceed by induction on d. The result clearly holds for d = 0. So assume that d ≥ 1 and that the result holds for all polynomials of degree d – 1. If f has no roots in K, we are done. So assume that f has a root, say, |
In the last proof, the only result we have used to exploit the fact that K is a field is that K contains no non-zero zero divisors. This is, however, true for every integral domain. Thus Proposition 2.25 continues to hold if K is any integral domain (not necessarily a field). However, if K is not an integral domain, the proposition is not necessarily true. For example, if ab = 0 with a,
, a ≠ b, then the polynomial X2 + (b – a)X has at least three roots: 0, a and a – b.
For a field extension K ⊆ L and for a polynomial
, we may think of the roots of f in L, since
too. Clearly, all the roots of f in K are also roots of f in L. However, the converse is not true in general. For example, the only roots of X4 – 1 in
are ±1, whereas the roots of the same polynomial in
are ±1, ±i. Indeed we have the following important result.
|
For any non-constant polynomial Proof If f has a root in K, taking K′ = K proves the proposition. So we assume that f has no root in K (which implies that deg f ≥ 2). In principle, we do not require f to be irreducible. But if we consider a non-constant factor g of f, irreducible over K, we see that the roots of g in any extension L of K are roots of f in L too. Thus we may replace f by g and assume, without loss of generality, that f is irreducible. We construct the field extension K′ := K[X]/〈f〉 of K and denote the equivalence class of X in K′ by α. (One also writes x, X or [X] to denote this equivalence class.) It is clear that |
We say that the field K′ in the proof of the last proposition is obtained by adjoining the root α of f and denote this as K′ = K(α). We can write f(X) = (X – α)f1(X), where
and deg f1 = (deg f) – 1. Now there is a field extension K″ of K′, where f1 has a root. Proceeding in this way we prove the following result.
|
A non-constant polynomial f in K[X] with deg f = d has d roots (not necessarily all distinct) in some field extension L of K. |
If a polynomial
of degree d ≥ 1 has all its roots α1, . . . , αd in L, then f(X) = a(X – α1) · · · (X – αd) for some
(actually
). In this case, we say that f splits (completely or into linear factors) over L.
|
Let
|
Every non-constant polynomial
has a splitting field L over K. Quite importantly, this field L is unique in some sense. This allows us to call the splitting field of f instead of a splitting field of f. We discuss these topics further in Section 2.8.
|
Let f be a non-constant polynomial in K[X] and let α be a root of f (in some extension of K). The largest natural number n for which (X –α)n|f(X) is called the multiplicity of the root α (in f). If n = 1 (resp. n > 1), then α is called a simple (resp. multiple) root of f. If all the roots of f are simple, then we call f a square-free polynomial. It is easy to see that f is square-free, only if f is not divisible by the square of a non-constant polynomial in K[X]. The reverse implication also holds, if char K = 0 or if K is a finite field (or, more generally, if K is a perfect field—see Exercise 2.76). |
The notion of multiplicity can be extended to a non-root β of f by setting the multiplicity of β to zero.
Here we assume, unless otherwise stated, that K ⊆ L is a field extension.
|
An element |
|
|
Let |
|
Let Proof Let f = f1f2 for some non-constant polynomials f1, Using polynomial division one can write h(X) = q(X)f(X) + r(X) for some polynomials q, Finally, if f and g are two minimal polynomials of α over K, then f|g and g|f and it follows that g(X) = cf(X) for some unit c of K[X]. But the only units of K[X] are the non-zero elements of K. |
By Proposition 2.28, a monic minimal polynomial f of α over K is uniquely determined by α and K. It is, therefore, customary to define the minimal polynomial of α over K to be this (unique) monic polynomial. Unless otherwise stated, we will stick to this revised definition and write f(X) = minpolyα, K(X).
|
For a field K, the following conditions are equivalent.
Proof [(a)⇒(b)] Consider a non-constant irreducible polynomial [(b)⇒(c)] Let [(c)⇒(d)] Obvious. [(d)⇒(a)] Let |
|
A field K satisfying the equivalent conditions of Proposition 2.29 is called an algebraically closed field. For an arbitrary field K, a minimal algebraically closed field |
We will see in Section 2.8 that an algebraic closure of every field exists and is unique in some sense. The algebraic closure of an algebraically closed field K is K itself. We end this section with the following well-known theorem. We will not prove the theorem in this book, because every known proof of it uses some kind of complex analysis which this book does not deal with.
|
The field
|
| 2.51 | Let R be a ring and f, . Show that:
|
| 2.52 | Let f, , where R is an integral domain. Show that if f(ai) = g(ai) for i = 1, . . . , n, where n > max(deg f, deg g) and where a1, . . . , an are distinct elements of R, then f = g. In particular, if f(a) = g(a) for an infinite number of , then f = g.
|
| 2.53 | Lagrange’s interpolation formula Let K be a field and let a0, . . . , an be distinct elements of K. Show that for |
| 2.54 | Polynomials over a UFD Let R be a UFD. For a non-zero polynomial |
| 2.55 | Let R be a UFD. Show that a non-constant polynomial is irreducible over R if and only if f is irreducible over Q(R), where Q(R) denotes the quotient field of R (see Exercise 2.34).
|
| 2.56 |
|
| 2.57 | Let K ⊆ L be a field extension and f1, . . . , fn non-constant polynomials in K[X]. Show that each fi, i = 1, . . . , n, splits over L if and only if the product f1 · · · fn splits over L. |
| 2.58 | Show that the irreducible polynomials in have degrees ≤ 2. [H]
|
| 2.59 | Show that a finite field (that is, a field with finite cardinality) is not algebraically closed. In particular, the algebraic closure of a finite field is infinite. |
| 2.60 | A complex number z is called an algebraic number, if z is algebraic over . An algebraic number z is called an algebraic integer, if z is a root of a monic polynomial in . Show that:
|
| 2.61 | Let K be a field and . The formal derivative f′ of f is defined to be the polynomial . Show that:
|
| 2.62 | Let be a non-constant polynomial of degree d and let α1, . . . , αd be the roots of f (in some extension field of K). The quantity is called the discriminant of f. Prove the following assertions:
|
Vector spaces and linear transformations between them are the central objects of study in linear algebra. In this section, we investigate the basic properties of vector spaces. We also generalize the concept of vector spaces to get another useful class of objects called modules. A module which also carries a (compatible) ring structure is referred to as an algebra. Study of algebras over fields (or more generally over rings) is of importance in commutative algebra, algebraic geometry and algebraic number theory.
Unless otherwise specified, K denotes a field in this section.
|
A vector space V over a field K (or a K-vector space, in short) is an (additively written) Abelian group V together with a multiplication map · : K × V → V called the scalar multiplication map, such that the following properties are satisfied by every a,
where ab denotes the product of a and b in the field K. When no confusions are likely, we omit the scalar multiplication sign · and write a · x simply as ax. |
|
|
Let V be a K-vector space. For every
Proof Easy verification. |
|
Let V be a vector space over K and S a subset of V. We say that S is a generating set or a set of generators of V (over K), or that S generates V (over K), if every element |
|
|
A subset S of a K-vector space V is called linearly independent (over K), if whenever a1x1 + · · · + anxn = 0 for some |
If
, then S is linearly dependent, since a · 0 = 0 for any
. One can easily check that all the generating sets of Example 2.13 are linearly independent too. This is, however, not a mere coincidence, as the following result demonstrates.
|
A subset S of a K-vector space V is a minimal generating set for V if and only if S is a maximal linearly independent set of V. Proof [if] Given a maximal linearly independent subset S of V, we first show that S is a generating set for V. Take any non-zero [only if] Given a minimal generating set S of V, we first show that S is linearly independent. Assume not, that is, a1x1 + · · · + anxn = 0 for some |
|
Let V be a K-vector space. A minimal generating set S of V is called a basis of V over K (or a K-basis of V). By Theorem 2.25, S is a basis of V if and only if S is a maximal linearly independent subset of V. Equivalently, S is a basis of V if and only if S is a generating set of V and is linearly independent. |
Any element of a vector space can be written uniquely as a finite linear combination of elements of a basis, since two different ways of writing the same element contradict the linear independence of the basis elements.
A K-vector space V may have many K-bases. For example, the elements 1, aX + b, (aX + b)2, · · · form a K-basis of K[X] for any a,
, a ≠ 0. However, what is unique in any basis of a given K-vector space V is the cardinality[8] of the basis, as shown in Theorem 2.26.
[8] Two sets (finite or not) S1 and S2 are said to be of the same cardinality, if there exists a bijective map S1 → S2.
For the sake of simplicity, we sometimes assume that V is a finitely generated K-vector space. This assumption simplifies certain proofs greatly. But it is important to highlight here that, unless otherwise stated, all the results continue to remain valid without the assumption. For example, it is a fact that every vector space has a basis. For finitely generated vector spaces, this is a trivial statement to prove, whereas without our assumption we need to use arguments that are not so simple. (A possible proof follows from Exercise 2.63 with U = {0}.)
|
Let V be a K-vector space. Then any K-basis of V has the same cardinality. Proof We assume that V is finitely generated. Let S = {x1, . . . , xn} be a minimal finite generating set, that is, a basis, of V. Let T be another basis of V. Assume that m := #T > n. (We might even have m = ∞.) We can choose distinct elements |
Theorem 2.26 holds even when V is not finitely generated. We omit the proof for this case here.
|
Let V be a K-vector space. The cardinality of any K-basis of V is called the dimension of V over K and is denoted by dimK V (or by dim V, if K is understood from the context). We call V finite-dimensional (resp. infinite-dimensional), if dimK V is finite (resp. infinite). |
For example, dimK Kn = n,
, and dimK K[X] = ∞.
|
Let V be a K-vector space. A subgroup U of V, which is closed under the scalar multiplication of V, is again a K-vector space and is called a (vector) subspace of V. In this case, we have dimK U ≤ dimK V (Exercise 2.63). |
|
Let V be a vector space over K.
|
|
Let V and W be K-vector spaces. A map f : V → W is called a homomorphism (of vector spaces) or a linear transformation or a linear map over K, if f(ax + by) = af(x) + bf(y) for all a,
|
|
Let V and W be K-vector spaces. Then V and W are isomorphic if and only if dimK V = dimK W. Proof If dimK V = dimK W and S and T are bases of V and W respectively, then there exists a bijection f : S → T. One can extend f to a linear map |
|
A K-vector space V with n := dimK V < ∞ is isomorphic to Kn. |
Let V be a K-vector space and U a subspace. As in Section 2.3 we construct the quotient group V/U. This group can be given a K-vector space structure under the scalar multiplication map a(x + U) := ax + U,
,
. If T ⊆ V is such that the residue classes of the elements of T form a K-basis of V/U and if S is a K-basis of U, then it is easy to see that S ∪ T is a K-basis of V. In particular,
Equation 2.2

For
, the set
is called the kernel Ker f of f, and the set
is called the image Im f of f. We have the isomorphism theorem for vector spaces:
|
Ker f is a subspace of V, Im f is a subspace of W, and V/Ker f ≅ Im f. Proof Similar to Theorem 2.3 and Theorem 2.9. |
|
For |
|
Rank f + Null f = dimK V for any |
If we remove the restriction that K is a field and assume that K is any ring, then a vector space over K is called a K-module. More specifically, we have:
|
Let R be a ring. A module over R (or an R-module) is an (additively written) Abelian group M together with a multiplication map · : R × M → M called the scalar multiplication map, such that for every a, |
|
Modules are a powerful generalization of vector spaces. Any result we prove for modules is equally valid for vector spaces, ideals and Abelian groups. On the other hand, since we do not demand that the ring R be necessarily a field, certain results for vector spaces are not applicable for all modules.
It is easy to see that Corollary 2.8 continues to hold for modules. An R-submodule of an R-module M is a subgroup of M, that is closed under the scalar multiplication of M. For a subset S ⊆ M, the set of all finite linear combinations of the form a1x1 + · · · + anxn,
,
,
, is an R-submodule N of M, denoted by RS or
. We say that N is generated by S (or by the elements of S). If S is finite, then N is said to be finitely generated. A (sub)module generated by a single element is called cyclic. It is important to note that unlike vector spaces the cardinality of a minimal generating set of a module is not necessarily unique. (See Exercise 2.68 for an example.) It is also true that given a minimal generating set S of M, there may be more than one ways of writing an element of M as finite linear combinations of elements of S. For example, if
and S = {2, 3}, then 1 = (–1)·2+1·3 = 2·2+(–1)·3. The nice theory of dimensions developed in connection with vector spaces does not apply to modules.
For an R-submodule N of M, the Abelian group M/N is given an R-module structure by the scalar multiplication map a(x + N) := ax + N. This module is called the quotient module of M by N.
For R-modules M and N, an R-linear map or an R-module homomorphism (from M to N) is defined as a map f : M → N with f(ax+by) = af(x)+bf (y) for all a,
and x,
(or equivalently with f(x + y) = f(x) + f(y) and f(ax) = af(x) for all
and x,
). An isomorphism, an endomorphism and an automorphism are defined in analogous ways as in case of vector spaces. The set of all (R-module) homomorphisms M → N is denoted by HomR(M, N) and the set of all (R-module) endomorphisms of M is denoted by EndR M. These sets are again R-modules under the definitions: (f + g)(x) := f(x) + g(x) and (af)(x) := af(x) for all
and
(and f, g in HomR(M, N) or EndR M).
The kernel and image of an R-linear map f : M → N are defined as the sets Ker
and Im
. With these notations we have the isomorphism theorem for modules:
|
Ker f and Im f are submodules of M and N respectively and M / Ker f ≅ Im f. |
For an R-module M and an ideal
of R, the set
consisting of all finite linear combinations
with
,
and
is a submodule of M. On the other hand, for a submodule N of M the set
is an ideal of R. In particular, the ideal (M : 0) is called the annihilator of M and is denoted as AnnR M (or as Ann M). For any ideal
, one can view M as an
under the map
. One can easily check that this map is well-defined, that is, the product
is independent of the choice of the representative a of the equivalence class
.
|
A free module M over a ring R is defined to be a direct sum |
Any vector space is a free module (Theorem 2.27 and Corollary 2.9). The Abelian groups
,
, are not free.
|
M is a finitely generated R-module if and only if M is a quotient of a free module Rn for some Proof [if] The free module Rn has a canonical generating set ei, ei = (0, . . . , 0, 1, 0, . . . , 0) (1 in the i-th position). If M = Rn/N, then the equivalence classes ei + N, i = 1, ..., n, constitute a finite set of generators of M. [only if] If x1, ..., xn generate M, then the R-linear map f : Rn → M defined by (a1, ..., an) ↦ a1x1 + · · · + anxn is surjective. Hence by the isomorphism theorem M ≅ Rn / Ker f. |
Let
be a homomorphism of rings. The ring A can be given an R-module structure with the multiplication map
for
and
. This R-module structure of A is compatible with the ring structure of A in the sense that for every a,
and x,
one has (ax)(by) = (ab)(xy).
Conversely, if a ring A has an R-module structure with (ax)(by) = (ab)(xy) for every a,
and x,
, then there is a unique ring homomorphism
taking a ↦ a · 1 (where 1 denotes the identity of A). This motivates us to define the following.
|
Let R be a ring. An algebra over R or an R-algebra is a ring A together with a ring homomorphism |
|
Let R be a ring.
|
An R-algebra A is an R-module with the added property that multiplication of elements of A is now legal. Exploiting this new feature leads to the following concept of algebra generators.
|
Let A be an R-algebra with the structure homomorphism |
|
One may proceed to define kernels and images of R-algebra homomorphisms and frame and prove the isomorphism theorem for R-algebras. We leave the details to the reader. We only note that algebra homomorphisms are essentially ring homomorphisms with the added condition of commutativity with the structure homomorphisms.
|
A ring A is a finitely generated R-algebra if and only if A is a quotient of a polynomial algebra (over R). Proof [if] Immediate from Example 2.17. [only if] Let A := R[x1, . . . , xn]. The map η : R[X1, . . . , Xn] → A that takes f(X1, . . . , Xn) ↦ f(x1, . . . , xn) is a surjective R-algebra homomorphism. By the isomorphism theorem, one has the isomorphism A ≅ R[X1, . . . , Xn]/Ker η of R-algebras. |
This theorem suggests that for the study of finitely generated algebras it suffices to investigate only the polynomial algebras and their quotients.
| 2.63 | Let V be a K-vector space, U a subspace of V, and T an arbitrary K-basis of U. Show that there is a K-basis of V, that contains T. [H] |
| 2.64 |
|
| 2.65 | Let V and W be K-vector spaces and f : V → W a K-linear map. Show that f is uniquely determined by the images f(x), , where S is a basis of V.
|
| 2.66 | Let V and W be K-vector spaces. Check that HomK(V, W) is a vector space over K. Show that dimK(HomK(V, W)) = (dimK V)(dimK W). In particular, if W = K, then HomK(V, K) is isomorphic to V. The space HomK(V, K) is called the dual space of V. |
| 2.67 | Let V and W be m- and n-dimensional K-vector spaces, S = {x1, . . . , xm} a K-basis of V, T = {y1, . . . , yn} a K-basis of W, and f : V → W a K-linear map. For each i = 1, . . . , m, write f(xi) = ai1y1 + · · · + ainyn, . The m × n matrix is called the transformation matrix of f (with respect to the bases S and T). We have:
Let V1, V2, V3 be K-vector spaces, f, f1,
(Remark: This exercise explains that the linear transformations of finite-dimensional vector spaces can be explained in terms of matrices.) |
| 2.68 | Show that for every there are integers a1, . . . , an that constitute a minimal set of generators for the unit ideal in . [H]
|
| 2.69 | Let M be an R-module. A subset S of M is called a basis of M, if S generates M and is linearly independent over R in the sense that , , , , implies a1 = · · · = an = 0. Show that M has a basis if and only if M is a free R-module.
|
| 2.70 | We define the rank of a finitely generated R-module M as
RankR M := min{#S | M is generated by S}. If N is a submodule of M, show that RankR M ≤ RankR N + RankR(M/N). Give an example where the strict inequality holds. |
| 2.71 | Let M be an R-module. An element is called a torsion element of M, if Ann Rx ≠ 0, that is, if there is with ax = 0. The set of all torsion elements of M is denoted by Tors M. M is called torsion-free if Tors M = {0}, and a torsion module if Tors M = M.
|
| 2.72 | Show that:
This shows that the converse of Exercise 2.71(c) is not true in general. |
In this section, we study some important properties of field extensions. We also give an introduction to Galois theory. Unless otherwise stated, the letters F, K and L stand for fields in this section.
We have seen that if F ⊆ K is a field extension, then K is a vector space over F. This observation leads to the following very useful definitions.
|
For a field extension F ⊆ K, the cardinality of any F-basis of K is called the degree of the extension F ⊆ K and is denoted by [K : F]. If [K : F] is finite, K is called a finite extension of F. Otherwise, K is called an infinite extension of F. |
|
Let F ⊆ K ⊆ L be a tower of field extensions. Then [L : F] = [L : K] [K : F]. In particular, the extension F ⊆ L is finite if and only if the extensions F ⊆ K and K ⊆ L are finite. In that case, [L : K] | [L : F] and [K : F] | [L : F]. Proof One can easily check that if S is an F-basis of K and S′ a K-basis of L, then the set |
Recall the definitions of the rings F[X] of polynomials and F(X) of rational functions in one indeterminate X. These notations are now generalized. For a field extension F ⊆ K and for
, we define:

and
Equation 2.3

It is easy to see that F[a] is the smallest (with respect to inclusion) of the integral domains that contain F and a. Similarly F(a) is the smallest of the fields that contain F and a. We also have F[a] ⊆ F(a). Now we state the following important characterization of algebraic elements.
|
For a field extension F ⊆ K and an element
Proof [(a)⇒(b)] Let [(b)⇒(c)] Let d := [F(a) : F]. Since the elements 1, a, a2, . . . , ad are linearly dependent over F, there exists [(c)⇒(a)] Clearly, the element 0 is algebraic over F. So assume a ≠ 0. Since |
|
For a field extension F ⊆ K, the set of elements in K that are algebraic over F is a field. Proof It is sufficient to show that if a, |
The field F(a)(b) in the proof of the last corollary is also denoted as F(a, b). It is the smallest subfield of K that contains F, a and b, and it follows that F(a, b) = F(b, a). More generally, for a field extension F ⊆ K and for
, each algebraic over F, the field F(a1, . . . , an) is defined as F(a1)(a2) . . . (an) and is independent of the order in which ai are adjoined.
|
Let F ⊆ K be a finite extension. Then K is algebraic over F. Proof For any |
The converse of the last corollary is not true, that is, it is possible that an algebraic extension has infinite extension degree. Exercise 2.59 gives an example.
|
If F ⊆ K and K ⊆ L are algebraic field extensions, then F ⊆ L is also algebraic. Proof Take an arbitrary |
|
A field extension F ⊆ K is called simple, if K = F(a) for some |
|
Let F be a field of characteristic 0 and let a, b (belonging to some extension of F) be algebraic over F. Then the extension F(a, b) of F is simple. Proof Let p(X) and q(X) be the minimal polynomials (over F) of a and b respectively. Let d := deg p and d′ := deg q. The polynomials p and q are irreducible over F and hence by Exercise 2.61 have no multiple roots. Let a1, . . . , ad be the roots of p and b1, . . . , bd′ the roots of q with a = a1 and b = b1. For each i, j with j ≠ 1, the equation ai + λbj = a + λb has a unique solution for λ (not necessarily in F). Since F is infinite, we can choose |
|
A finite extension F ⊆ K of fields of characteristic 0 is simple. Proof We proceed by induction on d := [K : F]. The result vacuously holds for d = 1. So let us assume that d > 1 and that the result holds for all smaller values of d. Choose an element |
Let f(X) be a non-constant polynomial of degree d in F[X]. Assume that f does nor split over F. Consider an irreducible (in F[X]) factor f′ of f of degree d′ > 1. F′ := F[X]/〈f′〉 is a field extension of F. Furthermore, if
, the elements 1,
constitute a basis of F′ over F. In particular, [F′ : F] = d′ ≤ d. Now, one can write f(X) = (X – α1)g(X) for some
. If g splits over F′, so does f too. Otherwise, choose any irreducible (in F′[X]) factor g′ of g with deg g′ > 1 and consider the field extension F″ := F′[X]/〈g′〉. Then [F″ : F′] = deg g′ ≤ deg g = d – 1, so that [F″ : F] ≤ d(d – 1). Moreover, if
, then f(X) = (X –α1)(X –α2)h(X) for some
. Proceeding in this way we get:
|
For a polynomial |
We now establish the uniqueness of the splitting field of a polynomial
. To start with, we set up certain notations. An isomorphism μ : F → F′ of fields induces an isomorphism μ* : F[X] → F′[Y] of polynomial rings, defined by adXd+ad–1Xd–1 + · · · + a0 ↦ μ(ad)Yd + μ(ad–1)Yd–1 + · · · + μ(a0). We have μ*(a) = μ(a) for all
. Note also that
is irreducible over F if and only if
is irreducible over F′. With these notations we state the following important lemma.
|
Let the non-constant polynomial Proof Since F(α) = F[α] and F′(β) = F′[β], we can define the map τ : F[α] → F′[β] by g(α) ↦ (μ*(g))(β) for each |
Roots of an irreducible polynomial are called conjugates (of each other). If α and β are two roots of an irreducible polynomial
, the last lemma guarantees the existence of an isomorphism τ : F(α) → F(β) that fixes all the elements of F and that maps α ↦ β.
|
We use the maps μ : F → F′ and μ* : F[X] → F′[Y] as defined above. Let Proof We proceed by induction on n := [K : F]. (By Proposition 2.32 n is finite.) If n = 1, then K = F, that is, the polynomial f splits over F itself and so does μ*(f) over F′, that is, K′ = F′. Thus τ = μ is the desired isomorphism. Now assume that n > 1 and that the result holds for all fields L and for all polynomials in L[X] with splitting fields (over L) of extension degrees less than n. Consider an irreducible factor g of f with 1 < deg g ≤ deg f. Note that g also splits over K. We take any root |
The results pertaining to the splitting field of a polynomial can be generalized in the following way. Let S be a non-empty subset of F[X]. A splitting field of S over F is a minimal field K containing F such that each polynomial
splits in K. If S = {f1, . . . , fr} is a finite set, the splitting field of S is the same as the splitting field of f = f1 · · · fr (Exercise 2.57). But the situation is different, if S is infinite. Of particular interest is the set S consisting of all irreducible polynomials in F[X]. In this case, the splitting field of S is an algebraic closure of F.
We give a sketch of the proof that even when S is infinite, a splitting field for S can be constructed. This, in particular, establishes the existence of an algebraic closure of any field. We may assume that S comprises non-constant polynomials only. For each
, we define an indeterminate Xf and consider the ring
and the ideal
of A generated by f(Xf) for all
. We have
and, therefore, there is a maximal ideal
of A containing
(Exercise 2.23). Consider the field F1 := A/m containing F. Every polynomial
contains at least one root in F1. Now we replace F by F1 and as above get another field F2 containing F1 (and hence F), such that every polynomial in S (of degree ≥ 2) has at least two roots in F2. We continue this procedure (infinitely often, if necessary) and obtain a sequence of fields F ⊆ F1 ⊆ F2 ⊆ F3 ⊆ · · ·. Define K to be the field consisting of all elements of
, that are algebraic over F. Each polynomial in S splits in K, but in no proper subfield of K, that is, K is a splitting field of S.
It turns out that the splitting field of S is unique up to isomorphisms that fix elements of F. In particular, the algebraic closure of F is unique up to isomorphisms that fix elements of F, and is denoted by
.
For a field K, the set Aut K of all automorphisms of K is a group under (functional) composition. We extend this concept now. Let F ⊆ K be an extension of fields.
|
An automorphism Conversely, for a subgroup H of AutF K the set of elements of K that are fixed by all the automorphisms of H, that is, the set of all |
For every intermediate field L (that is, a field L with F ⊆ L ⊆ K), we have a subgroup AutL K of AutF K. Conversely, given a subgroup H of AutF K we have the intermediate fixed field FixF H. It is a relevant question to ask if there is any relationship between the subgroups of AutF K and the intermediate fields. A nice correspondence exists for a particular type of extensions that we define now.
|
A field extension F ⊆ K is said to be a Galois extension (or K is said to be a Galois extension over F), if FixF (AutF K) = F. Thus K is Galois over F if and only if for every |
|
Let K be the splitting field of a non-constant polynomial |
The following theorem establishes the correspondence we are looking for.
|
For a finite Galois extension F ⊆ K, there is a bijective correspondence between the set of all intermediate fields and the set of all subgroups of AutF K (given by L ↦ AutL K and H ↦ FixF H) such that the following assertions hold:
|
A proof of this theorem is rather long and uses many auxiliary results which we would not need otherwise. We, therefore, choose to omit the proof here.
| 2.73 | Let α be transcendental over F. Show that the domain F[α] and the field F(α) are respectively isomorphic to the polynomial ring F[X] and the field F(X) of rational functions in one indeterminate X. Generalize the result for an arbitrary family αi, , of elements each of which is transcendental over F.
|
| 2.74 | Let F ⊆ K be a field extension and let be an endomorphism of K with for every .
|
| 2.75 | Let F ⊆ K be a field extension.
|
| 2.76 | F is called a perfect field, if every irreducible polynomial in F[X] is separable over F. |
| 2.77 | A field extension F ⊆ K is called normal, if every irreducible polynomial in F[X], that has a root in K, splits in K[X].
|
| 2.78 | Prove the following assertions:
|
| 2.79 | Let F ⊆ K be a field extension and let L be the fixed field of AutF K over F. Show that K is a Galois extension of L. |
Finite fields are seemingly the most important types of fields used in cryptography. They enjoy certain nice properties that infinite fields (in particular, the well-known fields like
,
and
) do not. We concentrate on some properties of finite fields in this section. As we see later, arithmetic over a finite field K is fast, when char K = 2 or when #K is a prime. As a result, these two classes of fields are the most common ones employed in cryptography. However, in this section, we do not restrict ourselves to these specific fields only, but provide a general treatment valid for all finite fields. As in the previous section, we continue to use the letters F, K, L to denote fields. In addition, we use the letter p to denote a prime number and q a power of p: that is, q = pn for some
.
Let K be a finite field of cardinality q. Then p := char K > 0. By Proposition 2.7, p is a prime, that is, K contains an isomorphic copy of the field
. If
, we have q = pn. Therefore, we have proved the first statement of the following important result.
|
The cardinality of a finite field is a power pn, Proof In order to construct a finite field of cardinality q := pn, we start with |
|
Let K be a finite field of cardinality q. Then every Proof Clearly, 0q = 0. Take a ≠ 0. K* being a group of order q – 1, by Proposition 2.4 ordK* (a) divides q – 1. In particular, aq–1 = 1, that is, aq = a. |
|
Let K be a finite field of cardinality q = pn and let F be the subfield of K isomorphic to Proof By Theorem 2.37, each of the q elements of K is a root of f and consequently K is the splitting field of f. The last assertion in the theorem follows from the uniqueness of splitting fields (Proposition 2.33). |
This uniqueness allows us to talk about the finite field of cardinality q (rather than a finite field of cardinality q). We denote this (unique) field by
.
The results proved so far can be generalized for arbitrary extensions
, where q = pn, n,
. We leave the details to the reader (Exercise 2.82). It is important to point out here that since
is the splitting field of Xqm – X over
, by Exercise 2.77 we have:
|
Every finite extension of finite fields is normal. |
This implies that an irreducible polynomial
has either none or all of its roots in
. Also if
with q = pn, then αq = αpn = α. Therefore, αpn–1 is a p-th root of α. By Exercise 2.76(b), we then conclude:
|
Every finite field is perfect. |
|
Consider the extension Proof For d|m, we have (Xqd – X)|(Xqm – X). The qd roots of Xqd – X in K constitute an intermediate field L. If L′ ≠ L is another intermediate field with qd elements, by Theorem 2.36 there are more than qd elements of K, that are roots of Xqd – X, a contradiction. Conversely, an intermediate field L contains qd elements, where |
|
Let Proof Consider the extension Now we will prove a very important result concerning the multiplicative group |
|
Proof Modify the proof of Proposition 2.19 or use the following more general result. |
|
Let K be a field (not necessarily finite). Then any finite subgroup G of the multiplicative group K* is cyclic. Proof Since K is a field, for any |
|
Every finite extension Proof Let α be a generator of the cyclic group |
In this section, we study some useful properties of polynomials over finite fields. We concentrate on polynomials in
for an arbitrary q = pn,
,
. We have seen how the polynomials Xqm – X proved to be important for understanding the structures of finite fields. But that is not all; these polynomials indeed have further roles to play. This prompts us to reserve the following special symbol:
.
Let
be a finite extension of finite fields and let
be a root of the polynomial
. Since each
, we have
. Therefore,
. More generally, for each r = 0, 1, 2, · · · the element
is a root of f(X). This gives us a nice procedure for computing the minimal polynomial of α as the following corollary suggests.
|
The minimal polynomial of Proof Let |
We now prove a theorem which has important consequences.
|
Proof We have |
The first consequence of Theorem 2.40 is that it leads to a procedure for checking the irreducibility of a polynomial
. Let d := deg f. If f(X) is reducible, it admits an irreducible factor of degree ≤ ⌊d/2⌋. Since
is the product of all distinct irreducible factors of f with degrees dividing m, we compute the gcds g1, . . . , g⌊d/2⌋. If all these gcds are 1, we conclude that f is irreducible. Otherwise f is reducible. We will see an optimized implementation of this procedure in Chapter 3. Besides irreducibility testing, the above theorem also leads to algorithms for finding random irreducible polynomials and for factorizing polynomials, as we will also discuss in Chapter 3.
The second consequence of Theorem 2.40 is that it gives us a formula for calculating the number of monic irreducible polynomials of a given degree over a given field. First we need to define a function on
.
|
The Möbius function
It follows that μ(n) ≠ 0 if and only if n is square-free. |
|
For
where Proof The result follows immediately for n = 1. For n > 1, write |
|
Let f and g be maps from
Proof To prove the additive formula we note that
where the last equality follows from Lemma 2.6. The multiplicative formula can be proved similarly. |
Let us denote by νq,m the number of monic irreducible polynomials in
of degree m and by
the product of all monic irreducible polynomials in
of degree m. By Theorem 2.40, we have
and
. Applications of the Möbius inversion formula then yield the following formulas:
Equation 2.4

If p1, . . . , pr are all the prime divisors of m (not necessarily all distinct), Equation (2.4) together with the observation that μ(n) ≥ –1 for all
imply that
But each pi ≥ 2, so that m ≥ 2r, and hence
. We, therefore, have an independent proof of the second statement in Corollary 2.17. Moreover, for practical values of q and m we have the good approximation:
Equation 2.5

Since the total number of monic polynomials of degree m in Fq[X] is qm, a randomly chosen monic polynomial in
of degree m has an approximate probability of 1/m for being irreducible, that is, one expects to get an irreducible polynomial of degree m, after O(m) random monic polynomials are picked up from
. These observations have an important bearing for devising efficient algorithms for finding irreducible polynomials over finite fields. (See Chapter 3.)
The conjugates of
over
are αqi,
. It is interesting to look at the sum and the product of the conjugates of α. By Corollary 2.18,
for some
. Since
, the elements
and
belong to
. Since αqd = α, for any (positive) integral multiple δ of d, the sum
and the product
are elements of
too.
|
Let
and the norm of α over
In view of the preceding discussion, the trace and norm of α are elements of |
The trace and norm functions play an important role in the theory of finite fields. See Exercise 2.86 for some elementary properties of these functions.
is a vector space of dimension m over
. Let β0, . . . , βm–1 be an
-basis of
. Each element
has a unique representation a = a0β0 + · · · + am–1βm–1 with each
. Therefore, if we have a representation of the elements of
, we can also represent the elements of
. Thus elements of any finite field can be represented, if we have representations of elements of prime fields. But the set {0, 1, . . . , p – 1} under the modulo p arithmetic represents
.
So our problem reduces to selecting suitable bases β0, . . . , βm–1 of
over
. In order to illustrate how we can do that, let us choose a priori a fixed monic irreducible polynomial
with deg f = m. We then represent
, where α (the residue class of X) is a root of f in
. The elements
are linearly independent over
, since otherwise we would have a non-zero polynomial of degree less than m, of which α is a root. The
-basis 1, α, . . . , αm–1 of
is called a polynomial basis (with respect to the defining polynomial f). The elements of
are then polynomials of degrees < m. The arithmetic in
is carried out as the polynomial arithmetic of
modulo the irreducible polynomial f.
|
Polynomial bases are most common in finite field implementations. Some other types of bases also deserve specific mention in this context.
|
An element |
It can be shown that normal bases exist for all finite extensions
. It can even be shown that primitive normal bases also do exist for all such extensions.
|
Consider the representation of
with the 3×3 transformation matrix having determinant 1 modulo 2. Thus α is a normal element of On the other hand, α + 1 is not a normal element of
with the transformation matrix having determinant zero modulo 2. |
Computations over finite fields often call for exponentiations of elements a = a0β0 + · · · + am–1βm–1. If βi = αqi, i = 0, . . . , m – 1, construct a normal basis,
, since αqm = α and
for each i. Thus the coefficients of aq (in the representation under the given normal basis) is obtained simply by cyclically shifting the coefficients a0, . . . , am–1 in the representation of a. This leads to a considerable saving of time. In particular, this trick becomes most meaningful for q = 2 (a case of high importance in cryptography).
Now that exponentiations become cheaper with normal bases, one should not let the common operations (addition and multiplication) turn significantly slower. The sum of a = a0β0 + · · · + am–1βm–1 and b = b0β0 + · · · + bm–1βm–1 continues to remain as easy as in the case of a polynomial basis, namely, a + b = (a0 + b0)β0 + · · · + (am–1 + bm–1)βm–1, where each ai + bi is calculated in
. However, computing the product ab introduces difficulty. In particular, it requires the representation of βiβj, 0 ≤ i, j ≤ m – 1, in the basis β0, . . . , βm–1, say,
. For i ≤ j, we have
. It is thus sufficient to look only at the coefficients
, 0 ≤ j, k ≤ m – 1. We denote by Cα the number of non-zero
. From practical considerations (for example, for hardware implementations), Cα should be as small as possible. For q = 2, one can show that 2m – 1 ≤ Cα ≤ m2. If, for this special case, Cα = 2m – 1, the normal basis α, αq, . . . , αqm–1 is called an optimal normal basis. Unlike normal (or primitive normal) bases, optimal normal bases do not exist for all values of
.
We finally mention another representation of elements of a finite field
, that does not depend on the vector space representation discussed so far, but which is based on the fact that the group
is cyclic. If we are given a primitive element (that is, a generator) γ of
, then the elements of
are 0, 1 = γ0, γ, . . . , γq–2. Multiplication and exponentiation become easy with this representation, since 0 · a = 0 for all
, whereas γi · γj = γk with k ≡ i + j (mod q – 1). Unfortunately, this representation provides no clue on how to compute γi + γj. One possibility is to store a table consisting of the values zk satisfying 1 + γk = γzk for all k = 0, . . . , q – 2 (with γk ≠ –1), so that for i ≤ j one can compute γi + γj = γi(1 + γj–i) = γiγzj–i = γl, where l ≡ i + zj–i (mod q – 1). Such a table is called Zech’s logarithm table, can be maintained for small values of q and may facilitate computations in extensions
. But if q is large (or more correctly if p is large, where q = pn), this representation of elements of
is not practical nor often feasible. Another difficulty of this representation is that it calls for a primitive element γ. If q is large and the integer factorization of q – 1 is not provided, there are no efficient methods known for finding such an element or even for checking if a given element is primitive.
|
Consider the representation of |
| k | γk | 1 + γk | zk |
|---|---|---|---|
| 0 | 1 | 2 | 4 |
| 1 | β + 1 | β + 2 | 7 |
| 2 | 2β | 2β + 1 | 3 |
| 3 | 2β + 1 | 2β + 2 | 5 |
| 4 | 2 | 0 | – |
| 5 | 2β + 2 | 2β | 2 |
| 6 | β | β + 1 | 1 |
| 7 | β + 2 | β | 6 |
| 2.80 | Let F be a field (not necessarily finite) of characteristic and let a, . Prove that (a + b)p = ap + bp, or, more generally, (a + b)pn = apn + bpn for all . [H]
|
| 2.81 | Let , and q := pn. Prove that:
|
| 2.82 | Let , n, and q := pn. Let F ⊆ K be an extension of finite fields with #F = q and #K = qm. Show that K is the splitting field of over . [H]
|
| 2.83 | Write the addition and multiplication tables of (some representations of) the fields and . Use these tables to find a primitive element in each of these fields and a normal element in (over ).
|
| 2.84 | Let K be a field (not necessarily finite or of positive characteristic).
|
| 2.85 | In this exercise, one studies the arithmetic in the finite field .
|
| 2.86 | Let F ⊆ K ⊆ L be finite extensions of finite fields with [L : K] = s. Let α, and . Prove the following assertions:
|
| 2.87 | Let be a finite extension of finite fields. In this exercise, we treat both K and L as vector spaces over K. Show that:
|
| 2.88 | Let K and L be as in Exercise 2.87 and let . Show that TrL|K(β) = 0 if and only if β = γq – γ for some .
|
| 2.89 | Let K and L be as in Exercise 2.87. Two K-bases (β0, . . . , βm–1) and (γ0, . . . , γm–1) of L are called dual or complementary, if TrL|K(βiγj) = δij.[10] Show that every K-basis of L has a unique dual basis.
|
| 2.90 | Prove that every finite extension of finite fields is Galois. [H] |
| 2.91 | For the extension , consider the map , α ↦ αq.
|
| 2.92 | Let be irreducible with deg f = d. Consider the extension and let r := gcd(d, m).
|
| 2.93 | Consider the representation of in Example 2.19. Construct the minimal polynomials over of the elements of . [H]
|
| 2.94 | Show that the number of (ordered) -bases of is
(qm – 1)(qm – q)(qm – q2) · · ·(qm – qm – 1). |
In this section, we introduce some elementary concepts from algebraic geometry, which facilitate the treatment of elliptic and hyperelliptic curves in the next two sections. We concentrate only on plane curves, because these are the only curves we need in this book. Throughout this section, K denotes a field (finite or infinite) and
the algebraic closure of K.
The solutions of a polynomial equation f(X, Y) = 0 is one of the central objects of study in algebraic geometry. For example, we know that in
the equation X2 + Y2 – 1 = 0 represents a circle with origin at (0, 0) and with radius 1. When we pass to an arbitrary field, it is often not possible to visualize such plots, but it still makes sense to talk about the set of solutions of such an equation. For example, the solutions of the above circle in
are the four discrete points (0, 1), (0, 2), (1, 0) and (2, 0). (This solution set does not really look round.)
One can generalize this study by considering polynomials in n indeterminates and by investigating the simultaneous solutions of m polynomials. We, however, do not intend to be so general here and concentrate only on curves defined by a single polynomial equation in two indeterminates.
|
For |
is an n-dimensional vector space over K. For example, the affine plane can be identified with the conventional X-Y plane.
|
An affine plane (algebraic) curve C over K is defined by a polynomial |
K-rational points on a plane curve are precisely the solutions of the defining polynomial equation. Standard examples of affine plane curves include the straight lines given by aX + bY + c = 0, a,
, not both 0, and the conic sections (circles, ellipses, parabolas and hyperbolas) given by aX2 + bXY + cY2 + dX + eY + f = 0, a, b, c, d, e,
with at least one of a, b, c non-zero. For
, the set of K-rational points can be drawn as a graph of the polynomial equation, whereas for an arbitrary field K (in particular, for finite fields) such drawings make little or no sense. However, it is often helpful to visualize curves as curves over
(also called real curves) and then generalize the situation to an arbitrary field K.
The number ∞ is not treated as a real number (or integer or natural number). But it is often helpful to extend the definition of
by including two points that are infinitely far away from the origin, one in each direction. This gives us the so-called extended real line
. An immediate advantage of such a completion of
is that every Cauchy sequence converges in
. But for studying the roots of polynomial equations it is helpful to add only a single point at infinity to
in order to get what is called the projective line
over
. Similarly, if we start with the affine plane
and add a point at infinity for each slope
of straight lines Y = aX + b and one more for the lines X = c, we get the so-called projective plane
over
. We also call the line passing through all the points at infinity in
to be the line at infinity. An immediate benefit of passing from
to
is that in
any two distinct lines (parallel or not in
) meet at exactly one point and through any two distinct points of
passes a unique line.
Now it is time to replace
by an arbitrary field K and rephrase our definitions in such a way that it continues to make sense to talk about points and line at infinity, even when K itself contains only finitely many points.
|
Let |
It is evident that
can be identified with the set of all 1-dimensional vector subspaces (that is lines) of the affine space
. To argue that this formal definition tallies with the intuitive notion for n = 2 and
, consider the affine 3-space
referred to by the coordinates X, Y, Z. Look at the family of planes
,
parallel to the X-Y plane. (ε0 is the X-Y plane itself.) First take a non-zero value of λ, say λ = 1. Every line in
passing through the origin and not parallel to the X-Y plane meets ε1 exactly at one point. Conversely, a unique line passes through each point on ε1 and the origin. In this way, we associate points of
with points on ε1. These are all the finite points of
. On the other hand, the lines passing through the origin and lying in the X-Y plane (ε0 : Z = 0) do not meet ε1 and correspond to the points at infinity of
.
In the last paragraph, we obtained the canonical embedding of the affine plane
in
by setting Z = 1. By definition,
is symmetric in X, Y and Z. This means that we can as well set X = 1 or Y = 1 and see that there are other embeddings of
in
. This observation often proves to be useful (for example, see Definition 2.66).
Now that we have passed from the affine plane to the projective plane, we should be able to carry (affine) plane curves to the projective plane. For this, we need some definitions.
|
Let R denote the polynomial ring K[X0, X1, . . . , Xn] over a field K. A monomial of R is an element of R of the form Let C : f(X, Y) = 0 be an affine plane curve over a field K defined by a non-zero polynomial |
Take
and
. By definition, [x, y, z] = [λx, λy, λz]. Since f(h)(λx, λy, λz) = λdf(h)(x, y, z) = 0 if and only if f(h)(x, y, z) = 0, it makes sense to talk about the zeros of the homogeneous polynomial f(h) in the projective plane
. This motivates us to define projective plane curves:
|
A projective plane curve C over K is defined by a homogeneous polynomial |
Let C : f(X, Y) = 0 be an affine plane curve. The projective plane curve defined by f(h)(X, Y, Z) is by an abuse of notation denoted also by C. The zeros of the affine curve C : f(X, Y) = 0 in
are in one-to-one correspondence with the finite zeros of C : f(h)(X, Y, Z) = 0 in
(that is, zeros with Z = 1). The projective curve contains some more point(s), namely those at infinity, that can be obtained by putting Z = 0 in f(h)(X, Y, Z). Passage from the affine plane to the projective plane is just that: a systematic inclusion of the points at infinity.
It is often customary to write an affine plane curve as C : f(X, Y) = g(X, Y) and a projective plane curve as C : f(h)(X, Y, Z) = g(h)(X, Y, Z) with f(h) and g(h) of the same degree. The former is the same as the curve C : f – g = 0, and the latter the same as C : f(h) – g(h) = 0.
A homogeneous polynomial
can be viewed as the homogenization of any of the polynomials
fZ(X, Y) = f(X, Y, 1), fY (X, Z) = f(X, 1, Z) and fX(Y, Z) = f(1, Y, Z).
Consider a point P = [a, b, c] on the projective curve C : f(X, Y, Z) = 0. Since a, b and c are not all 0, P is a finite point on at least one of fX, fY and fZ.
Throughout the rest of Section 2.10 we make the following assumption:
|
K is an algebraically closed field, that is, |
Although many of the results we state now are valid for fields that are not algebraically closed, it is convenient to make this assumption in order to avoid unnecessary complications.
Let C : f(X, Y) = 0 be a curve defined over K. Henceforth we assume that the polynomial f(X, Y) is irreducible over K. Though we write the affine equation for the curve for notational simplicity, we usually work with the set C(K) of the K-rational points on the corresponding projective curve. We refer to the solutions of C in the affine plane
as the finite points on the curve.
|
Let P = [a, b, c] be a point on a curve C defined over K. We call P a smooth or regular or non-singular point of C, if P satisfies the following conditions.
A non-smooth point on C is also called non-regular or singular. C is called smooth or regular or non-singular, if all points (finite and infinite) on C are smooth. |
Now we define polynomial functions on C. For a moment, we concentrate on the affine curve, that is, only the finite points on C. Let g,
with
(that is, f|(g – h)). Since for any point P on C we have f(P) = 0, it follows that g(P) = h(P). This motivates us to define the following.
|
The ring K[X, Y]/〈f〉 is called the affine coordinate ring of C and is denoted by K[C]. Elements of K[C] are called polynomial functions on C. If we denote by x and y the residue classes of X and Y respectively in K[C], then a polynomial function on C is given by a polynomial
The quotient field (Exercise 2.34) of K[C] is called the function field of C and is denoted by K(C). An element of K(C) is of the form g(x, y)/h(x, y) with g(x, y), |
By definition, two rational functions
are equal if and only if g1(x, y)h2(x, y) – g2(x, y)h1(x, y) = 0 in K[C] or, equivalently, if and only if
. We define addition and multiplication of rational functions by the usual rules (Exercise 2.34).
|
Let P = (a, b) be a finite point on the curve C. Given a polynomial function |
By definition, K[C] and K(C) are collections of equivalence classes. However, the value of a polynomial or a rational function on C is independent of the representatives of the equivalence classes and is, therefore, a well-defined concept.
The above definitions can be extended to the corresponding projective curve C : f(h)(X, Y, Z) = 0. By Exercise 2.96(e), the polynomial f(h) is irreducible, since we assumed f to be so.
|
The function field (denoted again by K(C)) of the projective curve C is the set of quotients (called rational functions) of the form g(X, Y, Z)/h(X, Y, Z), where g, A rational function |
One can define polynomial functions on a projective curve (as we did for affine curves), but it makes no sense to talk about the value of such a polynomial function at a point P on the curve, because this value depends on the choice of the homogeneous coordinates of P (Exercise 2.95). This problem is eliminated for a rational function g/h by assuming g and h to be of the same degree.
|
Let C be a projective plane curve, r be a non-zero rational function and P a point on C. P is called a zero of r if r(P) = 0, and a pole of r if r(P) = ∞. |
Now we define the multiplicities of zeros and poles of a rational function or, more generally, the order of any point on a projective plane curve. This is based on the following result, the proof of which is long and difficult, and is omitted.
|
Let C be a projective plane curve defined by an irreducible polynomial over K and P a smooth point on C. Then there exists a rational function
|
|
The function uP of the last theorem is called a uniformizing variable or a uniformizing parameter or simply a uniformizer of C at P. For any non-zero rational function |
The connection of poles and zeros with orders is established by the following theorem which we again avoid to prove.
|
P is neither a pole nor a zero of r if and only if ordP(r) = 0. P is a zero of r if and only if ordP(r) > 0. P is a pole of r if and only if ordP(r) < 0. |
If P is a zero (resp. a pole) of r, the integer ordP(r) (resp. – ordP(r)) is called the multiplicity of the zero (resp. pole) P.
|
Let r be a rational function on the projective plane curve C defined over K. Then r has finitely many poles and zeros. Furthermore, |
This is one of the theorems that demand K to be algebraically closed. More explicitly, if K is not algebraically closed, any rational function
continues to have only finitely many zeros and poles, but the sum of the orders of r at these points is not necessarily equal to 0. Also note that this sum, if taken over only the finite points of C, need not be 0, even when K is algebraically closed.
Now that we know how to define and evaluate rational functions on a curve, we are in a position to define rational maps between two curves. Let C1 : f1(X, Y, Z) = 0 and C2 : f2(X, Y, Z) = 0 be two projective plane curves defined over K by irreducible homogeneous polynomials f1,
.
|
A rational map This, however, is not the complete story. A more precise characterization of a rational map is as follows: A rational map The curves C1 and C2 are said to be isomorphic (denoted C1 ≅ C2), if there exist morphisms |
Isomorphism is an equivalence relation on the set of all projective plane curves defined over K. Since two isomorphic curves share many common algebraic and geometric properties, it is of interest in algebraic geometry to study the equivalence classes (rather than the individual curves). If C1 ≅ C2 and C2 has a simpler representation than C1, then studying the properties of C2 makes our job simpler and at the same time reveals all the common properties of C1. (See Section 2.11 for an example.)
Let a be a symbol and n a positive integer. We represent by na the formal sum a+···+a (n times). We also define 0a := 0 and –na := n(–a), where the symbol –a satisfies a + (–a) = (–a) + a = 0. For n1,
, we define n1a + n2a := (n1 + n2)a. The set
under these definitions becomes an Abelian group. If we are given two symbols a, b we can analogously define formal sums na + mb, n,
, and the sum of formal sums as (n1a + m1b) + (n2a + m2b) := (n1 + n2)a + (m1 + m2)b. With these definitions the set
becomes an Abelian group. These constructions can be generalized as follows:
|
Given a set (not necessarily finite) of symbols ai, |
Now let ai be the K-rational points on a projective plane curve C defined over K. For notational convenience, we represent by [P] the symbol corresponding to the point P on C. This removes confusions in connection with elliptic curves C (See Section 2.11) for which we intend to make a distinction between P + Q and [P] + [Q] for two points P,
. The former sum is again a point on C, whereas the latter is never (the symbol corresponding to) a point on C.
|
A formal sum Let The degree of D is defined as the integer |
Now we define divisors of rational functions on C. Henceforth we assume that C is smooth (that is, smooth at all K-rational points on C).
|
The divisor of a rational function A divisor |
Though the Jacobian
is defined for an arbitrary smooth curve C (defined by an irreducible polynomial), it is a special class of curves called hyperelliptic curves for which it is particularly easy to represent and do arithmetic in the group
. This gives us yet another family of groups on which cryptographic protocols can be built.
If K is not algebraically closed, we need not have
for a rational function
. This means that in that case the group
cannot be defined in the above manner. However, since C is also a curve defined over
, we can define
as above and call a particular subgroup of
as the Jacobian
of C over K. We defer this discussion until Section 2.12.
In this exercise set, we do not assume (unless otherwise stated) that K is necessarily algebraically closed.
| 2.95 |
|
| 2.96 | In this exercise, we generalize the notion of homogenization and dehomogenization of polynomials. Let K[X1, . . . , Xn] denote the polynomial ring in n indeterminates. Introducing another indeterminate X0, we define the homogenization of a polynomial as
Prove the following assertions.
|
| 2.97 | Let C : f(X, Y) = 0 be an affine plane curve defined by a non-zero polynomial and C : f(h)(X, Y, Z) = 0 the corresponding projective plane curve. Let d := deg f = deg f(h) and fd the sum of non-zero terms of f of degree d. Show that:
|
| 2.98 | Show that the defining polynomial of the elliptic curve in Exercise 2.97(e) is irreducible. Prove the same for the hyperelliptic curve of Exercise 2.97(f). [H] |
| 2.99 | Show that for an ideal the following two conditions are equivalent:
An ideal satisfying the above equivalent conditions is called a homogeneous ideal. Construct an example to demonstrate that all ideals of K[X1, . . . , Xn] need not be homogeneous. |
The mathematics of elliptic curves is vast and complicated. A reasonably complete understanding of elliptic curves would require a book of comparable size as this. So we plan to be rather informal while talking about elliptic curves and about their generalizations called hyperelliptic curves. Interested readers can go through the books suggested at the end of this chapter to learn more about these curves. In this section, K stands for a field (finite or infinite) and
the algebraic closure of K.
An elliptic curve E over K is a plane curve defined by the polynomial equation
Equation 2.6

or by the corresponding homogeneous equation
E : Y2Z + a1XYZ + a3YZ2 = X3 + a2X2Z + a4XZ2 + a6Z3.
These equations are called the Weierstrass equations for E. In order that E qualifies as an elliptic curve, we additionally require that it is smooth at all
-rational points (Definition 2.66).[12] Two elliptic curves defined over the field
are shown in Figure 2.1.
[12] Ellipses are not elliptic curves.

(a) Y2 = X3 – X + 1
(b) Y2 = X3 – X

E contains a single point at infinity, namely
(Exercise 2.97(e)). The set of K-rational points on E in the projective plane
is denoted by E(K) and is the central object of study in the theory of elliptic curves. We shortly endow E(K) with a group structure and this group is used extensively in cryptography.
Let us first see how we can simplify the equation for E. The simplification depends on the characteristic of K. Because fields of characteristic 3 are only rarely used in cryptography, we will not deal with such fields. Simplification of the Weierstrass equation is effected by suitable changes of coordinates. A special kind of transformation is allowed in order to preserve the geometric and algebraic properties of an elliptic curve.
|
Two elliptic curves
defined over K are isomorphic (Definition 2.72) if and only if there exist Equation 2.7
|
The theorem is not proved here. Formulas (2.7) can be checked by tedious calculations. A change of variables as in Theorem 2.44 is referred to as an admissible change of variables. We denote this by
(X, Y) ← (u2X + r, u3Y + u2sX + t).
The inverse transformation is also admissible and is given by

Isomorphism is an equivalence relation on the set of all elliptic curves over K.
Consider the elliptic curve E over K given by Equation (2.6). If char K ≠ 2, the admissible change
transforms E to the form
E1 : Y2 = X3 + b2X2 + b4X + b6.
If, in addition, char K ≠ 3, the admissible change
transforms E1 to E2 : Y2 = X3 + aX + b. We henceforth assume that an elliptic curve over a field of characteristic ≠ 2, 3 is defined by
Equation 2.8

(instead of by the original Weierstrass Equation (2.6)).
If char K = 2, the Weierstrass equation cannot be simplified as in Equation (2.8). In this case, we consider two cases separately, namely a1 ≠ 0 or otherwise. In the former case, the admissible change
allows us to write Equation (2.6) in the simplified form
Equation 2.9

On the other hand, if a1 = 0, then the admissible change (X, Y) ← (X + a2, Y) shows that E can be written in the form
Equation 2.10

A curve defined by Equation (2.9) is called non-supersingular, whereas one defined by Equation (2.10) is called supersingular.
Now we associate two quantities with an elliptic curve. The importance of these quantities follows from the subsequent theorem. We start with the generic Weierstrass equation and later specialize to the simplified formulas.
|
For the curve given by Equation (2.6), we define the following quantities: Equation 2.11
Δ(E) is called the discriminant of the curve E, and j(E) the j-invariant of E. |
For the special cases given by the simplified equations above, these quantities have more compact formulas as given in Table 2.5.
|
For the curve E defined by Equation (2.6), the following properties hold:
Proof
|
Consider an elliptic curve E over a field K. We now define an operation (which is conventionally denoted by +) on the set E(K) of K-rational points on E in the projective plane
. This operation provides a group structure on E(K). It is important to point out that this group is not the same as the group DivK(E) of divisors on E(K) (Definition 2.74), since the sum of points we are going to define is not formal. However, there is a connection between these two groups (See Exercise 2.125).
|
Let E be the elliptic curve defined by Equation (2.6) and
|
No simple proof of this theorem is known. Indeed the only group axiom that is difficult to check is associativity, that is, to check that (P + Q) + R = P + (Q + R) for all P, Q,
. An elementary strategy would be to write explicit formulas for (P + Q) + R and P + (Q + R) (using the formulas for P + Q given below) and show that they are equal, but this process involves a lot of awful calculations and consideration of many cases.
There are other proofs that are more elegant, but not as elementary. One possibility is to use the theory of divisors and is outlined now. It turns out that the Jacobian
has a bijective correspondence with the set E(K) via the map
which takes
to
(more correctly to the equivalence class of the divisor
in
). Furthermore,
, where the addition on the left is the addition on E(K) as defined above and the addition on the right is that in the Jacobian
. By definition,
is naturally an additive Abelian group. It immediately follows that E(K) is an additive Abelian group too. (See Exercise 2.125.)
We now give the formulas for the coordinates of the points –P and P + Q on E(K). The derivation of these formulas for the general case is left to the reader (Exercise 2.102). We concentrate on the important special cases. We assume that P = (h1, k1) and Q = (h2, k2) are finite points on E(K) with Q ≠ –P so that
.
If char K ≠ 2, 3 and E is defined by Equation 2.8, we have:

Next, we consider char K = 2 and non-supersingular curves (Equation 2.9). The formulas in this case are:

Finally, for supersingular curves (Equation 2.10) with char K = 2, we have:

We denote by mP the sum P + · · · + P (m times) for a point
and for
. We also define
and (–m)P := –(mP) (for
).
|
| P | 2P | 3P | 4P | 5P | 6P | 7P | 8P | 9P | ord P |
|---|---|---|---|---|---|---|---|---|---|
P0 = ![]() | ![]() | 1 | |||||||
| P1 = (0, ξ2 + ξ) | P5 | P4 | P7 | P8 | P3 | P6 | P2 | ![]() | 9 |
| P2 = (0, ξ2 + ξ + 1) | P6 | P3 | P8 | P7 | P4 | P5 | P1 | ![]() | 9 |
| P3 = (ξ + 1, ξ) | P4 | ![]() | 3 | ||||||
| P4 = (ξ + 1, ξ + 1) | P3 | ![]() | 3 | ||||||
| P5 = (ξ2, ξ2) | P7 | P3 | P2 | P1 | P4 | P8 | P6 | ![]() | 9 |
| P6 = (ξ2, ξ2 + 1) | P8 | P4 | P1 | P2 | P3 | P7 | P5 | ![]() | 9 |
| P7 = (ξ2 + ξ, ξ2 + ξ) | P2 | P4 | P6 | P5 | P3 | P1 | P8 | ![]() | 9 |
| P8 = (ξ2 + ξ, ξ2 + ξ +1) | P1 | P3 | P5 | P6 | P4 | P2 | P7 | ![]() | 9 |
|
Let |
Multiples mP of a point
can be expressed using nice formulas.
|
For an elliptic curve defined over K by the equation E : f(X, Y) = 0 and for mP = (θm(h, k)/ψm(h, k)2, ωm(h, k)/ψm(h, k)3). The polynomial ψm is called the m-th division polynomial of E. |
Using the addition formula one can verify the following recursive description for ψm and the expressions for θm and ωm in terms of ψm.
|
For an elliptic curve E defined by the general Weierstrass Equation (2.6) over a field K, the division polynomials ψm,
where di are as in Definition 2.76. The polynomials θm satisfy
and for char K ≠ 2, one has
|
It follows by induction on m that these formulas really give polynomial expressions for ψm, θm and ωm for all
. For even m, the polynomial ψm is divisible by ψ2. Furthermore, for
the polynomials
defined as

can be expressed as polynomials in x only. These univariate polynomials
are easier to handle than the bivariate ones ψm and, by an abuse of notation, are also called division polynomials. The degrees of
satisfy the inequality:

Points of E[m] can be characterized in terms of the division polynomials:
|
Ler |
We finally define polynomials
as follows. If char K ≠ 2, then
for all
. On the other hand, for char K = 2 and for non-supersingular curves over K we already have
(Exercise 2.107), and it is customary to define fm(x) := ψm(x, y) for all
. By further abuse of notations, we also call fm the m-th division polynomial of E.
In this section, we take
, a finite field of cardinality q and characteristic p. We do not deal with the case p = 3. Let E be an elliptic curve defined over
. If p > 3, we assume that E is defined by Equation (2.8), whereas for p = 2, we assume that E is defined by Equation (2.10) or Equation (2.9) depending on whether E is supersingular or not.
Since
is a subset of
, the cardinality
is finite. The next theorem shows that
is quite close to q.
|
|
The implication of this theorem is that the possible cardinalities of
lie in a rather narrow interval
. If q = p is a prime, then for every n,
, there is at least one curve E with
. Moreover, the values of
are distributed almost uniformly in the interval
. However, if q is not a prime, these nice results do not continue to hold.
|
If t = 1 (that is, if |
Anomalous and supersingular curves are cryptographically weak, because certain algorithms are known with running time better than exponential to solve the so-called elliptic curve discrete logarithm problem over these curves. Determination of the order
gives t from which one can easily check whether E is anomalous or supersingular. If p = 2, we have an easier check for supersingularity.
|
An elliptic curve E over a finite field of characteristic 2 is supersingular if and only if j(E) = 0 or, equivalently, if and only if a1 = 0 in Equation (2.6). |
For arbitrary characteristic p, we have the following characterization.
|
An elliptic curve E over |
By Theorem 2.38, the group
is always cyclic. However, the group
is not always cyclic, but is of a special kind. We need a few definitions to explain the structure of
. The notion of internal direct product for multiplicative groups (Exercise 2.19) can be readily applied to additive groups as follows.
|
Let G be an additive group and let H1, . . . , Hr be subgroups of G. If every element of G can be written uniquely as h1 + · · · + hr with |
|
Let G be a finite additive Abelian group of cardinality #G = n. Then there exist |

|
The elliptic curve group |
Once we know the order
of the group
, it is easy to compute the order of
as the following theorem suggests.
|
Let α, |
| 2.100 | Show that the following curves over K are not smooth (and hence not elliptic curves): |
| 2.101 |
|
| 2.102 | Let P = (h1, k1) and Q = (h2, k2) be two points (different from ) in E(K) defined by the Weierstrass Equation (2.6). Assume that Q ≠ –P. Determine R = (h3, k3) = P + Q as follows:
|
| 2.103 | Let . Show that there exists an elliptic curve E over K such that . [H]
|
| 2.104 | Assume that char K ≠ 2, 3 and consider the elliptic curve E given by Equation (2.8). Let K[E] be the affine coordinate ring and K(E) the field of rational functions on E.
|
| 2.105 | Show that the division polynomials for the general Weierstrass equation can be recursively defined as
where F = 4x3 + d2x2 + 2d4x + d6. |
| 2.106 | Write the recursive formulas for the division polynomials ψm(x, y) and for the elliptic curve E defined by Equation 2.8 over a field K of characteristic ≠ 2, 3. Show that for m ≥ 2 and for we have
|
| 2.107 | Write the recursive formulas for the division polynomials ψm(x, y) and for the elliptic curve E defined by Equation 2.9 over a field K of characteristic 2. Conclude that ψm are polynomials in only x for all . With fm := ψm for all show that for m ≥ 2 and for we have
|
| 2.108 | Consider the elliptic curve defined over the field :
Ea,b : Y2 = X3 + aX + b. Verify the following assertions: (You may write a computer program.)
|
| 2.109 | Consider the representation of as , where ξ is a root of T3 + T + 1 in . Identify an element (where ) with the integer (a2a1a0)2 = a222 + a12 + a0. For integers a, , b ≠ 0, define the non-supersingular elliptic curve:
Ea,b : Y2 + XY = X3 + aX2 + b. Verify the following assertions: (You may write a computer program.)
|
| 2.110 | Consider the representation of and the identification of elements of with integers as in Exercise 2.109. For a, b, , a ≠ 0, define the supersingular elliptic curve:
Ea,b,c : Y2 + aY = X3 + bX + c. Verify the following assertions: (You may write a computer program.)
|
| 2.111 | Consider the elliptic curve E : Y2 + XY = X3 + X2 + 1 defined over for all . Show that
where r = ⌊n/2⌋. [H] Conclude that E is anomalous over |
| 2.112 | Let K be a finite field of characteristic ≠ 2, 3 and E : Y2 = X3 + aX + b an elliptic curve defined over K. Prove that:
|
| 2.113 | Let E : Y2 + XY = X3 + aX2 + b be a non-supersingular elliptic curve defined over . Prove that:
|
| 2.114 | Let E : Y2 + aY = X3 + bX + c be a supersingular elliptic curve over . Prove that:
|
| 2.115 |
|
| 2.116 | Let , p ≡ 3 (mod 4), and a, . Consider the elliptic curve E : Y2 = X3 – a2X over (or over ). Prove that:
|
| 2.117 | A Weierstrass equation of an elliptic curve defined over a field K is said to be in the Legendre form, if it can be written as
Equation 2.12
for some |
Hyperelliptic curves are generalizations of elliptic curves. We cannot define a group structure on a general hyperelliptic curve in the way as we did for elliptic curves. We instead work in the Jacobian of a hyperelliptic curve. For an elliptic curve E over an algebraically closed field K, the Jacobian
is canonically isomorphic to the group E(K). Thus one can as well use the techniques for hyperelliptic curves for describing and working in elliptic curve groups. However, the exposition of the previous section turns out to be more intuitive and computationally oriented.
A hyperelliptic curve C of genus
over a field K is defined by a polynomial equation of the form
Equation 2.13

In order that C qualifies as a hyperelliptic curve, we additionally require that C (as a projective curve) be smooth over
. The set of K-rational points on C is denoted as usual by C(K). For g = 1, Equation (2.13) is the same as the Weierstrass Equation (2.6) on p 98, that is, elliptic curves are hyperelliptic curves of genus one. A hyperelliptic curve of genus 2 over
is shown in Figure 2.2.
: Y2 = X(X2 – 1)(X2 – 2)

A hyperelliptic curve has only one point at infinity
(Exercise 2.97(f)) and is smooth at
. If char K ≠ 2, substituting
simplifies Equation (2.13) as
. Since
is a monic polynomial in K[X] of degree 2g + 1, we may assume that if char K ≠ 2, the equation for C is of the form:
Equation 2.14

|
If char K ≠ 2, then the hyperelliptic curve C defined by Equation (2.14) is smooth if and only if v has no multiple roots (in Proof First, consider char K ≠ 2. If v has a multiple root, say For char K = 2 and |
|
Let P = (h, k) be a finite point on the hyperelliptic curve C defined by Equation (2.13). The point
|
All the general theory we described in Section 2.10 continues to be valid for hyperelliptic curves. However, since we are now given an explicit equation describing the curves, we can give more explicit expressions for polynomial and rational functions on hyperelliptic curves. For simplicity, we consider the affine equation and extend our definitions separately for the point at infinity.
Consider the hyperelliptic curve C defined by Equation (2.13). By Exercise 2.98, the defining polynomial f(X, Y) := Y2 + u(X)Y – v(X) (or its homogenization) is irreducible over
, so that the affine (or projective) coordinate ring of C is an integral domain and the corresponding function field is simply the field of fractions of the coordinate ring.
Let
. Since y2 + u(x)y – v(x) = 0 in K[C], we can repeatedly substitute y2 by –u(x)y + v(x) in G(x, y) until the y-degree of G(x, y) becomes less than 2. This proves part of the following:
|
Every polynomial function Proof In order to establish the uniqueness, note that if G(x, y) = a1(x) + yb1(x) = a2(x) + yb2(x), then |
|
Let |
Some useful properties of the norm function are listed in the following lemma, the proof of which is left to the reader as an easy exercise.
|
For G,
|
We also have an easy description of the rational functions on C.
|
Every rational function Proof We can write r(x, y) = G(x, y)/H(x, y) for G, |
The value of a rational function on C at a finite point on C can be defined as in the case of general curves (See Definition 2.68). In order to define the value of a rational function at the point
, we need some other concepts.
For a moment, let us assume that
. From the equation of C, we see that k2 ≈ h2g+1 (neglecting lower-degree terms) for sufficiently large coordinates h, k of a point
. This means that k tends to infinity exponentially (2g + 1)/2 times as fast as h does. So it is customary to give Y a weight (2g + 1)/2 times a weight we give to X. The smallest integral weights of X and Y to satisfy this are 2 and 2g + 1 respectively. This motivates us to provide Definition 2.84 (generalized for any K).
|
Let If 0 ≠ G = a(x)+yb(x), d1 = degx a and d2 = degx b, then the leading coefficient of G is taken to be the coefficient of xd1 in a(x) if deg G = 2d1, or to be the coefficient of xd2 in b(x) if deg G = 2g + 1 + 2d2. (We cannot have 2d1 = 2g + 1 + 2d2, since the left side is even and the right side is odd.) |
Some basic properties of the degree function follow.
|
For G,
Proof Easy exercise. |
Now we are in a position to give an explicit definition of the value of a rational function at
.
|
For If deg(G) < deg(H), then If deg(G) > deg(H), then If deg(G) = deg(H), then |
Now that we have a complete description of the value of a rational function at any point on C, poles and zeros of rational functions on C can be defined as in Definition 2.70. In order to define the order of a polynomial or rational function at a point P on C, we should find a uniformizing parameter uP at P. Tedious calculations help one deduce the following explicit expressions for uP.
|
Let
as a uniformizing parameter at P. Finally, |
We give an alternative definition of the order (independent of uP), which is computationally useful and which is equivalent to Definition 2.71 for a hyperelliptic curve.
|
Let
Finally, we define Now, let r(x, y) = G(x, y)/H(x, y) be a rational function on C and |
|
Let Now consider r = (x – h)m for some m < 0. Write r = G/H with G = 1 and H = (x – h)–m. Since ordQ(r) = ordQ(G) – ordQ(h), we continue to have
If m ≥ 0, then r is a polynomial function and has zeros P and |
|
A non-constant polynomial function |
We continue to work with the hyperelliptic curve C of Equation (2.13). We first impose the restriction that K is algebraically closed and use the theory of Section 2.10 to define the set Div(C) of divisors on C, the degree zero part Div0(C) of Div(C), the divisor Div(r) of a rational function
, the set Prin(C) of principal divisors on C, the Picard group Pic(C) = Div(C)/ Prin(C) and the Jacobian
.
|
For the rational function r := (x – h)m of Example 2.23, we have:
|
The Jacobian
is the set of all cosets of Prin(C) in Div0(C). It is not a good idea to work with cosets (which are equivalence classes). Recall that in the case of
, we represented a coset
by the remainder of Euclidean division of a by n. In case of the representation
, we took polynomials of smallest degrees as canonical representatives of the cosets of 〈f(X)〉. In case of
too, we intend to find such good representatives, one from each coset. We now introduce the concept of reduced divisors for that purpose.
|
Two divisors D1, |
Our goal is to associate to every divisor
some unique reduced divisor
with D ~ Dred, that is, Dred plays the role of the canonical representative of
. We start with the following definition.
|
A divisor |
|
Every divisor Proof Let
and
with m1 and m2 so chosen that D1,
Now, we explain how we can represent a semi-reduced divisor by a pair of polynomials a(x), |
|
Let
|
|
Let
Conversely, if a(x), |
We denote the divisor gcd
by Div(a, b). The zero divisor has the representation Div(1, 0).
A representation of the elements of
by semi-reduced divisors (that is, by pairs of polynomials in K[x]) suffers from two disadvantages. First, the representation is not unique, and second, the degrees of the representing polynomials may be quite large. These difficulties are removed if we consider semi-reduced divisors of a special kind.
|
A semi-reduced divisor |
The following theorem establishes the desirable properties of a reduced divisor.
|
For Proof We only prove the existence of reduced divisors. For the proof of the uniqueness, one may, for example, see Koblitz [154]. The norm of a divisor Let |
From the viewpoint of cryptography, the field K should be a finite field which is never algebraically closed. So we must remove the restriction
. Since C is naturally defined over
as well, we start with the Jacobian
and define a particular subgroup of
to be the Jacobian
of C over K.
|
Let |
Every element of
can be represented uniquely as a reduced divisor Div(a, b) for polynomials a(x),
with degx a ≤ g and degx b < degx a.
is, therefore, a finite Abelian group. For suitably chosen hyperelliptic curves, these groups can be used to build cryptographic protocols.
In this exercise set, we let C denote a hyperelliptic curve of genus g defined by Equation (2.13) over a field K (not necessarily algebraically closed).
| 2.118 |
|
| 2.119 | Represent as , where ξ is a root of the irreducible polynomial .
|
| 2.120 | Let . Prove the following assertions:
|
| 2.121 | Prove Lemmas 2.9 and 2.10. |
| 2.122 | Let and .
|
| 2.123 | Prove Theorem 2.52. [H] |
| 2.124 | A line on C is a polynomial function of the form with a, b, , a and b not both 0.
|
| 2.125 | Let E be an elliptic curve (that is, a hyperelliptic curve of genus 1) defined over K.
|
In this section, we develop the theory of number fields and rings. Our aim is to make accessible to the readers the working of the cryptanalytic algorithms based on number field sieves.
Commutative algebra is the study of commutative rings with identity (rings by our definition). Modern number theory and geometry are based on results from this area of mathematics. Here we give a brief sketch of some commutative algebra tools that we need for developing the theory of number fields.
We start with some basic operations on ideals (cf. Example 2.7, Definition 2.23).
|
Let A be a ring and let The set-theoretic intersection The sum of the family
Two ideals If I = {1, 2, . . . , n} is finite, the product
If |
One can readily check that the operations intersection, sum and product on ideals in a ring are associative and commutative.
Commutative algebra extensively uses the theory of prime and maximal ideals (Definition 2.19, Proposition 2.9, Corollary 2.2 and Exercise 2.23). The set of all prime ideals in A is called the (prime) spectrum of A and is denoted by Spec A. The set of all maximal ideals of A is called the maximal spectrum of A and denoted by Spm A. We have Spm A ⊆ Spec A. These two sets play an extremely useful role for the study of the ring A. If A is non-zero, both these sets are non-empty.
The concept of formation of fractions of integers to give the rationals can be applied in a more general setting. Instead of having any non-zero element in the denominator of a fraction we may allow only elements from a specific subset. All we require to make the collection of fractions a ring is that the allowed denominators should be closed under multiplication.
|
Let A be a ring. A non-empty subset S of A is called multiplicatively closed or simply multiplicative, if |
|
Let A be a ring and S a multiplicative subset of A. We define a relation ~ on A × S as: (a, s) ~ (b, t) if and only if u(at – bs) = 0 for some
. (If A is an integral domain, one may take u = 1 in the definition of ~.) It is easy to check that ~ is an equivalence relation on A × S. The set of equivalence classes of A × S under ~ is denoted by S–1A, whereas the equivalence class of
is denoted as a/s. For a/s,
, define (a/s) + (b/t) := (at + bs)/(st) and (a/s)(b/t) := (ab)/(st). It is easy to check that these operations are well-defined and make S–1 A a ring with identity 1/1, in which each s/1,
, is invertible. There is a canonical ring homomorphism
taking a ↦ a/1. In general,
is not injective. However, if A is an integral domain and 0 ∉ S, then the injectivity of
can be proved easily and we say that the ring A is canonically embedded in the ring S–1A.
|
Let A be a ring and S a multiplicative subset of A. The ring S–1A constructed as above is called the localization of A away from S or the ring of fractions of A with respect to S. |
|
The concept of integral dependence generalizes the notion of integers. Recall that for a field extension K ⊆ L, an element
is called algebraic over K, if α is a root of a non-zero polynomial
. Since K is a field, the polynomial f can be divided by its leading coefficient, giving a monic polynomial in K[X] of which α is a root. However, if K is not a field, division by the leading coefficient is not always permissible. So we require the minimal polynomial to be monic in order to define a special class of objects.
|
Let A ⊆ B be an extension of rings. An element
|
|
Now let A ⊆ B be an extension of rings and let C consist of all the elements of B that are integral over A. Clearly, A ⊆ C ⊆ B. It turns out that C is again a ring. This result is not at all immediate from the definition of integral elements. We prove this by using the following lemma which generalizes Theorem 2.33.
|
For a ring extension A ⊆ B and for
Proof [(a)⇒(b)] Let αn + an–1αn–1 + · · · + a1α + a0 = 0, [(b)⇒(c)] Take C := A[α]. [(c)⇒(a)] Let |
|
For an extension A ⊆ B of rings, the set
is a subring of B containing A. Proof Clearly, A ⊆ C ⊆ B as sets. To show that C is a ring let α, |
|
The ring C of Proposition 2.42 is called the integral closure of A in B. A is called integrally closed in B, if C = A. On the other hand, if C = B, we say that B is an integral extension of A or that B is integral over A. An integral domain A is called integrally closed (without specific mention of the ring in which it is so), if A is integrally closed in its quotient field Q(A). An integrally closed integral domain is called a normal domain (ND). |
|
Recall that a PID is a ring (integral domain) in which every ideal is principal, that is, generated by a single element. We now want to be a bit more general and demand every ideal to be finitely generated. If a ring meets our demand, we call it a Noetherian ring. These rings are named after Emmy Noether (1882–1935) who was one of the most celebrated lady mathematicians of all ages and whose work on Noetherian rings has been very fundamental and deep in the branch of algebra. Emmy’s father Max Noether (1844 –1921) was also an eminent mathematician.
|
Let A be a ring and let |
|
For a ring A, the following conditions are equivalent:
Proof [(a)⇒(b)] Let [(b)⇒(c)] Let S be a non-empty set of ideals of A. Order S by inclusion. The ACC implies that every chain in S has an upper bound in S. By Zorn’s lemma, S has a maximal element. [(c)⇒(a)] Let |
|
A ring A is called Noetherian, if A satisfies (one and hence all of) the equivalent conditions of Proposition 2.43. |
|
We have seen that if A is a PID, the polynomial ring A[X] need not be a PID. However, the property of being Noetherian is preserved during the passage from A to A[X] (Theorem 2.8).
A class of rings proves to be vital in the study of number fields:
|
An integral domain A is called a Dedekind domain, if it satisfies all of the following three conditions: |
After much ado we are finally in a position to define the basic objects of study in this section.
|
A number field K is defined to be a finite (and hence algebraic) extension of the field |
Note that there exist considerable controversies among mathematicians in accepting this definition of number fields. Some insist that any field K satisfying
should be called a number field. Some others restrict the definition by demanding that one must have K algebraic over
; however, fields K with infinite extension degree
are allowed. We restrict the definition further by imposing the condition that
has to be finite. Our restricted definition is seemingly the most widely accepted one. In this book, we study only the number fields of Definition 2.100 and accepting this definition would at the minimum save us from writing huge expressions like “(algebraic) number fields of finite extension degree over
” to denote number fields.
For number fields, the notion of integral closure leads to the following definition.
|
A number field K contains |
By Example 2.27(2), the ring of integers of the number field
is
, that is,
. It is, therefore, customary to call the elements of
rational integers. Since
is naturally embedded in
for any number field K, it is important to notice the distinction between the integers of K (that is, the elements of
) and the rational integers of K (that is, the images of the canonical inclusion
).
Some simple properties of number rings are listed below.
|
For a number field K, we have:
Proof (1) follows immediately from Example 2.27(2), (2) follows from Exercise 2.60, and (3) follows from Exercise 2.126(b). |
Let K be a number field of degree d. By Corollary 2.13, K is a simple extension of
, that is, there exists an element
with a minimal polynomial f(X) over
such that deg
and
. The field K is a
-vector space of dimension d with basis 1, α, . . . , αd–1. There exists a nonzero integer a such that
is an algebraic integer and we continue to have
. Thus, without loss of generality, we may take α to be an algebraic integer. In this case, the
-basis 1, α, . . . , αd–1 of K consists only of algebraic integers.
Conversely, let
be an irreducible polynomial of degree d ≥ 1. The field
is a number field of degree d and the elements of K can be represented by polynomials with rational coefficients and of degrees < d. Arithmetic in K is carried out as the polynomial arithmetic of
followed by reduction modulo the defining irreducible polynomial f(X). This gives us an algebraic representation of K independent of any element of K. Now, K can also be viewed as a subfield of
and the elements of K can be represented as complex numbers.[16] A representation
with a field isomorphism
is called a complex embedding of K in
.[17] Such a representation is not unique as Proposition 2.45 demonstrates.
[16] A complex number
has a representation by a pair (a, b) of real numbers. Here,
plays the role of X + 〈X2 + 1〉 in
. Finally, every real number has a decimal (or binary or hexadecimal or . . .) representation.
[17] The field
is canonically embedded in K. It is evident that the embedding σ : K → K′ fixes
element-wise.
|
A number field K of degree d ≥ 1 has exactly d distinct complex embeddings. Proof As above we take |
This proposition says that the conjugates α1, . . . , αd are algebraically indistinguishable. For example, X2 + 1 has two roots ±i, where
. But it makes little sense to talk about the positive and the negative square roots of –1? They are algebraically indistinguishable and if one calls one of these i, the other one becomes –i.[18] However, if a representation of
is given, we can distinguish between
and
by associating these quantities with the elements
and
respectively, where
is the positive real square root of 5 and where
is the imaginary unit available from the given representation of
.
[18] In a number theory seminar in 1996, Hendrik W. Lenstra, Jr. commented:
Suppose the Martians defined the complex numbers by adjoining a root of –1 they called j. And when the Earth and Martians start talking, they have to translate i to be either j or –j. So we take i to j, because I think that’s what the scientists will decide. ··· But it was later discovered that most Martians are left handed, so the philosophers decide it’s better to send i to –j instead.
It is also quite customary to start with
for some algebraic
and seek for the complex embeddings of K in
. One then considers the minimal polynomial f(X) of α (over
) and proceeds as in the proof of Proposition 2.45 but now defining the map
as the unique field isomorphism that fixes
and takes α ↦ αi. If we take α = α1, then σ1 is the identity map, whereas σ2, . . . , σd are non-identity field isomorphisms.
The moral of this story is that whether one wants to view the number field K as
or as
for any
is one’s personal choice. In any case, one will be dealing with the same mathematical object and as long as representation issues are not brought into the scene, all these definitions of a number field are absolutely equivalent.
The embeddings
need not be all distinct as sets. For example, the two embeddings
and
of
are identical as sets. But the maps x ↦ i and x ↦ –i are distinct (where x := X + 〈X2 + 1〉). Thus while specifying a complex embedding of a number field K, it is necessary to mention not only the subfield K′ of
isomorphic to K, but also the explicit field isomorphism K → K′.
|
Let K be a number field of degree d defined by an irreducible polynomial |
|
|
The simplest examples of number fields are the quadratic number fields, that is, number fields of degree 2. Some special properties of quadratic number fields are covered in the exercises. It follows from Exercise 2.136 that every quadratic number field is of the form
for some non-zero square-free integer D ≠ 1.
Now we investigate the
-module structure of
for a number field K of degree d. Let σ1, . . . , σd be the complex embeddings of K.
|
For an element Equation 2.15
and the norm of α (over
|
If g(X) is the minimal polynomial of α over
and r := deg g, then r|d. Moreover,
. So Tr(α) and N(α) belong to
. If α is an algebraic integer, then
, that is, Tr(α),
.
The following properties of the norm and trace functions can be readily verified. Here α,
and
.
| Tr(α + β) | = | Tr(α) + Tr(β), |
| N(αβ) | = | N(α)N(β), |
| Tr(cα) | = | c Tr(α), |
| N(cα) | = | cdN(α), |
| Tr(c) | = | cd, |
| N(c) | = | cd. |
|
Let |
|
Δ(β1, . . . , βd) = (det(σj(βi)))2. Proof Consider the matrices D := (Tr(βiβj)) and E := (σj(βi)). By definition, we have Δ(β1, . . . , βd) = det D. We show that D = EEt, which implies that det D = (det E)2. The ij-th entry of EEt is
where the last equality follows from Equation (2.15). |
Let
for some
and let f(X) be the minimal polynomial of α over
. We define the discriminant of f as
Δ(f) := Δ(1, α, α2, ..., αd–1).
We have to show that the quantity Δ(f) is well-defined, that is, independent of the choice of the root α of f(X). Let α = α1, α2, . . . .αd be all the roots of f(X) and let the complex embedding σj of K map α to αj. By Proposition 2.46, we have Δ(f) = (det E)2, where
. Computing the determinant of E gives
, which implies that Δ(f) is independent of the permutations of the conjugates α1, . . . , αd of α. Notice that since α1, . . . , αd are all distinct, Δ(f) ≠ 0.
Let us deduce a useful formula for Δ(f). Write
and take formal derivative to get
, that is,
. Therefore,
, that is,
Equation 2.16

For arbitrary
, the discriminant Δ(β1, . . . , βd) discriminates between the cases that β1, . . . , βd form a
-basis of K and that they do not.
|
Let Proof Let E1 := (σj(βi)) and E2 := (σj(γi)). Now
is the ij-th entry of the matrix T E1, that is, E2 = T E1. Hence Δ(γ1, . . . , γd) = (det E2)2 = (det T)2(det E1)2 = (det T)2Δ(β1, . . . , βd). |
|
Let |
|
Proof Let |
Finally comes the desired characterization of
.
|
For a number field K of degree d, the ring Proof Let Claim:
Claim: Assume not, that is, there exists
by Lemma 2.12, we have Δ(γ1, . . . , γd) = (det T)2Δ(β1, . . . , βd) = r2Δ(β1, . . . , βd). Since r ≠ 0, Δ(γ1, . . . , γd) ≠ 0, that is, (γ1, . . . , γd) is again a |
|
Every integral basis of K has the same discriminant (for a given K). Proof Let |
|
Let |
Recall that K, as a vector space over
, always possesses a
-basis of the form 1, α, . . . , αd–1.
, as a
-module, is free of rank d, but every number field K need not possess an integral basis of the form 1, α, . . . , αd–1. Whenever it does,
is called monogenic and an integral basis 1, α, . . . , αd–1 of K is called a power integral basis. Clearly, if K has a power integral basis 1, α, . . . , αd–1, then
. But the converse is not true, that is, for
with
, 1, α, . . . , αd–1 need not be an integral basis of K, even when
is monogenic.
|
Consider the quadratic number field Case 1: D ≡ 2, 3 (mod 4) Here
Case 2: D ≡ 1 (mod 4) In this case,
|
Ideals in a number ring possess very rich structures. We prove that number rings are Dedekind domains (Definition 2.99). A Dedekind domain (henceforth abbreviated as DD) need not be a UFD (or a PID). However, it is a ring in which ideals admit unique factorizations into products of prime ideals.
Let K be a number field of degree
and
its ring of integers. If
is a homomorphism of rings and if
is a prime ideal of B, then the contraction
is a prime ideal of A. We say that
lies above or over
. If A ⊆ B and
is the inclusion homomorphism, then
. For a number field K, we consider the natural inclusion
.
|
Let Proof Let |
|
Proof Let |
|
The ring Proof We have proved that |
Now we derive the unique factorization theorem for ideals in a DD. It is going to be a long story. We refer the reader to Definition 2.92 to recall how the product of two ideals is defined.
|
Let A be a ring, Proof The proof is obvious for r = 1. So assume that r > 1. If |
We now generalize the concept of ideals.
|
Let A be an integral domain and K := Q(A). An A-submodule |
Every ideal of A is evidently a fractional ideal of A and hence is often called an integral ideal of A. Conversely, every fractional ideal of A contained in A is an integral ideal of A. The principal fractional ideal Ax is the A-submodule of K generated by
. If A is a Noetherian domain, we have the following equivalent characterization of fractional ideals.
|
Let A be a Noetherian integral domain, K := Q(A) and Proof [if] Let [only if] Let |
We define the product of two fractional ideals
,
of an integral domain A as we did for integral ideals:

It is easy to check that
is again a fractional ideal of A. Let
denote the set of non-zero fractional ideals of A. The product of fractional ideals defines a commutative and associative binary operation on
. The ideal A acts as a (multiplicative) identity in
. A fractional ideal
of A is called invertible, if
for some fractional ideal
of A. We deduce shortly that if A is a DD, then every non-zero fractional ideal of A is invertible and, therefore,
is a group under multiplication of fractional ideals.
|
Let A be a Noetherian domain and Proof Let S be the set of ideals of A for which the lemma does not hold. Assume that |
Note that the condition “each containing
” was necessary in Lemma 2.16 in order to rule out the trivial possibility that
for some
.
|
Let A be a DD, K := Q(A) and
Then we have:
Proof
|
|
Every non-zero ideal Proof If In order to prove the uniqueness of this product, let |
In the factorization of a non-zero ideal of a DD, we do not rule out the possibility of repeated occurrences of factors. Taking this into account shows that every non-zero ideal
in a DD A admits a unique factorization

with distinct non-zero prime ideals
and with exponents
. Here uniqueness is up to permutations of the indexes 1, . . . , r. This factorization can be extended to fractional ideals, but this time we have to allow non-positive exponents. First note that for integers e1, . . . , er and non-zero prime ideals
of A the product
is well-defined and is a fractional ideal of
. The converse is proved in the following corollary.
|
Every non-zero fractional ideal Proof By definition, there exists |
The fractional ideal
in Corollary 2.22 is denoted by
. We have
. One can easily verify that
defined as above is equal to the set

In fact, one can use the last equality as the definition for
.
To sum up, every non-zero fractional ideal of a DD A is invertible and the set
of all non-zero fractional ideals of A is a group. The unit ideal A acts as the identity in
.
As in every group, we have the cancellation law(s) in
.
|
Let A be a DD and |
In view of unique factorization of ideals in A, we can speak of the divisibility of integral ideals in A. Let
and
be two integral ideals of A. We say that
divides
and write
, if
for some integral ideal
of A. We now show that the condition
is equivalent to the condition
. Thus for ideals in a DD the term divides is synonymous with contains.
|
Let Proof [if] If Also [only if] If |
|
Let Proof [if] We have [only if] Let |
As we pass from
to
, the notion of unique factorization passes from the element level to the ideal level. If a DD is already a PID, these two concepts are equivalent. (Non-zero prime ideals in a PID are generated by prime elements.) Though every UFD need not be a PID, we have the following result for a DD.
|
A Dedekind domain A is a UFD, if and only if A is a PID. Proof [if] Every PID is a UFD (Theorem 2.11). [only if] Let A be a UFD. In order to show that A is a PID, it suffices (in view of Theorem 2.57) to show that every non-zero prime ideal |
In the rest of this section, we abbreviate
as
, if K is implicit in the context.
We have seen that the ring
is a free
-module of rank d. The same result holds for every non-zero ideal
of
. Let β1, . . . , βd constitute an integral basis of K.
One can choose rational integers aij with each aii positive such that
Equation 2.17

constitute a
-basis of
. Moreover, the discriminant Δ(γ1, . . . , γd) is independent of the choice of an integral basis γ1, . . . , γd of
and is called the discriminant of
, denoted
. It follows that
can be generated as an ideal (that is, as an
-module) by at most d elements. We omit the proof of the following tighter result.
|
Every (integral) ideal in a DD A is generated by (at most) two elements. More precisely, for a proper non-zero ideal |
|
The norm |
Using the integers aij of Equations (2.17), we can write
Equation 2.18

|
For every non-zero ideal |
It is tempting to define the norm of an element
to be the norm of the principal ideal
. It turns out that this new definition is (almost) the same as the old definition of N(α). More precisely:
|
For any element Proof The result is obvious for α = 0. So assume that α ≠ 0 and call
It follows that |
|
For any |
Like the norm of elements, the norm of ideals is also multiplicative. We omit the (not-so-difficult) proof here.
The following immediate corollary often comes handy.
|
Let |
The behaviour of rational primes in number rings is an interesting topic of study in algebraic number theory. Let K be a number field of degree d and
. Consider a rational prime p and denote by 〈p〉 the ideal
generated by p in
. We use the symbol
to denote the (prime) ideal of
generated by p. Further let
Equation 2.19

be the prime factorization of 〈p〉 with
, with pairwise distinct non-zero prime ideals
of
and with
. For each i, we have
, that is,
, that is,
(Lemma 2.13), that is,
lies over
. Conversely if
is an ideal of
lying over
, then
, that is,
, that is,
, that is,
for some i. Thus,
are precisely all the prime ideals of
that lie over
.
By Corollary 2.27, N(〈p〉) = pd. By Corollary 2.28, each
divides pd and is again a power pdi of p.
|
We define the ramification index of |
By the multiplicative property of norms, we have

|
If r = d, so that each ei = di = 1, we say that the prime p (or |
The following important result is due to Dedekind. Its proof is long and complicated and is omitted here.
|
A rational prime p ramifies in |
Though this is not the case in general, let us assume that the ring
is monogenic (that is,
for some
) and try to compute the explicit factorization (Equality (2.19)) of 〈p〉 in
. In this case,
and let
be the minimal polynomial of α. We then have
.
Let us agree to write the canonical image of any polynomial
in
as
. We write the factorization of
as

with
and with pairwise distinct irreducible polynomials
. If
, then
. For each i = 1, . . . , r choose
whose reduction modulo p is
. Define the ideals

of
. Since
, we have

and

Therefore,
are non-zero prime ideals of
with
. Thus
. On the other hand,
, since f(α) = 0 and
. Thus we must have
, that is, we have obtained the desired factorization of 〈p〉.
Let us now concentrate on an example of this explicit factorization.
|
Let D ≠ 0, 1 be a square-free integer congruent to 2 or 3 modulo 4. If Case 1: In this case, p|D, that is, Case 2: Since p is assumed to be an odd prime, the two square roots of D modulo p are distinct. Let δ be an integer with δ2 ≡ D (mod p). Then Case 3: The polynomial Thus the quadratic residuosity of D modulo p dictates the behaviour of p in Let us finally look at the fate of the even prime 2 in Recall from Example 2.31 that ΔK = 4D. Thus we have a confirmation of the fact that a rational prime p ramifies in |
One can similarly study the behaviour of rational primes in
,
where D ≡ 1 (mod 4) is a square-free integer ≠ 0, 1.
There are just two units in
, namely ±1. In a general number ring, there may be many more units. For example, all the units in the ring
of Gaussian integers are ±1, ±i. There may even be an infinite number of units in a number ring. It can be shown that
,
, are all the units of
. (Note that for all n ≠ 0 the absolute values of
are different from 1.)
is a PID. So we can think of factorizations in
as element-wise factorizations. To start with, we fix a set of pairwise non-associate prime elements of
. Every non-zero element of
admits a factorization
for prime “representatives” pi and for a unit u of the form
. Thus, in order to complete the picture of factorization, we need machinery to handle the units in a number ring.
Let K be a number field of degree d and signature (r1, r2). We have d = r1 + 2r2. The set of units in
is denoted by
. We know that
is an (Abelian) group under (complex) multiplication. Our basic aim now is to reveal the structure of the group
.
Every Abelian group is a
-module and, if finitely generated and not free, contains torsion elements, that is, (non-identity) elements of finite order > 1.[19]
always contains the element –1 of order 2. The torsion subgroup of
is denoted by
. We have
, where
is a torsion-free group. It turns out that ℜ is a finite group (and hence cyclic) and that
is finitely generated and hence free, that is,
for some
. From Dirichlet’s unit theorem (which we do not prove), it follows that ρ = r1 + r2 – 1. Thus,
has a
-basis consisting of ρ elements, say ξ1, . . . , ξρ, and every unit of
can be uniquely expressed as
, where ω is a root of unity and
. A set of generators of
is called a set of fundamental units.
[19] Every finitely generated torsion-free module over a PID is free.
|
Let D ≠ 0, 1 be a square-free integer, Now, suppose D > 0. K is a real field in this case, so that |
| 2.126 |
|
| 2.127 | Let A ⊆ B be an extension of integral domains, a finitely generated non-zero ideal of A and . If , show that γ is integral over A. [H]
|
| 2.128 |
|
| 2.129 | Let A be a ring and S a multiplicatively closed subset of A. Show that:
|
| 2.130 | Let A ⊆ B be a ring extension and C the integral closure of A in B. Show that for any multiplicative subset S of A (and hence of B and C) the integral closure of S–1A in S–1B is S–1C. In particular, if A is integrally closed in B, then so is S–1A in S–1B. |
| 2.131 | Recall that an integrally closed integral domain is called a normal domain (ND).
(Remark: The reader should note the following important implications:
That is, a Euclidean domain is a PID, a PID is a UFD and a UFD is a normal domain. Neither of the reverse implications is true. For example, the ring |
| 2.132 | A (non-zero) ring A with a unique maximal ideal m is called a local ring. In that case, the field A/m is called the residue field of A.
Let A be ring and |
| 2.133 | A ring A is called a discrete valuation ring (DVR) or a discrete valuation domain (DVD), if A is a local principal ideal domain. Let A be a DVR with maximal ideal m = 〈p〉. Prove the following assertions:
(Remark: The prime p of A is called a uniformizing parameter or a uniformizer for A and is unique up to multiplication by units. The map |
| 2.134 |
|
| 2.135 |
|
| 2.136 |
(In particular, the ring of integers of |
| 2.137 | Let A be a Dedekind domain.
|
| 2.138 | Let A be a Dedekind domain and a non-zero (integral) ideal of A. Show that:
|
| 2.139 | Let and , ei, , be the prime decompositions of two non-zero ideals , of a DD A. Define the gcd and lcm of and as
Show that |
| 2.140 | Let K be a number field and .
|
| 2.141 | Let K be a number field, , , and . Show that:
|
| 2.142 | Let K be a number field. We say that K is norm-Euclidean, if for every α, , β ≠ 0, there exist q, such that α = qβ + r and | N(r)| < | N(β)|.
|
| 2.143 | In this exercise, one derives that the only (rational) integer solutions of Bachet’s equation
Equation 2.20
are x = 3, y = ±5.
|
Let us now study a different area of algebraic number theory, introduced by Kurt Hensel in an attempt to apply power series expansions in connection with numbers. While trying to explain the properties of (rational) integers mathematicians started embedding
in bigger and bigger structures, richer and richer in properties.
came in a natural attempt to form quotients, and for some time people believed that that is all about reality. Pythagoras was seemingly the first to locate and prove the irrationality of a number, namely,
. It took humankind centuries for completing the picture of the real line. One possibility is to look
as the completion of
. A sequence an,
, of rational numbers is called a Cauchy sequence if for every real ε > 0, there exists
such that |am – an| ≤ ε for all m,
, m, n ≥ N. Every Cauchy sequence should converge to a limit and it is
(and not
) where this happens. Seeing convergence of Cauchy sequences, people were not wholeheartedly happy, because the real polynomial X2 + 1 did not have—it continues not to have—roots in
. So the next question that arose was that of algebraic closure.
was invented and turned out to be a nice field which is both algebraically closed and complete.
Throughout the above business, we were led by the conventional notion of distance between points (that is, between numbers)—the so-called Archimedean distance or the absolute value. For every rational prime p, there exists a p-adic distance which leads to a ring
strictly bigger than and containing
. This is the ring of p-adic integers. The quotient field of
is the field
of p-adic numbers.
is complete in the sense of convergence of Cauchy sequences (under the p-adic distance), but is not algebraically closed. We know anyway that a (unique) algebraic closure
of
exists. We have
, that is, it was necessary and sufficient to add the imaginary quantity i to
to get an algebraically closed field. Unfortunately in the case of the p-adic distance the closure
is of infinite extension degree over
. In addition,
is not complete. An attempt to make
complete gives an even bigger field Ωp and the story stops here, Ωp being both algebraically closed and complete. But Ωp is already a pretty huge field and very little is known about it.
In the rest of this section, we, without specific mention, denote by p an arbitrary rational prime.
There are various ways in which p-adic integers can be defined. A simple way is to use infinite sequences.
|
A p-adic integer is defined as an infinite sequence
|
See Exercise 2.144 for another way of defining p-adic integers. We now show that
is a ring. Before doing that, we mention that the ring
is canonically embedded in
by the injective map
, a ↦ (a).
|
Let (an) and (bn) be two p-adic integers. Define:
One can easily check that these operations are well-defined, that is, independent of the choice of the representatives of an and bn. It also follows easily that these operations make |
It turns out that
is an integral domain. In order to see why, let us focus our attention on the units of
. Let us plan to denote
(the multiplicative group of units of
) by Up. The next result characterizes elements of Up.
|
For
Proof [(a)⇒(b)] Let (an)(bn) = (anbn) = 1 = (1) for some [(b)⇒(c)] Obvious. [(c)⇒(a)] Let us construct a p-coherent sequence bn, We also have an+1bn+1 ≡ 1 (mod pn), that is, anbn+1 ≡ 1 (mod pn), that is, |
|
Every Proof If p a1, take r := 0 and y := x. So assume that p|a1. Choose |
|
Proof Let x1 and x2 be non-zero elements of |
|
The quotient field |
|
Every non-zero Proof One can write x = a/b for some a, |
The canonical inclusion
naturally extends to the canonical inclusion
. We can identify
with the rational a/b and say that
is contained in
. Being a field of characteristic 0,
contains an isomorphic copy of
. The map
gives this isomorphism explicitly. Note that the ring
is strictly bigger than
and the field
is strictly bigger than the field
(Exercise 2.147).
Proposition 2.55 leads to the notion of p-adic distance between pairs of points in
. Let us start with some formal definitions.
|
A metric on a set S is a map
A set S together with a metric d is called a metric space (with metric d). |
|
A norm on a field K is a map
It is an easy check that for a norm ‖ ‖ on K the function A norm ‖ ‖ on a field K is called non-Archimedean (or a finite valuation), if ‖x + y‖ ≤ max(‖x‖, ‖y‖) for all x, |
|
|
The p-adic norm
|
|
The p-adic norm | |p is a non-Archimedean norm on Proof Non-negative-ness, non-degeneracy and multiplicativity of | |p are immediate. For proving the triangle inequality, it is sufficient to prove the non-Archimedean condition. Take x, |
|
Two metrics d1 and d2 on a metric space S are called equivalent if a sequence (xn) from S is Cauchy with respect to d1 if and only if it is Cauchy with respect to d2. Two norms on a field are called equivalent if they induce equivalent metrics. |
For every
, the field
is canonically embedded in
and thus we have a notion of a p-adic distance on
. We also have the usual Archimedean distance | |∞ on
. We now state an interesting result without a proof, which asserts that any distance on
must be essentially the same as either the usual Archimedean distance or one of the p-adic distances.
The notions of sequences and series and their convergences can be readily extended to
under the norm | |p. Since the p-adic distance assumes only the discrete values pr,
, it is often customary to restrict ourselves only to these values while talking about the convergence criteria of sequences and series, that is, instead of an infinitesimally small real ε > 0 one can talk about an arbitrarily large
with p–M ≤ ε.
|
Let x1, x2, . . . be a sequence of elements of Consider the partial sums A sequence x1, x2, . . . of elements of |
|
A field K is called complete under a norm ‖ ‖ if every sequence of elements of K, which is Cauchy under ‖ ‖, converges to an element in K. |
For example,
is complete under | |∞. We shortly demonstrate that
is complete under | |p.
Consider a field K not (necessarily) complete under a norm ‖ ‖. Let C denote the set of all Cauchy sequences
from K. Define addition and multiplication in C as (an) + (bn) := (an + bn) and (an)(bn) := (anbn). Under these operations C becomes a commutative ring with identity having a maximal ideal
. The field
is called the completion of K with respect to the norm ‖ ‖. K is canonically embedded in L via the map
. The norm ‖ ‖ on K extends to elements
of L as limn→∞ ‖an‖. L is a complete field under this extended norm. In fact, it is the smallest field containing K and complete under ‖ ‖.
is the completion of
with respect to the Archimedean norm | |∞. On the other hand,
turns out to be the completion of
with respect to the p-adic norm | |p. Before proving this let us first prove that
itself is a complete field under the p-adic norm. Let us start with a lemma.
|
A sequence (an) of p-adic numbers is a Cauchy sequence if and only if the sequence (an+1 – an) converges to 0. Proof [if] Take any Thus (an) is a Cauchy sequence. [only if] Take any |
|
The field Proof Let (an) be a Cauchy sequence in
It then follows that |an|p ≤ p–m for all Let an = an,0+an,1p+an,2p2+· · · be the p-adic expansion of an (Exercise 2.145). Since (an) is Cauchy, for every |
|
Proof Let C denote the ring of Cauchy sequences from If What remains is to show that the map |
|
The p-adic series Proof The only if part is obvious. For the if part, take a sequence (an) of p-adic numbers with |an|p → 0. Define |
This is quite unlike the Archimedean norm | |∞. For example, with respect to this norm
, whereas the series
diverges.
Let us conclude our short study of p-adic methods by proving an important theorem due to Hensel. This theorem talks about the solvability of polynomial equations f(X) = 0 for
. Before proceeding further, let us introduce a notation. Recall that every
has a unique p-adic expansion of the form a = a0 + a1p + a2p2 + · · · with 0 ≤ an < p (Exercises 2.144 and 2.145). If a0 = a1 = · · · = an–1 = 0, then a = anpn + an+1pn+1 + an +2pn+2 + · · · = pnb, where
. Thus pn|a in
. We denote this by saying that a ≡ 0 (mod pn). Notice that a ≡ 0 (mod pn) if and only if |a|p ≤ p–n. We write a ≡ b (mod pn) for a,
, if a – b ≡ 0 (mod pn). Since pn can be viewed as the element
of
, this congruence notation conforms to that for a general PID. (
is a PID by Exercise 2.148.)
Since by our assumption any ring A comes with identity (that we denote by 1 = 1A), it makes sense to talk for every
about an element n = nA in A, which is the n-fold sum of 1. More precisely:

Given any
, one can define the formal derivative of f as
. Properties of formal derivatives of polynomials are covered in Exercise 2.61.
|
Let
Then there exists a unique Proof Let us inductively construct a sequence α0, α1, α2, · · · of p-adic integers with the properties that |f(αn)|p ≤ p–(2M+n+1) and |f′(αn)|p = p–M for every
We want to find a suitable kn for which |f(αn)|p ≤ p–(2M+n+1). Taylor expansion gives f(αn) = f(αn–1) + knpM+nf′(αn–1) + cnp2(M+n) for some
Since pM+1 f′(αn–1), the element
This value of kn yields f (αn) = p2M + n(bnp + cnpn) ≡ 0 (mod p2M+n+1) for some Since |αn – αn–1|p ≤ p–(M+n), it follows that αn – αn–1 → 0, that is, (αn) is a Cauchy sequence (under | |p). By the completeness of For proving the uniqueness of α, let |
Note that αn in the last proof satisfies the congruence
f(αn) ≡ 0 (mod p2M+n+1)
for each
. We are given the solution α0 corresponding to n = 0. From this, we inductively construct the solutions α1, α2, . . . corresponding to n = 1, 2, . . . respectively. The process for computing αn from αn–1 as described in the proof of Hensel’s lemma is referred to as Hensel lifting. The given conditions ensure that this lifting is possible (and uniquely doable) for every
, and in the limit n → ∞ we get a root
of f. Since each kn is required modulo p, we can take
. So α admits a p-adic expansion of the form α = α0 + k1pM+1 + k2pM+2 + k3pM+3 + · · ·.
The special case M = 0 for Hensel’s lemma is now singled out:
|
Let
Then there exists a unique |
For this special case, we compute solutions αn of f(x) ≡ 0 (mod pn+1) inductively for n = 1, 2, 3, . . . , given a suitable solution α0 of this congruence for n = 0. The lifting formula is now:
Equation 2.21

|
For example, let p be an odd prime and As a specific numerical example, take p = 7, a = 2 and α0 = 3. Using Formula (2.21), we compute k1 = 1, α1 = 10, k2 = 2, α2 = 108, k3 = 6, α3 = 2166, and so on. Thus a square root of 2 in |
| 2.144 |
|
| 2.145 | In view of Exercise 2.144, every admits a unique expansion of the form x = x0 + x1p + x2p2 + · · · , where each . This notion of p-adic expansion can be extended to the elements of .
|
| 2.146 | Let p be an odd prime and with . From elementary number theory we know that the congruence x2 ≡ a (mod pn) has two solutions for every . Let x1 be a solution of x2 ≡ a (mod p). We know that a solution xn of x2 ≡ a (mod pn) lifts uniquely to a solution xn+1 of x2 ≡ a (mod pn+1). Thus we can inductively compute a sequence x1, x2, x3, · · · of integers. Show that (xn) is a p-adic integer and that (xn)2 = (a).
|
| 2.147 |
|
| 2.148 | Prove the following assertions:
|
| 2.149 | Compute the p-adic expansion of 1/3 in and of –2/5 in .
|
| 2.150 | Show that is dense in under the p-adic norm | |p, that is, show that given any and real ε > 0, there exists with |x – a|p < ε. Show also that is dense in .
|
| 2.151 | Prove the following assertions that establish that is the closure of in under | |p.
|
| 2.152 | Show that:
|
| 2.153 | Prove that for any non-zero . [H]
|
| 2.154 | Prove that for any the sequence (apn) converges in . [H]
|
| 2.155 | Let p, , p ≠ q. Show that the fields and are not isomorphic.
|
| 2.156 | Let a be an integer congruent to 1 modulo 8. Show that there exists an such that α2 = a and .
|
| 2.157 | Compute with α2 + α + 223 = 0 and α ≡ 4 (mod 243).
|
| 2.158 | Let p be an odd prime and . Show that the polynomial X2 – a has exactly root in .
|
| 2.159 | Show that the polynomial X2 – p is irreducible in .
|
| 2.160 | Teichmüller representative Let |
| 2.161 | Show that the algebraic closure of is of infinite extension degree over . [H]
|
Many attacks on cryptosystems involve statistical analysis of ciphertexts and also of data collected from the victim’s machine during one or more private-key operations. For a proper understanding of these analysis techniques, one requires some knowledge of statistics and random variables. In this section, we provide a quick overview of some statistical gadgets. We make the assumption that the reader is already familiar with the elementary notion of probability. We denote the probability of an event E by Pr(E).
An experiment whose outcome is random is referred to as a random experiment. The set of all possible outcomes of a random experiment is called the sample space of the experiment. For example, the outcomes of tossing a coin can be mapped to the set {H, T} with H and T standing respectively for head and tail. It is convenient to assign numerical values to the outcomes of a random experiment. Identifying head with 0 and tail with 1, one can view coin tossing as a random experiment with sample space {0, 1}. Some other random experiments include throwing a die (with sample space {1, 2, 3, 4, 5, 6}), the life of an electric bulb (with sample space
, the set of all non-negative real numbers), and so on. Unless otherwise specified, we henceforth assume that sample spaces are subsets of
.
A random variable is a variable which can assume (all and only) the values from a (given) sample space.
A discrete random variable can assume only countably many values, that is, the sample space SX of a discrete random variable X either is finite or has a bijection with
, that is, we can enumerate the elements of SX as x1, x2, x3, . . ..
The probability distribution function or the probability mass function
fX : SX → [0, 1]
of a discrete random variable X assigns to each x in the sample space SX of X the probability of the occurrence of the value x in a random experiment.[21] We have
[21] [a, b] is the closed interval consisting of all real numbers u satisfying a ≤ u ≤ b. Similarly, the open interval (a, b) is the set of all real values u satisfying a < u < b. In order to make a distinction between the open interval (a, b) and the ordered pair (a, b), many—mostly Europeans—use the notation ]a, b[ for denoting open intervals.

A continuous random variable assumes uncountable number of values, that is, the sample space SX of a continuous random variable X cannot be in bijective correspondence with a subset of
. Typically SX is an interval [a, b] or (a, b) with –∞ ≤ a < b ≤ +∞.
One does not assign individual probabilities Pr(X = x) to a value assumed by a continuous random variable X.[22] The probabilistic behaviour of X is in this case described by the probability density function
[22] More correctly, Pr(X = x) = 0 for each
.

with the implication that the probability that X occurs in the interval [c, d] (or (c, d)) is given by the integral

that is, by the area between the x-axis, the curve fX(x) and the vertical lines x = c and x = d. We have

It is sometimes useful to set fX(x) :=0 for
, so that fX is defined on the entire real line
.
The cumulative probability distribution
of a random variable X (discrete or continuous) is the function FX (x) := Pr(X ≤ x) for all
. If X is continuous, we have

which implies that

Let X and Y be discrete random variables. The joint probability distribution of X, Y refers to a random variable Z with SZ = SX × SY. For z = (x, y), the probability of Z = z is denoted by fZ(z) = Pr(Z = z) = Pr(X = x, Y = y). The probability Pr(X = x, Y = y) stands for the probability that X = x and Y = y. The random variables X and Y are called independent, if
Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y)
for all x, y.
|
Suppose that we have an urn containing three identical balls with labels 1, 2, 3. We draw two balls randomly from the urn. Let us denote the outcome of the first drawing by X and that of the second drawing by Y. We consider the joint distribution X, Y of the two outcomes in the two following cases:
|
For continuous random variables X and Y, the joint distribution is defined by the probability density function fX,Y (x, y) and the cumulative distribution is obtained by the double integral

X and Y are independent, if fX,Y (x, y) = fX(x)fY (y) for all x, y. In this case, we also have FX,Y (c, d) = FX(c)FY (d) for all c, d.
Now, we define arithmetic operations on random variables. First, let X and Y be discrete random variables. The sum X + Y is defined to be a random variable U which assumes the values u = x + y for
and
with probability

The product XY of X and Y is defined to be a random variable V which assumes the values v = xy for
and
with probability

For
, the random variable W = αX assumes the values w = αx for
with probability
fW(w) = Pr(W = αx) = Pr(X = x) = fX(x).
|
Let us consider the random variables X and Y of Example 2.36. For the sake of brevity, we denote Pr(X = x, Y = y) by Pxy. The distributions of U = X + Y in the two cases are as follows:
|
Now, let us consider continuous random variables X and Y. In this case, it is easier to define first the cumulative density functions of U = X + Y, V = XY and W = αX and then the probability density functions by taking derivatives:

One can easily generalize sums and products to an arbitrary finite number of random variables. More generally, if X1, . . . , Xn are random variables and
, one can talk about the probability distribution or density function of the random variable g(X1, . . . , Xn). (See Exercise 2.163.)
Now, we introduce the important concept of conditional probability. Let X and Y be two random variables. To start with, suppose that they are discrete. We denote by f(x, y) = Pr(X = x, Y = y) the joint probability distribution function of X, Y. For
with Pr(Y = y) > 0, we define the conditional probability of X = x given Y = y as:

For a fixed
, the probabilities fX|y(x),
, constitute the probability distribution function of the random variable X|y (X given Y = y). If X and Y are independent, f(x, y) = fX(x)fY (y) and so fX|y(x) = fX(x) for all
, that is, the random variables X and X|y have the same probability distribution. This is expected, because in this case the probability of X = x does not depend on whatever value y the variable Y takes.
If X and Y are continuous random variables with joint density f(x, y) and
, the conditional probability density function of X|y (X given Y = y) is defined by

Again if X and Y are independent, we have fX|y(x) = fX(x) for all x, y.
For a fixed
, one can likewise define the conditional probabilities fY|x (y) := f(x, y)/fX (x) for all
.
Let X and Y be discrete random variables with joint distribution f(x, y). Also let Γ ⊆ SX and Δ ⊆ SY. One defines the probability fX(Γ) as:

The joint probability f(Γ, Δ), is defined as:

If Γ = {x} is a singleton, we prefer to write f(x, Δ) instead of f({x}, Δ). Similarly, f(Γ, y) stands for f (Γ,{y}). We also define the conditional distributions:

We abbreviate fX|Δ (Γ) as Pr(Γ|Δ) and fY|Γ (Δ) as Pr(Δ|Γ).
|
Let X, Y be discrete random variables and Δ ⊆ SY with fY (Δ) > 0. Also let Γ1,..., Γn form a partition of SX with fX (Γi) > 0 for all i = 1, . . . , n. Then we have:
that is, in terms of probability:
Proof Pr(Γi, Δ) = Pr(Δ|Γi) Pr(Γi) = Pr(Γi|Δ) Pr(Δ). So it is sufficient to show that Pr(Δ) equals the sum in the denominator. The event Δ is the union of the pairwise disjoint events (Γj, Δ), j = 1,..., n, and so |
The Bayes rule relates the a priori probabilities Pr(Γj) and Pr(Δ|Γj) to the a posteriori probabilities Pr(Γi|Δ). The following example demonstrates this terminology.
|
Consider the random experiment of Example 2.36(2). Take Γj := {j} for
The a posteriori probability Pr(Γ1|Δ) that the first ball was obtained in the first draw given that the ball obtained in the second draw is the second or the third one is calculated using the Bayes rule as:
One can similarly calculate |
Let X be a random variable. The expectation E(X) of X is defined as follows:

E(X) is also called the (arithmetic) mean or average of X. One uses the alternative symbols μX and
to denote E(X). More generally, let X1, . . . , Xn be n random variables with joint probability distribution/density function f(x1, . . . , xn). Also let
. We define the following expectations:
X is discrete:

X is continuous:
Let g(X) and h(Y) be real polynomial functions of the random variables X and Y and let
. Then
| E(g(X) + h(Y)) | = | E(g(X)) + E(h(Y)), |
| E(g(X)h(Y)) | = | E(g(X)) E(h(Y)) if X and Y are independent, |
| E(αg(X)) | = | αE(g(X)). |
Let us derive the sum and product formulas for discrete variables X and Y.

If X and Y are independent, then

The variance Var(X) of a random variable X is defined as
Var (X) := E[(X – E(X))2].
From the observation that E[(X – E(X))2] = E[X2 – 2 E(X)X + [E(X)]2] = E(X2) – 2 E(X) E(X) + [E(X)]2, we derive the computational formula:
Var (X) = E[X2] – [E(X)]2.
Var(X) is a measure of how the values of X are dispersed about the mean E(X) and is always a non-negative quantity. The (non-negative) square root of Var(X) is called the standard deviation σX of X:

The following formulas can be easily verified:
| Var(X + α) | = | Var(X). |
| Var(αX) | = | α2 Var(X). |
| Var(X + Y) | = | Var(X) + Var(Y) + 2 Cov(X, Y), |
where
and where the covariance Cov(X, Y) of X and Y is defined as:
Cov(X, Y) := E[(X – E(X))(Y – E(Y))] = E(XY) – E(X) E(Y).
Normalized covariance is a measure of correlation between the two random variables X and Y. More precisely, the correlation coefficient ρX,Y is defined as:

If X and Y are independent, E(XY) = E(X) E(Y) so that Cov(X, Y) = 0 and so ρX,Y = 0. The converse of this is, however, not true, that is, ρX,Y = 0 does not necessarily imply that X and Y are independent. ρX,Y is a real value in the interval [–1, 1] and is a measure of linear relationship between X and Y. If larger (resp. smaller) values of X are (in general) associated with larger (resp. smaller) values of Y, then ρX,Y is positive. On the other hand, if larger (resp. smaller) values of X are (in general) associated with smaller (resp. larger) values of Y, then ρX,Y is negative.
|
Once again consider the drawing of two balls from an urn containing three balls labelled {1, 2, 3} (Examples 2.36, 2.37 and 2.38). Look at the second case (drawing without replacement). We use the shorthand notation Pxy for Pr(X = x, Y = y). The individual probability distributions of X and Y can be obtained from the joint distribution as follows:
Thus E(X) = 1 × (1/3) + 2 × (1/3) + 3 × (1/3) = 2. Similarly, E(Y) = 2. Therefore, E(X + Y) = E(X) + E(Y) = 4. This can also be verified by direct calculations: E(X + Y) = 3 × (1/3) + 4 × (1/3) + 5 × (1/3) = 4. E(X2) = E(Y2) = 12 × (1/3) + 22 × (1/3) + 32 × (1/3) = 14/3 and Var(X) = Var(Y) = (14/3) – 22 = 2/3. The probability distribution for XY is
so that E(XY) = 2 × (1/3) + 3 × (1/3) + 6 × (1/3) = 11/3. Therefore, Cov(XY) = E(XY) – E(X) E(Y) = (11/3) – 2 × 2 = –1/3, that is,
The negative correlation between X and Y is expected. If X = 1 (small), Y takes bigger values (2, 3). On the other hand, if X = 3 (large), Y assumes smaller values (1, 2). Of course, the correlation is not perfect, since for X = 2 the values of Y can be smaller (1) or larger (3). So, we should feel happy to see a not-so-negative correlation of –1/2 between X and Y. |
Some probability distributions that occur frequently in statistical theory and in practice are described now. Some other useful probability distributions are considered in the Exercises 2.169, 2.170 and 2.171.
A discrete uniform random variable U has sample space SU := {x1, . . . , xn} and probability distribution

A continuous uniform random variable U has sample space SU and probability density function

where A > 0 is the size[23] of SU. For example, if SU is the real interval [a, b] for a < b, we have
[23] If
, “size” means length. If
or
, “size” refers to area or volume respectively. We assume that the size of SU is “measurable”.

In this case, we have
| E(U) = (a + b)/2 | and | Var(U) = (b – a)2/12. |
Uniform random variables often occur naturally. For example, if we throw an unbiased die, the six possible outcomes (1 through 6) are equally likely, that is, each possible outcome has the probability 1/6. Similarly, if a real number is chosen randomly in the interval [0, 1], we have a continuous uniform random variable. The built-in C library call rand() (pretends to) return an integer between 0 and 231 – 1, each with equal probability (namely, 2–31).
The Bernoulli random variable B = B(n, p) is a discrete random variable characterized by two parameters
and
, where p stands for the probability of a certain event E and n represents the number of (independent) trials. It is assumed that the probability of E remains constant (namely, p) in each of the n trials. The sample space SB = {0, 1, . . . , n} comprises the (exact) numbers of occurrences of E in the n trials. B has the probability distribution

as follows from simple combinatorial arguments. The mean and variance of B are:
| E(B) = np | and | Var(B) = np(1 – p). |
The Bernoulli distribution is also called the binomial distribution.
The normal random variable or the Gaussian random variable N = N (μ, σ2) is a continuous random variable characterized by two real parameters μ and σ with σ > 0. The density function of N is

The cumulative distribution for N can be expressed in terms of the error function erf():

The error function does not have a known closed-form expression. Figure 2.3 shows the curves for fN (x) and FN (x) for the parameter values μ = 0 and σ = 1 (in this case, N is called the standard normal variable).
Some statistical properties of N are:
| E(N) = μ | and | Var(N) = σ2. |
The curve fN (x) is symmetric about x = μ. Most of the area under the curve is concentrated in the region μ – 3σ ≤ x ≤ μ + 3σ. More precisely:
| Pr(μ – σ ≤ X ≤ μ + σ) | ≈ | 0.68, |
| Pr(μ – 2σ ≤ X ≤ μ + 2σ) | ≈ | 0.95, |
| Pr(μ – 3σ ≤ X ≤ μ + 3σ) | ≈ | 0.997. |
Many distributions occurring in practice (and in nature) approximately follow normal distributions. For example, the height of (adult) people in a given community is roughly normally distributed. Of course, the height of a person cannot be negative, whereas a normal random variable may assume negative values. But, in practice, the probability that such an approximating normal variable assumes a negative value is typically negligibly low.
In practice, we often do not know a priori the probability distribution or density function of a random variable X. In some cases, we do not have the complete data, whereas in some other cases we need an infinite amount of data to obtain the actual probability distribution of a random variable. For example, let X represent the life of an electric bulb manufactured by a given company in the last ten years. Even though there are only finitely many such bulbs and even if we assume that it is possible to trace the working of every such bulb, we have to wait until all these bulbs burn out, before we know the actual distribution of X. That is certainly impractical. On the contrary, if we have data on the life-times of some sample bulbs, we can approximate the properties of X by those of the samples.
Suppose that S := (x1, x2, . . . , xn) is a sample of size n. We assume that all xi are real numbers. We define the following quantities for S:
Here
is the mean of the collection
.
If T := (y1, y2, . . . , ym) is another sample (of real numbers), the (linear) relationship between S and T is measured by the following quantities:

Here
is the mean of the collection ST := (xiyj | i = 1, . . . , n, j = 1, . . . , m).
An important property of the normal distribution is the following:
|
Let X be any random variable with mean μ and variance σ2 and let |
| 2.162 | An urn contains n1 red balls and n2 black balls. We draw k balls sequentially and randomly from the urn, where 1 ≤ k ≤ n1 + n2.
| ||||||||||||||||||
| 2.163 | Let X and Y be the random variables of Example 2.36. For each of the two cases, calculate the probability distribution functions, expectations and variances of the following random variables:
| ||||||||||||||||||
| 2.164 | Let X and Y be continuous random variables, g(X) and h(Y) non-constant real polynomials and α, β, . Prove that:
| ||||||||||||||||||
| 2.165 | Let X be a random variable and Y := αX + β for some α, . What is ρX,Y ?
| ||||||||||||||||||
| 2.166 |
| ||||||||||||||||||
| 2.167 | Let X and Y be continuous random variables whose joint distribution is the uniform distribution in the triangle 0 ≤ X ≤ Y ≤ 1.
| ||||||||||||||||||
| 2.168 | Let X, Y, Z be random variables. Show that:
| ||||||||||||||||||
| 2.169 | Geometric distribution Assume that in each trial of an experiment, an event E has a constant probability p of occurrence. Let G = G(p) denote the random variable with
| ||||||||||||||||||
| 2.170 | Poisson distribution Let P = P (λ) be the discrete random variable with | ||||||||||||||||||
| 2.171 | Exponential distribution
Show that the exponential variable X of Part (a) is memoryless. | ||||||||||||||||||
| 2.172 | The birthday paradox Let S be a finite set of cardinality n.
|
This chapter provides the foundations of public-key cryptology. The long compilation of mathematical concepts presented in the chapter would be indispensable for understanding the topics that follow in the next chapters.
This chapter begins with the basic concepts of sets, functions and relations. We also present the fundamental axioms of mathematics. Although the curricula of plus-two courses of many examination boards do include these topics, we planned to have a discussion on them in order to make our treatment self-sufficient.
Next comes a study of groups which are sets with binary operations satisfying some nice properties (associativity, identity, inverse and optionally commutativity). Groups are extremely important for cryptology. In particular, all discrete-log-based cryptosystems use suitable groups. Subgroups, cosets and formation of quotient groups constitute a prototypical feature that illustrates the basic paradigm of modern algebra. Secure cryptographic algorithms on groups rely on the availability of elements of large orders: for example, generators of big cyclic groups. We study these topics at length. Finally, we present Sylow’s theorem. For us, this theorem has only theoretical significance; it is used for proving some other theorems.
A set with a single operation (like a group) is often too restrictive. Many mathematical structures we are familiar with (like integers, polynomials) are endowed with two basic operations addition and multiplication. A set with two such (compatible) operations is called a ring. A study of rings, fields, ideals and quotient rings is essential in algebra (and so in cryptography too). Three important types of rings, namely unique factorization domains, principal ideal domains and Euclidean domains, are also discussed. Euclidean division is an important property of integers and polynomials, and is useful from a computational perspective.
Then, as a specific example, we study the properties of
, the ring of integers. We concentrate mostly on elementary properties of integers like divisibility, congruence, Chinese remainder theorem, Fermat’s and Euler’s theorems, quadratic residues and the law of quadratic reciprocity. We finally discuss some assorted topics from analytic number theory. In cryptography, we require many big randomly generated primes. The prime number theorem guarantees that there is essentially an abundant source of primes. Smooth integers (that is, integers having only small prime divisors) are useful for modern algorithms that compute factorization and discrete logarithms. We present an estimate on the density of smooth integers. The last topic we study is the Riemann hypothesis and its generalizations. This yet unproven hypothesis has a bearing on the running times of many number-theoretic algorithms relevant to cryptology.
The next example is the ring of polynomials over a ring. Polynomials over a field admit Euclidean division and consequently unique factorization. Irreducible polynomials are useful for constructing field extensions. Extension fields of characteristic 2 are quite frequently used in cryptographic systems.
We subsequently study the theory of vector spaces. Linear transformations are appropriate maps between vector spaces and necessitate the theory of matrices. Matrix algebra is widely useful in cryptology as it is in any other branch of algorithmic computer science. Algorithms to solve linear systems over rings and fields constitute a basic computational tool. A study of modules and algebras at the end of this section is mostly theoretical and can be avoided if the reader is willing to accept some theorems without proofs.
In the next section, we discuss the theory of field extensions. As mentioned earlier, cryptography relies heavily on extension fields of characteristic 2. Some related topics include splitting fields and algebraic closure of fields. At the end of this section, we have a short theoretical treatment of Galois theory.
Many popular cryptosystems are based on the multiplicative groups of finite fields. We study these fields as the next topic. Polynomials over finite fields are extremely useful for the construction and representation of finite fields. At the end of this section, we discuss several ways in which (elements of) finite fields can be represented in a computer’s memory. This study expedites the design, analysis and efficient implementation of finite-field arithmetic.
Elliptic- and hyperelliptic-curve cryptography having gained popularity in recent years, one needs to study the theory of plane algebraic curves. This is what we do in the next three sections. To start with, we define affine and projective spaces and curves. Going from the affine space to the projective space is necessitated by a systematic (algebraic) inclusion of points at infinity on a plane curve. We also discuss the theory of divisors and the Jacobian on plane curves. For elliptic curves, the Jacobian can be replaced by the equivalent group described in terms of the chord and tangent rule. For hyperelliptic curves, on the other hand, we have little option other than understanding the Jacobian itself.
Two kinds of elliptic curves that must be avoided in cryptography are supersingular curves and anomalous curves. The elliptic curve group (over a finite field) is the basic set used in elliptic curve cryptosystems. The orders (cardinality) of these groups are given by Hasse’s theorem. The structure theorem establishes that an elliptic curve group (over a finite field) is not necessarily cyclic, but has a rank of at most two.
We then study Jacobians of hyperelliptic curves over finite fields. This study supplements the theory of divisors on general curves. Reduced and semi-reduced divisors are expedient for the representation of the elements in the Jacobian of a hyperelliptic curve.
Many popular cryptosystems (including RSA) derive their security (presumably) from the intractability of the integer factorization problem. The best algorithm known till date for factoring integers is the number-field sieve method. An understanding of this algorithm requires the knowledge of number fields and number rings. We devote a section to the study of these mathematical objects. We start with some necessary commutative algebra including localization, integral dependence and Noetherian rings. Next, we deal with Dedekind domains. All number rings are Dedekind domains in which ideals admit unique factorization. We also discuss the factorization of ideals in number rings generated by rational primes and the structure of units in number rings (Dirichlet’s unit theorem).
The next section is a gentle introduction to the theory of p-adic numbers. These numbers are useful, for example, for designing attacks against elliptic curve cryptosystems.
In the last section, we summarize some statistical tools. Under the assumption that the reader is already familiar with the elementary notion of probability, we discuss properties of random variables and of some common probability distributions (including uniform and normal distributions). The birthday paradox described in an exercise is often useful in cryptographic context (for example, for collision attacks on hash functions).
That is the end of this chapter. The compilation may initially look long and boring, perhaps intimidating too. The unfortunate reality is that public-key cryptology is mathematical, and it is arguably better to treat it in the formal way. If the reader is not comfortable with mathematics (in general), cryptology is perhaps not her cup of tea. An elementary approach to cryptology is what many other books have adopted. This book aims at being different in that respect. It is up to the reader to decide to what level of details she is willing to study cryptography.
Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.
—Samuel Johnson
In this chapter, we have summarized the basic mathematical facts that cryptologists are expected to know in order to have a decent understanding of the present-day public-key technology. Our discussion has been often more intuitive than mathematically complete. A reader willing to gain further insight in these areas should look at materials written specifically to deal with the specialized topics. Here are our (biased) suggestions.
There are numerous textbooks on introductory algebra. The books by Herstein [125], Fraleigh [96], Dummit and Foote [81], Hungerford [133] and Adkins and Weintraub [1] are some of our favourites. The algebra of commutative rings with identity (rings by our definition) is called commutative algebra and is the basic for learning advanced areas of mathematics like algebraic geometry and algebraic number theory. A serious study of these disciplines demands more in-depth knowledge of commutative algebra than we have presented in Section 2.13.1. Atiyah and MacDonald’s book [14] is a de facto standard on commutative algebra. Hoffman and Kunze’s book [127] is a good reference for linear algebra and matrix algebra.
Elementary number theory deals with the theory of (natural) numbers without using sophisticated techniques from complex analysis and algebra. Zuckerman et al. [316] can be consulted for a lucid introduction to this subject. The books by Burton [42] and Mollin [207] are good alternatives.
Thorough mathematical treatise on finite fields can be found in the books by Lidl and Niederreiter [179, 180] of which the second also deals with computational issues. Other books of computational flavour include those by Menezes [191] and by Shparlinski [274]. Also see the paper [273] by Shparlinski.
The use of elliptic curves in cryptography has been proposed by Koblitz [150] and Miller [205], and that of hyperelliptic curves by Koblitz [151]. A fair mathematical understanding of elliptic curves banks on the knowledge of commutative algebra (see above) and algebraic geometry. Hartshorne’s book [124] is a detailed introduction to algebraic geometry. Fulton’s book [99] on algebraic curves is another good reference. Rigorous mathematical treatment on elliptic curves can be found in Silverman’s books [275, 276]. The book by Koblitz [152] is elementary, but has a somewhat different focus than needed in cryptology. By far, the best short-cut is the recent textbook from Washington [298]. Some other books by Koblitz [150, 153, 154], Blake et al. [24], Menezes [192] and Hankerson et al. [123] are written for non-experts in algebraic geometry (and hence lack mathematical details), but are good from computational viewpoint. The expository reports [46, 47] by Charlap et al. provide nice elementary introduction to elliptic curves. For hyperelliptic curves, on the other hand, no such books are available. Koblitz’s book [154] includes a chapter on hyperelliptic curves. In addition, an appendix in the same book, written by Menezes et al. much in the style of Charlap et al. [46, 47], provides an introductory and elementary coverage.
In an oversimplified sense, algebraic number theory deals with the study of number fields. The books by Janusz [140], Lang [160], Mollin [208] and Ribenboim [251] go well beyond what we cover in Section 2.13. Also see [89]. For a more modern and sophisticated treatment, look at Neukirch’s book [216]. A book dedicated to p-adic numbers is due to Koblitz [149]. Course notes from one of the authors of this book can also be useful in this regard. The notes are freely downloadable from:
http://www.facweb.iitkgp.ernet.in/~adas/IITK/course/MTH617/SS02/
Analytic number theory deals with the application of complex analytic techniques to solve problems in number theory. Although we do not explicitly need this branch of mathematics (apart from a few theorems that we mention without proofs), it is rather important for the study of numbers. Consult the books by Apostol [12] and by Ireland and Rosen [136] for this. Also see [249]. For complex analysis, we recommend the book by Ahlfors [6]
Feller’s celebrated book [92] is a classical reference on probability theory. Grinstead and Snell’s book [121] is available in the Internet.
| 3.1 | Introduction |
| 3.2 | Complexity Issues |
| 3.3 | Multiple-precision Integer Arithmetic |
| 3.4 | Elementary Number-theoretic Computations |
| 3.5 | Arithmetic in Finite Fields |
| 3.6 | Arithmetic on Elliptic Curves |
| 3.7 | Arithmetic on Hyperelliptic Curves |
| 3.8 | Random Numbers |
| Chapter Summary | |
| Sugestions for Further Reading |
From the start there has been a curious affinity between mathematics, mind and computing . . . It is perhaps no accident that Pascal and Leibniz in the seventeenth century, Babbage and George Boole in the nineteenth, and Alan Turing and John von Neumann in the twentieth – seminal figures in the history of computing – were all, among their other accomplishments, mathematicians, possessing a natural affinity for symbol, representation, abstraction and logic.
—Doron Swade [295]
. . . the laws of physics and of logic . . . the number system . . . the principle of algebraic substitution. These are ghosts. We just believe in them so thoroughly they seem real.
—Robert M. Pirsig [233]
The world is continuous, but the mind is discrete.
—David Mumford
Now that we have studied the properties of important mathematical objects that play vital roles in public-key cryptology, it is time to concentrate on the algorithmic and implementation issues for working with these objects. We need well-defined schemes (data structures) to represent these objects and well-defined procedures (algorithms) to manipulate them. While a theoretical analysis of the performance of our data structures and algorithms is of great concern, it still leaves us in the abstract domain. In the long run, one has to translate the abstract statements in the algorithms to machine codes that the computer understands, and this is where the implementation tidbits come into picture. It is our personal experience that a naive implementation of an algorithm may run hundred times slower than a carefully optimized implementation of the same algorithm. In certain specific applications (like those based on smart cards), where memory is a scarce resource, one should also pay attention to the storage requirements of the data structures and code segments. This chapter is an introduction to all these specialized topics.
Before we proceed further, certain comments are in order. In this book, we describe algorithms using a pseudocode that closely resembles the syntax of the programming language C. The biggest difference between C and our pseudocode is that we have given preference to mathematical notations in place of C syntax. For example, = means equality in our codes, whereas assignment is denoted by :=. Similarly, our while and for loops look more human-readable, for example, for i = 0, 1, . . . , m – 1 instead of C’s for (i=0; i<m; i++). In order to understand our pseudocode, a knowledge of C (or a similar programming language) is helpful, but not essential, on the part of the reader.
For certain implementations, we assume that the target machine carries out 32-bit 2’s-complement arithmetic. This is indeed true for most modern PCs and personal work stations. By the term word, we mean a 32-bit unit in the computer memory. We will also assume that the compiler provides facilities for storing and doing arithmetic with unsigned 64-bit integers. Though this is not an ANSI C feature, most popular compilers used today do support this built-in data type (Examples: unsigned __int64 for the Microsoft Visual C++ Compiler and unsigned long long for the GNU C Compiler). Though it is apparently desirable to be more generic and to avoid these specific assumptions on the part of the machine and the compiler, our exposition highlights the power of fine-tuning based on the knowledge of the underlying system.
Given an algorithm (or an implementation of the same), the time and space required for the execution of the algorithm on a machine depend very much on the machine’s architecture and on the compiler. But this does not mean that we cannot make some general theoretical estimates. The so-called asymptotic estimates that we are going to introduce now tend to approach the real situation as the input size tends to infinity. For finite input sizes (which is always the case in practice), these theoretical predictions turn out to provide valuable guidelines.
We start with the following important definitions.
|
Let f and g be positive real-valued functions of natural numbers.
|
|
The order notation is used to analyse algorithms in the following way. For an algorithm, the input size is defined as the total number of bits needed to represent the input of the algorithm. We find asymptotic estimates of the running time and the memory requirement of the algorithm in terms of its input size. Let f(n) denote the running time[2] of an algorithm A for an input of size
. If f(n) = Θ(na) (or, more generally, if f = O(na)) for some a > 0, A is called a polynomial-time algorithm. If a = 1 (resp. 2, 3, . . .), then A is specifically called a linear-time (resp. quadratic-time, cubic-time, . . .) algorithm. A Θ(1) algorithm is often called a constant-time algorithm. If f = Θ(bn) for some b > 1, A is called an exponential-time algorithm. Similarly, if f satisfies Equation (3.1) with 0 < α < 1, A is called a subexponential-time algorithm.
[2] The practical running time of an algorithm may vary widely depending on its implementation and also on the processor, the compiler and even on run-time conditions. Since we are talking about the order of growth of running times in relation to the input size, we neglect the constants of proportionality and so these variations are usually not a problem. If one plans to be more concrete, one may measure the running time by the number of bit operations needed by the algorithm.
One has similar classifications of an algorithm in terms of its space requirements, namely, polynomial-space, linear-space, exponential-space, and so on. We can afford to be lazy and drop -time from the adjectives introduced in the previous paragraph. Thus, an exponential algorithm is an exponential-time algorithm, not an exponential-space algorithm.
It is expedient to note here that the running time of an algorithm may depend on the particular instance of the input, even when the input size is kept fixed. For an example, see Exercise 3.3. We should, therefore, be prepared to distinguish, for a given algorithm and for a given input size n, between the best (that is, shortest) running time fb(n), the worst (that is, longest) running time fw(n), the average running time fa(n) on all possible inputs (of size n) and the expected running time fe(n) for a randomly chosen input (of size n). In typical situations, fw(n), fa(n) and fe(n) are of the same order, in which case we simply denote, by running time, one of these functions. If this is not the case, an unqualified use of the phrase running time would denote the worst running time fw(n).
The order notation, though apparently attractive and useful, has certain drawbacks. First it depicts the behaviour of functions (like running times) as the input size tends to infinity. In practice, one always has finite input sizes. One can check that if f(n) = n100 and g(n) = (1.01)n are the running times of two algorithms A and B respectively (for solving the same problem), then f(n) ≤ g(n) if and only if n = 1 or n ≥ 117,309. But then if the input size is only 1,000, one would prefer the exponential-time algorithm B over the polynomial-time algorithm A. Thus asymptotic estimates need not guarantee correct suggestions at practical ranges of interest. On the other hand, an algorithm which is a product of human intellect does not tend to have such extreme values for the parameters; that is, in a polynomial-time algorithm, the degree is usually ≤ 10 and the base for an exponential-time algorithm is usually not as close to 1 as 1.01 is. If we have f(n) = n5 and g(n) = 2n as the respective running times of the algorithms A and B, then A outperforms B (in terms of speed) for all n ≥ 23.
The second drawback of the order notation is that it suppresses the constant of proportionality; that is, an algorithm whose running time is 100n2 has the same order as one whose running time is n2. This is, however, a situation that we cannot neglect in practice. In particular, when we compare two different implementations of the same algorithm, the one with a smaller constant of proportionality is more desirable than the one with a larger constant. This is where implementation tricks prove to be important and even indispensable for large-scale applications.
A deterministic algorithm is one that always follows the same sequence of computations (and thereby produces the same output) for a given input. The deterministic running time of a computational problem P is the fastest of the running times (in order notation) of the known algorithms to solve P.
If an algorithm makes some random choices during execution, we call the algorithm randomized or probabilistic. The exact sequence of computations followed by the algorithm depends on these random choices and as a result different executions of the same algorithm may produce different outputs for a given input. At first glance, randomized algorithms look useless, because getting different outputs for a given input is apparently not what one would really want. But there are situations where this is desirable. For example, in an implementation of the RSA protocol, one generates random primes p and q of given bit lengths. Here we require our prime generation procedure to produce different primes during different executions (that is, for different entities on the net).
More importantly, randomized algorithms often provide practical computational solutions for many problems for which no practical deterministic algorithms are known. We will shortly encounter many such situations where randomized algorithms are simplest and/or fastest known algorithms. However, this sudden enhancement in performance by random choices does not come for free. To explain the so-called darker sides of randomization, we explain two different types of randomized algorithms.
A Monte Carlo algorithm is a randomized algorithm that may produce incorrect outputs. However, for such an algorithm to be useful, we require that the running time be always small and the probability of an error sufficiently low. A good example of a Monte Carlo algorithm is Miller–Rabin’s algorithm (Algorithm 3.13) for testing the primality of an integer. For an integer of bit size n, the Miller–Rabin test with t iterations runs in time O(tn3). Whenever the algorithm outputs false, it is always correct. But an answer of true is incorrect with an error probability ≤ 2–2t, that is, it certifies a composite integer as a prime with probability ≤ 2–2t. For t = 20, an error is expected to occur less than once in every 1012 executions. With this little sacrifice we achieve a running time of O(n3) (for a fixed t), whereas the best deterministic primality testing algorithm (known to the authors at the time of writing this book) takes time O(n7.5) and hence is not practical.
A Las Vegas algorithm is a randomized algorithm which always produces the correct output. However the running time of such an algorithm depends on the random choices made. For such an algorithm to be useful, we expect that for most random choices the running time is small. As an example, consider the problem of finding a random (monic) irreducible polynomial of degree n over
. Algorithm 3.22 tests the irreducibility of a polynomial in
in deterministic polynomial time. We generate random polynomials of degree n and check the irreducibility of these polynomials by Algorithm 3.22. From Section 2.9.2, we know that a randomly chosen monic polynomial of degree n over a finite field is irreducible with an approximate probability of 1/n. This implies that after O(n) random polynomials are tried, one expects to find an irreducible polynomial. The resulting Las Vegas algorithm (Algorithm 3.23) runs in expected polynomial time. It may, however, happen that for certain random choices we keep on generating reducible polynomials for an exponential number of times, but the likelihood of such an accident is very, very low (Exercise 3.5).
An algorithm is said to be a probabilistic or randomized polynomial-time algorithm, if it is either a Monte Carlo algorithm with polynomial worst running time or a Las Vegas algorithm with polynomial expected running time. Both the above examples of randomized algorithms are probabilistic polynomial-time algorithms. A combination of these two types of algorithms can also be conceived; namely, algorithms that produce correct outputs with high probability and have polynomial expected running time. Some computational problems are so challenging that even such probably correct and probably fast algorithms are quite welcome.
We finally note that there are certain computational problems for which the deterministic running time is exponential and for which randomization also does not help much. In some cases, we have subexponential randomized algorithms which are still too slow to be of reasonable practical use. Some of these so-called intractable problems are at the heart of the security of many public-key cryptographic protocols.
In the last two sections, we have introduced theoretical measures (the order notations) for estimating the (known) difficulty of solving computational problems. In this section, we introduce another concept by which we can compare the relative difficulty of two computational problems.
Let P1 and P2 be two computational problems. We say that P1 is polynomial-time reducible to P2 and denote this as
, if there is a polynomial-time algorithm which, given a solution of P2, provides a solution for P1. This means that if
, then the problem P1 is no more difficult than P2 apart from the extra polynomial-time reduction effort. In that case, if we know an algorithm to solve P2 in polynomial time, then we have a polynomial-time algorithm for P1 too. If
and
, we say that the problems P1 and P2 are polynomial-time equivalent and write P1 ≅ P2.
In order to give an example of these concepts, we let G be a finite cyclic multiplicative group of order n and g a generator of G. The discrete logarithm problem (DLP) is the problem of computing for a given
an integer x such that a = gx. The Diffie–Hellman problem (DHP), on the other hand, is the problem of computing gxy from the given values of gx and gy. If one can compute y from gy, one can also compute gxy = (gx)y by performing an exponentiation in the group G. Therefore,
, if exponentiations in G can be computed in polynomial time. In other words, if a solution for DLP is known, a solution for DHP is also available: that is, DHP is no more difficult than DLP except for the additional exponentiation effort. However, the reverse implication (that is, whether
) is not known for many groups.
So far we have assumed that our reduction algorithms are deterministic. If we allow randomized (that is, probabilistic) polynomial-time reduction algorithms, we can similarly introduce the concepts of randomized polynomial-time reducibility and of randomized polynomial-time equivalence. We urge the reader to formulate the formal definitions for these concepts.
| 3.1 |
| |||||||||||||||||||||||||
| 3.2 |
| |||||||||||||||||||||||||
| 3.3 | Suppose that an algorithm A takes as input a bit string and runs in time g(t), where t is the number of one-bits in the input string. Let fb(n), fw(n), fa(n) and fe(n) respectively denote the best, worst, average and expected running times of A for inputs of size n. Derive the following table under the assumption that each of the 2n bit strings of length n is equally likely.
| |||||||||||||||||||||||||
| 3.4 |
| |||||||||||||||||||||||||
| 3.5 | Consider the Las Vegas algorithm discussed in Section 3.2.2 for generating a random irreducible polynomial of degree n over . Assume that a randomly chosen polynomial in of degree n has (an exact) probability of 1/n for being irreducible. Find out the probability pr that r polynomials chosen randomly (with repetition) from are all reducible. For n = 1000, calculate the numerical values of pr for r = 10i, i = 1, . . . , 6, and find the smallest integers r for which pr ≤ 1/2 and pr ≤ 10–12. Find the expected number of polynomials tested for irreducibility, before the algorithm terminates.
| |||||||||||||||||||||||||
| 3.6 | Let n = pq be the product of two distinct primes p and q. Show that factoring n is polynomial-time equivalent to computing φ(n) = (p–1)(q–1), where φ is Euler’s totient function. (Assume that an arithmetic operation (including computation of integer square roots) on integers of bit size t can be performed in polynomial time (in t).) | |||||||||||||||||||||||||
| 3.7 | Let G be a finite cyclic multiplicative group and let H be the subgroup of G generated by whose order is known. The generalized discrete logarithm problem (GDLP) is the following: Given , find out if and, if so, find an integer x for which a = hx. Show that GDLP ≅ DLP, if exponentiations in G can be carried out in polynomial time and if DLP in H is polynomial-time equivalent to DLP in G. [H]
| |||||||||||||||||||||||||
Cryptographic protocols based on the rings
and
demand n and p to be sufficiently large (of bit length ≥ 512) in order to achieve the desired level of security. However, standard compilers do not support data types to hold with full precision the integers of this size. For example, C compilers support integers of size ≤ 64 bits. So one must employ custom-designed data types for representing and working with such big integers. Many libraries are already available that can handle integers of arbitrary length. FREELIP, GMP, LiDIA, NTL and ZEN are some such libraries that are even freely available.
Alternatively, one may design one’s own functions for multiple-precision integers. Such a programming exercise is not very difficult, but making the functions run efficiently is a huge challenge. Several tricks and optimization techniques can turn a naive implementation to a much faster and more memory-efficient code and it needs years of experimental experience to find out the subtleties. Theoretical asymptotic estimates might serve as a guideline, but only experimentation can settle the relative merits and demerits of the available algorithms for input sizes of practical interest. For example, the theoretically fastest algorithm known for multiplying two multiple-precision integers is based on the so-called fast Fourier transform (FFT) techniques. But our experience shows that this algorithm starts to outperform other common but asymptotically slower algorithms only when the input size is at least several thousand bits. Since such very large integers are rarely needed by cryptographic protocols, FFT-based multiplication is not useful in this context.
In order to represent a large integer, we break it up into small parts and store each part in a memory word[3] accessible by built-in data types. The simplest way to break up a (positive) integer a is to predetermine a radix ℜ and compute the ℜ-ary representation (as–1, . . . , a0)ℜ of a (see Exercise 3.8). One should have ℜ ≤ 232 so that each ℜ-ary digit ai can be stored in a memory word. For the sake of efficiency, it is advisable to take ℜ to be a power of 2. It is also expedient to take ℜ as large as possible, because smaller values of ℜ lead to (possibly) longer size s and thereby add to the storage requirement and also to the running time of arithmetic functions. The best choice is ℜ = 232. We denote by ulong a built-in unsigned integer data type provided by the compiler (like the ANSI C standard unsigned long). We use an array of ulong for storing the digits. The array can be static or dynamic. Though dynamic arrays are more storage-efficient (because they can be allocated only as much memory as needed), they have memory allocation and deallocation overheads and are somewhat more complicated to programme than static arrays. Moreover, for cryptographic protocols one typically needs integers no longer than 4096 bits. Since the product of two integers of bit size t has bit size ≤ 2t, a static array of 8192/32 = 256 ulong suffices for storing cryptographic integers. It is also necessary to keep track of the actual size of an integer, since filling up with leading 0 digits is not an efficient strategy. Finally, it is often useful to have a signed representation of integers. A sign bit is also necessary for this case. We state three possible declarations in Exercise 3.11.
[3] We assume that a word in the memory is 32 bits long.
We now describe the implementations of addition, subtraction, multiplication and Euclidean division of multiple-precision integers. Every other complex operation (like modular arithmetic, gcd) is based on these primitives. It is, therefore, of utmost importance to write efficient codes for these basic operations.
For integers of cryptographic sizes, the most efficient algorithms are the standard ones we use for doing arithmetic on decimal numbers, that is, for two positive integers a = as–1 . . . a0 and b = bt–1 . . . b0 we compute the sum c = a + b = cr–1 . . . c0 as follows. We first compute a0 + b0. If this sum is ≥ ℜ, then c0 = a0 + b0 – ℜ and the carry is 1, otherwise c0 = a0 + b0 and the carry is 0. We then compute a1 + b1 plus the carry available from the previous digit, and compute c1 and the next carry as before.
For computing the product d = ab = dl–1 . . . d0, we do the usual quadratic procedure; namely, we initialize all the digits of d to 0 and for each i = 0, . . . , s – 1 and j = 0, . . . , t – 1 we compute aibj and add it to the (i + j)-th digit of d. If this sum (call it σ) at the (i + j)-th location exceeds ℜ – 1, we find out q, r with σ = qℜ + r, r < ℜ. Then di+j is assigned r, and q is added to the (i + j + 1)-st location. If that addition results in a carry, we propagate the carry to higher locations until it gets fully absorbed in some word of d.
All these sound simple, but complications arise when we consider the fact that the sum of two 32-bit words (and a possible carry from the previous location) may be 33 bits long. For multiplication, the situation is even worse, because the product aibj can be 64 bits long. Since our machine word can hold only 32 bits, it becomes problematic to hold all these intermediate sums and products to full precision. We assume that the least significant 32 bits are correctly returned and assigned to the output variable (ulong), whereas the leading 32 bits are lost.[4] The most efficient way to keep track of these overflows is to use assembly instructions and this is what many number theory packages (like PARI and UBASIC) do. But this means that for every target architecture we have to write different assembly codes. Here we describe certain tricks that make it possible to grab the overflow information with only high-level languages, without sufficiently degrading the performance compared to assembly instructions.
[4] This is the typical behaviour of a CPU that supports 2’s complement arithmetic.
First consider the sum ai + bi. We compute the least significant 32 bits by assigning ci = ai + bi. It is easy to see that an overflow occurs during this sum if and only if ci < ai. We set the output carry accordingly. Now, let us consider the situation when we have an input carry: that is, when we compute the sum ci = ai + bi+1. Here an overflow occurs if and only if ci ≤ ai. Algorithm 3.1 performs this addition of words.
|
Input: Words ai and bi and the input carry Output: Word ci and the output carry Steps: ci := ai + bi. if (γi) { ci ++, δi := ( (ci ≤ ai) ? 1 : 0 ). } else { δi := ( (ci < ai) ? 1 : 0 ). } |
Algorithm 3.1 assumes that ci and ai are stored in different memory words. If this is not the case, we should store ai + bi in a temporary variable and, after the second line, ci should be assigned the value of this temporary variable. Note also that many processors provide an increment primitive which is faster than the general addition primitive. In that case, the statement ci++ is preferable to ci := ci+1.
For subtraction, we proceed analogously from right to left and keep track of the borrow. Here the check for overflow can be done before the subtraction of words is carried out (and, therefore, no temporary variable is needed, if we assume that the output carry is not stored in the location of the operands).
|
Input: Words ai and bi and the input borrow Output: Word ci and the output borrow Steps: if (γi) { δi := ( (ai ≤ bi) ? 1 : 0 ), ci := ai – bi, ci – –. } else { δi := ( (ai < bi) ? 1 : 0 ), ci := ai – bi. } |
We urge the reader to develop the complete addition and subtraction procedures for multiple-precision integers, based on the above primitives for words.
The product of two 32-bit words can be as long as 64 bits, and we plan to (compute and) store this product in two words. Assuming the availability of a built-in 64-bit unsigned integer data type (which we will henceforth denote as ullong), this can be performed as in Algorithm 3.3.
|
Input: Words a and b. Output: Words c and d with ab = cℜ + d. Steps: /* We use a temporary variable t of data type ullong */ t := (ullong)(a) * (ullong)(b), c := (ulong)(t ≫ 32), d := (ulong)t. |
We use a temporary 64-bit integer variable t to store the product ab. The lower 32 bits of t is stored in d by simply typecasting, whereas the higher 32 bits of t is obtained by right-shifting t (the operator ≫) by 32 bits. This is a reasonable strategy given that we do not explore assembly-level instructions. Algorithm 3.4 describes a multiplication algorithm for two multiple-precision integer operands, that does not directly use the word-multiplying primitive of Algorithm 3.3.
The reader can verify easily that this code properly computes the product. We now highlight how this makes the computation efficient. The intermediate results are stored in the array t of 64-bit ullong. This means that after the 64-bit product aibj of words ai and bj is computed (in the temporary variable T), we directly add T to the location ti+j. If the sum exceeds ℜ2 – 1 = 264 – 1, that is, if an overflow occurs, we should add ℜ to ti + j + 1 or equivalently 1 to ti+j+2. This last addition is one of ullong integers and can be made more efficient, if this is replaced by ulong increments, and this is what we do using the temporary array u. Since the quadratic loop is the bottleneck of the multiplication procedure, it is absolutely necessary to make this loop as efficient as possible.
|
Input: Integers a = (ar–1 . . . a0)ℜ and b = (bs–1 . . . b0)ℜ Output: The product c = (cr+s–1 . . . c0)ℜ = ab. Steps: /* Let T be a variable and t0, . . . , tr+s–1 an array of ullong variables */ /* Let v be a variable and u0, . . . , ur+s–1 an array of ulong variables */ Initialize the array locations ci, ti and ui to 0 for all i = 0, . . . , r + s – 1. /* The quadratic loop */ |
After the quadratic loop, we do deferred normalization from the array of 64-bit double-words ti to the array of 32-bit words ci. This is done using the typecasting and right-shift strategy mentioned in Algorithm 3.3. We should also take care of the intermediate carries stored in the array u. The normalization loop takes a total time of O(r + s), whereas the quadratic loop takes time O(rs). If we had done normalization inside the quadratic loop itself, that would incur an additional O(rs) cost (which is significantly more than that of deferred normalization).
If both the operands a and b of multiplication are same, it is not necessary to compute aibj and ajbi separately. We should add to ti+j the product
, if i = j, or the product 2aiaj, if i < j. Note that 2aiaj can be computed by left shifting aiaj by one bit. This might result in an overflow which can be checked before shifting by looking at the 64th bit of aiaj. Algorithm 3.5 incorporates these changes.
For the multiplication of two multiple-precision integers, there are algorithms that are asymptotically faster than the quadratic Algorithms 3.4 and 3.5. However, not all these theoretically faster algorithms are practical for sizes of integers used in cryptology. Our practical experience shows that a strategy due to Karatsuba outperforms the quadratic algorithm, if both the operands are of roughly equal sizes and if the bit lengths of the operands are 300 or more. We describe Karatsuba’s algorithm in connection with squaring, where the two operands are same (and hence of the same size). Suppose we want to compute a2 for a multiple-precision integer a = (ar–1 . . . a0)ℜ. We first break a into two integers of almost equal sizes, namely, α := (ar–1 . . . at)ℜ and β := (at–1 . . . a0)ℜ, so that a = ℜtα + β. Now, a2 = α2ℜ2t + 2αβℜt + β2 and 2αβ = (α2 + β2) – (α – β)2. We recursively invoke Karatsuba’s multiplication with operands α, β and α – β. Recursion continues as long as the operands are not too small and the depth of recursion is within a prescribed limit. One can check that Karatsuba’s algorithm runs in time O(rlg 3 lg r) = O(r1.585 lg r) which is a definite improvement over the O(r2) running time taken by the quadratic algorithm.
|
for (i = 0, . . . , r – 1) and (j = i, . . . , r – 1) { |
The best-known algorithm for multiplication of two multiple-precision integers is based on the fast Fourier transform (FFT) techniques and has running time O~(r). However, for integers used in cryptology this algorithm is usually not practical. Therefore, we will not discuss FFT multiplication in this book.
Euclidean division with remainder of multiple-precision integers is somewhat cumbersome, although conceptually as difficult (that is, as simple) as the division procedure of decimal integers, taught in early days of school. The most challenging part in the procedure is guessing the next digit in the quotient. For decimal integers, we usually do this by looking at the first few (decimal) digits of the divisor and the dividend. This need not give us the correct digit, but something close to the same. In the case of ℜ-ary digits, we also make a guess of the quotient digit based on a few leading ℜ-ary digits of the divisor and the dividend, but certain precautions have to be taken to ensure that the guess is not too different from the correct one.
Suppose we are given positive integers a = (ar–1 . . . a0)ℜ and b = (bs–1 . . . b0)ℜ with ar–1 ≠ 0 and bs–1 ≠ 0, and we want to compute the integers x = (xr–s . . . x0)ℜ and y = (ys–1 . . . y0)ℜ with a = xb + y, 0 ≤ y < b. First, we want that bs–1 ≥ ℜ/2 (you’ll see why, later). If this condition is already not met, we force it by multiplying both a and b by 2t for some suitable t, 0 < t < 32. In that case, the quotient remains the same, but the remainder gets multiplied by 2t. The desired remainder can be later found out easily by right-shifting the computed remainder by t bits. The process of making bs–1 ≥ ℜ/2 is often called normalization (of b). Henceforth, we will assume that b is normalized. Note that normalization may increase the word-size of a by 1.
|
Input: Integers a = (ar–1 . . . a0)ℜ and b = (bs–1 . . . b0)ℜ with r ≥ 3, s ≥ 2, ar–1 ≠ 0, bs–1 ≥ ℜ/2 and a ≥ b. Output: The quotient x = (xr–s . . . x0)ℜ = a quot b and the remainder y = (ys–1 . . . y0)ℜ = a rem b of Euclidean division of a by b. Steps: Initialize the quotient digits xi to 0 for i = 0, . . . , r – s. |
Algorithm 3.6 implements multiple-precision division. It is not difficult to prove the correctness of the algorithm. We refrain from doing so, but make some useful comments. The initial check inside the main loop may cause the increment of xi–s+1. This may lead to a carry which has to be adjusted to higher digits. This carry propagation is not mentioned in the code for simplicity. Since b is assumed to be normalized, this initial check needs to be carried out only once; that is, for a non-normalized b we have to replace the if statement by a while loop. This is the first advantage of normalization. In the first step of guessing the quotient digit xi–s, we compute ⌊(aiℜ + ai–1)/bs–1⌋ using ullong arithmetic. At this point, the guess is based only on two leading digits of a and one leading digit of b. In the while loop, we refine this guess by considering one more digit of a and b each. Since b is normalized, this while loop is executed no more than twice (the second advantage of normalization). The guess for xi–s made in this way is either equal to or one more than the correct value which is then computed by comparing a with xi–sbℜi–s. The running time of the algorithm is O(s(r – s)). For a fixed r, this is maximum (namely O(r2)) when s ≈ r/2.
Multiplication and division by a power of 2 can be carried out more efficiently using bit operations (on words) instead of calling the general procedures just described. It is also often necessary to compute the bit length of a non-zero multiple-precision integer and the multiplicity of 2 in it. In these cases also, one should use bit operations for efficiency. For these implementations, it is advantageous to maintain precomputed tables of the constants 2i, i = 0, . . . , 31, and of 2i – 1, i = 0, . . . , 32, rather than computing them in situ every time they are needed. In Algorithm 3.7, we describe an implementation of multiplication by a power of 2 (that is, the left shift operation). We use the symbols OR, ≫ and ≪ to denote bit-wise or, right shift and left shift operations on 32-bit integers.
|
Input: Integer a = (ar–1 . . . a0)ℜ ≠ 0, ar–1 ≠ 0, and Output: The integer c = (cs–1 . . . c0)ℜ = a · 2t, cs–1 ≠ 0. Steps: u := t quot 32, v := t rem 32. |
Unless otherwise mentioned, we will henceforth forget about the above structural representation of multiple-precision integers and denote arithmetic operations on them by the standard symbols (+, –, * or · or ×, quot, rem and so on).
Computing the greatest common divisor of two (multiple-precision) integers has important applications. In this section, we assume that we want to compute the (positive) gcd of two positive integers a and b. The Euclidean gcd loop comprising repeated division (Proposition 2.15) is not usually the most efficient way to compute integer gcds. We describe the binary gcd algorithm that turns out to be faster for practical bit sizes of the operands a and b. If a = 2ra′ and b = 2sb′ with a′ and b′ odd, then gcd(a, b) = 2min(r,s) gcd(a′, b′). Therefore, we may assume that a and b are odd. In that case, if a > b, then gcd(a, b) = gcd(a – b, b) = gcd((a – b)/2t, b), where t := v2(a – b) is the multiplicity of 2 in a – b. Since the sum of the bit sizes of (a – b)/2t and b is strictly smaller than that of a and b, repeating the above computation eventually terminates the algorithm after finitely many iterations.
|
Input: Two positive integers a, b with a ≥ b and b odd. Output: Integers d, u and v with d = gcd(a, b) = ua + vb > 0. If (a, b) ≠ (1, 1), then |u| < b and |v| < a. Steps: /* Initial reduction */ |
Multiple-precision division is much costlier than subtraction followed by division by a power of 2. This is why the binary gcd algorithm outperforms the Euclidean gcd algorithm. However, if the bit sizes of a and b differ reasonably, it is preferable to use Euclidean division once and replace the pair (a, b) by (b, a rem b), before entering the binary gcd loop. Even when the original bit sizes of a and b are not much different, one may carry out this initial reduction, because in this case Euclidean division does not take much time.
Recall from Proposition 2.16 that if d := gcd(a, b), then for some integers u and v we have d = ua + vb. Computation of d along with a pair of integers u, v is called the extended gcd computation. Both the Euclidean and the binary gcd loops can be augmented to compute these integers u and v. Since binary gcd is faster than Euclidean gcd, we describe an implementation of the extended binary gcd algorithm. We assume that 0 < b ≤ a and compute u and v in such a way that if (a, b) ≠ (1, 1), then |u| < b and |v| < a. Algorithm 3.8, which shows the details, requires b to be odd. The other operand a may also be odd, though the working of the algorithm does not require this.
In order to prove the correctness of Algorithm 3.8, we introduce the sequence of integers xk, yk, u1,k, u2,k, v1,k and v2,k for k = 0, 1, 2, . . . , initialized as:
| x0 := b, | u1, 0 := 1, | v1, 0 := 0, |
| y0 := r, | u2, 0 := 0, | v2, 0 := 1. |
During the k-th iteration of the main loop, k = 1, 2, . . . , we modify the values xk–1, yk–1, u1,k–1, u2,k–1, v1,k–1 and v2,k–1 to xk, yk, u1,k, u2,k, v1,k and v2,k in such a way that we always maintain the relations:
| u1,kx0 + v1,ky0 | = | xk, |
| u2,kx0 + v2,ky0 | = | yk. |
The main loop terminates when xk = 0, and at that point we have the desired relation yk = gcd(b, r) = u2,kb + v2,kr. For the updating during the k-th iteration, we assume that xk–1 ≥ yk–1. (The converse inequality can be handled analogously.) The x and y values are updated as xk := (xk–1 – yk–1)/2tk, yk := yk–1, where tk := v2(xk–1 – yk–1). Thus, we have u2,k = u2,k–1 and v2,k = v2,k–1, whereas if tk > 0, we write

All the expressions within square brackets in the last equation are integers, since x0 = b is odd. Note that updating the variables in the loop requires only the values of these variables available from the previous iteration. Therefore, we may drop the prefix k and call these variables x, y, u1, u2, v1 and v2. Moreover, the variables u1 and u2 need not be maintained and updated in every iteration, since the updating procedure for the other variables does not depend on the values of u1 and u2. We need the value of u2 only at the end of the main loop, and this is available from the relation y = u2b + v2r maintained throughout the loop. The formula u2b + v2r = y = gcd(b, r) is then combined with the relations a = qb + r and gcd(a, b) = gcd(b, r) to get the final relation gcd(a, b) = v2a + (u2 – v2q)b.
Algorithm 3.8 continues to work even when a < b, but in that case the initial reduction simply interchanges a and b and we forfeit the possibility of the reduction in size of the arguments (x and y) caused by the initial Euclidean division.
Finally, we remove the restriction that b is odd. We write a = 2ra′ and b = 2sb′ with a′, b′ odd and call Algorithm 3.8 with a′ and b′ as parameters (swapping a′ and b′, if a′ < b′) to compute integers d′, u′, v′ with d′ = gcd(a′, b′) = u′a′ + v′b′. Without loss of generality, assume that r ≥ s. Then d := gcd(a, b) = 2sd′ = u′(2sa′) + v′b. If r = s, then 2sa′ = a and we are done. So assume that r > s. If u′ is even, we can extract a power of 2 from u′ and multiply 2sa′ by this power. So let’s say that we have a situation of the form
for some integers
and
, with
odd, and for s ≤ t < r. We can rewrite this as
. Since
is even, this gives us
, where τ > t and where
is odd or τ = r. Proceeding in this way, we eventually reach a relation of the form d = u(2ra′) + vb = ua + vb. It is easy to check that if (a′, b′) ≠ (1, 1), then the integers u and v obtained as above satisfy |u| < b and |v| < a.
So far, we have described how we can represent and work with the elements of
. In cryptology, we are seemingly more interested in the arithmetic of the rings
for multiple-precision integers n. We canonically represent the elements of
by integers between 0 and n – 1.
Let a,
. In order to compute a + b in
, we compute the integer sum a + b, and, if a + b ≥ n, we subtract n from a + b. This gives us the desired canonical representative in
. Similarly, for computing a – b in
, we subtract b from a as integers, and, if the difference is negative, we add n to it. For computing
, we multiply a and b as integers and then take the remainder of Euclidean division of this product by n.
Note that
is invertible (that is,
) if and only if gcd(a, n) = 1. For
, a ≠ 0, we call the extended (binary) gcd algorithm with a and n as the arguments and get integers d, u, v satisfying d = gcd(a, n) = ua+vn. If d > 1, a is not invertible modulo n. Otherwise, we have ua ≡ 1 (mod n), that is, a–1 ≡ u (mod n). The extended gcd algorithm indeed returns a value of u satisfying |u| < n. Thus if u > 0, it is the canonical representative of a–1, whereas if u < 0, then u + n is the canonical representative of a–1.
Another frequently needed operation in
is modular exponentiation, that is, the computation of ae for some
and
. Since a0 = 1 for all
and since ae = (a–1)–e for e < 0 and
, we may assume, without loss of generality, that
. Computing the integral power ae followed by taking the remainder of Euclidean division by n is not an efficient way to compute ae in
. Instead, after every multiplication, we reduce the product modulo n. This keeps the size of the intermediate products small. Furthermore, it is also a bad idea to compute ae as (· · ·((a·a)·a)· · ·a) which involves e – 1 multiplications. It is possible to compute ae using O(lg e) multiplications and O(lg e) squarings in
, as Algorithm 3.9 suggests. This algorithm requires the bits of the binary expansion of the exponent e, which are easily obtained by bit operations on the words of e.
The for loop iteratively computes bi := a(er–1 ... ei)2 (mod n) starting from the initial value br := 1. Since (er–1 . . . ei)2 = 2(er–1 . . . ei+1)2 + ei, we have
(mod n). This establishes the correctness of the algorithm. The squaring (b2) and multiplication (ba) inside the for loop of the algorithm are computed in
(that is, as integer multiplication followed by reduction modulo n). If we assume that er–1 = 1, then r = ⌈lg e⌉. The algorithm carries out r squares and ρ ≤ r multiplications in
, where ρ is the number of bits of e, that are 1. On an average ρ = r/2. Algorithm 3.9 runs in time O((log e)(log n)2). Typically, e = O(n), so this running time is O((log n)3).
|
Input: Output: Steps: Let the binary expansion of e be e = (er–1 . . . e1e0)2, where each |
Now, we describe a simple variant of this square-and-multiply algorithm, in which we choose a small t and use the 2t-ary representation of the exponent e. The case t = 1 corresponds to Algorithm 3.9. In practical situations, t = 4 is a good choice. As in Algorithm 3.9, multiplication and squaring are done in
.
|
Input: Output: Steps: Let e = (er–1 . . . e1e0)2t, where each |
In Algorithm 3.10, the powers al, l = 0, 1, . . . , 2t – 1, are precomputed using the formulas: a0 = 1, a1 = a and al = al–1 · a for l ≥ 2. The number of squares inside the for loop remains (almost) the same as in Algorithm 3.9. However, the number of multiplications in this loop reduces at the expense of the precomputation step. For example, let n be an integer of bit length 1024 and let e ≈ n. A randomly chosen e of this size has about 512 one-bits. Therefore, the for loop of Algorithm 3.9 does about 512 multiplications, whereas with t = 4 Algorithm 3.10 does only 1024/4 = 256 multiplications with the precomputation step requiring 14 multiplications. Thus, the total number of multiplications reduces from (about) 512 to 14 + 256 = 270.
During a modular exponentiation in
, every reduction (computation of remainder) is done by the fixed modulus n. Montgomery exponentiation exploits this fact and speeds up each modular reduction at the cost of some preprocessing overhead.
Assume that the storage of n requires s ℜ-ary digits, that is, n = (ns–1 . . . n0)ℜ (with ns–1 ≠ 0). Take R := ℜs = 232s, so that R > n. As is typical in most cryptographic situations, n is an odd integer (for example, a big prime or a product of two big primes). Then gcd(ℜ, n) = gcd(R, n) = 1. Use the extended gcd algorithm to precompute n′ := –n–1 (mod ℜ).
We associate
with
, where
(mod n). Since R is invertible modulo n, this association gives a bijection of
onto itself. This bijection respects the addition in
: that is,
in
. Multiplication in
, on the other hand, corresponds to
, and can be implemented as Algorithm 3.11 suggests.
|
Input: Output: Montgomery representation Steps:
|
Montgomery multiplication works as follows. In the first step, it computes the integer product
. The subsequent for loop computes
(mod n). Since n′ ≡ –n–1 (mod ℜ), the i-th iteration of the loop makes wi = 0 (and leaves wi–1, . . . ,w0 unchanged). So when the for loop terminates, we have w0 = w1 = · · · = ws–1 = 0: that is,
is a multiple of ℜs = R. Therefore,
is an integer. Furthermore, this
is obtained by adding to
a multiple of n: that is,
for some integer k ≥ 0. Since R is coprime to n, it follows that
(mod n). But this
may be bigger than the canonical representative of
. Since k is an integer with s ℜ-ary digits (so that k < R) and
and n < R, it follows that
. Therefore, if
exceeds n – 1, a single subtraction suffices.
Computation of
requires ≤ s2 single-precision multiplications. One can use the optimized Algorithm 3.4 for that purpose. In case of squaring,
and further optimizations (say, in the form of Karatsuba’s method) can be employed.
Each iteration of the for loop carries out s + 1 single-precision multiplications. (The reduction modulo ℜ is just returning the more significant word in the two-word product win′.) Since, the for loop is executed s times, Algorithm 3.11 performs a total of ≤ s2 + s(s+1) = 2s2 + s single-precision multiplications.
Integer multiplication (Algorithm 3.4) followed by classical modular reduction (Algorithm 3.6) does almost an equal number of single-precision multiplications, but also O(s) divisions of double-precision integers by single-precision ones. It turns out that the complicated for loop of Algorithm 3.6 is slower than the much simpler loop in Algorithm 3.11. But if precomputations in the Montgomery multiplication are taken into account, we do not tend to achieve a speed-up with this new technique. For modular exponentiations, however, precomputations need to be done only once: that is, outside the square-and-multiply loop, and Montgomery multiplication pays off. In Algorithm 3.12, we rewrite Algorithm 3.9 in terms of the Montgomery arithmetic. A similar rewriting applies to Algorithm 3.10.
|
Input: Output: b = ae (mod n). Steps: /* Precomputations */
|
| 3.8 | Let , ℜ > 1. Show that every can be represented uniquely as a tuple (as–1, . . . , a1, a0) for some (depending on a) with
a = as–1ℜs–1 + · · · + a1ℜ + a0, 0 ≤ ai < ℜ for all i and as–1 ≠ 0. In this case, we write a as (as–1 . . . a0)ℜ or simply as as–1 . . . a0, when ℜ is understood from the context. ℜ is called the radix or base of this representation, as–1, . . . , a0 the (ℜ-ary) digits of a, as–1 the most significant digit, a0 the least significant digit and s the size of a with respect to the radix ℜ. | |||
| 3.9 | Let . Show that every can be written uniquely as
a = asRs + as–1Rs–1 + · · · + a1R + a0 with each | |||
| 3.10 | Negative radix Show that every integer with | |||
| 3.11 | Investigate the relative merits and demerits of the following three representations (in C) of multiple-precision integers needed for cryptography. In each case, we have room for storing 256 ℜ-ary words, the actual size and a sign indicator. In the second and third representations, we use two extra locations (sizeIdx and signIdx) in the digit array for holding the size and sign information.
Remark: We recommend the third representation. | |||
| 3.12 | Write an algorithm that prints a multiple-precision integer in decimal and an algorithm that accepts a string of decimal digits (optionally preceded by a + or – sign) and stores the corresponding integer as a multiple-precision integer. Also write algorithms for input and output of multiple-precision integers in hexadecimal, octal and binary. | |||
| 3.13 | Write an algorithm which, given two multiple-precision integers a and b, compares the absolute values |a| and |b|. Also write an algorithm to compare a and b as signed integers. | |||
| 3.14 |
| |||
| 3.15 | Describe a representation of rational numbers with exact multiple-precision numerators and denominators. Implement the arithmetic (addition, subtraction, multiplication and division) of rational numbers under this representation. | |||
| 3.16 | Sliding window exponentiation Suppose we want to compute the modular exponentiation ae (mod n). Consider the following variant of the square-and-multiply algorithm: Choose a small t (say, t = 4) and precompute a2t–1, a2t–1+1, . . . , a2t–1 modulo n. Do squaring for every bit of e, but skip the multiplication for zero bits in e. Whenever a 1 bit is found, consider the next t bits of e (including the 1 bit). Let these t bits represent the integer l, 2t–1 ≤ l ≤ 2t – 1. Multiply by al (mod n) (after computing usual t squares) and move right in e by t bit positions. Argue that this method works and write an algorithm based on this strategy. What are the advantages and disadvantages of this method over Algorithm 3.10? | |||
| 3.17 | Suppose we want to compute aebf (mod n), where both e and f are positive r-bit integers. One possibility is to compute ae and bf modulo n individually, followed by a modular multiplication. This strategy requires the running time of two exponentiations (neglecting the time for the final multiplication). In this exercise, we investigate a trick to reduce this running time to something close to 1.25 times the time for one exponentiation. Precompute ab (mod n). Inside the square-and-multiply loop, either skip the multiplication or multiply by a, b or ab, depending upon the next bits in the two exponents e and f. Complete the details of this algorithm. Deduce that, on an average, the running time of this algorithm is as declared above. | |||
| 3.18 | Let , m ≠ 1. An addition chain for m of length l is a sequence 1 = a1, a2, . . . , al = m of natural numbers such that for every index i, 2 ≤ i ≤ l, there exist indices i1, i2 < i with ai = ai1 + ai2. (It is allowed to have i1 = i2.)
|
Now that we know how to work in
and in the residue class rings
,
, we address some important computational problems associated with these rings. In this chapter, we restrict ourselves only to those problems that are needed for setting up various cryptographic protocols.
One of the simplest and oldest questions in algorithmic number theory is to decide if a given integer
, n > 1, is prime or composite. Practical primality testing algorithms are based on randomization techniques. In this section, we describe the Monte Carlo algorithm due to Miller and Rabin. The obvious question that comes next is to find one (or all) of the prime factors of an integer, deterministically or probabilistically proven to be composite. This is the celebrated integer factorization problem and will be formally introduced in Section 4.2. In spite of the apparent proximity between the primality testing and the integer factoring problems, they currently have widely different (known) complexities. Primality testing is easy and thereby promotes efficient setting up of cryptographic protocols. On the other hand, the difficulty of factoring integers protects these protocols against cryptanalytic attacks.
|
Let n be an odd integer greater than 1 and let |
By Fermat’s little theorem, a prime p is a pseudoprime to every base
with gcd(a, p) = 1. However, the converse of this is not true. By Exercise 3.19, n is not a pseudoprime to at least half of the bases in
, provided that there is at least one such base in
. Unfortunately, there exist composite integers m, known as Carmichael numbers, such that m is a pseudoprime to every base
. The smallest Carmichael number is 561 = 3 × 11 × 17. Exercises 3.21 and 3.22 investigate some properties of these numbers. Though Carmichael numbers are not very abundant in nature (
), they are still infinite in number. So a robust primality test requires n to satisfy certain constraints in addition to being a pseudoprime to one or more bases. The following constraint is due to Solovay and Strassen.
|
Let n be an odd integer > 1 and let |
By Euler’s criterion (Proposition 2.21), if p is a prime and gcd(a, p) = 1, then p is an Euler pseudoprime to the base a. The converse in not true, in general, but if n is composite, then n is an Euler pseudoprime to at most φ(n)/2 bases in
(Exercise 3.20). This, in turn, implies that if n is an Euler pseudoprime to t randomly chosen bases in
, then the chance that n is composite is no more than 1/2t. This observation leads to a Monte Carlo algorithm for testing the primality of an integer, where the probability of error (1/2t) can be made arbitrarily small by choosing large values of t. A more efficient algorithm can be developed using the following concept due to Miller and Rabin.
|
Let n be an odd integer > 1 with n – 1 = 2rn′, r := v2(n – 1) > 0, n′ odd, and let |
The rationale behind this definition is the following. If for some
we have an–1 ≢ 1 (mod n), we conclude with certainty that n is composite. So assume that an–1 ≡ 1 (mod n) and consider the powers bi := a2in′ (mod n) for i = 0, 1, . . . , r to see how the sequence b0, b1, . . . eventually reaches br ≡ 1 (mod n). If b0 ≡ 1 (mod n) already, this dynamics is clear. If, on the other hand, we have an i such that bi ≢ 1 (mod n), whereas bi+1 ≡ 1 (mod n), then bi is a square root of 1 modulo n. If n is a prime, the only square roots of 1 modulo n are ±1 and so n must be a strong pseudoprime to the base a. On the other hand, if n is composite but not the power of a prime, then 1 has at least two non-trivial square roots (that is, square roots other than ±1) modulo n (Exercise 3.30). We hope to find one such non-trivial square root of 1 in the sequence b0, b1, . . . , br–1 and if we are successful, the compositeness of n is proved with certainty.
A complete residue system modulo an odd composite n contains at most n/4 bases to which n is a strong pseudoprime. The proof of this fact is somewhat involved (though elementary) and can be found elsewhere, for example, in Chapter V of Koblitz [153]. Here, we concentrate on the Monte Carlo Algorithm 3.13 known as the Miller–Rabin primality test and based on this observation.
|
Input: An odd integer Output: A certificate that either “n is composite” or “n is prime”. Steps: Find out n′ and r such that n – 1 = 2rn′ with |
Whenever Algorithm 3.13 outputs n is composite, it is correct. On the other hand, if it certifies n as prime, there is a probability δ that n is composite. This probability can be made very small by choosing a suitably large value of the iteration count t. For cryptographic applications, δ ≤ 1/280 is considered sufficiently safe. In view of the first statement of the last paragraph, we can take t = 40 to meet this error bound. In practice, much smaller values of t offer the desired confidence. For example, if n is of bit length 250, 500, 750 or 1000, the respective values t = 12, 6, 4 and 3 suffice.
Although, in Algorithm 3.13, we have chosen a to be an arbitrary integer between 2 and n – 2, there is apparently no harm, if we choose a randomly in the interval 2 ≤ a < 232. In fact, such a choice of single-precision bases is desirable, because that makes the exponentiation an′ (mod n) more efficient (See Algorithm 3.9). A typical cryptographic application loads at start-up a precalculated table of small primes (say, the first thousand primes). Choosing the bases randomly from this list of small primes is indeed a good idea.
While the Miller–Rabin algorithm settles the primality testing problem in a practical sense, it is, after all, a randomized algorithm. It is interesting, at the minimum theoretically, to investigate the deterministic complexity of primality testing. There has been a good amount of research in this line. Let us sketch here the history of deterministic primality proving, without going to rigorous mathematical details.
One natural strategy to check for the primality of a positive integer n is to factor it. However, factoring integers is a computationally difficult problem. Primality proving has been found to be a much easier computational exercise. That is, one need not factorize n explicitly in order to claim about the primality of n.
The (seemingly) first modern primality testing algorithm is due to Miller[204]. This algorithm is deterministic polynomial-time, provided that the extended Riemann hypothesis or ERH (Conjecture 2.3) is true. Since the ERH is still an unsolved problem in mathematics, it cannot be claimed with certainty if Miller’s test is really a polynomial-time algorithm. Rabin [248] provided a version of Miller’s test which is unconditionally polynomial-time, but is, at the same time, randomized. This is what we have discussed earlier under the name Miller–Rabin primality test. This is a Monte Carlo algorithm which produces the answer no (composite) with certainty, but the answer yes (prime) with some (small) probability of error. Solovay and Strassen’s test [287] based on Definition 3.3 is another no-biased randomized polynomial-time primality test and can be made deterministically polynomial-time under the ERH.
Adleman and Huang [3], using the work of Goldwasser and Kilian [116], provide a yes-biased randomized primality-proving algorithm that runs in expected polynomial time unconditionally. Adleman et al. [4] propose the first deterministic algorithm that runs unconditionally in time less than fully exponential (in log n). Its (worst) running time is (ln n)O(ln ln ln n), which is still not polynomial. (The exponent ln ln ln n grows very slowly with n, but still is not a constant.)
In August 2002, Agarwal, Kayal and Saxena came up with the first deterministic primality testing algorithm that runs in polynomial time unconditionally, that is, under no unproven assumptions. This algorithm, popularly abbreviated as the AKS algorithm, is based on the observation that n is prime if and only if (X + a)n ≡ Xn + a (mod n) for every
(Exercise 3.26). A naive application of this observation requires computing an exponential number of coefficients in the binomial expansion of (X + a)n. The AKS algorithm gets around with this difficulty by checking the new congruence
Equation 3.2

for some polynomial h(X) of small degree. Here the notation (mod n, h(X)) means modulo the ideal
of
. If deg h(X) is bounded by a polynomial in log n, (X + a)n (and also Xn + a) can be computed modulo n, h(X) in polynomial time. However, reduction modulo h(X) may allow a composite n to satisfy the new congruence. Agarwal et al. took h(X) := Xr –1 for some prime r = O(ln6 n) with r – 1 having a prime divisor
ln n. From a result in analytic number theory due to Fouvry, such a prime r always exists. Congruence (3.2) is verified for this h(X) and for at most
ln n values of a. An elementary proof presented in Agarwal et al. [5] demonstrates that this suffices to conclude deterministically and unconditionally about the primality of n. The AKS algorithm in this form runs in time O~(ln12 n).
Lenstra and Pomerance [175] have reduced the running time of the AKS algorithm to O~(ln6 n). The AKS paper comes with another conjecture which, if true, yields a O~(ln3 n) deterministic primality-proving algorithm.
|
Let n be an odd integer > 1, and (X – 1)n ≡ Xn – 1 (mod n, Xr – 1), then either n is prime or n2 ≡ 1 (mod r). |
It remains an open question whether a future version of the AKS algorithm would supersede the Miller–Rabin test in terms of performance. As long as the answers are not favourable to the AKS algorithm, these new theoretical endeavours do not seem to have sufficient impacts on cryptography. Primes certified by the Miller–Rabin test are at present secure enough for all applications. Nonetheless, the AKS breakthrough has solid theoretical implications and deserves mention in a prime context.
If a random prime of a given bit length t is called for, we can keep on generating random odd integers of bit length t and check these integers for primality using the Miller–Rabin test. The prime number Theorem 2.20 ascertains that after O(t) iterations we expect to find a prime. A somewhat similar but reasonably faster algorithm is discussed in Exercise 4.14. We will henceforth call random primes of a given bit length and having no additional imposed properties as naive primes. Naive primes are often not cryptographically secure, because the primes used in many protocols should satisfy certain properties in order to preclude some known cryptanalytic attacks.
|
Let p be an odd prime. Then p is called a safe prime, if (p – 1)/2 is also a prime, whereas p is called a strong prime, if
In cryptography, a large prime divisor typically refers to one with bit length ≥ 160. |
A random safe prime of a given bit length t can be found by generating a random sequence of natural numbers n congruent to 3 modulo 4 and of bit length t, until one is found for which both n and (n – 1)/2 are primes (as certified by the Miller–Rabin primality test). The prime number theorem once again implies that this search is expected to terminate after O(t2) iterations.
For generating a random strong prime p of bit length t, we first generate q′ and q″ and then q and finally p. (See the notations of Definition 3.5.) Algorithm 3.14 describes Gordon’s algorithm in which the bit lengths l and l′ of q and q′ are nearly t/2 and the bit length l″ of q″ is slightly smaller than l′. In our concrete implementation of the algorithm, we choose l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20 and l″ := ⌈t/2⌉ – 22. If t is sufficiently large (say, t ≥ 400), the prime divisors q, q′ and q″ are then cryptographically large.
The simple check that Gordon’s algorithm correctly computes a strong prime of bit length t with q, q′ and q″ as in Definition 3.5 is based on Fermat’s little theorem and is left to the reader. Note that with our choice of l, l′ and l″, the loop variables i and j run through single-precision values only, thereby making arithmetic involving them efficient. Also note that the ranges over which i and j vary are sufficiently large so that we expect the (outer) while loop to be executed only once. This implementation has a tendency to generate smaller values of q and p (with the given bit sizes). In practice, this is not a serious problem and can be avoided, if desired, by choosing random values of i and j from the indicated ranges.
|
Output: A strong prime p of bit length t. Steps: l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20, l″ := ⌈t/2⌉ – 22. while (1) { |
Gordon’s algorithm takes only nominally more expected running time than that needed by the algorithm discussed at the beginning of Section 3.4.2 for generating naive primes of the same bit length. On the other hand, safe primes are much costlier to generate and may be avoided, unless the situation specifically demands their usage.
Determination of square roots modulo a prime p is frequently needed in cryptographic applications. In this section, we assume that p is an odd prime and want to compute the square roots of
, gcd(a, p) = 1, modulo p, provided that a is a quadratic residue modulo p, that is, if
. Using the Jacobi symbol the value
can be computed efficiently as Algorithm 3.15 suggests.
The correctness of Algorithm 3.15 follows from the properties of the Jacobi symbol (Proposition 2.22 and Theorem 2.19). The value of (–1)(b2–1)/8 is determined by the value of b modulo 8, that is, by the three least significant bits of b:

Similarly, (–1)(a – 1)(b – 1)/4 can be computed using only the second least significant bits of a and b as:

If
, our next task is to compute
with x2 ≡ a (mod p). If one such x is found, the other square root of a modulo p is –x ≡ p – x (mod p). If p ≡ 3 (mod 4) or p ≡ 5 (mod 8), we have explicit formulas for a square root x. The remaining case, namely p ≡ 1 (mod 8), is somewhat complicated. In this case, we use the probabilistic algorithm due to Tonelli and Shanks. The details are given in Algorithm 3.16. The explicit formulas for the first two cases are easy to verify. We now prove the correctness of the algorithm in the remaining case.
|
Input: An odd prime p and an integer a, 1 ≤ a < p. Output: The Legendre symbol Steps:
/* The Euclidean loop */
|
Since
is cyclic and has order p – 1 = 2vq, the 2-Sylow subgroup G of
has order 2v and is also cyclic. Let g be a generator of G. By Euler’s criterion, aq is a square in G and, therefore, aqge = 1 (in G) for some even integer e, 0 ≤ e < 2v, and x ≡ a(q + 1)/2ge/2 (mod p) is a square root of a modulo p.
A generator g of G can be obtained by choosing random elements b from
and computing the Legendre symbol
. It is easy to see that
. Furthermore, bq is a generator of G if and only if
. Finding a quadratic non-residue in
is the probabilistic part of the algorithm. Since exactly half of the elements of
are quadratic non-residues, one expects to find one after a few random trials. In order to make the exponentiation bq efficient, b should be chosen as single-precision integers. The while loop of the algorithm computes the multiplier ge/2 in x using O(v) iterations by successively locating the 1 bits of e starting from the least significant end.
To sum up, square roots modulo a prime can be computed in probabilistic polynomial time. Computing square roots modulo a composite integer n is, on the other hand, a very difficult problem, unless the complete factorization of n is known (see Section 4.2 and Exercise 3.29).
| 3.19 | Let be odd and composite and suppose that there exists (at least) one with an–1 ≢ 1 (mod n). Show that bn–1 ≢ 1 (mod n) for at least half of the bases . [H]
Algorithm 3.16. Modular square root
| |
| 3.20 | Let be odd and composite.
| |
| 3.21 | Let be a Carmichael number, that is, a composite integer for which an–1 ≡ 1 (mod n) for all a coprime to n, that is, ordn(a)|(n – 1) for all . Prove that:
| |
| 3.22 |
| |
| 3.23 | Fermat’s test for prime numbers Let | |
| 3.24 | Pépin’s test for Fermat numbers Show that the Fermat number n := 22k + 1 is prime if and only if 3(n – 1)/2 ≡ –1 (mod n). | |
| 3.25 | Write an algorithm that, given natural numbers t, l with l < t, outputs a (probable) prime p of bit length t such that p – 1 has a (probable) prime divisor q of bit length l. | |
| 3.26 | Let .
| |
| 3.27 | Modify Algorithm 3.15 to compute the (generalized) Jacobi symbol for odd and for arbitrary .
| |
| 3.28 | A Implement the Chinese remainder theorem for integers, that is, write an algorithm that takes as input pairwise relatively prime moduli and integers for i = 1, . . . , r and that outputs with a ≡ ai (mod ni) for all i = 1, . . . , r. [H]
| |
| 3.29 | Let f(X) be a non-constant polynomial in .
| |
| 3.30 | Let be odd and . Deduce that the congruence x2 ≡ a (mod n) has exactly solutions modulo n.
| |
| 3.31 | Show that Algorithm 3.17 correctly computes for . Specify a strategy to initialize a before the while loop. Determine how Algorithm 3.17 can be used to check if a given is a perfect square. [H]
Algorithm 3.17. Integer square root
| |
| 3.32 |
|
Many cryptographic protocols are based on the (apparent) intractability of the discrete logarithm problem (Section 4.2) in the multiplicative group of a finite field
. The arithmetic of the finite fields
,
, and
,
, is easy to implement and run efficiently. In view of this, these two kinds of finite fields are most popular in cryptography and we concentrate our algorithmic study on these fields only.
A prime field
is the quotient ring
. In Section 3.3.4, we have already made a thorough study of the arithmetic of the rings
,
. We recall that the elements of
are represented as integers from the set {0, 1, . . . , p – 1} and the arithmetic in
is the modulo p integer arithmetic. Since p is typically multiple-precision, the characteristic p of
is odd. The fields of even characteristic that we will study are the non-prime fields
.
Section 2.9.3 explains several representations of extension fields. The most common one is the polynomial-basis representation
for an irreducible polynomial f(X) of degree n in
. In that case, an element of
has the canonical representation as a polynomial a0 + a1X + · · · + an–1Xn–1,
, of degree < n. An arithmetic operation on two elements of
is the same operation in
followed by reduction modulo the defining polynomial f(X). So we start with the implementation of the polynomial arithmetic over
.

A polynomial over
(or any field) is identified by its coefficients of which only finitely many are non-zero. Thus for storing a polynomial g(X) = adXd + ad–1Xd–1 + · · · + a1X + a0 it is sufficient to store the finite ordered sequence adad–1 . . . a1a0. It is not necessary to demand ad ≠ 0, but the shortest sequence representing a non-zero polynomial corresponds to ad ≠ 0 and in this case deg g = d. On the other hand, as we see later it is often useful to pad such a sequence with leading zero coefficients. As an example, the polynomial
is representable as 101 or as 0101 or as 00101 or · · ·.
Since
can be viewed as the set {0, 1} with operations modulo 2, a polynomial in
is essentially a bit string unique up to insertion (and deletion) of leading zero bits. As in the case of multiple-precision integers, we pack these coefficients in an array of 32-bit words and maintain the number of coefficients belonging to the polynomial. For example, the polynomial g(X) = X64 + X31 + X7 + 1 can be stored in an array w2w1w0 of three 32-bit words. w0 consists of the coefficients of X0, X1, . . . , X31, w1 consists of the coefficients of X32, X33, . . . , X63, and w2 consists of the coefficient of X64. It is up to the implementation scheme to decide whether the coefficients are to be stored from left to right or from right to left in the bits of a word. We assume that less significant coefficients go to the less significant bits of a word. For the polynomial g above, the word w0 viewed as an unsigned integer will then be w0 = 231 + 27 + 1, whereas we have w1 = 0. The least significant bit of w2 would be 1. The remaining 31 bits of w2 are not important and can be assigned any value as long as we maintain the information that only the coefficients of Xi, 0 ≤ i ≤ 64, need to be considered. On the other hand, if we want to store the coefficients of g upto that of X80, then the bits of w2 at locations 1, . . . , 16 must be zero, whereas those at locations 17, . . . , 31 may be of any value. We, however, always recommend the use of leading zero-bits to fill the portion of the leading word not belonging to the polynomial.
Such a representation of elements of
, in addition to being compact, facilitates efficient implementation of arithmetic functions. As we will shortly see, we need not often extract the individual coefficients of a polynomial but apply bit operations on entire words to process 32 coefficients simultaneously per operation. We usually do not need polynomials of degrees > 4096 for cryptographic applications. It is, therefore, sufficient to declare a static array capable of storing all the 8193 coefficients of a product of two such largest polynomials. The zero polynomial may be represented as one with zero word size, whereas the degree of the zero polynomial is taken to be –∞ which may be representable as –1.
We now describe the arithmetic functions on two non-zero polynomials
Equation 3.3

Under our implementation, a and b demand ρ := ⌈(r + 1)/32⌉ and σ := ⌈(s + 1)/32⌉ machine words αρ – 1 . . . α1α0 and βσ – 1 . . . β1β0. We also assume paddings with leading zero bits in the areas not belonging to the operands.
Note that the addition of
is the same as the XOR (⊕) of two bits. Applying this bit operation on words αi and βi adds 32 coefficients of the operand polynomials simultaneously (see Algorithm 3.18). Finally note that –1 = 1 in any field of characteristic 2, that is, subtraction is the same as addition in such a field.
The product a(X)b(X) can be computed as in Algorithm 3.19. Once again, using wordwise operations yields faster implementation. By AND and OR, we denote the bit-wise and and or operations on 32-bit words. The easy verification of the correctness of this algorithm is left to the reader. As in the case of addition, one might want to make the polynomial c compact after its words γτ – 1, . . . , γ0 are computed.
|
Input: a(X), Output: c(X) = a(X) + b(X) (to be stored in the array γτ – 1 . . . γ1γ0). Steps: τ := max(ρ, σ). |
|
Input: a(X), Output: c(X) = a(X)b(X) (to be stored in the array γτ – 1 . . . γ1γ0). Steps: τ := ρ + σ – 1. /* The size of the product */ |
The square of
can be computed very easily using the fact that
a(X)2 = (arXr + · · · + a1X + a0)2 = arX2r + · · · + a1X2 + a0.
This gives us a linear-time (in terms of r or ρ) algorithm instead of the quadratic general-purpose multiplication Algorithm 3.19. We leave the implementational details to the reader.
Division with remainder in
is implemented in Algorithm 3.20. As before, we continue to work with the operands a(X) and b(X) as in Equation (3.3). But now we make a further assumption that bs = 1, so that βσ–1 ≠ 0, and also that s ≤ r. When the Euclidean division loop of Algorithm 3.20 terminates, the array locations δσ–1, . . . , δ1, δ0 contain the remainder. The arrays γ and δ may be made compact to discard the leading zero bits, if any.
|
Input: a(X), Output: c(X) = a(X) quot b(X) (to be stored in the array γτ – 1 . . . γ1γ0) and d(X) = a(X) rem b(X) (to be sored in the array δρ–1 . . . δ1δ0). Steps: τ := ⌈(s – r + 1)/32⌉. /* The size of the quotient */ |
Computing modular inverses requires computation of extended gcds of polynomials in
. We again start with the non-zero polynomials a(X),
and compute polynomials d(X), u(X) and v(X) in
with d(X) = gcd(a(X), b(X)) = u(X)a(X) + v(X)b(X), deg u < deg b and deg v < deg a. For polynomials, we do not have an equivalent of the binary gcd algorithm (Algorithm 3.8). We use repeated Euclidean divisions instead.
The proof for the correctness of Algorithm 3.21 is similar to that for Algorithm 3.8. Here, we introduce the variables rk, Uk and Vk for k = 0, 1, 2, . . . . The initialization goes as: r0 := a, r1 := b, U0 := 1, U1 := 0, V0 := 0 and V1 := 1. During the k-th iteration (k = 1, 2, . . .), we first use Euclidean division to get rk–1 = qkrk + rk + 1 which gives rk + 1 = rk–1 – qkrk. We also compute Uk + 1 = Uk–1 – qkUk and Vk + 1 = Vk–1 –qkVk using the values available from the previous two iterations so as to maintain the relation rk + 1 = Uk + 1r0 + Vk + 1r1 for all k = 1, 2, . . . . In Algorithm 3.21, the k-th iteration of the while loop begins with x = rk–1, y = rk, u1 = Uk and u2 = Uk–1 and ends after updating the values to x = rk, y = rk + 1, u1 = Uk + 1 and u2 = Uk. It is not necessary to maintain the values Vk in the main loop. After the loop terminates, one computes Vk = (rk – Ukr0)/r1.
Modular arithmetic in
is very much similar to the modular arithmetic in
. If f(X) is a non-constant polynomial of
(not necessarily irreducible), we represent elements of
as polynomials in
of degrees < n. Given two such polynomials a and b, we compute the sum a + b simply as the sum in
. The product ab is computed by first computing the product ab in
and then computing the remainder of Euclidean division of this product by f. Inverse of a modulo f exists if and only if gcd(a, f) = 1 (in
). In that case, extended gcd computation gives us polynomials u, v such that 1 = ua + vf, so that ua ≡ 1 (mod f). If a ≠ 0, then Algorithm 3.21 computes u with deg u < deg f = n, so that we take this u to be the canonical representative of a–1 in
. Finally, for
the computation of the modular exponentiation ae (mod f) can be done using an algorithm very similar to Algorithm 3.9 or Algorithm 3.10. We leave the details to the reader.
|
Input: Nonzero polynomials a, Output: Polynomials d, u, d = gcd(a, b) = ua + vb, deg u < deg b, deg v < deg a. Steps: /* Initialize */ |
For the polynomial basis representation
, we need an irreducible polynomial
of degree n. We shortly present a probabilistic algorithm that generates a random monic irreducible polynomial in
of given degree
. Although we are interested only in the case q = 2, this algorithm holds even if q is any arbitrary prime or an arbitrary prime power.
First, we describe a deterministic polynomial-time algorithm for checking the irreducibility of a non-constant polynomial
(over
). If f is reducible, it has a factor of degree i ≤ ⌊n/2⌋. Also recall (Theorem 2.40, p 82) that Xqi –X is the product of all monic irreducible polynomials of
of degrees dividing i. Therefore, if f has an irreducible factor of degree i, then gcd(f, Xqi – X) = gcd(f, Xqi – X rem f) will be a non-constant polynomial. Algorithm 3.22 employs these simple observations.
Now, recall from Section 2.9.2 that a random monic polynomial of
of degree n is irreducible with probability approximately 1/n. Therefore, if we keep on checking for irreducibility random monic polynomials in
of degree n, then after O(n) checks we expect to find an irreducible polynomial. This leads to the Las Vegas probabilistic Algorithm 3.23.
|
Input: A non-constant polynomial Output: A (deterministic) certificate whether f is irreducible or not. Steps: n := deg f, g := X. |
|
Input: Output: A random monic irreducible polynomial Steps: while (1) { |
Once the defining irreducible polynomial f is available, we carry out the arithmetic in
as modular polynomial arithmetic with respect to the modulus f. This is described at the end of Section 3.5.1. Since this modular arithmetic involves taking the remainder of Euclidean division by f, it is sometimes expedient to choose f to be an irreducible polynomial of certain special types. The randomized algorithm described above gives a random monic irreducible polynomial f of degree n having on an average ≈ n/2 non-zero coefficients. The division algorithm (Algorithm 3.20) in that case takes time O(n2). On the other hand, if f is a sparse polynomial (like a trinomial), the Euclidean division loop can be rewritten to exploit this sparsity, thereby bringing down the running time of the division procedure to O(n). (See Exercise 3.34. Also see Exercise 3.38 for computing isomorphisms between different polynomial-basis representations of the same field.)
Let p be a prime and let
. We have seen how to implement arithmetic in
and hence by Exercise 3.35 that in
too. If
is an irreducible polynomial of degree n and if q = pn, then
and we implement the arithmetic of
as the polynomial arithmetic of
modulo f. Again by Exercise 3.35, this gives us the arithmetic of
. Now, for
and a monic irreducible polynomial
we have a representation
. Instead of having such a two-way representation of
we may also represent
as
, where
is a monic irreducible polynomial of degree nm. It usually turns out that the second representation of
is more efficient. However, there are some situations where the two-way representation performs better. This is, in particular, the case when the arithmetic of
can be made more efficient than the modular polynomial arithmetic of
. For example, we might precompute tables of arithmetic operations of
and use table lookups for performing the coefficient arithmetic of
. This demands O(q2) storage and is feasible only when q is small. On the other hand, if we find a primitive element γ of
and precompute a table that maps i ↦ γi and another that maps γi → i, then products in
can be computed in time O(1) using table lookups. If, in addition, we store the Zech’s logarithm table (Section 2.9.3) for
, then addition in
can also be performed in O(1) time with table lookup. Both these three tables take O(q) memory which (though better than O(q2) storage of the previous scheme) is feasible only for small q.
Not all finite fields are suitable for cryptographic applications. In this section, we discuss the desirable properties of a field
so that secured protocols on
can be developed. We first note that such protocols are usually based on the apparent intractability of the so-called discrete logarithm problem (DLP) (Section 4.2). As a result, selections of suitable fields are dictated by the known cryptanalytic algorithms to solve the DLP (See Section 4.4). We shall mostly concentrate on
with either q = p a prime or q = 2n for some
. By the bit size of q, denoted |q|, we mean the number of bits in the binary representation of q, that is, |q| = ⌈lg q⌉. As we have seen, each element of
is representable using O(|q|) bits and, therefore, |q| is often also called the size of
.
The first requirement on a cryptographically suitable field
is that the size |q| should be sufficiently large. Recent cryptanalytic studies show that sizes |q| ≤ 512 are not secure enough. Sizes |q| ≥ 768 are recommended for secure applications. For long-term security, one might even require |q| ≥ 2048.
Any field of the recommended size is, however, not adequately secure. The cardinality #Fq = q must be such that q – 1 has at least one large prime divisor q′ (See the Pohlig–Hellman method in Section 4.4). By large, we usually mean |q′| ≥ 160. In addition, this prime factor q′ of q – 1 should be known to us. If q = p is a prime, then a safe prime or a strong prime serves our purpose (Definition 3.5, Algorithm 3.14). Also see Exercise 3.25. On the other hand, if q = 2n, the only way to obtain q′ is by factorizing the Mersenne number Mn := q – 1 = 2n – 1. Factorizing Mn for n ≥ 768 is a very difficult task. Luckily, extensive tables of complete or partial factorizations of Mn are available. For example, for n = 769 (a prime number), we have
M769 = 2769 – 1 = 1,591,805,393 × 6,123,566,623,856,435,977,170,641 × q′,
where q′ is a 657-bit prime. These tables should be consulted for choosing a suitable value of n.
The multiplicative group
is cyclic (Theorem 2.38). If the complete integer factorization of q – 1 is known, then it is possible to find, in polynomial time (in |q|), a primitive element of
. Algorithm 3.24 computes r = O(lg n) exponentiations in G in order to conclude whether a given element
is a generator of G. For
, we have polynomial-time exponentiation algorithms, so Algorithm 3.24 runs in deterministic polynomial time. By Exercise 2.47, the probability of a randomly chosen element of G being primitive is φ(m)/m. In view of the lower bound on φ(m)/m, given in Theorem 3.1 and proved by Rosser and Schoenfield [253], Algorithm 3.25 is expected to return a random primitive element of G after O(ln ln m) iterations.
|
Let |
|
Input: A cyclic group G of cardinality #G = m with known factorization Output: A deterministic certificate that a is a generator of G. Steps: /* We assume that G is multiplicatively written and has the identity e */ |
|
Input: A cyclic group G of cardinality #G = m with known factorization Output: A generator g of G. Steps: while (1) { |
If, however, the factorization of #G = m is not known, there are no known (deterministic or probabilistic) algorithms for finding a random generator of G or even for checking if a given element of G is primitive. This is indeed one of the intractable problems of computational algebraic number theory. This problem for
can be bypassed as follows.
Recall that we have chosen q in such a way that
has a large known prime factor q′. Let H be the unique subgroup of G of order q′. Then H is also cyclic and we choose to work in H (using the arithmetic of G). It turns out that if q′ ≥ 2160 and if H is not contained in a proper subfield of
, the security of cryptographic protocols over
does not degrade too much by the use of H (instead of the full G) as the ground group. But we now face a new problem, that is, the problem of finding a generator of H. Since #H = q′ is a prime, every element of H \ {1} is a generator of H. So the problem essentially reduces to that of finding any non-identity element of H. This latter problem has a simple probabilistic solution. First of all, if q – 1 = q′ is itself prime, choosing any random non-identity element of
will do. So assume q′ < q – 1. Choose a random
and let b := a(q – 1)/q′. By Lagrange’s theorem (Theorem 2.2, p 24), bq′ = aq–1 = 1 and, therefore, by Proposition 2.5
. Now,
being a field, the polynomial
can have at most (q – 1)/q′ roots in
(that is, in
) and hence the probability that b = 1 is ≤ ((q – 1)/q′)/(q – 1) = 1/q′. This justifies the randomized polynomial running time of the Las Vegas Algorithm 3.26. Indeed if q′ ≥ 2160, the while loop of the algorithm is executed only once almost always.
|
Input: A finite field Output: An element Steps: while (1) { |
Polynomial factorization over finite fields is an interesting computational problem. All deterministic algorithms known for this purpose are quite poor: that is, fully exponential in the size of the field. However, if randomization is allowed, we have reasonably efficient (polynomial-time) algorithms. In this section, we outline the basic working of the modern probabilistic algorithms for polynomial factorization over finite fields. We assume that a non-constant polynomial
is to be factored. Without loss of generality, we can take f to be monic. We assume further that the arithmetic of
and that of
is available. We work with a general value of q = pn, p prime and
, though in some cases we have to treat the case p = 2 separately. Irreducibility (or otherwise) in this section means the same over
.
The factorization algorithm we are going to discuss is a generalization of the root finding algorithm (see Exercise 3.36) and consists of three steps:
We now provide a separate detailed discussion for each of these three steps.
Theorem 3.2 is at the very heart of the square-free factorization algorithm and is a generalization of Exercise 2.61.
|
Let K be a field and Proof Let |
The algorithm for SFF over
is now almost immediate except for one subtlety, namely, the consideration of the case f/gcd(f, f′) = 1, or equivalently, f′ = 0. In order to see when this case can occur, let us write the non-zero terms of f as f = a1Xe1 + · · · + atXet with distinct exponents e1, . . . , et and
. Then f′ = a1e1Xe1 – 1 + · · · + atetXet – 1 = 0 if and only if
, that is, if p divides all of e1, . . . , et. But then f(X) = h(X)p, where
, since
for all i. These observations motivate the recursive Algorithm 3.27. It is easy to check that this (deterministic) algorithm runs in time polynomially bounded by deg f and log q.
|
Input: A monic non-constant polynomial Output: A square-free factorization of f. Steps: Compute f′. |
Let
be a square-free polynomial of degree d. We can write f = f1 · · · fd, where for each i the polynomial
is the product of all the irreducible factors of f of degree i. If f does not have an irreducible factor of degree i, then we take fi = 1 as usual.[5] In order to compute the polynomials fi, we make use of the fact that
is the product of all monic irreducible polynomials in
whose degrees divide i (see Theorem 2.40 on p 82). It immediately follows that
. Thus a few (at most d) gcd computations give us all fi. But the polynomials
are of rather large degrees. But since
, keeping polynomials reduced modulo f implies that we take gcds of polynomials of degrees ≤ d. This, in turn, implies that the DDF can be performed in (deterministic) polynomial time (in d and ln q).
[5] Conventionally, an empty product is taken to be the multiplicative identity and an empty sum to be the additive identity.
Algorithm 3.28 shows an implementation of the DDF. Though the algorithm does not require f to be monic, there is no harm in assuming so.
|
Input: A (non-constant) square-free polynomial Output: The DDF of f, that is, the polynomials f1, . . . , fd as explained above. Steps: g := f. /* Make a local copy of f */ |
This simple-minded implementation of the DDF is theoretically not the most efficient one known. In fact, it turns out that the DDF (and not the seemingly more complicated EDF) is the bottleneck of the entire polynomial factorization process. Therefore, making the DDF more efficient is important and there are lots of improvements suggested in the literature. All these improved algorithms essentially do the same thing as above (that is, the computation of
), but they optimize the computation of the polynomials
rem f. The best-known method (due to Kaltofen and Shoup) is based on the observation that, in general, most of the fi are 1. Therefore, instead of computing each
, one may break the interval 1, . . . , d into several subintervals I1, I2, . . . , Il and compute
, j = 1, . . . , l. Only those Fj that turn up to be non-constant are further decomposed.
For cryptographic purposes, we will, however, deal with rather small values of d = deg f. (Typically d is at most a few thousands.) The asymptotically better algorithms usually do not outperform the simple Algorithm 3.28 for these values of d.
Equal-degree factorization, the last step of the polynomial factorization process, is the only probabilistic part of the algorithm. We may assume that f is a (monic) square-free polynomial of degree d and that each irreducible factor of f has the same (known) degree, say δ. If d = δ, then f is irreducible. So we assume that d > δ, that is, d = rδ for some
. Theorem 3.3 provides the basic foundations for the EDF.
|
Let g be any polynomial in Proof If g = 0, there is nothing to prove. If g = alXl + · · · + a1X + a0 ≠ 0 with |
Now, we have to separate two cases, namely, q is odd and q is even. Theorem 3.3 is valid for any q, even or odd, but taking q as odd allows us to write gqδ – g = g(g(qδ –1)/2–1)(g(qδ –1)/2 + 1). With the above assumptions on f we have f|(Xqδ –X) and, therefore, f|(gqδ – g), so that f = gcd(gqδ – g, f) = gcd(g, f) gcd(g(qδ –1)/2 – 1, f) gcd(g(qδ –1)/2 + 1, f). If g is randomly chosen, then gcd(g(qδ –1)/2 – 1, f) is with probability ≈ 1/2 a non-trivial factor of f. The idea is, therefore, to keep on choosing random g and computing
until one gets
. One then recursively applies the algorithm to
and
. It is sufficient to choose g with deg g < 2δ. Obviously, the exponentiation gqδ has to be carried out modulo f. We leave the details to the reader, but note that trying O(1) random polynomials g is expected to split f and, therefore, the EDF runs in expected polynomial time.
For the case q = 2n, essentially the same algorithm works, but we have to use the split gqδ + g = g2nδ + g = (g2nδ–1 + g2nδ–2 + · · · + g2 + g)(g2nδ–1 + g2nδ–2 + · · · + g2 + g + 1). Once again computing gcd(g2nδ–1 + g2nδ–2 + · · · + g2 + g, f) for a random
splits f with probability ≈ 1/2 and, thus, we get an EDF algorithm that runs in expected polynomial time.
| 3.33 | Find a (polynomial-basis) representation of . Compute a primitive element in this representation.
| ||
| 3.34 |
| ||
| 3.35 | Implement the polynomial arithmetic of given that of .
| ||
| 3.36 | Let q = pn (p prime and ), a non-constant polynomial and let g := gcd(f, Xq – X).
Explain how Algorithm 3.30 produces two non-trivial factors of g (over Algorithm 3.30. Computing roots of a polynomial: characteristic 2
| ||
| 3.37 | Use Exercise 3.36 to compute all the roots of the following polynomials: | ||
| 3.38 | Let f and g be two monic irreducible polynomials over and of the same degree . Consider the two representations . In this exercise, we study how we can compute an isomorphism between these two representations. The polynomial f(Y) splits into linear factors over . Consider a root α = α(Y) of f(Y) in . Show that 1, α, α2, . . . , αn–1 is an -basis of (the -vector space) . For i = 0, . . . , n – 1, write (uniquely) with , and consider the matrix A = (αij)0≤i≤n–1, 0≤j≤n–1. Show that the map that maps (the equivalence class of) a0 + a1X + · · · + an–1Xn–1 to (the equivalence class of) b0 + b1Y + · · · + bn–1Yn–1, where (b0b1 . . . bn–1) = (a0a1 . . . an–1)A, is an -isomorphism.
| ||
| 3.39 | Let q = pn for a prime p and . We have seen that the elements of can be represented as integers between 0 and p – 1, whereas the elements of can be represented as polynomials modulo some irreducible polynomial of degree n, that is, as polynomials of of degrees < n. Show that the substitution X = p in the polynomial representation of elements of gives a representation of elements of as integers between 0 and q – 1. We call this latter representation of elements of the packed representation. Compare the advantages and disadvantages of the packed representation over the polynomial representation.
| ||
| 3.40 | Let G be a cyclic multiplicatively written group of order m (and with the identity element e). Assume that the factorization of is known. Devise an algorithm that computes the order of an arbitrary element in G. [H]
| ||
| 3.41 | Berlekamp’s Q-matrix factorization Let
|
The recent popularity of cryptographic systems based on elliptic curve groups over
stems from two considerations. First, discrete logarithms in
can be computed in subexponential time. This demands q to be sufficiently large, typically of length 768 bits or more. On the other hand, if the elliptic curve E over
is carefully chosen, the only known algorithms for solving the discrete logarithm problem in
are fully exponential in lg q. As a result, smaller values of q suffice to achieve the desired level of security. In practice, the length of q is required to be between 160 and 400 bits. This leads to smaller key sizes for elliptic curve cryptosystems. The second advantage of using elliptic curves is that for a given prime power q, there is only one group
, whereas there are many elliptic curve groups
(over the same field
) with orders ranging from
to
. If a particular group
is compromised, we can switch to another curve without changing the base field
.
In this section, we start with the description of efficient implementation of the arithmetic in the groups
. Then we concentrate on some algorithms for counting the order
. Knowledge of this order is necessary to find out cryptographically suitable elliptic curves. We consider only prime fields
or fields
of characteristic 2. So we assume that the curve is defined by Equation (2.8) or Equation (2.9) on p 100 (supersingular curves are not used in cryptography) instead of by the general Weierstrass Equation (2.6) on p 98.
Let us first see how we can efficiently represent points on an elliptic curve E over
. Since
corresponds to two elements h,
and since each element of
can be represented using ≤ s = ⌈lg q⌉ bits, 2s bits suffice to represent P. We can do better than this. Substituting X = h in the equation for E leaves us with a quadratic equation in Y. This equation has two roots of which k is one. If we adopt a convention (for example, see Section 6.2.1) that identifies, using a single bit, which of the two roots the coordinate k is, the storage requirement for P drops to s + 1 bits. During an on-line computation this compressed representation incurs some overhead and may be avoided. However, for off-line storage and transmission (of public keys, for example), this compression may be helpful.
Explicit formulas for the sum of two points and for the opposite of a point on an elliptic curve E are given in Section 2.11.2. These operations in
can be implemented using a few operations in the ground field
.
Computation of mP for
and
(or, more generally, for
) can be performed using a repeated-double-and-add algorithm similar to the repeated-square-and-multiply Algorithm 3.9. We leave out the trivial modifications and urge the reader to carry out the details.
Finding a random point
is another useful problem. If q = p is an odd prime and we use the short Weierstrass Equation (2.8), we first choose a random
and substitute X by h to get Y2 = h3 + ah + b. This equation has 2, 0 or 1 solution(s) depending on whether h3 + ah + b is a quadratic residue or non-residue or 0 modulo p. Quadratic residuosity can be checked by computing the Legendre symbol (Algorithm 3.15), whereas square roots modulo p can be computed using Tonelli and Shanks’ Algorithm 3.16.
For a non-supersingular curve E over
defined by Equation (2.9), a random point
is chosen by first choosing a random
. Substituting X = h in the defining equation gives Y2 + hY + (h3 + ah2 + b) = 0. If h = 0, then the unique solution for k is b2n–1. If h ≠ 0, replacing Y by hY and dividing by h2 transforms the equation to the form Y2 + Y + α = 0 for some
. This equation has two or zero solutions depending on whether the absolute trace
is 0 or 1. If k is a solution, the other solution is k + 1. In order to find a solution (if it exists), one may use the (probabilistic) root-finding algorithm of Exercise 3.36. Another possibility is discussed now.
We consider two separate cases. First, if n is odd, then
is a solution, since Tr(α) = k2 + k + α. On the other hand, if n is even, we first find a
with Tr(β) = 1. Since Tr is a homomorphism of the additive groups
and Tr(1) = 1, exactly half of the elements of
have trace 1. Therefore, a desired β can be quickly found out by selecting elements of
at random and computing traces of these elements. Now, it is easy to check that
gives a solution of Y2 + Y + α = 0.
Counting points on elliptic curves is a challenging problem, both theoretically and computationally. The first polynomial time (in log q) algorithm invented by Schoof and later made efficient by Elkies and Atkins (and many others), is popularly called the SEA algorithm. Unfortunately, even the most efficient implementation of this algorithm is not quite efficient, but it is the only known reasonable strategy, in particular, when q = p is a large (odd) prime of a size of cryptographic interest. The more recent Satoh–FGH algorithm, named after its discoverer Satoh and after Fouquet, Gaudry and Harley who proposed its generalized and efficient versions, is a remarkable breakthrough for the case q = 2n. Both the SEA and the Satoh–FGH algorithms are mathematically quite sophisticated. We now present a brief overview of these algorithms.
We assume that q = p is a large odd prime, this being the typical situation when we apply the SEA algorithm. We also assume that E is given by the short Weierstrass equation Y2 = X3 + aX + b. Let q1 = 2, q2 = 3, q3 = 5, . . . be the sequence of prime numbers and t the Frobenius trace of E at p. By Hasse’s theorem (Theorem 2.48, p 106),
with
. A knowledge of t modulo sufficiently many small primes l allows us to reconstruct t using the Chinese remainder theorem. Because of the Hasse bound on t, it is sufficient to choose l from the primes q1, q2, . . . in succession, until the product q1q2 · · · qr exceeds
. By the prime number theorem (Theorem 2.20, p 53), we have r = O(ln p) and also qi = O(ln p) for each i = 1, . . . , r.
The most innovative idea of Algorithm 3.31 is the determination of the integers ti. For l = q1 = 2, the process is easy. We have t1 ≡ t ≡ 0 (mod 2) if and only if
contains a point of order 2 (a point of the form (h, 0)), or equivalently, if and only if the polynomial X3 + aX + b has a root in
. We compute the polynomial gcd g(X) := gcd(X3 + aX + b, Xp–X) over
and conclude that 
|
Input: A prime field Output: The order of the group Steps: Find (the smallest) |
Determination of ti for i > 1 involves more work. We explain here the original idea due to Schoof. We denote by l the i-th prime qi and by
the set of all l-torsion points of
(Definition 2.78, p 105). The Frobenius endomorphism
that maps
and
to (hp, kp) satisfies the relation
. If we restrict our attention only to the group E[l], then this relation reduces to
, where ti = t rem l and pi = p rem l, that is,
for all
.
In terms of polynomials, the last relation is equivalent to
Equation 3.4

where the sum and difference follow the formulas for the elliptic curve E. Now, one has to calculate symbolically rather than numerically, since X and Y are indeterminates. These computations can be carried out in the ring
(instead of in
), where f(X, Y) = Y2 – (X3 + aX + b) is the defining polynomial of E and fl = fl(X) is the l-th division polynomial of E (Section 2.11.2 and Theorem 2.47, p 106). Reduction of a polynomial in
modulo f makes its Y-degree ≤ 1, whereas reduction modulo fl makes the X-degree less than deg fl which is O(l2). We can try the values ti = 0, 1, . . . , l – 1 successively until the desired value satisfying Equation (3.4) is found out.
It is not difficult to verify that Schoof’s algorithm runs in time O(log8 p) (under standard arithmetic in
) and is thus a deterministic polynomial-time algorithm for the point-counting problem. Essentially the same algorithm works for fields
with q = 2n and has the same running time. Unfortunately, the big exponent (8) in the running time makes Schoof’s algorithm quite impractical. Numerous improvements are suggested to bring down this exponent. Elkies and Atkin’s modification for the case q = p gives rise to the SEA algorithm which has a running time of O(log6 p) under the standard arithmetic in
. This speed-up is achieved by working in the ring
, where gl is a suitable factor of fl and has degree O(l). Couveignes suggests improvements for the fields of characteristic 2. Efficient implementations of the SEA algorithm are reported by Morain, Müller, Dewaghe, Vercauteren and many others. At the time of writing this book, the largest values of q for which the algorithm has been successfully applied are 10499 + 153 (a prime) and 21999 (a power of 2).
The Satoh–FGH algorithm is well suited for fields
of small characteristic p and, in particular, for the fields
of characteristic 2. This algorithm has enabled point-counting over fields as large as
. A generic description of the Satoh–FGH algorithm now follows after the introduction of some mathematical notions. Though our practical interest concentrates on the fields
only, we consider curves over a general
with q = pn, p a prime.
Recall from Section 2.14 that the ring
of p-adic integers is a discrete valuation ring (Exercises 2.133 and 2.148) with the unique maximal ideal generated by
, and the residue field
is isomorphic to
.
We represent
as a polynomial algebra over
. We analogously define the p-adic ring
, where f is an irreducible polynomial of degree n in
. The elements of
can be viewed as polynomials of degrees < n and with p-adic integers as coefficients. The arithmetic operations in
are polynomial operations in
modulo the defining polynomial f. The ring
is canonically embedded in the ring
(consider constant polynomials).
turns out to be a discrete valuation ring with maximal ideal
, and the residue field
is isomorphic to
.
|
The projection map The (Teichmüller) lift The semi-Witt decomposition of |
The p-th power Frobenius endomorphism
, a ↦ ap, can now be extended to an endomorphism
as follows. Let
have the semi-Witt decomposition a0, a1, . . . with
. Then,
is the unique element
having the semi-Witt decomposition
One can show that
. We have
and similarly
.
Now, let E = E0 be an elliptic curve defined over
. Application of
to the coefficients of E0 gives another elliptic curve E1 over
whose rational points are
,
, where
, together with the point at infinity. We may apply
to E1 to get another curve E2 over
and so on. Since
, we get a cycle of elliptic curves defined over
:
Equation 3.5

Similarly, if ε = ε0 is an elliptic curve defined over
, application of
leads to a sequence of elliptic curves defined over
:
Equation 3.6

We need the canonical lifting of an elliptic curve E over
to a curve ε over
. Explaining that requires some more mathematical concepts:
|
Let K be a field and let E and E′ be two elliptic curves defined over K. A morphism (Definition 2.72, p 95) that maps the point The kernel ker The set Hom(E, E′) of all isogenies E → E′ is an Abelian group defined as The multiplication-by-m map of E is an isogeny. If End(E) contains an isogeny not of this type, we call E an elliptic curve with complex multiplication. |
|
For each |
|
The polynomials
|
The next theorem establishes the foundation for lifting curves from
to
.
|
Let E be an elliptic curve defined over |
With this definition of lifting of elliptic curves, Cycles (3.5) and (3.6) satisfy the following commutative diagram, where εi is the canonical lift of Ei for each i = 0, 1, . . . , n.

Algorithm 3.32 outlines the Satoh–FGH algorithm. In order to complete the description of the algorithm, one should specify how to lift curves (that is, a procedural equivalent of Theorem 3.5) and their p-torsion points and how the lifted data can be used to compute the Frobenius trace t. We leave out the details here.
|
Input: An elliptic curve E over Output: The cardinality Steps: Compute the curves E0, . . . , En–1 and their j-invariants j0, . . . , jn–1. |
The elements of
(and hence of
) are infinite sequences and hence cannot be represented in computer memory. However, we make an approximate representation by considering only the first m terms of the sequences representing elements of
. Working in
with this approximate representation is then essentially the same as working in
. For the Satoh–FGH algorithm, we need m ≈ n/2.
For small p (for example, p = 2) and with standard arithmetic in
, the Satoh–FGH algorithm has a deterministic running time O(n5) and space requirement O(n3). With Karatsuba arithmetic the exponent in the running time drops from 5 to nearly 4.17. In addition, this algorithm is significantly easier to implement than optimized versions of the SEA algorithm. These facts are responsible for a superior performance of the Satoh–FGH algorithm over the SEA algorithm (for small p).
Choosing cryptographically suitable elliptic curves is more difficult than choosing good finite fields. First, the order of the elliptic curve group
must have a suitably large prime divisor, say, of bit length 160 or more. In addition, the MOV attack applies to supersingular curves and the anomalous attack to anomalous curves (Definition 2.80 and Section 4.5). So a secure curve must be non-supersingular and non-anomalous. Checking all these criteria for a random curve E over
requires the group order
. One may use either the SEA algorithm or the Satoh–FGH algorithm to compute
. Once
is known, it is easy to check whether E is supersingular or anomalous. But factoring
to find its largest prime divisor may be a difficult task and is not recommended. One may instead extract all the small prime factors of
by trial divisions with the primes q1 = 2, q2 = 3, q3 = 5, . . . , qr for a predetermined r and write
where m1 has all prime factors ≤ qr and m2 has all prime factors > qr. If m2 is prime and of the desired size, then E is treated as a good curve. Algorithm 3.33 illustrates these steps.
The computation of the group orders
takes up most of the execution time of the above algorithm. It is, therefore, of utmost importance to employ good algorithms for point counting. The best algorithms known till date (the SEA and the Satoh–FGH algorithms) are only reasonable. Further research in this area may lead to better algorithms in future.
|
Input: A suitably large finite field Output: A cryptographically good elliptic curve E over Steps: while (1) { |
There are ways of generating good curves without requiring the point counting algorithms over large finite fields. One possibility is to use the so-called subfield curves. If
has a subfield
of relatively small cardinality, one can choose a random curve E over
and compute
. Since E is also a curve defined over
and
can be easily obtained using Theorem 2.51 (p 107), we save the lengthy direct computation of
. However, the drawback of this method is that since E is now chosen with coefficients from a small field
, we do not have many choices. The second drawback is that we must have a small divisor q′ of q. If q is already a prime, this strategy does not work at all. If q = pn, p a small prime, we need n to have a small divisor n′ that corresponds to q′ = pn′. Sometimes small odd primes p are suggested, but the arithmetic in a non-prime field of some odd characteristic is inherently much slower than that in a field of nearly equal size but of characteristic 2.
Specific curves with complex multiplication (Definition 3.7) over large prime fields have also been suggested in the literature. Finding good curves with complex multiplication involves less computational overhead than Algorithm 3.33, but (like subfield curves) offers limited choice. However, it is important to mention that no special attacks are currently known for subfield curves and also for those chosen by the complex multiplication strategy.
Let
be a finite field and C a hyperelliptic curve of genus g defined over K by Equation (2.13), that is, by
C : Y2 + u(X)Y = v(X)
for suitable polynomials u,
. We want to implement the arithmetic in the Jacobian
. Recall from Section 2.12 that an element of
can be represented uniquely as a reduced divisor Div(a, b) for a pair of polynomials a(x),
with a monic, degx a ≤ g, degx b < degx a and a|(b2 + bu – v). Thus, each element of
requires O(g log q) storage.
We first present Algorithm 3.34 that, given two elements Div(a1, b1), Div(a2, b2) of
, computes the reduced divisor Div
which satisfies Div(a, b) ~ Div(a1, b1) + Div(a2, b2). The algorithm proceeds in two steps:
Compute a semi-reduced divisor Div(a′, b′) ~ Div(a1, b1) + Div(a2, b2).
Compute the reduced divisor Div(a, b) ~ Div(a′, b′).
Both these steps can be performed in (deterministic) polynomial time (in the input size, that is, g log q). Algorithm 3.34 implements the first step and continues to work even when the input divisors are semi-reduced (and not completely reduced).
|
Input: (Semi-)reduced divisors Div(a1, b1) and Div(a2, b2) defined over K. Output: A semi-reduced divisor Div(a′, b′) ~ Div(a1, b1) + Div(a2, b2). Steps:
|
It is an easy check that the two expressions appearing between pairs of big parentheses in Algorithm 3.34 are polynomials. This algorithm does only a few gcd calculations and some elementary arithmetic operations on polynomials of K[X]. If the input polynomials (a1, a2, b1, b2) correspond to reduced divisors, then their degrees are ≤ g and hence this algorithm runs in polynomial time in the input size. Furthermore, in that case, the output polynomials a′ and b′ are of degrees ≤ 2g.
We now want to compute the unique reduced divisor Div(a, b) equivalent to the semi-reduced divisor Div(a′, b′). This can be performed using Algorithm 3.35. If the degrees of the input polynomials a′ and b′ are O(g) (as is the case with those output by Algorithm 3.34), Algorithm 3.35 takes a time polynomial in g log q. To sum up, two elements of
can be added in polynomial time. The correctness of the two algorithms is not difficult to establish, but the proof is long and involved and hence omitted. Interested readers might look at the appendix of Koblitz’s book [154].
For an element
and
, one can easily write an algorithm (similar to Algorithm 3.9) to compute nα using O(log n) additions and doublings in
.
For a hyperelliptic curve C of genus g defined over a field
, we are interested in the order of the Jacobian
rather than in the cardinality of the curve
. Algorithmic and implementational studies of counting
have not received enough research endeavour till date and though polynomial-time algorithms are known to this effect (at least for curves of small genus), these algorithms are far from practical for hyperelliptic curves of cryptographic sizes. In this section, we look at some of these algorithms.
|
Input: A semi-reduced divisor Div(a′, b′) defined over K. Output: The reduced divisor Div(a, b) ~ Div(a′, b′). Steps: (a, b) := (a′, b′). |
We start with some theoretical results which are generalizations of those for elliptic curves. The Frobenius endomorphism
, x ↦ xq, is a (non-trivial)
-automorphism of
. The map
naturally (that is, coordinate-wise) extends to the points on
and also to divisors and, in particular, to the Jacobian
as well as to
. For a reduced divisor Div
, we have
, where for a polynomial
the polynomial
is obtained by applying the map
to the coefficients of h. It is known that
satisfies a monic polynomial χ(X) of degree 2g with integer coefficients. For example, for g = 1 (elliptic curves) we have
χ(X) = X2 – tX + q,
where t is the trace of Frobenius at q. For g = 2, we have
Equation 3.7

for integers t1, t2. The cardinality
is related to the polynomial χ(X) as

and satisfies the inequalities
Equation 3.8

Thus n lies in a rather narrow interval, called the Hasse–Weil interval, of width
,
Theorem 2.50 can be generalized as follows:

|
The Jacobian |
The exponent of
(See Exercise 3.42) is clearly m := Exp
. Since m|n, there are ≤ ⌈(w + 1)/m⌉ possibilities for n for a given m (where w is the width of the Hasse–Weil interval). In particular, n is uniquely determined by m, if m > w. From the Hasse–Weil bound, we have
, that is,
. There are examples with
. On the other hand,
. So it is possible to have m ≤ w, though such curves are relatively rare. In the more frequent case (m > w), Algorithm 3.36 determines n.
|
Input: A hyperelliptic curve C of genus g defined over Output: The cardinality n of the Jacobian Steps: m := 1. |
Since Exp
, the above algorithm eventually (in practice, after few executions of the while loop) computes this exponent. However, if Exp
, the algorithm never terminates. Thus, we may forcibly terminate the algorithm by reporting failure, after sufficiently many random elements x are tried (and we continue to have m ≤ w). In order to complete the description of the algorithm, we must specify a strategy to compute ν := ord x for a randomly chosen
. Instead of computing ν directly, we compute an (integral) multiple μ of ν, factorize μ and then determine ν. Since nx = 0, we search for a desired multiple μ in the Hasse–Weil interval. This search can be carried out using a baby-step–giant-step (Section 4.4) or a birthday-paradox (Exercise 2.172) method, and the algorithm achieves an expected running-time of
which is exponential in the input size. This method, therefore, cannot be used except when n is small.
For hyperelliptic curves of small genus g, generalizations of Schoof’s algorithm (Algorithm 3.31) can be used. Gaudry and Harley [106] describe the case g = 2. One computes the polynomial χ(X) of Equation (3.7), that is, the values of t1 and t2 modulo sufficiently many small primes l. Since the roots of χ(X) are of absolute value
, we have
and |t2| ≤ 6q. Therefore, determination of t1 and t2 modulo O(log q) small primes l uniquely determines χ(X) (as well as n = χ(1)).
Let
be the set of l-torsion points of
. The Frobenius map restricted to
satisfies
Equation 3.9

where t1, l := t1 rem l, t2, l := t2 rem l and ql := q rem l. By exhaustively trying all (that is, ≤ l2) possibilities for t1,l and t2,l, one can find out their actual values, that is, those values that cause the left side of Equation (3.9) to vanish (symbolically).
A result by Kampkötter [144] allows us to consider only the reduced divisors
of the form D = Div(a, b) with a(X) = X2 + a1X + a0 and b(X) = b1X + b0. There exists an ideal
of the polynomial ring
such that a reduced divisor D of this special form lies in
if and only if f(a1, a0, b1, b0) = 0 for all
. Thus the computation of the left side of Equation (3.9) may be carried out in the ring
. An explicit set of generators for
can be found in Kampkötter [144]. To sum up, we get a polynomial-time algorithm.
Working (modulo
) in the 4-variate polynomial ring
is, indeed, expensive. Use of Cantor’s division polynomials [43] essentially reduces the arithmetic to proceed with a single variable (instead of four). We do not explore further along this line, but only mention that for g = 2 Schoof’s algorithm employing division polynomials runs in time O(log9 q). Although this is a theoretical breakthrough, the prohibitively large exponent (9) in the running-time precludes the feasibility of using the algorithm in the range of interest in cryptography.
| 3.42 | Let G be a multiplicative group (not necessarily Abelian and/or finite) with identity e.
Let
|
So far we have met several situations where we needed random elements from a (finite) set, for example, the set
(or
) or the set
(or
) or the set
of
-rational points on an elliptic (or hyperelliptic) curve. By randomness, we here mean that each element
is equally likely to get selected, that is, if #S = n, then each element of S is selected with probability 1/n. Since elements of a set S of cardinality n can be represented as bit strings of length ≤ ⌈lg(n + 1)⌉, the problem of selecting a random element of S essentially reduces to the problem of generating (finite) random sequences of bits. A random sequence of bits is one in which every bit has a probability of 1/2 of being either 0 or 1 (irrespective of the other bits in the sequence).
Generating a (truly) random sequence of bits seems to be an impossible task. Some natural phenomena, such as electronic noise from a specifically designed integrated circuit, can be used to generate random bit sequences. However, such systems are prone to malfunctioning, often influenced by observations and are, of course, costly. A software solution is definitely the more practical alternative. Phenomena, like the system clock, the work load or memory usage of a machine, that can be captured by programs may be used to generate random bit sequences. But this strategy also suffers from various drawbacks. First of all the sequences generated by these methods would not be (truly) random. Moreover they are vulnerable to attacks by adversaries (for example, if a random bit generator is based on the system clock and if the adversary knows the approximate time when a bit sequence is generated using that generator, she will have to try only a few possibilities to generate the same sequence).
In order to obviate these difficulties, pseudorandom bit generators (PRBG) are commonly used. A bit string a0a1a2 . . . is generated by a PRBG following a specific strategy, which is more often that not a (mathematical) algorithm. The first bit a0 is based on certain initial value, called a seed, whereas for i ≥ 1the bit ai is generated as a predetermined function of some or all of the previous bits a0, . . . , ai–1. Since the resulting bit ai is now functionally dependent on the previous bits, the sequence is not at all random (but deterministic), but we are happy if the sequence a0a1a2 . . . looks or behaves random. A random behaviour of a sequence is often examined by certain well-known statistical tests. If a generator generates bit sequences passed by these tests, we call it a PRBG and sequences available from such a generator pseudorandom bit sequences. Various kinds of PRBGs are used for generating pseudorandom bit sequences. We won’t describe them here, but concentrate on a particular kind of generators that has a special significance in cryptography.
A PRBG for which no polynomial time algorithms exist (provably or not) in order to predict with probability significantly larger than 1/2 a bit in a sequence generated by the PRBG from a knowledge of the previous bits (but without the knowledge of the seed) is called a cryptographically strong (or secure) pseudorandom bit generator or a CSPRBG in short. Usually, an intractable computational problem (see Section 4.2) is at the heart of the security of a CSPRBG. As an example, we now explain the Blum–Blum–Shub (or BBS) generator.
|
Input: Output: A cryptographically strong pseudorandom bit sequence a0a1a2 . . . . Steps: Generate two (distinct) large primes p and q each ≡ 3 (mod 4). |
In Algorithm 3.37, we have used indices for the sequence xi for the sake of clarity. In an actual implementation, all indices may be removed, that is, one may use a single variable x to store and update the sequence xi. Furthermore, if there is no harm in altering the value of s, one might even use the same variable for s and x.
The cryptographic security of the BBS generator stems from the presumed intractability of factoring integers or of computing square roots modulo a composite integer (here n = pq) (see Exercise 3.43). Note that p, q and s have to be kept secret, whereas n can be made public. A knowledge of xm + 1 is also not expected to help an opponent and may too be made public. For achieving the desired level of secrecy, p and q should be of nearly equal size and the size of n should be sufficiently large (say, 768 bits or more). Generating each bit by the BBS generator involves a modular squaring and is, therefore, somewhat slow (compared to the traditional PRBGs which do not guarantee cryptographic security). However, the BBS generator can be used for moderately infrequent purposes, for example, for the generation of a session key. Moreover, a maximum of lg lg n (least significant) bits (instead of 1 as in the above snippet) can be extracted from each xi without degrading the security of the generator.
It is evident that any (infinite) sequence a0a1 · · · generated by the BBS generator must be periodic. As an extreme example, if s = 1, then the BBS generator outputs a sequence of one-bits only. We are interested in rather short (sub)sequences (of such infinite sequences). Therefore, it suffices if the length of the period is reasonably large (for a random seed s). This is guaranteed if one uses strong primes (Definition 3.5)
The way we have defined PRBG (or CSPRBG) makes it evident that the unpredictability of a pseudorandom bit sequence essentially reduces to that of the seed. Care should, therefore, be taken in order to choose the values of the seed. The seed need not be randomly or pseudorandomly generated, but should have a high degree of unpredictability, so that it is infeasible for an adversary to have a reasonably quick guess of it. As an example, assume that we intend to generate a suitable seed s for the BBS generator with a 1024-bit modulus n. If we employ for that purpose a specific algorithm (known to the opponent) using only the built-in random number generator of a standard compiler and if this built-in generator has a 32-bit seed σ, then there are only 232 possibilities for s, even when s itself is 1024 bits long. Thus an adversary has to try at most 232 (231 on an average) values of σ in order to guess the correct value of s. So we must add further unpredictability to the resulting seed value s. This can be done by setting the bits of s depending on several factors, like the system clock, the system load, the memory usage, keyboard inputs from a human user and so on. Each of such factors might not be individually completely unpredictable, but their combined effect should preclude the feasibility of an exhaustive search by the opponent. After all, we have 1024 bits of s to fill up and even if the total search space of possible values of s is as low as 2160, it would be impossible for the opponent to guess s in a reasonable span of time. Note that more often than not the values of the seed need not be remembered: that is, need not be regenerated afterwards. As a result, there is no harm in introducing unpredictability in s caused by certain factors that we would not ourselves be able to reproduce in future.
| 3.43 | With the notations of Algorithm 3.37 show that:
|
This chapter deals with the algorithmic details needed for setting up public-key cryptosystems. We study algorithms for selecting public-key parameters and for carrying out the basic cryptographic primitives. Algorithms required for cryptanalysis are dealt with in Chapters 4 and 7.
We start the chapter with a discussion on algorithms. Time and space complexities of algorithms are discussed first and the standard order notations are explained. Next we study the class of randomized algorithms which provide practical solutions to many computational problems that do not have known efficient deterministic algorithms. In the worst case, a randomized algorithm may take exponential running time and/or may output an incorrect answer. However, the probability of these bad behaviours of a randomized algorithm can be made arbitrarily low. We finally discuss reduction between computational problems. A reduction helps us conclude about the complexity of one problem relative to that for another problem.
Many popular public-key cryptosystems are based on working modulo big integers. These integers have sizes up to several thousand bits. One can not represent such integers with full precision by built-in data types supplied by common programming languages. So we require efficient ways of representing and doing arithmetic on big integers. We carefully deal with the implementation of the arithmetic on multiple-precision integers. We provide a special treatment of computation of gcd’s and extended gcd’s of integers. We utilize these arithmetic functions in order to implement modular arithmetic. Most public-key primitives involve modular exponentiations as the most time-consuming steps. In addition to the standard square-and-multiply algorithm, certain special tricks (including Montgomery exponentiation) that help speed up modular exponentiation are described at length in this section.
In the next section, we deal with some other number-theoretic algorithms. One important topic is the determination of whether a given integer is prime. The Miller–Rabin primality test is an efficient algorithm for primality testing. This algorithm is, however, randomized in the sense that it may declare some composite integers as primes. Using suitable choices of the relevant parameters, the probability of this error may be reduced to very low values (≤ 2–80). We also briefly introduce the deterministic polynomial-time AKS algorithm for primality testing. Since we can easily check for the primality of integers, we can generate random primes by essentially searching in a pool of randomly generated odd integers of a given size. Security in some cryptosystems require such random primes to possess some special properties. We present Gordon’s algorithm for generating cryptographically strong primes. The section ends with a study of the Tonelli–Shanks algorithm for computing square roots modulo a big prime.
Next, we concentrate on the implementation of the finite field arithmetic. The arithmetic of a field of prime cardinality p is the same as integer arithmetic modulo p and is discussed in detail earlier. The other finite fields that are of interest to cryptology are extension fields of characteristic 2. In order to study the arithmetic in these fields, one first requires arithmetic of the polynomial ring
. We discuss the basic operations in this ring. Next we talk about algorithms for checking irreducibility of polynomials and for obtaining (random) irreducible polynomials in
. If f(X) is such a polynomial of degree d, the arithmetic of the field
is the same as the arithmetic of
modulo the defining polynomial f(X). In order that a finite field
is cryptographically safe, we require q – 1 to have a prime factor of sufficiently big size (160 bits or more). Suppose that the factorization of q – 1 is provided. We discuss algorithms that compute the order of elements in
, that check if a given element is a generator of the cyclic group
, and that produce random generators of
. We end the study of finite fields by discussing a way to factor polynomials over finite fields. The standard algorithm comprising the three steps square-free factorization, distinct-degree factorization and equal-degree factorization is explained in detail. The exercises cover the details of an algorithm to compute the roots of polynomials over finite fields.
The arithmetic of elliptic curves over finite fields is dealt with next. Each operation in the elliptic curve group can be realized by a sequence of operations over the underlying field. The multiple of a point on an elliptic curve can be computed by a repeated double-and-add algorithm which is the same as the square-and-multiply algorithm for modular exponentiation, applied to an additive setting. We also discuss ways of selecting random points on elliptic curves. We then present two algorithms for counting points in an elliptic curve group. The SEA algorithm is suitable for curves over prime fields, whereas the Satoh–FGH algorithm works efficiently for curves over fields of characteristic 2. Once we can determine the order of an elliptic curve group, we can choose good elliptic curves for cryptographic usage.
In the next section, we study the arithmetic of hyperelliptic curves. We describe ways to represent elements of the Jacobian by pairs of polynomials and to do arithmetic on elements in this representation. We also discuss two algorithms for counting points in a Jacobian.
In the last section, we address the issue of generation of pseudorandom bits. We define the concept of cryptographically strong pseudorandom bit generator and provide an example, namely the Blum–Blum–Shub generator, which is cryptographically strong under the assumption that taking square roots modulo a big composite integer is computationally intractable.
The basic algorithmic issues discussed in Section 3.2 can be found in any text-book on data structure and algorithms. One can, for example, look at [7, 8, 61]. However, most of these elementary books do not talk about randomization and parallelization issues. We refer to [214] for a recent treatise on randomized algorithm. Also see Rabin’s papers [247, 248].
Complexity theory deals with classifying computational problems based on the known algorithms for solving them and on reduction of one problem to another. A simple introduction to complexity theory is the book [280] by Sipser. Chapter 2 of Koblitz’s book [154] is also a compact introduction to computational complexity meant for cryptographers. Also see [113].
Knuth’s book [147] is seemingly the best resource to look at for a comprehensive treatment on multiple-precision integer arithmetic. The proofs of correctness of many algorithms, that we omitted in Section 3.3, can be obtained in this book. This can be supplemented by the more advanced algorithms and important practical tips compiled in the book [56] by Cohen who designed a versatile computational number theory package known as PARI. Montgomery’s multiplication algorithm appeared in [210]. Also see Chapter 14 of Menezes et al. [194] for more algorithms and implementation issues.
Most of the important papers on primality testing [3, 4, 5, 116, 175, 204, 248, 287] have been referred in Section 3.4.1. Also see the survey [164] due to Lenstra and Lenstra. Gordon’s algorithm for generating strong primes appeared in [118]. The book [69] by Crandall and Pomerance is an interesting treatise on prime numbers, written with a computational perspective. The modular square-root Algorithm 3.16 is essentially due to Tonelli (1891). Algebraic number theory is treated from a computation perspective in Cohen [56] and Pohst and Zassenhaus [235].
Arithmetic on finite fields is discussed in many books including [179, 191]. Finite fields find recent applications in cryptography and coding theory and as such it is necessary to have efficient software and hardware implementations of finite field arithmetic. A huge number of papers have appeared in the last two decades, that talk about these implementation issues. Chapter 5 of Menezes [191] talks about optimal normal bases (Section 2.9.3 of the current book) which speeds up exponentiation in finite fields.
Factoring univariate polynomials over finite fields is a topic that has attracted a lot of research attention. Berlekamp’s Q-matrix method [21] is the first modern algorithm for this purpose. Computationally efficient versions of the algorithm discussed in Section 3.5.4 have been presented in Gathen and Shoup [104] and Kaltofen and Shoup [143]. The best-known running time for a deterministic algorithm for univariate factorization over finite fields is due to Shoup [272]. Shparlinski shows [274] that Shoup’s algorithm on a polynomial in
of degree d uses O(q1/2(log q)d2+ε) bit operations. This is fully exponential in log q.
The book [103] by von zur Gathen and Gerhard is a detailed treatise on many topics discussed in Sections 3.2 to 3.5 of the current book. Mignotte’s book [203] and the one by [108] by Geddes et al. also have interesting coverage. Also see Chapter 1 of Das [72] for a survey of algorithms for various computational problems on finite fields.
For elliptic curve arithmetic, look at Blake et al. [24], Hankerson et al. [123] and Menezes [192]. The first polynomial-time algorithm for counting points in elliptic curves over a finite field
has been proposed by Schoof. The original version of this algorithm runs in time O(log8 q). Later Elkies improved this running time to O(log6 q) for most of the elliptic curves. Further modifications due to Atkin gave rise to what we call the SEA algorithm. Schoof’s paper [264] talks about this point-counting algorithm and includes the modifications due to Elkies and Atkin. Also look at the article [85] by Elkies.
The Satoh–FGH algorithm is originally due to Satoh [256]. Fouquet et al. [94] have proposed a modification of Satoh’s algorithm to work for fields of characteristic 2. They also report large-scale implementations of the modified algorithm. Also see Fouquet et al. [95] and Skjernaa [281].
Recently, there has been lot of progress in point counting algorithms, in particular, for fields of characteristic 2. The most recent account of this can be found in Lercier and Lubicz [177]. The authors of this paper later reported implementation of their algorithm for counting points in an elliptic curve over
. This computation took nearly 82 hours on a 731 MHz Alpha EV6 processor. With these new developments, the point counting problem is practically solved for fields of small characteristics. However, for prime fields the known algorithms require further enhancements in order to be useful on a wide scale.
Finding good random elliptic curves for cryptographic purposes has also been an area of active research recently. With the current status of solving the elliptic curve discrete-log problem, the strategy we mentioned in Algorithm 3.33 is quite acceptable as long as good point-counting algorithms are at our disposal (they are now). For further discussions on this topic, we refer the reader to two papers [95, 176].
The appendix in Koblitz’s book [154] is seemingly the best source for learning hyperelliptic curve arithmetic. This is also available as a CACR technical report [195]. Gaudry and Harley’s paper [106] has more on the hyperelliptic curve point-counting algorithms we discussed in Section 3.7.2. Hess et al. [126] discuss methods for computing hyperelliptic curves for cryptographic usage.
Chapter 5 of Menezes et al. [194] is devoted to the generation of pseudorandom bits and sequences. This chapter lists the statistical tests for checking the randomness of a bit sequence. It also describes two cryptographically secure pseudorandom bit generators other than the BBS generator (Algorithm 3.37). The BBS generator was originally proposed by Blum et al. [26]. Also see Chapter 3 of Knuth [147].
| 4.1 | Introduction |
| 4.2 | The Problems at a Glance |
| 4.3 | The Integer Factorization Problem |
| 4.4 | The Finite Field Discrete Logarithm Problem |
| 4.5 | The Elliptic Curve Discrete Logarithm Problem |
| 4.6 | The Hyperelliptic Curve Discrete Logarithm Problem |
| 4.7 | Solving Large Sparse Linear Systems over Finite Rings |
| 4.8 | The Subset Sum Problem |
| Chapter Summary | |
| Sugestions for Further Reading |
It is insufficient to protect ourselves with laws; we need to protect ourselves with mathematics.
—Bruce Schneier
Most number theorists considered the small group of colleagues that occupied themselves with these problems as being inflicted with an incurable but harmless obsession.
—Arjen K. Lenstra and Hendrik W. Lenstra, Jr. [164]
All mathematics is divided into three parts: cryptography (paid for by CIA, KGB and the like), hydrodynamics (supported by manufacturers of atomic submarines) and celestial mechanics (financed by military and other institutions dealing with missiles, such as NASA).
—V. I. Arnold [13]
Public-key cryptographic systems are based on the apparent intractability of solving certain computational problems. However, there is very little evidence (if any) to corroborate the fact that algorithmic solutions to these problems are really very difficult. In spite of intensive studies over a long period, mathematicians and cryptologists have not come up with good algorithms, and it is their failures that justify the attempts to go on building secure cryptographic protocols based on these problems. The inherent assumption is that it would be infeasible for an opponent having practical amounts of computing resources to break these cryptosystems in a reasonable amount of time. Of course, the fear remains that someone may devise a fast algorithm and our cryptosystems may not pass the security guarantees. On the other extreme, it is also possible that someone proves the theoretical (and, hence, practical) impossibility of solving such a problem in a small (like polynomial) amount of time, and our cryptosystems become secure for ever (well, at least until other paradigms of computing, like the yet practically non-implementable quantum computing, solve the problems efficiently).
Whether you are a cryptographer or a cryptanalyst, it is important, if not essential, to be aware of the best methods available till date to attack the intractable problems of cryptography. In the first place, this knowledge quantifies practical security margins of the protocols, for instance, by dictating the determination of the input sizes as a function of the security requirements. Let us take a specific example: With today’s computing power and known integer factorization algorithms, we assert that a message that needs to be kept secret for a day or two may be encrypted by a 768-bit RSA key, whereas if one wants to maintain the security for a year or more, much longer keys are needed. The second point in studying the known cryptanalytic algorithms is that though general-purpose algorithms for solving these problems are still unknown, there are good algorithms for specific cases—the cases to be avoided by the designers of cryptographic applications. For example, there is a linear-time algorithm to attack cryptographic systems based on anomalous elliptic curves. The moral is that one must not employ these curves for cryptographic applications. The third reason for studying cryptanalytic algorithms is sentimental. The fact that we are still unable to answer some simply stated questions even after spending a reasonable amount of collective effort is indeed humbling. To worsen matters, cryptography thrives by exploiting this scientific inadequacy. Cryptanalysis, though seemingly unlawful from a cryptographer’s viewpoint, turns out to be a deep and beautiful area of applied mathematics. Ironically enough, it is quite common that the proponents of cryptographic protocols are themselves most interested to see the end. The journey goes on. . . Read on!
It may appear somewhat unusual to discuss the cryptanalytic algorithms prior to the cryptographic ones (see Chapter 5). We find this order convenient in that one must first know the intractable problems before applying them in cryptographic protocols. Moreover, the known attacks help one fix the parameters for use in the cryptographic algorithms. We defer till Chapter 7 other cryptanalytic techniques which do not directly involve solving these mathematical problems. The full power of the mathematical machinery of Chapters 2 and 3 is felt here in the science of cryptology. Understanding the various aspects of cryptology hence becomes easier.
Let us first introduce the intractable problems of cryptology. In the rest of this chapter, we describe some known methods to solve these problems.
The integer factorization problem (IFP) is perhaps the most studied one in the lot. We know that
is a unique factorization domain (UFD) (Definition 2.25, p 40), that is, given a natural number n there are (pairwise distinct) primes
(unique up to rearrangement) such that
for some
. Broadly speaking, the IFP is the determination of these pi and αi from the knowledge of n. Note that once the prime divisors pi of n are known, it is rather easy to compute the multiplicities αi = vpi(n) by trial divisions. It is, therefore, sufficient to find out the primes pi only. It is easy (Algorithm 3.13) to check if n is composite. If n is already prime, then its prime factorization is known. On the other hand, if n is known to be composite, an algorithm that splits n into two non-trivial factors, that is, that outputs n1,
with n = n1n2, n1 < n and n2 < n, can be repeatedly used to compute the complete factorization of n. It is enough that a non-trivial factor n1 of n is made available, the cofactor n2 = n/n1 is obtained by a single division. Finally, it is sometimes known a priori that n is the product of two (distinct odd) primes (as in the RSA protocols). In this case, the non-trivial split of n immediately gives the desired factorization of n. To sum up, the IFP can be stated in various versions, the presumed difficulty of all these versions being essentially the same.
| Problem 4.1 | General integer factorization problem Given an integer | ||||||||
| Problem 4.2 | Integer factorization problem (IFP) Given a composite integer | ||||||||
| Problem 4.3 | RSA integer factorization problem Given a product n = pq of two (distinct odd) primes p and q, find the prime divisors p and q of n. Recall that if is the prime factorization of n, then the Euler totient function φ(n) of n is . Thus, if the prime factorization of n is known, it is easy to compute φ(n). The converse is not known to be true in general. However, if n = pq is the product of two primes, factoring n is polynomial-time equivalent to computing φ(n) (Exercise 3.6).
| ||||||||
| Problem 4.4 | Totient problem Given a natural number | ||||||||
| Problem 4.5 | RSA totient problem Given a product n = pq of two (distinct odd) primes p and q, compute φ(n). Note that is also a UFD. Quite interestingly, it is computationally easy to find a non-trivial factor g of a polynomial (that is, 0 < deg g < deg f). One might, for example, use the polynomial-time deterministic L3 algorithm named after Lenstra, Lenstra and Lovasz (Section 4.8.2).
Square roots modulo an integer | ||||||||
| Problem 4.6 | Modular square root problem (SQRTP) Given a composite integer can be written as a = gx for some integer x unique modulo n. In this case, x is called the discrete logarithm or the index of a with respect to the base g and is denoted by indg a.
| ||||||||
| Problem 4.7 | Discrete logarithm problem (DLP) Given a finite cyclic group G, a generator g of G and an element is anyway cyclic. If , then the discrete logarithm or index of a with respect to the base g is an integer x unique modulo m := ord H such that a = gx. In this case, we denote such an integer x by indg a. On the other hand, if a ∉ H, then we say that the discrete logarithm indg a is not defined. Recall from Proposition 2.5 that if G is cyclic and if m is known, then checking if a belongs to H amounts to computing an exponentiation in G (that is, if and only if am is the identity of G). If G is not cyclic (or if m is not known), then it is not easy, in general, to develop such a nice criterion.
| ||||||||
| Problem 4.8 | Generalized discrete logarithm problem (GDLP) Given a finite Abelian group G and elements g, and g is an integer with gcd(g, n) = 1, then for every integer a we have indg a ≡ g–1a (mod n), where the modular inverse g–1 (mod n) can be computed efficiently using the extended gcd algorithm (Algorithm 3.8) on g and n. Also note that if G is cyclic and if each element of G is represented as indg a for a given generator g of G (see, for example, Section 2.9.3), then computing discrete logarithms in G to the base g is a trivial problem. In that case, it is also trivial to compute discrete logarithms (if existent) to any other base h (Exercise 4.3).
On the other hand, there are certain groups G in which discrete logarithms cannot be computed so easily; that is, computing indices in G may demand time not bounded by any polynomial in log n, where n = ord G. However, if the group operation on any two elements of G can be performed in time bounded by a polynomial in log n, then cryptographic protocols can be based on G. Typical candidates for such groups are listed below together with the conventional names for the DLP over such groups.
Note that if we are interested in computing indices to a base Another problem that is widely believed to be computationally equivalent to the DLP (at least for the groups mentioned in the above table) is called the Diffie–Hellman problem (DHP). Similar to the DLP, the DHP is presumably difficult to solve for the groups | ||||||||
| Problem 4.9 | Diffie–Hellman problem (DHP) Let G be a multiplicative group and let There are some other difficult problems on which cryptographic systems can be built. Problem 4.10 deserves specific mention in this regard. | ||||||||
| Problem 4.10 | Subset sum problem (SSP) Given a set A := {a1, . . . , an} of natural numbers and Some of the early cryptographic systems based on the SSP have succumbed to efficient (even polynomial-time) cryptanalytic attacks. However, some schemes have been proposed in the recent years, which seem to be resistant to such attacks, or, in other words, for which good attacks are not yet known. As a result, it is important to study the SSP in some detail. The SSP is often mapped to problems on lattices. Let v1, . . . , vn be linearly independent vectors in
| ||||||||
| Problem 4.11 | Shortest vector problem (SVP) Find a non-zero vector | ||||||||
| Problem 4.12 | Closest vector problem (CVP) Given a vector |
| 4.1 |
|
| 4.2 | Show that the following problems are polynomial-time reducible to the IFP.
|
| 4.3 | Let G be a finite cyclic group of order n and let g, g′ be two arbitrary generators of G.
|
| 4.4 | Let G be a finite cyclic multiplicatively written group of order n. An algorithm on G is said to be polynomial-time if it runs in time bounded above by a polynomial function of log n. Assume that the product of any two elements in G can be computed in polynomial time. Recall from Exercise 2.47 that . Show that the computation of an isomorphism is polynomial-time equivalent to computing discrete logarithms in G. (That is, assuming that we are given a (two-way) black box that returns in polynomial time or for every and , discrete logarithms in G can be computed in polynomial time. Conversely, if discrete logarithms with respect to a primitive element can be computed in polynomial time, then such a black box can be realized.)
|
| 4.5 | Let p be an (odd) prime and let g be a primitive root modulo p. Show that is a quadratic residue modulo p if and only if the index indg a is even. Hence, conclude that there is a polynomial-time (in log p) algorithm that computes the least significant bit of indg a, given any . More generally, let p – 1 = 2r s, where r, and s is odd. Show that there exists a polynomial-time algorithm that computes the r least significant bits of indg a given any . (This exercise shows that the DLP has a polynomial-time solution for Fermat primes Fn := 22n + 1. Note that Fn is prime for n = 0, 1, 2, 3, 4. No other Fermat primes are known.)
|
The integer factorization problem (IFP) (Problems 4.1, 4.2 and 4.3) is one of the most easily stated and yet hopelessly difficult computational problem that has attracted researchers’ attention for ages and most notably in the age of electronic computers. A huge number of algorithms varying widely in the basic strategy, mathematical sophistication and implementation intricacy have been suggested, and, in spite of these, factoring a general integer having only 1000 bits seems to be an impossible task today even using the fastest computers on earth.
It is important to note here that even proving rigorous bounds on the running times of the integer-factoring algorithms is quite often a very difficult task. In many cases, we have to be satisfied with clever heuristic bounds based on one or more reasonable but unprovable assumptions.
This section highlights human achievements in the battle against the IFP. Before going into the details of this account we want to mention some relevant points. Throughout this section we assume that we want to factor a (positive) integer n. Since such an integer can be represented by ⌈lg(n + 1)⌉ bits, the input size is taken to be lg n (or, ln n, or log n). Most modern factorization algorithms take time given by the following subexponential expression in ln n:
L(n, α, c) := exp((c + o(1))(ln n)α(ln ln n)1–α),
where 0 < α < 1 and c > 0 are constants. As described in Section 3.2, the smaller the value of α is, the closer the expression L(n, α, c) is to a polynomial expression (in ln n). If n is understood from the context, we write L(α, c) in place of L(n, α, c). Although the current best-known algorithms correspond to α = 1/3, the algorithms with α = 1/2 are also quite interesting. In this case, we use the shorter notation L[c] := L(1/2, c).
Henceforth we will use, without explicit mention, the notation q1 := 2, q2 := 3, q3 := 5, . . . to denote the sequence of primes. The concept of qt-smoothness (for some
) will often be referred to as B-smoothness, where B = {q1, . . . , qt}. Recall from Theorem 2.21 that smaller integers have higher probability of being B-smooth for a given B. This observation plays an important role in designing integer factoring algorithms. The following special case of Theorem 2.21 is often useful.
|
Let Before any attempt of factoring n is made, it is worthwhile to check for the primality of n. Since probabilistic primality tests (like Algorithm 3.13) are quite efficient, we should first run one such test before we are sure that n is really composite. Henceforth, we will assume that n is known to be composite. |
“Factoring in the dark ages” (a phrase attributed to Hendrik Lenstra) used fully exponential algorithms some of which are discussed now. Though the worst-case performances of these algorithms are quite poor, there are many situations when they might factor even a large integer quite fast. It is, therefore, worthwhile to spend some time on these algorithms.
A composite integer n admits a factor ≤
, that can be found by trial divisions of n by integers ≤
. This demands
trial divisions and is clearly impractical, even when n contains only 30 decimal digits. It is also true that n has a prime divisor ≤
. So it suffices to carry out trial divisions by primes only. Though this modified strategy saves us many unnecessary divisions, the asymptotic complexity does not reduce much, since by the prime number theorem the number of primes ≤
is about
. In addition, we need to have a list of primes ≤
or generate the primes on the fly, neither of which is really practical. A trade-off can be made by noting that an integer m ≥ 30 cannot be prime unless m ≡ 1, 7, 11, 13, 17, 19, 23, 29 (mod 30). This means that we need to perform the trial divisions only by those integers m congruent to one of these values modulo 30 and this reduces the number of trial divisions to about 25 per cent. Though trial division is not a practical general-purpose algorithm for factoring large integers, we recommend extracting all the small prime factors of n, if any, by dividing n by a predetermined set {q1, . . . , qt} of small primes. If n is indeed qt-smooth or has all prime factors ≤ qt except only one, then the trial division method completely factors n quite fast. Even when n is not of this type, trial division might reduce its size, so that other algorithms run somewhat more efficiently.
Pollard’s rho method solves the IFP in an expected O~(n1/4) time and is based on the birthday paradox (Exercise 2.172).
Let
be an (unknown) prime divisor of n and let
be a random map. We start with an initial value
and generate a sequence xi+1 = f(xi),
, of elements of
. Let yi denote the smallest non-negative integer satisfying yi ≡ xi (mod p). By the birthday paradox, after
iterates x1, . . . , xt are generated, we have a high chance that yi = yj, that is, xi ≡ xj (mod p) for some 1 ≤ i < j ≤ t. This means that p|(xi – xj) and computing gcd(xi – xj, n) splits n into two non-trivial factors with high probability. The method fails if this gcd is n. For a random n, this incident of having a gcd equal to n is of very low probability.
Algorithm 4.1 gives a specific implementation of this method. Computing gcds for all the pairs (xi – xj, n) is a massive investment of time. Instead we store (in the variable ξ) the values xr, r = 2t, for
and compute only gcd(xr+s – xr, n) for s = 1, . . . , r. Since the sequence yi,
, is ultimately periodic with expected length of period
, we eventually reach a t with r = 2t ≥ τ. In that case, the for loop detects a match. Typically, the update function f is taken to be f(x) = x2 –1 (mod n), which, though not a random function, behaves like one. Note that the iterates yi,
, may be visualized as being located on the Greek letter ρ as shown in Figure 4.1 (with a tail of the first μ iterates followed by a cycle of length τ). This is how this method derives its name.

Algorithm 4.1 takes an expected running time
. Since
, Pollard’s rho method runs in expected time
.
|
Input: A composite integer Output: A non-trivial factor of n. Steps: Choose a random element while (1) { |
Many modifications of Pollard’s rho method have been proposed in the literature. Perhaps the most notable one is an idea due to R. P. Brent. All these modifications considerably speed up Algorithm 4.1, though leaving the complexity essentially the same, that is,
. We will not describe these modifications in this book.
Pollard’s p – 1 method is dependent on the prime factors of p – 1 for a prime divisor p of n. Indeed if p – 1 is rather smooth, this method may extract a (non-trivial) factor of n pretty fast, even when p itself is quite large. To start with we extend the definition of smoothness as follows.
|
Let |
Let p be an (unknown) prime divisor of n. We may assume, without loss of generality, that
. Assume that p–1 is M-power-smooth. Then (p – 1)| lcm(1, . . . , M) and, therefore, for an integer a with gcd(a, n) = 1 (and hence with gcd(a, p) = 1), we have alcm(1,...,M) ≡ 1 (mod p) by Fermat’s little theorem, that is, d := gcd(alcm(1,...,M) – 1, n) > 1. If d ≠ n, then d is a non-trivial factor of n. In case we have d = n (a very rare occurrence), we may try with another a or declare failure.
The problem with this method is that p and so M are not known in advance. One may proceed by guessing successively increasing values of M, till the method succeeds. In the worst case, that is, when p is a safe prime, we have M = (p – 1)/2. Since
, this algorithm runs in a worst-case time of
. However, if M is quite small, then this algorithm is rather efficient, irrespective of how large p itself is.
In Algorithm 4.2, we give a variant of the p – 1 method, where we supply a predetermined value of the bound M. We also assume that we have at our disposal a precalculated list of all primes q1, . . . , qt ≤ M.
There is a modification of this algorithm known as Stage 2 or the second stage. For this, we choose a second bound M′ larger than M. Assume that p – 1 = rq, where r is M-power-smooth and q is a prime in the range M < q ≤ M′. In this case, Stage 2 computes with high probability a factor of n after doing an
operations as follows. When Algorithm 4.2 returns “failure” at the last step, it has already computed the value A := am (mod n), where
, ei = ⌊ln M/ln qi⌋. In this case, A has the multiplicative order of q modulo p, that is, the subgroup H of
generated by A has order q. We choose
random integers
. By the birthday paradox (Exercise 2.172), we have with high probability Ali ≡ Alj (mod p) for some i ≠ j. In that case, d := gcd(Ali – Alj, n) is divisible by p and is a desired factor of n (unless d = n, a case that occurs with a very low probability). In practice, we do not know q and so we determine s and the integers l1, . . . , ls using the bound M′ instead of q.
|
Input: A composite integer Output: A non-trivial factor d of n or “failure”. Steps: Select a random integer a, 1 < a < n. /* For example, we may take a := 2 */ if (d := gcd(a, n) ≠ 1) { Return d. } |
In another variant of Stage 2, we compute the powers Aqt+1 , . . . , Aqt′ (mod n), where qt+1, . . . , qt′ are all the primes qj satisfying M < qj ≤ M′. If p – 1 = rq is of the desired form, we would find q = qj for some t < j ≤ t′, and then gcd(Aq – 1, n), if not equal to n, would be a non-trivial factor of n.
In practice, one may try one’s luck using this algorithm for some M in the range 105 ≤ M ≤ 106 (and possibly also the second stage with 106 ≤ M′ ≤ 108) before attempting a more sophisticated algorithm like the MPQSM, the ECM or the NFSM.
As always, we assume that n is a composite integer and that p is an (unknown) prime divisor of n. Pollard’s p – 1 method uses an element a in the group
whose multiplicative order is p – 1. The idea of Williams’ p + 1 method is very similar, that is, it works with an element a, this time in
, whose multiplicative order is p + 1. If p + 1 is M-power-smooth for a reasonably small bound M, then computing d := gcd(ap+1 – 1, n) > 1 splits n with high probability.
In order to find an element
of order p + 1, we proceed as follows. Let α be an integer such that α2 – 4 is a quadratic non-residue modulo p. Then the polynomial
is irreducible and
. Let a,
be the two roots of f. Then ab = 1 and a + b = α. Since f(ap) = 0 (check it!) and since
, we have ap = b = a–1, that is, ap+1 = 1.
Unfortunately, p is not known in advance. Therefore, we represent elements of
as integers modulo n and the elements of
as polynomials c0 + c1X with c0,
. Multiplying two such elements of
is accomplished by multiplying the two polynomials representing these elements modulo the defining polynomial f(X), the coefficient arithmetic being that of
. This gives us a way to do exponentiations in
in order to compute am – 1 for a suitable m (for example, m = lcm(1, . . . , M)).
However, the absence of knowledge of p has a graver consequence, namely, it is impossible to decide whether α2 – 4 is a quadratic non-residue modulo p for a given integer α. The only thing we can do is to try several random values of α. This is justified, because if k random integers α are tried, then the probability that for all of these α the integers α2 – 4 are quadratic residues modulo p is only 1/2k.
The code for the p + 1 method is very similar to Algorithm 4.2. We urge the reader to complete the details. Since p3 – 1 = (p – 1)(p2 + p + 1), p4 – 1 = (p2 – 1)(p2 + 1) and so on, we can work in higher extensions like
,
to find elements of order p2 + p + 1, p2 + 1 and so on, and generalize the p ± 1 methods. However, the integers p2 + p + 1, p2 + 1, being large (compared to p ± 1), have smaller chance of being M-smooth (or M-power-smooth) for a given bound M.
The reader should have recognized why we paid attention to strong primes and safe primes (Definition 3.5, p 199, and Algorithm 3.14, p 200). Let us now concentrate on the recent developments in the IFP arena.
Carl Pomerance’s quadratic sieve method (QSM) is one of the (reasonably) successful modern methods of factoring integers. Though the number field sieve factoring method is the current champion, there was a time in the recent past when the quadratic sieve method and the elliptic curve method were known to be the fastest algorithms for solving the IFP.
We assume that n is a composite integer which is not a perfect square (because it is easy to detect if n is a perfect square and if so, we replace n by
). The basic idea is to reach at a congruence of the form
Equation 4.1

with x ≢ ±y (mod n). In that case, gcd(x – y, n) is a non-trivial factor of n.
We start with a factor base B = {q1, . . . , qt} comprising the first t primes and let
and J := H2 – n. Then H and J are each
and hence for a small integer c the right side of the congruence
(H + c)2 ≡ J + 2cH + c2 (mod n)
is also
. We try to factor T(c) := J + 2cH + c2 using trial divisions by elements of B. If the factorization is successful, that is, if T(c) is B-smooth, then we get a relation of the form
Equation 4.2

where
. (Note that T(c) ≠ 0, since n is assumed not to be a perfect square.) If all αi are even, say, αi = 2βi, then we get the desired Congruence (4.1) with
and y = H + c. But this is rarely the case. So we keep on generating other relations. After sufficiently many relations are available, we combine these together (by multiplication) to get Congruence (4.1) and compute gcd(x – y, n). If this does not give a non-trivial factor, we try to recombine the collected relations in order to get another Congruence (4.1). This is how Pomerance’s QSM works.
In order to find suitable combinations for yielding Congruence (4.1), we employ a method similar to Gaussian elimination. Assume that we have collected r relations of the form

We search for integers
such that the product

is a desired Congruence (4.1). The left side of this congruence is already a square. In order to make the right side a square too, we have to essentially solve the following system of linear congruences modulo 2:

This is a system of t equations over
in r unknowns β1, . . . , βr and is expected to have solutions, if r is slightly larger than t. Note that the values of αij modulo 2 are only needed for solving the above linear system. This means that we can have a compact representation of the coefficient matrix (αij) by packing 32 of the coefficients as bits per word. Gaussian elimination (over
) can be done using bit operations only.
The running time of this method can be derived using Corollary 4.1. Note that the integers T(c) that are tested for B-smoothness are O(n1/2) which corresponds to α = 1/2 in the corollary. We take qt = L[1/2] (so that t = L[1/2]/ ln L[1/2] = L[1/2] by the prime number theorem) which corresponds to β = 1/2. Assuming that the integers T(c) behave as random integers of magnitude O(n1/2), the probability that one such T(c) is B-smooth is L[–1/2]. Therefore, if L[1] values of c are tried, we expect to get L[1/2] relations involving the L[1/2] primes q1, . . . , qt. Combining these relations by Gaussian elimination is now expected to produce a non-trivial Congruence (4.1). This gives us a running-time of the order of L[3/2] for the relation collection stage. Gaussian elimination using L[1/2] unknowns also takes asymptotically the same time. However, each T(c) can have at most O(log n) distinct prime factors, implying that Relation (4.2) is necessarily sparse. This sparsity can be effectively exploited and the Gaussian elimination can be done essentially in time L[1]. Nevertheless, the entire procedure runs in time L[3/2], a subexponential expression in ln n.
In order to reduce the running time from L[3/2] to L[1], we employ what is known as sieving (and from which the algorithm derives its name). Let us fix a priori the sieving interval, that is, the values of c for which T(c) is tested for B-smoothness, to be –M ≤ c ≤ M, where M = L[1]. Let
be a small prime (that is, q = qi for some i = 1, . . . , t). We intend to find out the values of c such that qh|T(c) for small exponents h = 1, 2, . . . . Since T(c) = J + 2cH + c2 = (c + H)2 – n, the solvability for c of the condition qh|T(c) or of q|T(c) is equivalent to the solvability of the congruence (c + H)2 ≡ n (mod q). If n is a quadratic non-residue modulo q, no c satisfies the above condition. Consequently, the factor base B may comprise only those primes q for which n is a quadratic residue modulo q (instead of all primes ≤ qt). So we assume that q meets this condition. We may also assume that q n, because it is a good strategy to perform trial divisions of n by all the primes in B before we go for sieving. The sieving process makes use of an array
indexed by c. We initialize the array location
for each c, –M ≤ c ≤ M.
We explain the sieving process only for an odd prime q. The modifications for the case q = 2 are left to the reader as an easy exercise. The congruence x2 – n ≡ 0 (mod q) has two distinct solutions for x, say, x1 and
mod q. These correspond to two solutions for c of (H + c)2 ≡ n (mod q), namely, c1 ≡ x1 – H (mod q) and
(mod q). For each value of c in the interval –M ≤ c ≤ M, that is congruent either to c1 or
modulo q, we subtract ln q from
. We then lift the solutions x1 and
to the (unique) solutions x2 and
of the congruence x2 – n ≡ 0 (mod q2) (Exercise 3.29), compute c2 ≡ x2 – H (mod q2) and
(mod q2) and for each c in the range –M ≤ c ≤ M congruent to c2 or
modulo q2 subtract ln q from
. We then again lift to obtain the solutions modulo q3 and proceed as above. We repeat this process of lifting and subtracting ln q from appropriate locations of
until we reach a sufficiently large
for which neither ch nor
corresponds to any value of c in the range –M ≤ c ≤ M. We then choose another q from the factor base and repeat the procedure explained in this paragraph for this q.
After the sieving procedure is carried out for all small primes q in the factor base B, we check for which c, –M ≤ c ≤ M, the array location
is 0. These are precisely the values of c in the indicated range for which T(c) is B-smooth. For each smooth T(c), we then compute Relation (4.2) using trial division (by primes of B).
The sieving process replaces trial divisions (of every T(c) by every q) by subtractions (of ln q from appropriate
). This is intuitively the reason why sieving speeds up the relation collection stage. For a more rigorous analysis of the running time, note that in order to get the desired ci and
modulo qi for each
and for each i = 1, . . . , h we have either to compute a square root modulo q (for i = 1) or to solve a congruence (during lifting for i ≥ 2), each of which can be done in polynomial time. Also the bound h on the exponent of q satisfy
, that is, h = O(log n). Finally, there are L[1/2] primes in B. Therefore, the computation of the ci and
for all q and i takes a total of L[1/2] time.
Now, we count the total number ν of subtractions of different ln q values from all the locations of the array
. The size of
is 2M + 1. For each qi, we need to subtract ln q from at most 2 ⌈(2M + 1)/qi⌉ locations (for odd q), and we also have
. Therefore, ν is of the order of
, where Q is the maximum of all qi and is
, and where Hm,
, denote the harmonic numbers (Exercise 4.6). But Hm = O(ln m), and so ν = O(2(2M + 1) log n) = L[1], since M = L[1].
The logarithms ln q (as well as the initial array values ln |T(c)|) are irrational numbers and hence need infinite precision for storing. We, however, need to work with only crude approximations of these logarithms, say up to three places after the decimal point. In that case, we cannot take
as the criterion for selecting smooth values of T(c), because the approximate representation of logarithms leads to truncation (and/or rounding) errors. In practice, this is not a severe problem, because T(c) is not smooth if and only if it has a prime factor at least as large as qt+1 (the smallest prime not in B). This implies that at the end of the sieving operation the values of
for smooth T(c) are close to 0, whereas those for non-smooth T(c) are much larger (close to a number at least as large as ln qt+1). Thus we may set the selection criterion for smooth integers as
or as
ln qt+1. It is also possible to replace floating point subtraction by integer subtraction by doing the arithmetic on 1000 times the logarithm values. To sum up, the ν = L[1] subtractions the sieving procedure does would be only single-precision operations and hence take a total of L[1] time.
As mentioned earlier, Gaussian elimination with sparse equations can also be performed in time L[1]. So Pomerance’s algorithm with sieving takes time L[1].
Numerous modifications over this basic strategy speed up the algorithm reasonably. One possibility is to do sieving every time only for h = 1 and ignore all higher powers of q. That is, for every q we check which of the integers T(c) are divisible by q and then subtract ln q from the corresponding indices of the array
. If some T(c) is divisible by a higher power of q, this strategy fails to subtract ln q the required number of times. As a result, this T(c), even if smooth, may fail to pass the smoothness criterion. This problem can be overcome by increasing the cut-off from 1 (or 0.1 ln qt+1) to a value ξ ln qt for some ξ ≥ 1. But then some non-smooth T(c) will pass through the selection criterion in addition to some smooth ones that could not, otherwise, be detected. This is reasonable, because the non-smooth ones can be later filtered out from the smooth ones and one might use even trial divisions to do so. Experimentations show that values of ξ ≤ 2.5 work quite well in practice.
The reason why this strategy performs well is as follows. If q is small, for example q = 2, we should subtract only 0.693 from
for every power of 2 dividing T(c). On the other hand, if q is much larger, say q = 1,299,709 (the 105-th prime), then ln q ≈ 14.078 is large. But T(c) would not, in general, be divisible by a high power of this q. This modification, therefore, leads to a situation where the probability that a smooth T(c) is actually detected as smooth is quite high. A few relations would still be missed out even with the modified selection criterion, but that is more than compensated by the speed-up gained by the method. Henceforth, we will call this modified strategy as incomplete sieving and the original strategy (of considering all powers of q) as complete sieving.
Another trick known as large prime variation also tends to give more usable relations than are available from the original (complete or incomplete) sieving. In this context, we call a prime q′ large, if q′ ∉ B. A value of T(c) is often expected to be B-smooth except for a single large prime factor:
Equation 4.3

with q′ ∉ B. Such a value of T(c) can be easily detected. For example, incomplete sieving with the relaxed selection criterion is expected to give many such relations naturally, whereas for complete sieving, if the left-over of ln |T(c)| in
at the end of the subtraction steps is < 2 ln qt, then this must correspond to a large prime factor <
. Instead of throwing away an apparently unusable Equation (4.3), we may keep track of them. If a large prime q′ is not large enough (that is, not much larger than qt), then it might appear on the right side of Equation (4.3) for more than one values of c, and if that is the case, all these relations taken together now become usable for the subsequent Gaussian elimination stage (after including q′ in the factor base). This means that for each large prime occurring more frequently than once, the factor base size increases by 1, whereas the number of relations increases by at least 2. Thus with a little additional effort we enrich the factor base and the relations collected, and this, in turn, increases the probability of finding a useful Congruence (4.1), our ultimate goal. Viewed from another angle, the strategy of large prime variation allows us to start with smaller values of t and/or M and thereby speed up the sieving stage and still end up with a system capable of yielding the desired Congruence (4.1). Note that an increased factor base size leads to a larger system to solve by Gaussian elimination. But this is not a serious problem in practice, because the sieving stage (and not the Gaussian elimination stage) is usually the bottleneck of the running time of the algorithm.
It is natural that the above discussion on handling one large prime is applicable to situations where a T(c) value has more than one large prime factors, say q′ and q″. Such a T(c) value leads to a usable relation if
. This situation can be detected by a compositeness test on the non-smooth part of T(c). Subsequently, we have to factor the non-smooth part to obtain the two large primes q′ and q″. This is called two large prime variation. As the size of the integer n to be factored becomes larger, one may go for three and four large prime variations.
We will shortly encounter many other instances of sieving (for solving the IFP and the DLP). Both incomplete sieving and the use of large primes, if carefully applied, help speed up most of these sieving methods much in the same way as they do in connection with the QSM.
Easy computations (Exercise 4.11) show that the average and maximum of the integers |T(c)| checked for smoothness in the QSM are approximately M H and 2M H respectively. Though these values are theoretically
, in practice the factor of M (or 2M) makes the integers |T(c)| somewhat large leading to a poor yield of B-smooth integers for larger values of |c| in the sieving interval. The multiple-polynomial quadratic sieve method (MPQSM) applies a nice trick to reduce these average and maximum values. In the original QSM, we work with a single polynomial in c, namely,
T(c) = J + 2cH + c2 = (H + c)2 – n.
Now, we work with a more general quadratic polynomial

with W > 0 and V2 – UW = n. (The original T(c) corresponds to U = J, V = H and W = 1.) Then we have
, that is, in this case a relation looks like

This relation has an additional factor of W that was absent in Relation (4.2). However, if W is chosen to be a prime (possibly a large one), then the Gaussian elimination stage proceeds exactly as in the original method. Indeed in this case W appears in every relation and hence poses no problem. Only the integers
need to be checked for B-smoothness and hence should have small values. The sieving procedure (that is, computing the appropriate locations of
for subtracting ln q,
) for the general polynomial
is very much similar to that for T(c). The details are left to the reader as an easy exercise.
Let us now explain how we can choose the parameters U, V, W. To start with we fix a suitable sieving interval
and then choose W to be a prime close to
such that n is a quadratic residue modulo W. Then we compute a square root V of n modulo W (Algorithm 3.16) and finally take U = (V2 – n)/W. This choice clearly gives
and
. (Indeed one may choose 0 < V < W/2, but this is not an important issue.) Now, the maximum value of
becomes
. Thus even for
, this maximum value is smaller by a factor of
than the maximum value of |T(c)| in the original QSM. Moreover, we may choose somewhat smaller values of
(compared to M) by working with several polynomials
corresponding to different choices for the prime W. This is why the MPQSM, despite having the same theoretical running-time (L[1]) as the original QSM, runs faster in practice.
The QSM is highly parallelizable. More specifically, different processors can handle pairwise disjoint subsets of B during the sieving process. That is, each processor P maintains a local array
indexed by c, –M ≤ c ≤ M. The (local) sieving process at P starts with initializing all the locations
to 0. For each prime q in the subset BP of the factor base B assigned to P, one adds ln q to appropriate locations (and appropriate numbers of times). After all these processors finish local sieving, a central processor computes, for each c in the sieving interval, the value ln
(where the sum extends over all processors P which have done local sieving) based on which T(c) is recognized as smooth or not. For the multiple-polynomial variant of the QSM, different processors might handle different polynomials
and/or different subsets of B.
Adi Shamir has proposed the complete design of a (hardware) device, TWINKLE (The Weizmann INstitute Key Location Engine), that can perform the sieving stage of QSM a hundred to thousand times faster than software implementations in usual PCs available nowadays. This speed-up is obtained by using a high clock speed (10 GHz) and opto-electronic technology for detecting smooth integers. Each TWINKLE, if mass produced, has an estimated cost of US $5,000.
The working of TWINKLE is described in Figure 4.2. It uses an opaque cylinder of a height of about 10 inches and a diameter of about 6 inches. At the bottom of the cylinder is an array of LEDs,[1] each LED representing a prime in the factor base. The i-th LED (corresponding to the i-th prime qi) emits light of intensity proportional to log qi. The device is clocked and the i-th LED emits light only during the clock cycles c for which qi|T(c). The light emitted by all the active LEDs at a given clock cycle is focused by a lens and a photo-detector senses the total emitted light. If this total light exceeds a certain threshold, the corresponding clock cycle (that is, the time c) is reported to a PC attached to TWINKLE. The PC then analyses the particular T(c) for smoothness over {q1, . . . , qt} by trial division.
[1] An LED (light emitting diode) is an electronic device that emits light, when current passes through it. A GaAs(Gallium arsenide)-based LED emits (infra-red) light of wavelength ~870 nano-meters. In the operational range of an LED, the intensity of emitted light is roughly proportional to the current passing through the LED.

Thus, TWINKLE implements incomplete sieving by opto-electronic means. The major difference between TWINKLE’s sieving and software sieving is that in the latter we used an array of times (the c values) and the iteration went over the set of small primes. In TWINKLE, we use an array of small primes and allow time to iterate over the different values of c in the sieving interval –M ≤ c ≤ M. An electronic circuit in TWINKLE computes for each LED the cycles c at which that LED is expected to emanate light. That is to say that the i-th LED emits light only in the clock cycles c congruent modulo qi to any of the two solutions c1 and
of T(c) ≡ 0 (mod qi). Shamir’s original design uses two LEDs for each prime qi, one corresponding to c1, the other to
. In that case, each LED emits light at regularly spaced clock cycles and this simplifies the electronic circuitry (at the cost of having twice the number of LEDs).
Another difference of TWINKLE from software sieving is that here we add the log qi values (to zero) instead of subtracting them from log |T(c)|. By Exercise 4.11, the values |T(c)| typically have variations by small constant factors. Taking logs reduces this variation further and, therefore, comparing the sum of the active log qi values for a given c with a fixed predefined threshold (say log M H) independent of c is a neat way of bypassing the computation of all log |T(c)|, –M ≤ c ≤ M. (This strategy can also be used for software sieving.)
The reasons, why TWINKLE speeds up the sieving procedure over software implementations in conventional PCs, are the following:
Silicon-based PC chips at present can withstand clock frequencies on the order of 1 GHz. On the contrary a GaAs-based wafer containing the LED array can be clocked faster than 10 GHz.
There is no need to initialize the array
(to log |T(c)| or zero). Similarly at the end, there is no need to compare the final values in all these array locations with a threshold.
The addition of all the log qi values effective at a given c is done instantly by analog optical means. We do not require an explicit electronic adder.
Shamir [269] reports the full details of a VLSI[2] design of TWINKLE.
[2] very large-scale integration
H. W. Lenstra’s elliptic curve method (ECM) is another modern algorithm to solve the IFP and runs in expected time
, where p is the smallest prime factor of n (the integer to be factored). Since
, this running time is L[1] = L(n, 1/2, 1): that is, the same as the QSM. However, if p is small (that is, if p = O(nα) for some α < 1/2), then the ECM is expected to outperform the QSM, since the working of the QSM is incapable of exploiting smaller values of p.
As before, let n be a composite natural number having no small prime divisors and let p be the smallest prime divisor of n. For denoting subexponential expressions in ln p, we use the symbol Lp[c] := L(p, 1/2, c), whereas the unsubscripted symbol L[c] stands for L(n, 1/2, c). We work with random elliptic curves

and consider the group
of rational points on E modulo p. However, since p is not known a priori, we intend to work modulo n. The canonical surjection
allows us to identify the
-rational points on E as points on E over
. We now define a bound
and let B = {q1, . . . , qt} be all the primes smaller than or equal to M, so that by the prime number theorem (Theorem 2.20) #B ≈ M/ln
. Of course, p is not known in advance, so that M and B are also not known. We will discuss about the choice of M and B later. For the time being, let us assume that we know some approximate value of p, so that M and B can be fixed, at least approximately, ate the beginning of the algorithm.
By Hasse’s theorem (Theorem 2.48, p 106), the cardinality
satisfies
, that is, ν = O(p). If we make the heuristic assumption that ν is a random integer on the order O(p), then Corollary 4.1 tells us that ν is B-smooth with probability
. This assumption is certainly not rigorous, but accepting it gives us a way to analyse the running time of the algorithm.
If
random curves are tried, then we expect to find one B-smooth value of ν. In this case, a non-trivial factor of n can be computed with high probability as follows. Define ei := ⌊ln n/ln qi⌋ for i = 1, . . . , t, and
, where t is the number of primes in B. If ν is B-smooth, then ν|m and, therefore, for any point
we have
. Computation of mP involves computation of many sums P1 + P2 of points P1 := (h1, k1) and P2 := (h2, k2). At some point of time, we would certainly compute
, that is, P1 = –P2, that is, h1 ≡ h2 (mod p) and k1 ≡ –k2 (mod p). Since p was unknown, we worked modulo n, that is, the values of h1, h2, k1 and k2 are known modulo n. Let d := gcd(h1 – h2, n). Then p|d and if d ≠ n (the case d = n has a very small probability!), we have the non-trivial factor d of n. The computation of the coordinates of P1 + P2 (assuming P1 ≠ P2) demands computing the inverse of h1 – h2 modulo n (Section 2.11.2). However, if d = gcd(h1 – h2, n) ≠ 1, then this inverse does not exist and so the computation of P1 + P2 fails, and we have a non-trivial factor of n. If ν is B-smooth, then the computation of mP is bound to fail. The basic steps of the ECM are then as shown in Algorithm 4.3.
|
Input: A composite integer Output: A non-trivial divisor d of n. Steps: while (1) { |
Before we derive the running time of the ECM, some comments are in order. A random curve E is chosen by selecting random integers a and b modulo n. It turns out that taking a as single-precision integers and b = 1 works quite well in practice. Indeed one can keep on trying the values a = 0, 1, 2, . . . successively. Note that the curve E is an elliptic curve, that is, non-singular, if and only if δ := gcd(n, 4a3 + 27b2) = 1. However, having δ > 1 is an extremely rare occurrence and one might skip the computation of δ before starting the trial with a curve. The choice b = 1 is attractive, because in that case we may take the point P = (0, 1). In Section 3.6, we have described a strategy to find a random point on an elliptic curve over a field K. This is based on the assumption that computing square roots in K is easy. The same method can be applied to curves over
, but n being composite, it is difficult to compute square roots modulo n. So taking b to be 1 (or the square of a known integer) is indeed a pragmatic decision. After all, we do not need P to be a random point on E.
Recall that we have taken
, where ei = ⌊ln n/ln qi⌋. If instead we take ei := ⌊ln M/ln qi⌋ (where M is the bound mentioned earlier), the computation of mP per trial reduces much, whereas the probability of a successful trial (that is, a failure of computing mP) does not decrease much. The integer m can be quite large. One, however, need not compute m explicitly, but proceed as follows: first take Q0 := P and subsequently for each i = 1, . . . , t compute
. One finally gets mP = Qt.
Now comes the analysis of the running time of the ECM. We have fixed the parameter M to be
, so that B contains
small primes. The most expensive part of a trial with a random elliptic curve is the (attempted) computation of the point mP. This involves
additions of points. Since an expected number of
elliptic curves needs to be tried for finding a non-trivial factor of n, the algorithm performs an expected number of
additions of points on curves modulo n. Since each such addition can be done in polynomial time, the announced running time follows.
Note that
is the optimal running time of the ECM and can be shown to be achieved by taking
. But, in practice, p is not known a priori. Various ad hoc ways may be adopted to get around with this difficulty. One possibility is to use the worst-case bound
. For example, for factoring integers of the form n = pq, where p and q are primes of roughly the same size, this is a good approximation for p. Another strategy is to start with a small value of M and increase M gradually with the number of trials performed. For larger values of M, the probability of a successful trial increases implying that less number of elliptic curves needs to be tried, whereas the time per iteration (that is, for the computation of mP) increases. In other words, the total running time of the ECM is apparently not very sensitive to the choice of M.
A second stage can be used for each elliptic curve in order to increase the probability of a trial being successful. A strategy very similar to the second stage of the p – 1 method can be employed. The reader is urged to fill out the details. Employing the second stage leads to reasonable speed-up in practice, though it does not affect the asymptotic running time.
The ECM can be effectively parallelized, since different processors can carry out the trials, that is, computations of mP (together with the second stage) with different sets of (random) elliptic curves.
The number field sieve method (NFSM) is till date the most successful of all integer factoring algorithms. Under certain heuristic assumptions it achieves a running time of the form L(n, 1/3, c), which is better than the L(n, 1/2, c′) algorithms described so far. The NFSM was first designed for integers of a special form. This variant of the NFSM is called the special NFS method (SNFSM) and was later modified to the general NFS method (GNFSM) that can handle arbitrary integers. The running time of the SNFSM has c = (32/9)1/3 ≈ 1.526, whereas that for the GNFSM has c = (64/9)1/3 ≈ 1.923. For the sake of simplicity, we describe only the SNFSM in this book (see Cohen [56] and Lenstra and Lenstra [165] for further details).
We choose an integer
and a polynomial
such that f(m) ≡ 0 (mod n). We assume that f is irreducible in
; otherwise a non-trivial factor of f yields a non-trivial factor of n. Consider the number field
. Let d := deg f be the degree of the number field K. We use the complex embedding
for some root
of f. The special NFS method makes certain simplifying assumptions:
f is monic, so that
.
is monogenic.
is a PID.
Consider the ring homomorphism

This is well-defined, since f(m) ≡ 0 (mod n). We choose small coprime (rational) integers a, b and note that Φ(a+bα) = a + bm (mod n). Let
be a predetermined smoothness bound. Assume that for a given pair (a, b), both a + bm and a + bα are B-smooth. For the rational integer a + bm, this means

being the set of all rational primes ≤ B. On the other hand, smoothness of the algebraic integer a + bα means that the principal ideal
is a product of prime ideals of prime norms ≤ B; that is, we have a factorization

where
is the set of all prime ideals of
of prime norms ≤ B. By assumption, each
is a principal ideal. Let
denote a set of generators, one for each ideal in
. Further let
denote a set of generators of the multiplicative group of units of
. The smoothness of a + bα can, therefore, be rephrased as
Equation 4.4

Applying Φ then yields

This is a relation for the SNFSM. After
relations are available, Gaussian elimination modulo 2 (as in the case of the QSM) is expected to give us a congruence of the form
x2 ≡ y2 (mod n),
and gcd(x – y, n) is possibly a non-trivial factor of n. This is the basic strategy of the SNFSM. We clarify some details now.
There is no clearly specified way to select the polynomial f for defining the number field
. We require f to have small coefficients. Typically, m is much smaller than n and one writes the expansion of n in base m as n = btmt + bt–1mt–1 + ··· + b1m + b0 with 0 ≤ bi < m. Taking f(X) = btXt + bt–1Xt–1 + ··· + b1X + b0 is often suggested.
For integers n of certain special forms, we have natural choices for f. The seminal paper on the NFSM by Lenstra et al. [167] assumes that n = re – s for a small integer
and a non-zero integer s with small absolute value. In this case, one first chooses a small extension degree
and sets m := r⌈e/d⌉ and f(X) := Xd – sr⌈e/d⌉d–e. Typically, d = 5 works quite well in practice. Lenstra et al. report the implementation of the SNFSM for factoring n = 3239 – 1. The parameters chosen are d = 5, m = 348 and f(X) = X5 – 3. In this case,
is monogenic and a PID.

Take a small rational prime
. From Section 2.13, it follows that if
is the factorization of the canonical image of f(X) modulo p, then
, i = 1, . . . , r, are all the primes lying over p. We have also seen that
,
, is prime if and only if di = 1, that is,
for some
. Thus, each root of
in
corresponds to a prime ideal of
of prime norm p.
To sum up, a prime ideal in
of prime norm is specified by a pair (p, cp) of values (in
). We denote this ideal by
. All ideals in
can be precomputed by finding the roots of the defining polynomial f(X) modulo the small primes p ≤ B. One can use the root-finding algorithms of Exercise 3.29.

Constructing a set
of generators of ideals in
is a costly operation. We have just seen that each prime ideal
of
corresponds to a pair (p, cp) and is a principal ideal by assumption. A generator
of such an ideal
is an element of the form
,
, with N(gp,cp) = ±p and
(mod p). Algorithm 4.4 (quoted from Lenstra et al. [167]) computes the generators gp,cp for all relevant pairs (p, cp). The first for loop exhaustively searches over all small polynomials h(α) in order to locate for each (p, cp) an element
of norm kp with |k| as small as possible. If the smallest k (stored in ap,cp) is ±1,
is already a generator gp,cp of
, else some additional adjustments need to be performed.
|
Choose two suitable positive constants aB and CB (depending on B and K). Initialize an array ap,cp := aB indexed by the relevant pairs (p, cp). for each |

Let K have the signature (r1, r2). Write ρ = r1 + r2 – 1. By Dirichlet’s unit theorem, the group
of units of
is generated by an appropriate root u0 of unity and ρ multiplicatively independent[3] elements u1, . . . , uρ of infinite order. Each unit u of
has norm N(u) = ±1. Thus, one may keep on generating elements
, hi small integers, of norm ±1, until ρ independent elements are found. Many elements of
are available as a by-product during the construction of
, which involves the computation of norms of many elements in
. For a more general exposition on this topic, see Algorithm 6.5.9 of Cohen [56].
[3] The elements u1, . . . , uρ in a (multiplicatively written) group are called (multiplicatively) independent if
,
, is the group identity only for n1 = ··· = nρ = 0.
In order to compute the factorization of Equation (4.4), we first factor the integer N(a + bα) = bdf(–a/b). If
is the prime factorization of 〈a + bα〉 with pairwise distinct prime ideals
of
, by the multiplicative property of norms we obtain
.
Now, let p ≤ B be a small prime. If p N(a + bα), it is clear that no prime ideal of
of norm p (or a power of p) appears in the factorization of 〈a + bα〉. On the other hand, if p| N(a + bα), then
for some
. The assumption
implies that the inertial degree of
is 1: that is,
, that is,
, that is, there is a cp with f(cp) ≡ 0 (mod p) such that the prime ideal
corresponds to the pair (p, cp). In this case, we have a ≡ –cpb (mod p). Assume that another prime ideal
of norm p appears in the prime factorization of 〈a + bα〉. If
corresponds to the pair p,
, then
. Since cp and
are distinct modulo p, it follows that p|gcd(a, b), a contradiction, since gcd(a, b) = 1. Thus, a unique ideal
of norm p appears in the factorization of 〈a + bα〉. Moreover, the multiplicity of
in the factorization of 〈a + bα〉 is the same as the multiplicity vp(N(a + bα)).
Thus, one may attempt to factorize N(a + bα) using trial divisions by primes ≤ B. If the factorization is successful, that is, if N(a + bα) is B-smooth, then for each prime divisor p of N(a + bα) we find out the ideal
and its multiplicity in the factorization of 〈a + bα〉 as explained above. Since we know a generator of each
, we eventually compute a factorization
, where u is a unit in
. What remains is to factor u as a product of elements of
. We don’t discuss this step here, but refer the reader to Lenstra et al. [167].
In the QSM, we check the smoothness of a single integer T(c) per trial, whereas for the NFS method we do so for two integers, namely, a + bm and N(a + bα). However, both these integers are much smaller than T(c), and the probability that they are simultaneously smooth is larger than the probability that T(c) alone is smooth. This accounts for the better asymptotic performance of the NFS method compared to the QSM.
One has to check the smoothness of a + bm and N(a + bα) for each coprime a, b in a predetermined interval. This check can be carried out efficiently using sieves. We have to use two sieves, one for filtering out the non-smooth a + bm values and the other for filtering out the non-smooth a + bα values. We should have gcd(a, b) = 1, but computing gcd(a, b) for all values of a and b is rather costly. We may instead use a third sieve to throw away the values of a for a given b for which gcd(a, b) is divisible by primes ≤ B. This still leaves us with some pairs (a, b) for which gcd(a, b) > 1. But this is not a serious problem, since such values are small in number and can be later discarded from the list of pairs (a, b) selected by the smoothness test.
We fix b and allow a to vary in the interval –M ≤ a ≤ M for a predetermined bound M. We use an array
indexed by a. Before the first sieve we initialize this array to
. We may set
for those values of a for which gcd(a, b) is known to be > 1 (where +∞ stands for a suitably large positive value). For each small prime p ≤ B and small exponent
, we compute a′ := –mb (mod ph) and subtract ln p from
for each a, –M ≤ a ≤ M, with a ≡ a′ (mod ph). Finally, for each value of a for which
is not (close to) 0, that is, for which a + mb is not B-smooth, we set
. For the other values of a, we set
. One may use incomplete sieving (with a liberal selection criterion) during the first sieve.
The second sieve proceeds as follows. We continue to work with the value of b fixed before the first sieve and with the array
available from the first sieve. For each prime ideal
, we compute a″ := –bcp (mod p) and subtract ln p from each location
for which a ≡ a″ (mod p). For those a for which
ln B for some real ξ ≥ 1, say ξ = 2, we try to factorize a + bα over
and
. If the attempt is successful, both a + bm and a + bα are smooth. This second sieve is an incomplete one and, therefore, we must use a liberal selection criterion.
For deriving the running time of the SNFSM, take d ≤ (3 ln n/(2 ln ln n))1/3, m = L(n, 2/3, (2/3)1/3), B = L(n, 1/3, (2/3)2/3) and M = L(n, 1/3, (2/3)2/3). From the prime number theorem and from the fact that d is small, it follows that both
and
have the same asymptotic bound as B. Also
meets this bound. We then have L(n, 1/3, (2/3)2/3) unknown quantities on which we have to do Gaussian elimination.
The integers a + mb have absolute values ≤ L(n, 2/3, (2/3)1/3). If the coefficients of f are small, then
|N(a + bα)| = |bdf(–a/b)| ≤ L(n, 1/3, d · (2/3)2/3) = L(n, 2/3, (2/3)1/3).
Under the heuristic assumption that a + mb and N(a + bα) behave as random integers of magnitude L(n, 2/3, (2/3)1/3), the probability that both these are B-smooth turns out to be L(n, 1/3, –(2/3)2/3), and so trying L(n, 1/3, 2(2/3)2/3) pairs (a, b) is expected to give us L(n, 1/3, (2/3)2/3) relations. The entire sieving process takes time L(n, 1/3, 2(2/3)2/3), whereas solving a sparse system in L(n, 1/3, (2/3)2/3) unknowns can be done essentially in the same time. Thus the running time of the SNFSM is L(n, 1/3, 2(2/3)2/3) = L(n, 1/3, (32/9)1/3).
| 4.6 | For , define the harmonic numbers . Show that for each we have ln(m + 1) ≤ Hm ≤ 1 + ln m. [H] Deduce that the sequence Hm, , is not convergent. (Note, however, that the sequence Hm – ln m, , converges to the constant γ = 0.57721566 . . . known as the Euler constant. It is not known whether γ is rational or not.)
| |
| 4.7 | Let k, c, c′, α be positive constants with α < 1. Prove the following assertions.
| |
| 4.8 | Let us assume that an adversary C has computing power to carry out 1012 floating point operations (flops) per second. Let A be an algorithm that computes a certain function P(n) using T(n) flops for an input . We say that it is infeasible for C to compute P(n) using algorithm A, if it takes ≥ 100 years for the computation or, equivalently, if T(n) ≥ 3.1536 × 1021. Find, for the following expressions of T(n), the smallest values of n that make the computation of P(n) by Algorithm A infeasible: T(n) = (ln n)3, T(n) = (ln n)10, T(n) = n, , T(n) = n1/4, T(n) = L[2], T(n) = L[1], T(n) = L[0.5], T(n) = L(n, 1/3, 2) and T(n) = L(n, 1/3, 1). (Neglect the o(1) terms in the definitions of L( ) and L[ ].)
| |
| 4.9 | Let be an odd integer and let r be the total number of distinct (odd) prime divisors of n. Show that for each integer a the congruence x2 ≡ a2 (mod n) has ≤ 2r solutions for x modulo n. If gcd(a, n) = 1, show that this congruence has exactly 2r solutions. [H]
| |
| 4.10 | Show that the problems IFP and SQRTP are probabilistic polynomial-time equivalent. [H] | |
| 4.11 | In this exercise, we use the notations introduced in connection with the Quadratic Sieve method for factoring integers (Section 4.3.2). We assume that M ≪ H, since , whereas M = L[1].
| |
| 4.12 | Reyneri’s cubic sieve method (CSM) Suppose that we want to factor an odd integer n. Suppose also that we know a triple (x, y, z) of integers satisfying x3 ≡ y2z (mod n) with x3 ≠ y2z (as integers). We assume further that |x|, |y|, |z| are all O(pξ) for some ξ, 1/3 < ξ < 1/2.
| |
| 4.13 | Sieve of Eratosthenes Two hundred years before Christ, Eratosthenes proposed a sieve (Algorithm 4.5) for computing all primes between 1 and a positive integer n. Prove the correctness of this algorithm and compute its running time. [H]
Algorithm 4.5. The sieve of Eratosthenes
| |
| 4.14 | This exercise proposes an adaptation of the sieve of Eratosthenes for computing a random prime of a given bit length l. In Section 3.4.2, we have described an algorithm for this computation, that generates random (odd) integers of bit length l and checks the primality of each such integer, until a (probable) prime is found. An alternative strategy is to generate a random l-bit odd integer n and check the integers n, n + 2, n + 4, . . . for primality.
|
The discrete logarithm problem (DLP) has attracted somewhat less attention of the research community than the IFP has done. Nonetheless, many algorithms exist to solve the DLP, most of which are direct adaptations of algorithms for solving the IFP. We start with the older algorithms collectively known as the square root methods, since the worst-case running time of each of these is
for the field
. The new family of algorithms based on the index calculus method provides subexponential solutions to the DLP and is described next. For the sake of simplicity, we assume in this section that we want to compute the discrete logarithm indg a of
with respect to a primitive element g of
. We concentrate only on the fields
, p odd prime, and
,
, since non-prime fields of odd characteristics are only rarely used in cryptography.
Square root methods are applicable to any finite (cyclic) group. To avoid repetitions we provide here a generic description. That is, we assume that G is a multiplicatively written group of order n and
. The identity of G is denoted as 1. It is not necessary to assume that G is cyclic or that g is a generator of G. However, these assumptions are expected to make the descriptions of the algorithms somewhat easier and hence we will stick to them. The necessary modifications for non-cyclic groups G or non-primitive elements g are rather easy and the reader is requested to fill out the details. We assume that each element of G can be represented by O(lg n) bits (so that the input size is taken to be lg n) and that multiplications, exponentiations and inverses in G can be computed in time polynomially bounded by this input size.
Let us assume that the elements of G can be (totally) ordered in such a way that comparing two elements of G with respect to this order can be done in polynomial time of the input size. For example, a natural order on
is the relation ≤ on
. Note that k elements of G can be sorted (under the above order) using O(k log k) comparisons.
Let
. Then d := indg a is uniquely determined by two (nonnegative) integers d0, d1 < m such that d = d0 + d1m (the base m representation of d). In Shanks’ baby-step–giant-step (BSGS) method, we compute d0 and d1 as follows. To start with we compute a list of pairs (d0, gd0) for d0 = 0, 1, . . . , m – 1 and store these pairs in a table sorted with respect to the second coordinate (the baby steps). Now, for each d1 = 0, 1, . . . , m – 1, we compute g–md1 (the giant steps) and search if ag–md1 is the second coordinate of a pair (d0, gd0) of some entry in the table mentioned above. If so, we have found the desired d0 and d1, otherwise we try with the next value of d1. An optimized implementation of this strategy is given as Algorithm 4.6.
The computation of all the elements of T and sorting T can be done in time O~(m). If we use a binary search algorithm (Exercise 4.15), then the search for h in T can be performed using O(lg m) comparisons in G. Therefore, the giant steps also take a total running time of O~(m). Since
, the BSGS method runs in time
. The memory requirement of the BSGS (that is, of the table T) is
. Thus this method becomes impractical, even when n contains as few as 30 decimal digits.
Pollard’s rho method for solving the DLP is similar in idea to the method with the same name for solving the IFP. Let
be a random map and let us generate a sequence of tuples
,
, starting with a random (r1, s1) and subsequently computing (ri+1, si+1) = f(ri, si) for each i = 1, 2, . . . . The elements
for i = 1, 2, . . . can then be thought of as randomly chosen ones from G. By the birthday paradox (Exercise 2.172), we expect to get a match bi = bj for some i ≠ j, after
of the elements b1, b2, . . . are generated. But then we have ari–rj = gsj –si, that is, indg a ≡ (ri – rj)–1(sj – si) (mod n), provided that the inverse exists, that is, gcd(ri – rj, n) = 1. The expected running time of this algorithm is
, the same as that of the BSGS method, but the storage requirement drops to only O(1) elements of G.
|
Input: G, g and a as described above. Output: d = indg a. Steps: n : = ord(G), /* Baby steps */ Initialize T to an empty table. Insert the pairs (0, 1) and (1, g) in T. h := g. |
The Pohlig–Hellman (PH) method assumes that the prime factorization of n = ord
is known. Since d := indg a is unique modulo n, we can easily compute d using the CRT from a knowledge of d modulo
, j = 1, . . . , r. So assume that p is a prime dividing n and that
. Let d0 + d1p + ··· + dα–1pα–1,
, be the p-ary representation of d modulo pα. The p-ary digits d0, d1, . . . , dα–1 can be successively computed as follows.
Let H be the subgroup of G generated by h := gn/p. We have ord H = p (Exercise 2.44). For the computation of di,
, from the knowledge of d0, . . . , di–1, consider the element
But ord(gn/pi+1) = pi+1, so that
Thus,
and di = indh b, that is, each di can be obtained by computing a discrete logarithm in the group H of order p (using the BSGS method or the rho method).
From the prime factorization of n, we see that the computations of d modulo
for all j = 1, . . . , r can be done in time
, q being the largest prime factor of n, since αj and r are O(log n). Combining the values of d modulo
by the CRT can be done in polynomial time (in log n). In the worst case, q = O(n) and the PH method takes time
which is fully exponential in the input size log n. But if q (or, equivalently, all the prime divisors p1, . . . , pr of n) are small, then the PH method runs quite efficiently. In particular, if q = O((log p)c) for some (small) constant c, then the PH method computes discrete logarithms in G in polynomial time. This fact has an important bearing on the selection of a group G for cryptographic applications, namely, n = ord G is required to have a suitably large prime divisor, so that the PH method cannot compute discrete logarithms in G in feasible time.
The index calculus method (ICM) is not applicable to all (cyclic) groups. But whenever it applies, it usually leads to the fastest algorithms to solve the DLP. Several variants of the ICM are used for prime finite fields and also for finite fields of characteristic 2. On such a field
they achieve subexponential running times of the order of L(q, 1/2, c) = L[c] or L(q, 1/3, c) for positive constants c. We start with a generic description of the ICM. We assume that g is a primitive element of
and want to compute d := indg a for some
.
To start with we fix a suitable subset B = {b1, . . . , bt} of
of small cardinality, so that a reasonably large fraction of the elements of
can be expressed easily as products of elements of B. We call B a factor base. In the ICM, we search for relations of the form
Equation 4.5

for integers α, β, γi and δi. This gives us a linear congruence
Equation 4.6

The ICM proceeds in two[4] stages. In the first stage, we compute di := indg bi for each element bi in the factor base B. For that, we collect Relation (4.5) with β = 0. When sufficiently many relations are available, the corresponding system of linear Congruences (4.6) is solved mod q – 1 for the unknowns di. In the second stage, a single relation with gcd(β, q – 1) = 1 is found. Substituting the values of di available from the first stage yields indg a.
[4] Some authors prefer to say that the number of stages in the ICM in actually three, because they decouple the congruence-solving phase from the first stage. This is indeed justified, since implementations by several researchers reveal that for large fields this linear algebra part often demands running time comparable to that needed by the relation collection part. Our philosophy is to call the entire precomputation work the first stage. Now, although it hardly matters, it is up to the reader which camp she wants to join.
Note that as long as q (and g) are fixed, we don’t have to carry out the first stage every time the discrete logarithm of an element of
is to be computed. If the values di, i = 1, . . . , t, are stored, then only the second stage needs to be carried out for computing the indices of any number of elements of
. This is the reason why the first stage of the ICM is often called the precomputation stage.
In order to make the algorithm more concrete, we have to specify:
how to choose a factor base B;
how to find Relation (4.5);
In the rest of this section, we describe variants of the ICM based on their strategies for selecting the factor base and for collecting relations. We discuss the third issue in Section 4.7.
Let
be a finite field of prime cardinality. For cryptographic applications, p should be quite large, say, of length around thousand bits or more, and so naturally p is odd. Elements of
are canonically represented as integers between (and including) 0 and p–1. The equality x = y in
means equality of two integers in the range 0, . . . , p–1, whereas x ≡ y (mod p) means that the two integers x and y may be different, but their equivalence classes in
are the same.
In the basic version of the ICM, we choose the factor base B to comprise the first t primes q1, . . . , qt, where t = L[ζ]. (The optimal value of ζ is determined below.) In the first stage, we choose random values of
and compute gα. Any integer representing gα can be considered, but we think of
as an integer in {1, . . . , p – 1}. We then try to factorize gα using trial divisions by elements of the factor base B. If gα is found to be B-smooth, then we get a desired relation for the first stage, namely,

If gα is not B-smooth, we try another random α and proceed as above. After sufficiently many relations are available, we solve the resulting system of linear congruences modulo p – 1. This gives us di := indg qi for i = 1, . . . , t.
In the second stage, we again choose random integers α and try to factorize agα completely over B. Once the factorization gets successful, that is, we have
, we compute
.
In order to optimize the running time, we note that the relation collection phase of the first stage is usually the bottleneck of the algorithm. If ζ (or equivalently t) is chosen to be too small, then finding B-smooth integers would be very difficult. On the other hand, if ζ is too large, then we have to collect too many relations to have a solvable linear system of congruences. More explicitly, since the integers gα can be regarded as random integers of the order of p, the probability that gα is B-smooth is
(Corollary 4.1). Thus we expect to get each relation after
random values of α are tried. Since for each α we need to carry out L[ζ] divisions by elements of the factor base B (the exponentiation gα can be done in polynomial time and hence can be neglected for this analysis), each relation can be found in expected time
. Now, in order to solve for di, i = 1, . . . , t, we must have (slightly more than) t = L[ζ] relations. Thus, the relation collection phase takes a total time of
. It can be easily checked that
is minimized for ζ = 1/2. This gives a running time of L[2] for the relation collection phase.
Since each gα is a positive integer less than p, it is evident that it can have at most O(log p) prime divisors. In other words, the congruences collected are necessarily sparse. As we will see later, such a system can be solved in time O~(t2), that is, in time L[1] for ζ = 1/2.
In the second stage, it is sufficient to have a single relation to compute d = indg a. As explained before, such a relation can be found in expected time
. Thus the total running time of the basic ICM is L[2].
The second stage of the basic ICM is much faster than the first stage. In fact, this is a typical phenomenon associated with most variants of the ICM. Speeding up the first stage is, therefore, our primary concern.
Each step in the search for relations consists of an exponentiation (gα) modulo p followed by trial divisions by q1, . . . , qt. Now, gα may be non-smooth, but gα + kp (integer sum) may be smooth for some
. Once gα is computed and found to be non-smooth, one can check for the smoothness of gα + kp for k = ±1, ±2, . . ., before another α is tried. Since these integers are available by addition (or subtraction) only (which is much faster than exponentiation), this strategy tends to speed up the relation collection phase. Moreover, information about the divisibility of gα + kp by qi can be obtained from that of gα + (k – 1)p by qi. So using suitable tricks one might reduce the cost of trial divisions. Two such possibilities are explored in Exercise 4.18. Though these modifications lead to some speed-up in practice, they have the disadvantage that as |k| increases, the size of |gα+kp| also increases, so that the chance of getting smooth candidates reduces, and therefore using high values of k does not effectively help.
There are other heuristic modification schemes that help us gain some speed-up in practice. For example, the large prime variation as discussed in connection with the QSM applies equally well here. Another trick is to use the early abort strategy. A random B-smooth integer has higher probability of having many small prime factors rather than a few large prime factors. This observation can be incorporated in the smoothness tests as follows. Let us assume that we do trial divisions by the small primes in the order q1, q2, . . . , qt. After we do trial divisions of a candidate x by the first t′ < t primes (say, t′ ≈ t/2), we check how far we have been able to reduce x. If the reduction of x is already substantial, we continue with the trial divisions by the remaining primes qt′+1, . . . , qt. In the other case, we abort the smoothness test for x and try another candidate. Obviously, this strategy prematurely rejects some smooth candidates (which are anyway rather small in number), but since most candidates are expected to be non-smooth, it saves a lot of trial divisions in the long run. Determination of t′ and/or the quantization of “substantial” reduction actually depend on practical experiences. With suitable choices one may expect to get a speed-up of about 2. The drawback with the early abort strategy is that it often does not go well with sieving. Sieving, whenever applicable, should be given higher preference.
To sum up, the basic ICM and all its modifications can be used for computing discrete logarithms only in small fields, say, of size ≤ 80 bits. For bigger fields, we need newer ideas.
The linear sieve method (LSM) is a direct adaptation of the quadratic sieve method for factoring integers (Section 4.3.2). In the basic ICM just discussed, we try to find smooth integers from candidates that are on an average as large as O(p). The LSM, on the other hand, finds smooth ones from a pool of integers each of which is
. As a result, we expect to have a higher density of smooth integers among the candidates tested in the LSM than those in the basic method. Furthermore, the LSM employs sieving techniques instead of trial divisions. All these help LSM achieve a running time of L[1], a definite improvement over the L[2] performance of the basic method.
Let
and J := H2 – p. Then
. Let’s consider the congruence
Equation 4.7

For small integers c1, c2, the right side of the above congruence, henceforth denoted as
T(c1, c2) := J + (c1 + c2)H + c1c2,
is of the order of
. If the integer T(c1, c2) is smooth with respect to the first t primes q1, q2, . . . , qt, that is, if we have a factorization like
, then we have a relation

For the linear sieve method, the factor base comprises primes less than L[1/2] (so that t = L[1/2] by the prime number theorem) and integers H + c for –M ≤ c ≤ M. The bound M on c is chosen to be of the order of L[1/2]. Each T(c1, c2), being
in absolute value, has a probability of L[–1/2] for being qt-smooth. Thus once we check the factorization of T(c1, c2) for all (that is, for a total of L[1]) values of the pair (c1, c2) with –M ≤ c1 ≤ c2 ≤ M, we expect to get L[1/2] Relations (4.7) involving the unknown indices of the factor base elements. If we further assume that the primitive element g is a small prime which itself is in the factor base, then we get a free relation indg g = 1. The resulting system is then solved to compute the discrete logarithms of elements in the factor base. This is the basic principle for the first stage of the LSM.
If we compute all T(c1, c2) and use trial divisions by q1, . . . , qt to separate out the smooth ones, we achieve a running time of L[1.5], as can be easily seen. Sieving is employed to reduce the running time to L[1]. First one fixes a
and initializes to ln |T(c1, c2)| an array
indexed by c2 in the range c1 ≤ c2 ≤ M. One then computes for each prime power qh, q being a small prime in the factor base and h a small positive exponent, a solution for c2 of the congruence (H + c1)c2 + (J + c1H) ≡ 0 (mod qh).
If gcd(H + c1, q) = 1, that is, if H + c1 is not a multiple of q, then the solution is given by σ ≡ –(J + c1H)(H + c1)–1 (mod qh). The inverse in the last congruence can be calculated by running the extended gcd algorithm (Algorithm 3.8) on H + c1 and qh. Then for each value of c2 (in the range c1 ≤ c2 ≤ M) that is congruent to σ (mod qh), ln q is subtracted from the array location
.
If q|(H + c1), we find out h1 := vq(H + c1) > 0 and h2 := vq(J + c1H) ≥ 0. If h1 > h2, then for each value of c2, the expression T(c1, c2) is divisible by qh2 and by no higher powers of q. So we subtract the quantity h2 ln q from
for all c2. Finally, if h1 ≤ h2, then we subtract h1 ln q from
for all c2 and for h > h1 solve the congruence as
.
Once the above procedure is carried out for each small prime q in the factor base and for each small exponent h, we check for which values of c2, the value
is equal (that is, sufficiently close) to 0. These are precisely the values of c2 such that for the given c1 the integer T(c1, c2) factors smoothly over the small primes in the factor base.
As in the QSM for integer factorization, it is sufficient to have some approximate representations of the logarithms (like ln q). Incomplete sieving and large prime variation can also be adopted as in the QSM.
Finally, we change c1 and repeat the sieving process described above. It is easy to see that the sieving operations for all c1 in the range –M ≤ c1 ≤ M take time L[1] as announced earlier. Gaussian elimination involving sparse congruences in L[1/2] variables also meets the same running time bound.
The second stage of the LSM can be performed in L[1/2] time. Using a method similar to the second stage of the basic ICM leads to a huge running time (L[3/2]), because we have only L[1/2] small primes in the factor base. We instead do the following. We start with a random j and try to obtain a factorization of the form
, where q runs over L[1/2] small primes in the factor base and u runs over medium-sized primes, that is, primes less than L[2]. One can use an integer factorization algorithm to this effect. Lenstra’s ECM is, in particular, recommended, since it can detect smooth integers fast. More specifically, about L[1/4] random values of j need to be tried, before we expect to get an integer with the desired factorization. Each attempt of factorization using the ECM takes time less than L[1/4].
Now, we have
. The indices indg q are available from the first stage, whereas for each u (with wu ≠ 0) the index indg u is calculated as follows. First we sieve in an interval of size L[1/2] around
and collect integers y in this interval, which are smooth with respect to the L[1/2] primes in the factor base. A second sieve in an interval of size L[1/2] around H gives us a small integer c, such that (H + c)yu – p is smooth again with respect to the L[1/2] primes in the factor base. Since H + c is in the factor base, we get indg u. The reader can easily verify that computing individual logarithms indg a using this method takes time L[1/2] as claimed earlier.
There are some other L[1] methods (like the Gaussian integer method and the residue list sieve method) known for computing discrete logarithms in prime fields. We will not discuss these methods in this book. Interested readers may refer to Coppersmith et al. [59] to know about these L[1] methods. A faster method (running time L[0.816]), namely the cubic sieve method, is covered in Exercise 4.21. Now, we turn our attention to the best method known till date.
The number field sieve method (NFSM) for solving the DLP in a prime field
is a direct adaptation of the NFSM used to factor integers (Section 4.3.4). As before, we let g be a generator of
and are interested in computing the index d = indg a for some
.
We choose an irreducible polynomial
with small integer coefficients and of degree d, and use the number field
for some root
of f. For the sake of simplicity, we consider the special case (SNFSM) that f is monic,
is a PID, and
. We also choose an integer m such that f(m) ≡ 0 (mod p) and define the ring homomorphism

Finally, we predetermine a bound
and let
be the set of (rational) primes
,
the set of prime ideals of
of prime norms
,
a set of generators of the (principal) ideals
and
a set of generators of the group of units of
.
We try to find coprime integers c, d of small absolute values such that both c + dα and Φ(c + dα) = c + dm are smooth with respect to
and
respectively, that is, we have factorizations of the forms
and
or equivalently,
. But then
, that is,
Equation 4.8

This motivates us to define the factor base as

We assume that
so that we have the free relation indg g ≡ 1 (mod p – 1).
Trying sufficiently many pairs (c, d) we generate many Relations (4.8). The resulting sparse linear system is solved for the unknown indices of the elements of B. This completes the first stage of the SNFSM.
In the second stage, we bring a to the scene in the following manner. First assume that a is small such that either a is
-smooth, that is,

or for some
the ideal
can be written as a product of prime ideals of
, that is,


In both the cases, taking logarithms and substituting the indices of the elements of the factor base (available from the first stage) yields d = indg a.
However, a is not small, in general, and it is a non-trivial task to find a
such that 〈γ〉 is
-smooth. We instead write a as a product
Equation 4.9

where each ai is small enough so that indg ai can be computed using the method described above. This gives
. In order to see how one can find a representation of a as a product of small integers as in Congruence (4.9), we refer the reader to Weber [300].
As in most variants of the ICM, the running time of the SNFSM is dominated by the first stage and under certain heuristic assumptions can be shown to be of the order of L(p, 1/3, (32/9)1/3). Look at Section 4.3.4 to see how the different parameters can be set in order to achieve this running time. For the general NFS method (GNFSM), the running time is L(p, 1/3, (64/9)1/3). The GNFSM has been implemented by Weber and Denny [301] for computing discrete logarithms modulo a particular prime having 129 decimal digits (see McCurley [189]).
We wish to compute the discrete logarithm indg a of an element
, q = 2n, with respect to a primitive element g of
. We work with the representation
for some irreducible polynomial
with deg f = n. For certain algorithms, we require f to be of special forms. This does not create enough difficulties, since it is easy to compute isomorphisms between two polynomial basis representations of
(Exercise 3.38).
Recall that we have defined the smoothness of an integer x in terms of the magnitudes of the prime divisors of x. Now, we deal with polynomials (over
) and extend the definition of smoothness in the obvious way: that is, a polynomial is called smooth if it factors into irreducible polynomials of low degrees. The next theorem is an analog of Theorem 2.21 for polynomials. By an abuse of notation, we use ψ(·, ·) here also. The context should make it clear what we are talking about – smoothness of integers or of polynomials.
|
Let r, ψ(r, m) := u–u+o(u) = e–[(1+o(1))u ln u]. |
The above expression for ψ(r, m), though valid asymptotically, gives good approximations for finite values of r and m. The condition r1/100 ≤ m ≤ r99/100 is met in most practical situations. The probability ψ(r, m) is a very sensitive function of u = r/m. For a fixed m, polynomials of smaller degrees have higher chances of being smooth (that is, of having all irreducible factors of degrees ≤ m).
Now, let us consider the field
with q = 2n. The elements of
are represented as polynomials of degrees ≤ n–1. For a given m, the probability
that a randomly chosen element of
has all irreducible factors of degrees ≤ m is then approximately given by
, as n, m → ∞ with n1/100 ≤ m ≤ n99/100. We can, therefore, approximate
by ψ(n, m).
For many algorithms that we will come across shortly, we have r ≈ n/α and
for some positive α and β, so that
and, consequently,
The idea of the basic ICM for
is analogous to that for prime fields. Now, the factor base B comprises all irreducible polynomials of
having degrees ≤ m. We choose
. (As in the case of the basic ICM for prime fields, this can be shown to be the optimal choice.) By Approximation (2.5) on p 84, we then have
.
In the first stage, we choose random α, 1 ≤ α ≤ q – 2, compute gα and check if gα is B-smooth. If so, we get a relation. For a random α, the polynomial gα is a random polynomial of degree < n and hence has a probability of nearly
of being smooth. Note that unlike integers a polynomial over
can be factored in probabilistic polynomial time (though for small m it may be preferable to do trial division by elements of B). Thus checking the smoothness of a random element of
can be done in (probabilistic) polynomial time, and each relation is available in expected time
. Since we need (slightly more than)
relations for setting up the linear system, the relation collection stage runs in expected time
. A sparse system with
unknowns can also be solved in time
.
In the second stage, we need a single smooth polynomial of the form gαa. If α is randomly chosen, we expect to get this relation in time
. Therefore, the second stage is again faster than the first and the basic method takes a total expected running time of
. Recall that the basic method for
requires time L[2]. The difference arises because polynomial factorization is much easier than integer factorization.
We now explain a modification of the basic method, proposed by Blake et al. [23]. Let
: that is, a non-zero polynomial in
of degree < n. If h is randomly chosen from
(as in the case of gα or gαa for random α), then we expect the degree of h to be close to n. Let us write h ≡ h1/h2 (mod f) (f being the defining polynomial) with h1 and h2 each having degree ≈ n/2. Then the ratio of the probability that both h1 and h2 are smooth to the probability that h is smooth is ψ(n/2, m)2/ψ(n, m) ≈ 2n/m (neglecting the o( ) terms). For practical values of n and m, this ratio of probabilities can be substantially large implying that it is easier to get relations by trying to factor both h1 and h2 instead of trying to factor h. This is the key observation behind the modification due to Blake et al. [23]. Simple calculations show that this modification does not affect the asymptotic
behaviour of the basic method, but it leads to considerable speed-up in practice.
In order to complete the description of the modification of Blake et al. [23], we mention an efficient way to write h as h1/h2 (mod f). Since 0 ≤ deg h < n and since f is irreducible of degree n, we must have gcd(h, f) = 1. During the iteration of the extended gcd algorithm we actually compute a sequence of polynomials uk, vk, xk such that ukh + vkf = xk for all k = 0, 1, 2, . . . . At the start of the algorithm we have u0 = 1, v0 = 0 and x0 = h. As the algorithm proceeds, the sequence deg uk changes non-decreasingly, whereas the sequence deg xk changes non-increasingly and at the end of the extended gcd algorithm we have xk = 1 and the desired Bézout relation ukh + vkf = 1 with deg uk ≤ n – 1. Instead of proceeding till the end of the gcd loop, we stop at the value k = k′ for which deg xk′ is closest to n/2. We will then usually have deg uk′ ≈ n/2, so that taking h1 = xk′ and h2 = uk′ serves our purpose.
The concept of large prime variation is applicable for the basic ICM. Moreover, if trial divisions are used for smoothness tests, one can employ the early abort strategy. Despite all these modifications the basic variant continues to be rather slow. Our hunt for faster algorithms continues.
The LSM for prime fields can be readily adapted to the fields
, q = 2n. Let us assume that the defining polynomial f is of the special form f(X) = Xn + f1(X), where deg f1 is small. The total number of choices for such f with deg f1 < k is 2k. Under the assumption that irreducible polynomials (over
) of degree n are randomly distributed among the set of polynomials of degree n, we expect to find an irreducible polynomial f = Xn + f1 for deg f1 = O(lg n) (see Approximation (2.5) on p 84). In particular, we may assume that deg f1 ≤ n/2.
Let k := ⌈n/2⌉ and
. For polynomials h1,
of small degrees, we then have
(Xk + h1)(Xk + h2) ≡ Xσf1 + (h1 + h2)Xk + h1h2 (mod f).
The right side of the congruence, namely,
T(h1, h2) := Xσf1 + (h1 + h2)Xk + h1h2,
has degree slightly larger than n/2. This motivates the following algorithm.
We take
and let the factor base B be the (disjoint) union of B1 and B2, where B1 contains irreducible polynomials of degrees ≤ m, and where B2 contains polynomials of the form Xk + h, deg h ≤ m. Both B1 and B2 (and hence B) contain L[1/2] elements. For each Xk + h1,
, we then check the smoothness of T(h1, h2) over B1. Since deg T(h1, h2) ≈ n/2, the probability of finding a smooth candidate per trial is L[–1/2]. Therefore, trying L[1] values of the pair (h1, h2) is expected to give L[1/2] relations (in L[1/2] variables). Since factoring each T(h1, h2) can be performed in probabilistic polynomial time, the relation collection stage takes time L[1]. Gaussian elimination (with sparse congruences) can be done in the same time. As in the case of the LSM for prime fields, the second stage can be carried out in time L[1/2]. To sum up, the LSM for fields of characteristic 2 takes L[1] running time.
Note that the running time L[1] is achievable in this case without employing any sieving techniques. This is again because checking the smoothness of each T(h1, h2) can simply be performed in polynomial time. Application of polynomial sieving, though unable to improve upon the L[1] running time, often speeds up the method in practice. We will describe such a sieving procedure in connection with Coppersmith’s algorithm that we describe next.
Coppersmith’s algorithm is the fastest algorithm known to compute discrete logarithms in finite fields
of characteristic 2. Theoretically it achieves the (heuristic) running time L(q, 1/3, c) and is, therefore, subexponentially faster than the L[c′] = L(q, 1/2, c′) algorithms described so far. Gordon and McCurley have made aggressive attempts to compute discrete logarithms in fields as large as
using Coppersmith’s algorithm in tandem with a polynomial sieving procedure and, thereby, established the practicality of the algorithm.
In the basic method, each trial during the search for relations involves checking the smoothness of a polynomial of degree nearly n. The modification due to Blake et al. [23] replaces this by checking the smoothness of two polynomials of degree ≈ n/2. For the adaptation of the LSM, on the other hand, we check the smoothness of a single polynomial of degree ≈ n/2. In Coppersmith’s algorithm, each trial consists of checking the smoothness of two polynomials of degrees ≈ n2/3. This is the basic reason behind the improved performance of Coppersmith’s algorithm.
To start with we make the assumption that the defining polynomial f of
is of the form f(X) = Xn + f1(X) with deg f1 = O(lg n). We have argued earlier that an irreducible polynomial f of this special form is expected to be available. We now choose three integers m, M, k such that
m ≈ αn1/3(ln n)2/3, M ≈ βn1/3(ln n)2/3 and 2k ≈ γn1/3(ln n)–1/3,
where the (positive real) constants α, β and γ are to be chosen appropriately to optimize the running time. The factor base B comprises irreducible polynomials (over
) of degrees ≤ m. Let
l := ⌊n/2k⌋ + 1,
so that l ≈ (1/γ)n2/3(ln n)1/3. Choose relatively prime polynomials u1(X) and u2(X) (in
) of degrees ≤ M and let
h1(X) := u1(X)Xl + u2(X) and h2(X) := (h1(X))2k rem f(X).
But then, since indg h2 ≡ 2k indg h1 (mod q – 1), we get a relation if both h1 and h2 are smooth over B. By choice, deg h1 is clearly O~(n2/3), whereas
h2(X)≡ u1(X2k)Xl2k + u2(X2k)≡ u1(X2k)Xl2k–nf1(X) + u2(X2k)(mod f)
and, therefore, deg h2 = O~(n2/3) too.
For each pair (u1, u2) of relatively prime polynomials of degrees ≤ M, we compute h1 and h2 as above and collect all the relations corresponding to the smooth values of both h1 and h2. This gives us the desired (sparse) system of linear congruences in the unknown indices of the elements of B, which is subsequently solved modulo q – 1.
The choice
and γ = α–1/2 gives the optimal running time of the first stage as
e[(2α ln 2)+o(1))n1/3(ln n)2/3] = L(q, 1/3, 2α/(ln 2)1/3) ≈ L(q, 1/3, 1.526).
The second stage of Coppersmith’s algorithm is somewhat involved. The factor base now contains only nearly L(q, 1/3, 0.763) elements. Therefore, finding a relation using a method similar to the second stage of the basic method requires time L(q, 2/3, c) for some c, which is much worse than even L[c′] = L(q, 1/2, c′). To work around this difficulty we start by finding a polynomial gαa all of whose irreducible factors have degrees ≤ n2/3(ln n)1/3. This takes time of the order of L(q, 1/3, c1) (where c1 ≈ 0.377) and gives us
, where vi have degrees ≤ n2/3(ln n)1/3. Note that the number of vi is less than n, since deg(gαa) < n. We then have

All these vi need not belong to the factor base, so we cannot simply substitute the values of indg vi. We instead reduce the problem of computing each indg vi to the problem of computing indg vii′ for several i′ with deg vii′ ≤ σ deg vi for some constant 0 < σ < 1. Subsequently, computing each indg vii′ is reduced to computing indg vii′i″ for several i″ with deg vii′i″ < σ deg vii′. Repeating this process, we eventually end up with the polynomials in the factor base. Because reduction of a polynomial generates new polynomials with degrees reduced by at least the constant factor σ, it is clear that the recursion depth is O(ln n). Now, if for each i the number of i′ is ≤ n and for each i′ the number of i″ is ≤ n and so on, we have to carry out the reduction of ≤ nO(ln n) = eO((ln n)2) = L(q, 1/3, 0) polynomials. Therefore, if each reduction can be performed in time L(q, 1/3, c2), the second stage will run in time L(q, 1/3, max(c1, c2)).
In order to explain how a polynomial v of degree ≤ d ≤ n2/3(ln n)1/3 can be reduced in the desired time, we choose
such that
, and let l := ⌊n/2k⌋ + 1. As in the first stage, we fix a suitable bound M, choose relatively prime polynomials u1(X), u2(X) of degrees ≤ M and define
h1(X) := u1(X)Xl + u2(X)
and
h2(X) := (h1(X))2k rem f(X) = u1(X2k)Xl2k–nf1(X) + u2(X2k).
The polynomials u1 and u2 should be so chosen that v|h1. We see that h1 and h2 have low degrees and we try to factor h1/v and h2. Once we get a factorization of the form

with deg vi, deg wj < σ deg v, we have the desired reduction of v, namely,

that is, the reduction of computation of indg v to that of all indg vi and indg wj. With the choice M ≈ (n1/3(ln n)2/3(ln 2)–1 + deg v)/2 and σ = 0.9, reduction of each polynomial can be shown to run in time L(q, 1/3, (ln 2)–1/3) ≈ L(q, 1/3, 1.130). Thus the second stage of Coppersmith’s algorithm runs in time L(q, 1/3, 1.130) and is faster than the first stage.
Large prime variation is a useful strategy to speed up Coppersmith’s algorithm. In case of trial divisions for smoothness tests, early abort strategy can also be applied. However, a more efficient idea (though seemingly non-collaborative with the early abort strategy) is to use polynomial sieving as introduced by Gordon and McCurley.
Recall that in the first stage we take relatively prime polynomials u1 and u2 of degrees ≤ M and check the smoothness of both h1(X) = u1(X)Xl + u2(X) and h2(X) = h1(X)2k rem f(X). We now explain the (incomplete) sieving technique for filtering out the (non-)smooth values of h1 = (h1)u1,u2 for the different values of u1 and u2. To start with we fix u1 and let u2 vary. We need an array
indexed by u2, a polynomial of degree ≤ M. Clearly, u2 can assume 2M+1 values and so
must contain 2M+1 elements. To be very concrete we will denote by
the location
, where u2(2) ≥ 0 is the integer obtained canonically by substituting 2 for X in u2(X) considered to be a polynomial in
with coefficients 0 and 1. We initialize all the locations of
to zero.
Let t = t(X) be a small irreducible polynomial in the factor base B (or a small power of such an irreducible polynomial) with δ := deg t. The values of u2 for which t divides (h1)u1,u2 satisfy the polynomial congruence u2(X) ≡ u1(X)Xl (mod t). Let
be the solution of this congruence with
. If δ* > M, then no value of u2 corresponds to smooth (h1)u1,u2. So assume that δ* ≤ M. If δ > M, then the only value of u2 for which (h1)u1,u2 is smooth is
. So we may also assume that δ ≤ M. Then the values of u2 that makes (h1)u1,u2 smooth are given by
for all polynomials v(X) of degrees ≤ M – δ. For each of these 2M–δ+1 values of u2, we add δ = deg t to the location
.
When the process mentioned in the last paragraph is completed for all
, we find out for which values of u2 the array locations
contain values close to deg(h1)u1,u2. These values of u2 correspond to the smooth values of (h1)u1,u2 for the chosen u1. Finally, we vary u1 and repeat the sieving procedure again.
In each sieving process described above, we have to find out all the values
as v runs through all polynomials of degrees ≤ M – δ. We may choose the different possibilities for v in any sequence, compute the products vt and then add these products to
. While doing so serves our purpose, it is not very efficient, because computing each u2 involves performing a polynomial multiplication vt. Gordon and McCurley’s trick steps through all the possibilities of v in a clever sequence that helps one get each value of u2 from the previous one by a much reduced effort (compared to polynomial multiplication). The 2M–δ+1 choices of v can be naturally mapped to the bit strings of length (exactly) M – δ + 1 (with the coefficients of lower powers of X appearing later in the sequence). This motivates using the following concept.
|
Let
where juxtaposition denotes string concatenation. |
For example, the gray code of dimension 2 is 00, 01, 11, 10 and that of dimension 3 is 000, 001, 011, 010, 110, 111, 101, 100. Proposition 4.1 can be easily proved by induction on the dimension d.
|
Let |
Back to our sieving business! Let us agree to step through the values of v in the sequence v1, v2, . . . , v2M – δ+1, where vi corresponds to the bit string
for the (M – δ + 1)-dimensional gray code. Let us also call the corresponding values of u2 as
. Now, v1 is 0 and the corresponding
is available at the beginning. By Proposition 4.1 we have for 1 ≤ i < 2M–δ+1 the equality vi+1 = vi + Xv2(i), so that (u2)i+1 = (u2)i + Xv2(i)t. Computing the product Xv2(i)t involves shifting the coefficients of t and is done efficiently using bit operations only (assuming data structures introduced in Section 3.5). Thus (u2)i+1 is obtained from (u2)i by a shift followed by a polynomial addition. This is much faster than computing (u2)i+1 directly as
.
We mentioned earlier that efficient implementations of Coppersmith’s algorithm allows one to compute, in feasible time, discrete logarithms in fields as large as
. However, for much larger fields, say for n ≥ 1024, this algorithm is still not a practical breakthrough. The intractability of the DLP continues to remain cryptographically exploitable.
| 4.15 | Binary search Let ≤ be a total order on a set S (finite or infinite) and let a1 ≤ a2 ≤ ··· ≤ am be a given sequence of elements of S. Device an algorithm that, given an arbitrary element |
| 4.16 | |
| 4.17 | Let p be a prime and g a primitive element of . For a , prove the explicit formula (mod p). What is the problem in using this formula for computing indices in ?
|
| 4.18 | In the basic ICM for the prime field , we try to factor random powers gα over the factor base B = {q1, . . . , qt}. In addition to the canonical representative of gα in the set {1, . . . , p – 1}, one can also check for the smoothness of the integers gα + kp for –M ≤ k ≤ M, where M is a small positive integer (to be determined experimentally).
|
| 4.19 |
|
| 4.20 | Consider the following modification of the LSM for . Define for the integers and . Choose a small and repeat the linear sieve method for each r, 1 ≤ r ≤ s, that is, check the smoothness (over the first t = L[1] primes) of the integers Tr(c1, c2) := Jr + (c1 + c2)Hr + c1c2 for all 1 ≤ r ≤ s, –μ ≤ c1 ≤ c2 ≤ μ. Let be the average of |Tr(c1, c2)| over all choices of r, c1 and c2. Show that , where is as defined in Exercise 4.19. In particular, for both the choices: (1) and (2) μ = ⌊M/s⌋, that is, on an average we check smaller integers for smoothness under this modified strategy. Determine the size of the factor base and the total number of integers Tr(c1, c2) checked for smoothness for the two values of μ given above.
|
| 4.21 | Cubic sieve method (CSM) for
|
| 4.22 | The problem with the CSM is that it is not known how to efficiently compute a solution of the congruence
Equation 4.10
subject to the condition that x3 ≠ y2z and x, y, z = O(pξ) for 1/3 ≤ ξ < 1/2. In this exercise, we estimate the number of solutions of Congruence (4.10).
|
| 4.23 | Adaptation of CSM for |
Unlike the finite field DLP, there are no general-purpose subexponential algorithms to solve the ECDLP. Though good algorithms are known for certain specific types of elliptic curves, all known algorithms that apply to general curves take fully exponential time. The square root methods of Section 4.4 are the fastest known methods for solving the ECDLP over an arbitrary curve. As a result, elliptic curves are gaining popularity for building cryptosystems. The absence of subexponential algorithms implies that smaller fields can be chosen compared to those needed for cryptosystems based on the (finite field) DLP. This, in particular, results in smaller sizes of keys.
We start with Menezes, Okamoto and Vanstone’s (MOV) algorithm that reduces the ECDLP in a curve over
to the DLP over the field
for some suitable
. Since, the DLP can be solved in subexponential time, the ECDLP is also solved in that time, provided that the extension degree
is small. For supersingular curves, one can choose k ≤ 6. For non-supersingular curves, this k is quite large, in general, and the MOV reduction takes exponential time.
A linear-time algorithm is known to solve the ECDLP over anomalous curves (that is, curves with trace of Frobenius equal to 1). This algorithm is called the SmartASS method after its inventors Smart, Araki, Satoh and Semaev [257, 265, 282].
J. H. Silverman [277] has proposed an algorithm known as the xedni calculus method for solving the ECDLP over an arbitrary curve. Rigorous running times of this algorithm are not known, however heuristic analysis and experiments suggest that this algorithm is not really practical.
Let E be an elliptic curve over a finite field
and let
be of order m. We want to compute indP Q (if it exists) for a point
. Unless it is necessary, we will not assume any specific defining equation for E or a specific value of q.
Let us first look at the structure of the group
of m-torsion points on an elliptic curve defined over K. Here
is the algebraic closure of K.
|
Let K be a field of characteristic
|
Now, let E be an elliptic curve defined over a finite field K of characteristic p. Let
with gcd(m, p) = 1. We use the shorthand notation E[m] for
(and not for EK[m]). We want to define a function
em : E[m] × E[m] → μm,
where
is the group of m-th roots of unity (Exercise 4.24). This function em, known as the Weil pairing, helps us reduce the ECDLP in
to the DLP in a suitable field
. Let P,
. The definition of em(P, R) calls for using divisors on E. Recall from Exercise 2.125 that a divisor
belongs to
(that is, is the divisor of a rational function on E) if and only if
and
. Since
, there is a rational function
such that
. Now,
as well and p m2. Hence, by Theorem 4.2 there exists a point R′ of order m2 such that R = mR′. Since, #E[m] = m2, it follows that
and, therefore, there exists a rational function
with
. The functions f and g as introduced above are unique up to multiplication by elements of
. One can show that we can choose f and g in such a manner that f ο λm = gm, where
is the multiplication map Q ↦ mQ. Then for
and
we have gm(P + U) = f(mP + mU) = f(mU) = gm(U). Since g has only finitely many poles and zeros (whereas
is infinite), we can choose U such that both g(U) and g(P + U) are defined and non-zero. For such a point U, we then have
and define
em(P, R) := g(P + U)/g(U).
The right side can be shown to be independent of the choice of U. The relevant properties of the Weil pairing em are now listed.
|
Let P, P′, R,
| |||||||||||||||||||||
The above definition of em is not computationally effective. We will see later how we can compute em(P, T) in probabilistic polynomial time using an alternative (but equivalent) definition.
Algorithm 4.7 shows how the MOV reduction algorithm makes use of Weil pairing. We now clarify the subtle details of this algorithm.
|
Input: A point Output: The index indP Q, that is, an integer l with Q = lP. Steps: Choose the smallest |
From the bilinearity of the Weil pairing, it follows that if Q = lP, 0 ≤ l < m, then β = em(Q, R) = em(lP, R) = em(P, R)l = αl. Thus treating indα β as the least nonnegative integer modulo ord α we conclude that l = indα β if and only if ord α = m, that is, α is a primitive m-th root of unity. That α is an m-th root of unity for any
is obvious from the definition of em. We now show that there exists some
for which α = em(P, R) is primitive.
|
Let Proof If R1 + 〈P〉 = R2 + 〈P〉, then R1 = R2 + rP for some integer r and so by bilinearity and identity of Weil pairing em(P, R1) = em(P, R2)em(P, P)r = em(P, R2). Conversely, let em(P, R1) = em(P, R2). By Theorem 4.2, |
As an immediate corollary to Lemma 4.1, the desired result follows.
|
Let
Then #S/#E[m] = φ(m)/m. In particular, S is non-empty. Proof There are m distinct cosets of 〈P〉 in E[m]. Now, as R ranges over all points of E[m], the coset R+〈P〉 ranges over all of these m possibilities and, accordingly by Lemma 4.1 the value em(P, R) ranges over m distinct values. Since μm is cyclic of order m and hence with φ(m) generators, the theorem follows. |
By Theorem 3.1, one should try an expected number of O(ln ln m) random points
before a primitive m-th root α = em(P, R) is found.
Since E[m] consists of finitely many (m2) points, it is obvious that there exist finite values of k such that
. It can also be shown that if
, then
that is,
for all P,
. The computation of the discrete logarithm indα β is then carried out in
. For Algorithm 4.7 to be efficient, one requires k to be rather small. However, for most curves, k is rather large implying that the MOV reduction is impractical for these curves. For the specific class of curves, the so-called supersingular curves, one can choose k to be rather small, namely k ≤ 6. We don’t go to the details of the choices for k for various cases of supersingular curves, but refer the reader to Menezes [192].
We start with an alternative definition of the Weil pairing
for P,
. First note that if
is a divisor and if
is a rational function on E such that for every pole or zero T of f one has mT = 0 (that is, such that Div(f) and T have disjoint supports), then one can define

Choose points U,
(where
) and consider the divisors DP := [P + U] – [U] and DR := [R + V] – [V]. Since
) is infinite, one can choose both P + U and U distinct from R + V and V. Since P,
, it follows that mDP and mDR are principal, namely, there are rational functions fP and fR such that Div(fP) = mDP = m[P + U] – m[U] and Div(fR) = mDR = m[R + V] – m[V]. One can show that
Equation 4.11

independent of the choice of U and V as long as fP (DR) and fR(DP) are defined. Therefore, em(P, R) can be computed efficiently, if fP and fR can be computed efficiently. To this effect we now describe an algorithm for computing the rational function f of a principal divisor
, where
. Since deg
, we can write
. Suppose that we have an Algorithm A that, for a pair of reduced divisors

and

computes the sum (a reduced divisor)

Then, f can be computed by repeated application of Algorithm A as follows.
Compute for each i = 1, . . . , r the reduced divisor
. Let 1 = ai1, ai2, . . . , aiti = |mi| be an addition chain for |mi| (Exercise 3.18). Clearly, ti – 1 applications of Algorithm A computes Δi. Since we can choose ti ≤ 2 ⌈lg |mi|⌉, each Δi can be computed using O(log |mi|) applications of Algorithm A.
Compute f by computing D = Div(f) = Δ1 + ··· + Δr. This can be done by applying Algorithm A a total of r – 1 times.
What remains is the description of Algorithm A that computes P3 and f3 from a knowledge of P1, P2, f1 and f2. Clearly, if
, then we have P3 = P2 and f3 = f1f2. Similar is the case for
. So assume
and
. Let l1 be the line passing through P1 and P2 and P′ := –(P1 + P2). First, assume that
. By Exercise 2.125, we have
. Let l2 be the (vertical) line passing through P′ and –P′. Again by Exercise 2.125, we have
. But then
, that is, we take P3 = –P′ = P1+P2 and f3 = f1f2l1/l2. Finally, if
, then
and, therefore,
. Thus, in this case too, we take
and f3 = f1f2l1/l2 with l2 := 1.
Before we finish the description of the MOV reduction, some comments are in order. First note that if f1,
and P1,
, then both l1 and l2 are in K(E) and the computation of f3 and P3 can be carried out by working in K only.
Second, consider the (general) case
. Since
, the rational function f3 has poles and is, therefore, undefined only at the points P3 and
. f3 is certainly defined at –P3, but l2(–P3) = 0 and, therefore, evaluating f3(–P3) as (f1f2l1)(–P3)/l2(–P3) fails. Of course, there is a rational function g such that both f1f2l1g and l2g are defined and non-zero at –P3, but finding such a rational function is an added headache. So we choose to continue to have the representation f3 = f1f2l1/l2 and agree not to evaluate f3 at –P3. Recall from Equation (4.11) that we want to evaluate fP at DR (that is, at R + V and V) and also fR and DP (that is, at P + U and U). Let us assume that we use the addition chain 1 = a1, a2, . . . , at = m for m. This means that we cannot evaluate fP at the points ±ai(P + U) and ±aiU for all i = 1, . . . , t. Therefore, V should be chosen such that both R + V and V are not one of these points. Similar constraints dictate the choice of U. However, if m is sufficiently large (m ≥ 1024) and if we choose an addition chain of length t ≤ 2 ⌈lg m⌉, then it can be easily seen that for a random choice of (U, V) the evaluation of fP (DR) or fR(DP) fails with a probability of no more than 1/2. Therefore, few random choices of (U, V) are expected to make the algorithm work. This is the only place where a probabilistic behaviour of the algorithm creeps in. In practice, however, this is not a serious problem, since we have much larger values of m (than 1024) and accordingly the above probability of failure becomes negligibly small.
Finally, note that if we multiply the factors f1, f2 and l1 in the numerator, then the coefficients of the numerator grow very rapidly, when the algorithm is applied repeatedly. Thus we prefer to keep the numerator in the factored form. The same applies to the denominator as well.
The SmartASS method, named after its inventors Smart [282], Satoh and Araki [257] and Semaev [265], is also called the anomalous attack to solve the ECDLP, since it is applicable to anomalous elliptic curves. Let
be a finite field of odd prime cardinality p and E an elliptic curve defined over
. We assume that E is anomalous: that is, the trace of Frobenius of E at p is 1; that is,
. Since p is prime, the group
is cyclic and, in particular, isomorphic to the additive group (
, +). This isomorphism is effectively exploited by the SmartASS method to give a polynomial time algorithm for computing ECDLP in the group
.
Before proceeding further we introduce some auxiliary results. Recall (Exercise 2.133) that a local PID is called a discrete valuation ring (DVR). Now, we see an equivalent definition of a DVR, that gives a justification to its name.
|
A discrete valuation on a field K is a surjective group homomorphism
such that for every a,
is a ring called the valuation ring of v. |
A DVR can be characterized as follows:
|
Let R be an integral domain and let K := Q(R) be the field of fractions of R. Then R is a DVR if and only if there exists a discrete valuation Proof [if] By definition, Let By definition, |
Recall that the ring
of p-adic integers (Definition 2.111) is a DVR. The field
of fractions of
is called the field of p-adic numbers. We now explicitly describe a valuation v on
of which
is the valuation ring. Let the p-adic expansion (Exercises 2.144 and 2.145) of a p-adic integer α be
Equation 4.12

A rational integer can be naturally viewed as a p-adic integer with finitely many nonzero terms, that is, one for which ki = 0 except for finitely many
. However, a p-adic integer with infinitely many non-zero ki does not correspond to a rational integer. If in Expansion (4.12) we have k1 = k2 = ··· = kr–1 = 0, we can write
α = pr(kr + kr+1p + kr+2p2 + ···).
A p-adic integer is, in general, an infinite series and a representation with finite precision looks like
k0 + k1p + k2p2 + ··· + ksps + O(ps+1).
Arithmetic on p-adic numbers is done like integers written in base p, but from left to right. Thus, for example, if one wants to add two p-adic integers k0 + k1p + k2p2 + ... and
, one may add the base-p integers ... k2k1k0 and
in the usual manner till the desired level of precision. A p-adic integer α = k0 + k1p k2p2 + ··· is invertible (in
) if and only if k0 ≠ 0 (Proposition 2.52).
An element
also has a p-adic expansion, but in this case one has to allow terms involving a finite number of negative exponents of p. That is to say, we have an expansion of the form
β = k–tp–t + k–t+1p–t+1 + ··· + k–1p–1 + k0 + k1p + k2p2 + ···
or
β = p–t(k–t + k–t+1p + ··· + k–1pt–1 + k0pt + k1pt+1 + k2pt+2 + ···).
Of course, if k–t = k–t+1 = ··· = k–1 = 0, then β is already in
.
From the arguments above, it follows that any non-zero
can be written uniquely as γ = pδ(γ0 + γ1p + γ2p2 + ···) with
and γ0 ≠ 0. We then set v(γ) := δ. It is easy to see that v defines a discrete valuation on
of which
is the valuation ring. Moreover, since γ0 + γ1p + γ2p2 + ··· is a unit in
, p = 0 + 1 · p + 0 · p2 + ··· plays the role of a uniformizer of the DVR
. As usual, we write v(0) = +∞.
Now, back to our ECDLP business. Let E be an elliptic curve defined over
. Here we consider the case that E is anomalous. We can naturally think of E as a curve over the field
as well and denote this curve by ε. The coordinate-wise application of the canonical surjection
induces the reduction homomorphism
. Now, we define the following subgroups of
:

It can be shown that
is a subgroup of
and
is a subgroup of
. Furthermore, since E is anomalous, we have

Now, let
and Q a point in the subgroup of
generated by P. Our purpose is to find an integer l such that Q = lP. Let
,
be such that
and
. It is not difficult to find such points
and
. For example, if P = (a, b), we can take
, where b0 = b and b1, b2, . . . are successively obtained by Hensel lifting.
Since
, the point
and, therefore,
. Now, if we take the so-called p-adic elliptic logarithm ψp on both sides, we get
(mod p2), whence it follows that

provided that
is invertible modulo p. The function ψp can be easily calculated. Therefore, this gives a very efficient probabilistic algorithm for computing discrete logarithms over anomalous elliptic curves. Here the most time-consuming step is the linear-time computation of the points p
and p
. For further details on the algorithm (like the computation of
and
from P and Q, and the definition of p-adic elliptic logarithms), see Blake et al. [24] and Silverman [275].
Joseph Silverman’s xedni calculus method (XCM) is a recent algorithm for solving the ECDLP in an arbitrary elliptic curve over a finite field. The algorithm is based on some deep mathematical conjectures and heuristic ideas. However, its performance has been experimentally established to be poor. Here we give a sketchy description of the XCM. For simplicity, we concentrate on elliptic curves over prime fields
only.
The basic idea of the XCM is to lift an elliptic curve E over
to a curve ε over
. In view of this, we start with a couple of important results regarding elliptic curves over
(or, more generally, over a number field). See Silverman [275], for example, for the proofs.
Let ε be an elliptic curve defined over a number field K.
|
The group ε(K) is finitely generated. The group structure of ε(K) is made explicit by the next theorem. Note that the elements of ε(K) of finite order form a subgroup εtors(K) of ε(K), called the torsion subgroup of ε(K) (Exercise 4.26). |
|
The non-negative integer ρ of Theorem 4.4 is called the rank of ε(K). Now, let E be an elliptic curve defined over a prime field The basic idea of the XCM is to select r points
|
We start by fixing an integer r, 4 ≤ r ≤ 9. We then choose r random pairs (si, ti) of integers and compute the points

We now apply a change of coordinates of the form
Equation 4.13

so that the first four of the points Rp,i become Rp,1 = [1, 0, 0], Rp,2 = [0, 1, 0], Rp,3 = [0, 0, 1] and Rp,4 = [1, 1, 1]. This change of coordinates fails if some three of the four points Rp,1, Rp,2, Rp,3 and Rp,4 sum to
. But in that case the desired index indP Q can be computed with high probability. If, for example,
, then we have (s1 + s2 + s3)P = (t1 + t2 + t3)Q and, therefore, if gcd(t1 + t2 + t3, n) = 1, then indP Q ≡ (t1 + t2 + t3)–1(s1 + s2 + s3) (mod n). On the other hand, if gcd(t1 + t2 + t3, n) ≠ 1, we repeat with a different set of pairs (si, ti).
Henceforth, we assume that the change of coordinates, as given in Equation (4.13), is successful. This transforms the equation for E to a general cubic equation:
Cp : up,1X3 + up,2X2Y + up,3XY2 + up,4Y3 + up,5X2Z + up,6XY Z + up,7Y2Z + up,8XZ2 + up,9Y Z2 + up,10Z3 = 0.
Now, we carry out a step that heuristically ensures that the curve ε over
(that we are going to construct) has a small rank. We choose a product M of small primes with p M, a cubic curve
CM : uM,1X3 + uM,2X2Y + uM,3XY2 + uM,4Y3 + uM,5X2Z + uM,6XYZ + uM,7Y2Z + uM,8XZ2 + uM,9Y Z2 + uM,10Z3 ≡ 0 (mod M)
over
and points RM,1, . . . , RM,r on CM and with coordinates in
. The first four points should be RM,1 = [1, 0, 0], RM,2 = [0, 1, 0], RM,3 = [0, 0, 1] and RM,4 = [1, 1, 1]. We have to ensure also that for every prime divisor q of M, the matrix B(RM,1, . . . , RM,r) has maximal rank modulo q. In practice, it is easier to choose the points RM,1, . . . , RM,r first and then compute a curve CM passing through these points by solving a set of linear equations in the coefficients uM,1, . . . , uM,10 of CM. The curve CM should be so chosen that it has the minimum possible number of solutions modulo M. This, in conjunction with some deep conjectures in the theory of elliptic curves, guarantees that the curve ε that we will construct shortly will have a rank less than the expected value.
We now combine the curves Cp and CM as follows. Using the Chinese remainder theorem, we compute integers
such that
(mod p) and
(mod M) for each i = 1, . . . , 10. Similarly, we compute points R1, . . . , Rr with integer coefficients such that Ri ≡ Rp,i (mod p) and Ri ≡ RM,i (mod M) for each i = 1, . . . , r, where congruence of points stands for coordinate-wise congruence. Here we have R1 = [1, 0, 0], R2 = [0, 1, 0], R3 = [0, 0, 1] and R4 = [1, 1, 1].
Clearly, the points R1, . . . , Rr are lifts of the points Rp,1, . . . , Rp,r respectively, whereas the cubic curve

over
is a lift of E. However,
, treated as a curve over
, need not pass through the points R1, . . . , Rr. In order to ensure this last condition, we modify the coefficients
of
to the (small integer) coefficients u1, . . . , u10 by solving the system of linear equations

subject to the condition that
(mod pM) for each i = 1, . . . , 10. The resulting cubic curve
C : u1X3 + u2X2Y + u3XY2 + u4Y3 + u5X2Z + u6XYZ + u7Y2Z + u8XZ2 + u9Y Z2 + u10Z3 = 0
over
evidently continues to be a lift of E.
Now, we apply a change of coordinates in order to transfer
to the standard Weierstrass equation
ε : Y2 + a1XY + a3Y = X3 + a2X2 + a4X + a6
with integer coefficients ai. This transformation changes the points R1, . . . , Rr to the points S1, . . . , Sr. One should also ensure that
.
Finally, we check if S2, . . . , Sr are linearly dependent. If so, we determine a (non-trivial) relation
with
. This corresponds to the relation
, where n1 := –(n2 + ··· + nr), that is, sP = tQ with s := n1s1 + ··· + nrsr and t := n1t1 + ··· + nrtr. If gcd(t, n) = 1, we have indP Q ≡ t–1s (mod n).
On the other hand, if S2, . . . , Sr are linearly independent or if gcd(t, n) > 1, then the lifted data fail to compute indP Q. In that case, we repeat the entire process by selecting new pairs (si, ti) and/or new points RM,1, . . . , RM,r.
This completes our description of the XCM. See Silverman [277] for further details. No rigorous or heuristic analysis of the running time of the XCM is available in the literature. Practical experience (reported in Jacobson et al. [139]) shows that the algorithm is rather impractical. The predominant cause for failure of a trial of the XCM is that the probability that the points S2, . . . , Sr are linearly dependent is amazingly low. Suitable choices of the curve CM help us to construct curves ε of low rank, but not low enough, in general, to render S2, . . . , Sr linearly dependent. Larger values of r are expected to increase the probability of success in each trial, but it is not clear how to handle the values r > 9. Nevertheless, the XCM is a radically new idea to solve the ECDLP. As Joseph Silverman [277] says, “some of the ideas may prove useful in future work on ECDLP”.
| 4.24 | Let K be a field, and . Elements of μm are called the m-th roots of unity. Prove the following assertions.
|
| 4.25 | We use the notations of the last exercise and assume that #μm = m, that is, either char K = 0 or p := char K > 0 is coprime to m. In this case, a generator of μm is called a primitive m-th root of unity. If is a primitive m-th root of unity and ωr = 1 for some , then evidently m|r. In particular, m is the smallest of the exponents such that ωr = 1. The (monic) polynomial
where the product runs over all primitive m-th roots of unity, is called the m-th cyclotomic polynomial (over K). Clearly, deg Φm(X) = φ(m) (where φ is Euler’s totient function).
|
| 4.26 |
|
The hyperelliptic curve discrete logarithm problem (HECDLP) has attracted less research attention than the ECDLP. Surprisingly, however, there exist subexponential (index calculus) algorithms for solving the HECDLP over curves of large genus. Adleman, DeMarrais and Huang first proposed such an algorithm [2] (which we will refer to as the ADH algorithm). Enge [86] suggested some modifications of the ADH algorithm and provided rigorous analysis of its running time. Gaudry [105] simplified the ADH algorithm and even implemented it. Gaudry’s experimentation suggests that it is feasible to compute discrete logarithms in Jacobians of almost cryptographic sizes, given that the genus of the underlying curve is high (say ≥ 6). Enge and Gaudry [87] proved rigorously that as long as the genus g is greater than ln q (
being the field over which the curve is defined), the ADH algorithm (and its improvements) run in time L(qg, 1/2,
).
In what follows, we outline Gaudry’s version of the ADH algorithm and refer to this as the ADH–Gaudry algorithm. Let C : Y2 + u(X)Y = v(X) be a hyperelliptic curve of genus g defined over a finite field
. We assume that the cardinality of the Jacobian
is known and has a suitably large prime divisor m. We assume further that a reduced divisor
of order m is available, and we want to compute the discrete logarithm indα β of
with respect to α.
Recall that every reduced divisor
can be written uniquely as
, l ≤ g, where for i ≠ j the points Pi and Pj are not opposite of each other. Only ordinary points (not special points) may appear more than once in the list P1, . . . , Pl. We also know that such a divisor can be represented by a pair of unique polynomials a,
satisfying deg b < deg a ≤ g and a|(b2 + bu – v). In that case, we write D = Div(a, b). What interests us is the fact that the roots of the polynomial a are precisely the X-coordinates of the points P1, . . . , Pl. This fact leads to the very useful concepts of prime divisors and smooth divisors.
|
A divisor For an arbitrary divisor |
In order to set up a factor base B, we predetermine a smoothness bound δ and let B consist of all the prime divisors
with deg a ≤ δ. For simplicity, we take δ = 1. This is indeed a practical choice, when the genus g is not too large (say, g ≤ 9). Let
be an (irreducible) polynomial of degree 1. In order to find out
such that Div(a, b) is a prime divisor, we first see that deg b < deg a, that is,
. Furthermore, a|(b2 + bu – v): that is, b2 + bu – v ≡ 0 (mod X – h); that is, b2 + bu(h) – v(h) = 0. Thus, the desired values of
, if existent, can be found by solving a quadratic equation over
. There are q irreducible polynomials
of degree 1 and for each such a there are either two or no solutions for
. Assuming that both these possibilities are equally likely, we conclude that the size of the factor base is ≈ q.
In order to check for the smoothness of a divisor
over the factor base B, we first factor a over
. Under the assumption that δ = 1, the divisor D is smooth if and only if a splits completely over
. Let us write a(X) = (X –h1) ··· (X –hl),
. Then for some
we have
, where Di := Div(X – hi, ki). We may use trial divisions (that is, trial subtractions in this additive setting) by elements of B in order to determine the prime divisors D1, . . . , Dl of D. Proposition 4.5 establishes the probability that a randomly chosen element of
is smooth.
|
For q ≫ 4g2, there are approximately qg/g! (1-)smooth divisors in The assumption q ≫ 4g2 is practical, since we usually employ curves of (fixed) small genus g over finite fields |
Now, we have all the machinery required to describe the basic version of the index calculus method for computing indα β in
. In the first stage, we choose a random
and compute the (reduced) divisor jα and check if jα is smooth over the factor base B. Every smooth jα gives a relation: that is, a linear congruence modulo m involving the (unknown) indices of the elements of B to the base α. After sufficiently many (say, ≥ 2(#B)) such relations are found, the system of linear congruences collected is expected to be of full rank and is solved modulo m. This gives us the indices of the elements of the factor base. Each congruence collected above contains at most g non-zero coefficients and so the system is necessarily sparse. In the second stage, we find out a single random j for which β +jα is smooth. The database prepared in the first stage then immediately gives indα β.
The Hasse–Weil Bounds (3.8) on p 226 show that the cardinality of
is approximately qg. Thus O(g log q) bits are needed to represent an element of
. This fact is consistent with the representation of reduced divisors by pairs of polynomials. Gaudry [105] calculates that this variant of the ICM does O(q2 + g!q) operations, each of which takes polynomial time in the input size g log q. If g is considered to be constant, the running time becomes O(q2 logt q) (that is, O~(q2)) for some real t > 0. A square root method on
runs in (expected) time O~(qg/2). Thus for g > 4 the index calculus method performs better than the square root methods. Indeed Gaudry’s implementation of this algorithm is capable of computing in a few days discrete logs in the curve of genus 6 mentioned above. The Jacobian of this curve is of cardinality ≈ 1040.
For cryptographic purposes, we should have
. If we want to take q small (so that multi-precision arithmetic can be avoided), we should choose large values of g. But this choice makes the ADH–Gaudry algorithm quite efficient. For achieving the desired level of security in cryptographic applications, hyperelliptic curves of genus 2, 3 and 4 only are recommended.
So far we have seen many algorithms which require solving large systems of linear equations (or congruences). The number n of unknowns in such systems can be as large as several millions. Standard Gaussian elimination on such a system takes time O(n3) and space O(n2). There are asymptotically faster algorithms like Strassen’s method [292] that takes time O(n2.807) and Coppersmith and Winograd’s method [60] having a running time of O(n2.376). Unfortunately, these asymptotic estimates do not show up in the range of practical interest. Moreover, the space requirements of these asymptotically faster methods are prohibitively high (though still O(n2)).
Luckily enough, cryptanalytic algorithms usually deal with coefficient matrices that are sparse: that is, that have only a small number of non-zero entries in each row. For example, consider the system of linear congruences available from the relation collection stage of an ICM for solving the DLP over a finite field
. The factor base consists of a subexponential (in lg q) number of elements, whereas each relation involves at most O(lg q) non-zero coefficients. Furthermore, the sparsity of the resulting matrix A is somewhat structured in the sense that the columns of A corresponding to larger primes in the factor base tend to have fewer numbers of non-zero entries. In this regard, we refer to the interesting analysis by Odlyzko [225] in connection with the Coppersmith method (Section 4.4.4). Odlyzko took m = 2n equations in n unknown indices and showed that about n/4 columns of A are expected to contain only zero coefficients, implying that these variables never occurred in any relation collected. Moreover, about 0.346n columns of A are expected to have only single non-zero coefficients.
The sparsity (as well as the structure of the sparsity) of the coefficient matrix A can be effectively exploited and the system can be solved in time O~(n2). In this section, we describe some special algorithms for large sparse linear systems. In what follows, we assume that we want to compute the unknown n-dimensional column vector x from the given system of equations
Ax = b,
where A is an m × n matrix, m ≥ n, and where b is a non-zero m-dimensional column vector. Though this is not the case in general, we will often assume for the sake of simplicity that A has full rank (that is, n). We write vectors as column vectors, that is, an l-dimensional vector v with elements v1, . . . , vl is written as v = (v1 v2 . . . vl)t, where the superscript t denotes matrix transpose.
Before we proceed further, some comments are in order. First note that our system of equations is often one over the finite ring
which is not necessarily a field. Most of the methods we describe below assume that
is a field, that is, r is a prime. If r is composite, we can do the following. First, assume that the prime factorization
, αi > 0, of r is known. In that case, we first solve the system over the fields
for i = 1, . . . , s. Then for each i we lift the solution modulo pi to the solution modulo
. Finally, all these lifted solutions are combined using the CRT to get the solution modulo r.
Hensel lifting can be used to lift a solution of the system Ax ≡ b (mod p) to a solution of Ax ≡ b (mod pα), where p is a prime and
. We proceed by induction on α. Let us denote the (or a) solution of Ax ≡ b (mod p) by x1, which can be computed by solving a system in the field
. Now, assume that for some
we know (integer) vectors x1, . . . , xi such that
Equation 4.14

We then attempt to compute a vector xi+1 such that
Equation 4.15

Congruence (4.14) shows that the elements of A, x1, . . . , xi, b can be so chosen (as integers) that for some vector yi we have the equality
A(x1 + px2 + ··· + pi–1xi) = b – piyi
in
. Substituting this in Congruence (4.15) gives Axi+1 ≡ yi (mod p). Thus the (incremental) vector xi+1 can be obtained by solving a linear system in
.
It, therefore, suffices to know how to solve linear congruences modulo a prime p. However, problems arise, when we do not know the factorization of r (while solving Ax ≡ b (mod r)). If r is large, it would be a heavy investment to make attempts to factor r. What can be done instead is the following. First, we use trial divisions to extract the small prime factors of r. We may, therefore, assume that r has no small prime factors. We proceed to solve Ax ≡ b (mod r) assuming that r is a prime (that is, that
is a field). In a field, every non-zero element is invertible. But if r is composite, there are non-zero elements
which are not invertible (that is, for which gcd(a, r) > 1). If, during the course of the computation, we never happen to meet (and try to invert) such non-zero non-invertible elements, then the computation terminates without any trouble. Otherwise, such an element a corresponds to a non-trivial factor gcd(a, r) of r. In that case, we have a partial factorization of r and restart solving the system modulo each suitable factor of r.
Some of the algorithms we discuss below assume that A is a symmetric matrix. In our case, this is usually not the case. Indeed we have matrices A which are not even square. Both these problems can be overcome by trying to solve the modified system AtAx = At b. If A has full rank, this leads to an equivalent system.
If r = 2 (as in the case of the QSM for factoring integers), using the special methods is often not recommended. In this case, the elements of A are bits and can be packed compactly in machine words, and addition of rows can be done word-wise (say, 32 bits at a time). This leads to an efficient implementation of ordinary Gaussian elimination, which usually runs faster than the more complicated special algorithms described below, at least for the sizes of practical systems.
In what follows, we discuss some well-known methods for solving large sparse linear systems over finite fields (typically prime fields). In order to simplify notations, we will refrain from writing the matrix equalities as congruences, but treat them as equations over the underlying finite fields.
Structured Gaussian elimination is applied to a sparse system before one of the next three methods is employed to solve the system. If the sparsity of A has some structures (as discussed earlier), then structured Gaussian elimination tends to reduce the size of the system considerably, while maintaining the sparsity of the system. We now describe the essential steps of structured Gaussian elimination. Let us define the weight of a row or column of a matrix to be the number of non-zero entries in that row or column.
First we delete all the columns (together with the corresponding variables) that have weight 0. These variables never occur in the system and need not be considered at all.
Next we delete all the columns that have weight 1 and the rows corresponding to the non-zero entries in these columns. Each such deleted column correspond to a variable xi that appears in exactly one equation. After the rest of the system is solved, the value of xi is obtained by back substitution. Deleting some rows in the matrix in this step may expose some new columns of weight 1. So this step should be repeated, until all the columns have weight > 1.
Now, choose each row with weight 1. This gives a direct solution for the variable xi corresponding to the non-zero entry of the row. We then substitute this value of xi in all the equations where it occurs and subsequently delete the ith column. We repeat this step, until all rows are of weight > 1.
At this point, the system usually has many more equations than variables. We may make the system a square one by throwing away some rows. Since subtracting multiples of rows of higher weights tends to increase the number of non-zero elements in the matrix, we should throw away the rows with higher weights. While discarding the excess rows, we should be careful to ensure that we are not left with a matrix having columns of weight 0. Some columns in the reduced system may again happen to have weight 1. Thus, we have to repeat the above steps again. And again and again and . . . , until we are left with a square matrix each row and column of which has weight ≥ 2.
This procedure leads to a system which is usually much smaller than the original system. In a typical example quoted in Odlyzko [225], structured Gaussian elimination reduces a system with 16,500 unknowns to one with less that 1,000 unknowns. The resulting reduced system may be solved using ordinary Gaussian elimination which, for smaller systems, appears to be much faster than the following sophisticated methods.
The conjugate gradient method was originally proposed to solve a linear system Ax = b over
for an n × n (that is, square) symmetric positive definite matrix A and for a nonzero vector b and is based on the idea of minimizing the quadratic function
. The minimum is attained, when the gradient ∇f = Ax – b equals zero, which corresponds to the solution of the given system.
The conjugate gradient method is an iterative procedure. The iterations start with an initial minimizer x0 which can be any n-dimensional vector. As the iterations proceed, we obtain gradually improved minimizers x0, x1, x2, . . . , until we reach the solution. We also maintain and update two other sequences of vectors ei and di. The vector ei stands for the error b – Axi, whereas the vectors d0, d1, . . . constitute a set of mutually conjugate (that is, orthogonal) directions. We initialize e0 = d0 = b – Ax0 and for i = 0, 1, . . . repeat the steps of Algorithm 4.8, until ei = 0. We denote the inner product of two vectors v = (v1 v2 . . . vn)t and w = (w1 w2 . . . wn)t by
.
|
ai := 〈ei, ei〉/〈di, Adi〉. xi+1 := xi + aidi. ei+1 := ei – aiAdi. bi := 〈ei+1, ei+1〉/〈ei, ei〉. di+1 := ei+1 + bidi. |
This method computes a set of mutually orthogonal directions d0, d1, . . . , and hence it has to stop after at most n – 1 iterations, since we run out of new orthogonal directions after n – 1 iterations. Provided that we work with infinite precision, we must eventually obtain ei = 0 for some i, 0 ≤ i ≤ n – 1.
If A is sparse, that is, if each row of A has O(logc n) non-zero entries, c being a positive constant, then the product Adi can be computed using O~(n) field operations. Other operations clearly meet this bound. Since at most n – 1 iterations are necessary, the conjugate gradient method terminates after performing O~(n2) field operations.
We face some potential problems, when we want to apply this method to solve a system over a finite field
. First, the matrix A is usually not symmetric and need not even be square. This problem can be avoided by solving the system AtAx = At b. The new coefficient matrix AtA may be non-sparse (that is, dense). So instead of computing and working with AtA explicitly, we compute the product (AtA)di as At (Adi), that is, we avoid multiplication by a (possibly) dense matrix at the cost of multiplications by two sparse matrices.
The second difficulty with a finite field
is that the question of minimizing an
-valued function makes hardly any sense (and so does positive definiteness of a matrix over
). However, the conjugate gradient method is essentially based on the generation of a set of mutually orthogonal vectors d0, d1, . . . . This concept continues to make sense in the setting of a finite field.
If A is a real positive definite matrix, we cannot have 〈di, Adi〉 = 0 for a nonzero vector di. But this condition need not hold for a matrix A over
. Similarly, we may have a non-zero error vector ei over
, for which 〈ei, ei〉 = 0. (Again this is not possible for real vectors.) So for the iterations over
(more precisely, the computations of ai and bi) to proceed gracefully, all that we can hope for is that before reaching the solution we never hit a non-zero direction vector di for which 〈di, Adi〉 = 0 nor a non-zero error vector ei for which 〈ei, ei〉 = 0. If q is sufficiently large and if the initial minimizer x0 is sufficiently randomly chosen, then the probability of encountering such a bad di or ei is rather low and as a result the method is very likely to terminate without problems. If, by a terrible stroke of bad luck, we have to abort the computation prematurely, we should restart the procedure with a new random initial vector x0. If q is small (say q = 2 as in the case of the QSM), it is a neater idea to select the entries of the initial vector x0 from a field extension
and work in this extension. The eventual solution we will reach at will be in
, but working in the larger field decreases the possibility of an attempt of division by 0.
There is, however, a brighter side of using a finite field
in place of
, namely every calculation we perform in
is exact, and we do not have to bother about a criterion for determining whether an error vector ei is zero or about the conditioning of the matrix A. One of the biggest headaches of numerical analysis is absent here.
The Lanczos method is another iterative method quite similar to the conjugate gradient method. The basic difference between these methods lies in the way by which the mutually conjugate directions d0, d1, . . . are generated. For the Lanczos method, we start with the initializations: d0 := b,
,
, x0 = a0d0. Then, for i = 1, 2, . . . , we repeat the steps in Algorithm 4.9 as long as
.
|
vi+1 := Adi.
xi := xi–1 + aidi. |
If A is a real positive definite matrix, the termination criterion
is equivalent to the condition di = 0. When this is satisfied, the vector xi–1 equals the desired solution x of the system Ax = b. Since d0, d1, . . . are mutually orthogonal, the process must stop after at most n – 1 iterations. Therefore, for a sparse matrix A, the entire procedure performs O~(n2) field operations.
The problems we face with the Lanczos method applied to a system over
are essentially the same as those discussed in connection with the conjugate gradient method. The problem with a non-symmetric and/or non-square matrix A is solved by multiplying the system by At. Instead of working with AtA explicitly, we prefer to multiply separately by A and At.
The more serious problem with a system over
is that of encountering a non-zero direction vector di with
. If it happens, we have to abort the computation prematurely. In order to restart the procedure, we try to solve the system BAx = Bb, where B is a diagonal matrix whose diagonal elements are chosen randomly from the non-zero elements of the field
or of some suitable extension
(if q is small).
The Wiedemann method for solving a sparse system Ax = b over
uses ideas different from those employed by the other methods discussed so far. For the sake of simplicity, we assume that A is a square non-singular matrix (not necessarily symmetric). The Wiedemann method tries to compute the minimal polynomial
, d ≤ n, of A. To that end, one selects a small positive integer l in the range 10 ≤ l ≤ 20. For
, let vi denote the column vector of length l consisting of the first l entries of the vector Aib. For the working of the Wiedemann method, we need to compute only the vectors v0, . . . , v2n. If A is a sparse matrix, this computation involves a total of O~(n2) operations in
.
Since μA(A) = 0, we have
for every
. Therefore, for each k = 1, . . . , l the sequence v0,k, v1,k, . . . of the k-th entries of v0, v1, . . . satisfies the linear recurrence

But then the minimal polynomial μk(X) of the k-th such sequence is a factor of μA(X). There are methods that compute each μk(X) using O(n2) field operations. We then expect to obtain μA(X) = lcm(μk(X) | 1 ≤ k ≤ l).
The assumption that A is non-singular is equivalent to the condition that c0 ≠ 0. In that case, the solution vector
can be computed using O~(n2) arithmetic operations in the field
.
If A is singular, we may find out linear dependencies among the rows of A and subsequently throw away suitable rows. Doing this repeatedly eventually gives us a non-singular A. For further details on the Wiedemann method, see [303].
In this section, we assume that
be a knapsack set. For
, we are required to find out
such that
, provided that a solution exists. In general, finding such a solution for ∊1, . . . , ∊n is a very difficult problem.[6] However, if the weights satisfy some specific bounds, there exist polynomial-time algorithms for solving the SSP.
[6] In the language of complexity theory, the decision problem of determining whether a solution of the SSP exists is NP-complete.
Let us first define an important quantity associated with a knapsack set:
|
The density of the knapsack set |
If d(A) > 1, then there are, in general, more than one solutions for the SSP (provided that there exists one solution). This makes the corresponding knapsack set A unsuitable for cryptographic purposes. So we consider low densities: that is, the case that d(A) ≤ 1.
There are certain algorithms that reduce in polynomial time the problem of finding a solution of the SSP to that of finding a shortest (non-zero) vector in a lattice. Assuming that such a vector is computable in polynomial time, Lagarias and Odlyzko’s reduction algorithm [157] solves the SSP in polynomial time with high probability, if d(A) ≤ 0.6463. An improved version of the algorithm adapts to densities d(A) ≤ 0.9408 (see Coster et al. [64] and Coster et al. [65]). The reduction algorithm is easy and will be described in Section 4.8.1. However, it is not known how to efficiently compute a shortest non-zero vector in a lattice. The Lenstra–Lenstra–Lovasz (L3) polynomial-time lattice basis reduction algorithm [166] provably finds out a non-zero vector whose length is at most the length of a shortest non-zero vector, multiplied by a power of 2. In practice, however, the L3 algorithm tends to compute a shortest vector quite often. Section 4.8.2 deals with the L3 lattice basis reduction algorithm.
Before providing a treatment on lattices, let us introduce a particular case of the SSP, which is easily (and uniquely) solvable.
|
A knapsack set {a1, . . . , an} with a1 < ··· < an is said to be superincreasing, if |
Algorithm 4.10 solves the SSP for a superincreasing knapsack set in deterministic polynomial time. The proof for the correctness of this algorithm is easy and left to the reader.
|
Input: A superincreasing knapsack set {a1, . . . , an} with a1 < ··· < an and Output: The (unique) solution for Steps: for i = n, n – 1, . . . , 1 { |
We start by defining a lattice.
|
Let n,
We say that v1, . . . , vd constitute a basis of L. |
In general, a lattice may have more than one basis. We are interested in bases consisting of short vectors, where the concept of shortness is with respect to the following definition.
|
Let v := (v1, . . . , vn)t and w := (w1, . . . , wn)t be two n-dimensional vectors in 〈v, w〉 := v1w1 + ··· + vnwn, and the length of v is defined as
|
For the time being, let us assume the availability of a lattice oracle which, given a lattice, returns a shortest non-zero vector in the lattice. The possibilities for realizing such an oracle will be discussed in the next section.
Consider the subset sum problem with the knapsack set A := {a1, . . . , an} and let B be an upper bound on the weights (that is, each ai ≤ B). For
, we are supposed to find out
such that
. Let L be the n+1-dimensional lattice in
generated by the vectors

where N is an integer larger than
. The vector
is in the lattice L, where
. Involved calculations (carried out in Coster et al. [64, 65]) show that the probability P of the existence of a vector
with ‖w‖ ≤ ‖v‖ satisfies
, where c ≈ 1.0628. Now, if the density d(A) of A is less than 1/c ≈ 0.9408, then B = 2c′n for some c′ > c and, therefore, P → 0 as n → ∞. In other words, if d(A) < 0.9408, then, with a high probability, ±v are the shortest non-zero vectors of L. The lattice oracle then returns such a vector from which the solution ∊1, . . . , ∊n can be readily computed.
Let L be a lattice in
specified by a basis of n linearly independent vectors v1, . . . , vn. We now construct a basis
of
such that
(that is,
and
are orthogonal to each other) for all i, j, i ≠ j. Note that
need not be a basis for L. Algorithm 4.11 is known as the Gram–Schmidt orthogonalization procedure.
|
Input: A basis v1, . . . , vn of Output: The Gram–Schmidt orthogonalization Steps:
|
One can easily verify that
constitute an orthogonal basis of
. Using these notations, we introduce the following important concept:
|
The basis v1, . . . , vn is called a reduced basis of L, if Equation 4.16
and Equation 4.17
|
A reduced basis v1, . . . , vn of L is termed so, because the vectors vi are somewhat short. More precisely, we have Theorem 4.5, the proof of which is not difficult, but is involved, and is omitted here.
|
Let v1, . . . , vn be a reduced basis of a lattice L, and
for all i = 1, . . . , m. In particular, for any non-zero vector w of L we have ‖ v1‖2 ≤ 2n–1‖w‖2. That is, for a reduced basis v1, . . . , vn of L the length of v1 is at most 2(n–1)/2 times that of the shortest non-zero vector in L. |
Given an arbitrary basis v1, . . . , vn of a lattice L, the L3 basis reduction algorithm computes a reduced basis of L. The algorithm starts by computing the Gram–Schmidt orthogonalization
of v1, . . . , vn. The rational numbers μi,j are also available from this step. We also obtain as byproducts the numbers
for i = 1, . . . , n.
Algorithm 4.12 enforces Condition (4.16) |μk,l| ≤ 1/2 for a given pair of indices k and l. The essential work done by this routine is subtracting a suitable multiple of vl from vk and updating the values μk,1, . . . , μk,l accordingly.
|
Input: Two indices k and l. Output: An update of the basis vectors to ensure |μk,l| ≤ 1/2. Steps:
vk := vk – rvl. for h = 1, . . . , l – 1 {μk,h := μk,h – rμl,h. } μk,l := μk,l – r. |
If Condition (4.17) is not satisfied by some k, that is, if
, then vk and vk–1 are swapped. The necessary changes in the values Vk, Vk–1 and certain μi,j’s should also be incorporated. This is explained in Algorithm 4.13.
|
Input: An index k. Output: An update of the basis vectors to ensure Steps: μ := μk,k–1. V := Vk + μ2Vk–1. |
The main basis reduction algorithm is described in Algorithm 4.14. It is not obvious that this algorithm should terminate at all. Consider the quantity D := d1 · · · dn–1, where di := | det(〈vk, vl〉)1≤k,l≤i| for each i = 1, . . . , n. At the beginning of the basis reduction procedure one has di ≤ Bi for all i = 1, . . . , n, where B := max(|vi|2 | 1 ≤ i ≤ n). It can be shown that an invocation of Algorithm 4.12 does not alter the value of D, whereas interchanging vi and vi–1 in Algorithm 4.13 decreases D by a factor < 3/4. It can also be shown that for any basis of L the value D is bounded from below by a constant which depends only on the lattice. Thus, Algorithm 4.14 stops after finitely many steps.
|
Input: A basis v1, . . . , vn of a lattice L. Output: v1, . . . , vn converted to a reduced basis. Steps: Compute the Gram–Schmidt orthogonalization of v1, . . . , vn (Algorithm 4.11). /* The initial values of μi,j and Vi are available at this point */ |
For a more complete treatment of the L3 basis reduction algorithm, we refer the reader to Lenstra et al. [166] (or Mignotte [203]). It is important to note here that the L3 basis reduction algorithm is at the heart of the Lenstra–Lenstra–Lovasz algorithm for factoring a polynomial in
. This factoring algorithm indeed runs in time polynomially bounded by the degree of the polynomial to be factored and is one of the major breakthroughs in the history of symbolic computing.
| 4.27 | Let be a knapsack set. Show that:
|
| 4.28 | Let L be a lattice in and let v1, . . . , vn constitute a basis of L. The determinant of L is defined by
det L := det(v1, . . . , vn).
|
This chapter introduces the most common computationally intractable mathematical problems on which the security of public-key cryptosystems banks. We also describe some algorithms known till date for solving these difficult computational problems.
To start with, we enumerate these computational problems. The first problem in the row is the integer factorization problem (IFP) and its several variants. Some problems that are provably or believably equivalent to the IFP are the totient problem, problems associated with the RSA algorithm, and the modular square root problem. The next class of problems includes the discrete logarithm problem (DLP) and its variants on elliptic curves (ECDLP) and hyperelliptic curves (HECDLP). The Diffie–Hellman problem (DHP) and its variants (ECDHP, HECDHP) are believed to be equivalent to the respective variants of the DLP. Finally, the subset sum problem (SSP) and two related problems, namely the shortest vector problem (SVP) and the closest vector problem (CVP) on lattices, are introduced.
The subsequent sections are devoted to an algorithmic study of these difficult problems. We start with IFP. We first present some fully exponential algorithms like trial division, Pollard’s rho method, Pollard’s p – 1 method and Williams’ p + 1 method. Next we describe the modern genre of subexponential algorithms. The quadratic sieve method (QSM) is discussed at length together with its heuristic improvements like incomplete sieving, large prime variation and the multiple polynomial variant. We also describe TWINKLE, a hardware device that efficiently implements the sieving stage of the QSM. We then discuss the elliptic curve method (ECM) and the number field sieve method (NFSM) for factoring integers. The NFSM turns out to be the asymptotically fastest known algorithm for factoring integers.
The (finite field) DLP is discussed next. The older square-root methods, such as Shanks’ baby-step–giant-step method (BSGS), Pollard’s rho method and the Pohlig–Hellman method (PHM), take exponential running times in the worst case. The PHM for a field
is, however, efficient if q – 1 has only small prime factors. Next we discuss the modern family of algorithms collectively known as the index calculus method (ICM). For prime fields, we discuss three variants of the ICM, namely the basic method, the linear sieve method (LSM) and the number field sieve method (NFSM). We also discuss three variants of the ICM for fields of characteristic 2: the basic method, the linear sieve method and Coppersmith’s algorithm. Another interesting variant is the cubic sieve method (CSM) covered in the exercises. We explain Gordon and McCurley’s polynomial sieving in connection with Coppersmith’s algorithm.
The next section deals with algorithms for solving the ECDLP. For a general elliptic curve, the exponential square-root methods are the only known algorithms. For some special classes of curves, more efficient methods are proposed in the literature. The MOV reduction based on Weil pairing reduces ECDLP on a curve over
to DLP in the finite field
for some suitable
. This k is small and the reduction is efficient for supersingular curves. The SmartASS method (also called the anomalous method) reduces the ECDLP in an anomalous curve to the computation of p-adic discrete logarithms. This reduction solves the original DLP in polynomial time. In view of these algorithms, it is preferable to avoid supersingular and anomalous curves in cryptographic applications. The xedni calculus method (XCM) is discussed finally. This algorithm works by lifting a curve over
to a curve over
. Experimental and theoretical evidences suggest that the XCM is not an efficient solution to the ECDLP.
We then devote a section to the study of an index calculus method to solve the HECDLP. For hyperelliptic curves of small genus, this method leads to a subexponential algorithm (the ADH–Gaudry algorithm).
Many of the above subexponential methods require solving a system of linear congruences over finite rings. This (inherently sequential) linear algebra part often turns out to be the bottleneck of the algorithms. However, the fact that these equations are necessarily sparse can be effectively exploited, and some faster algorithms can be used to solve these systems. We study four such algorithms: structured Gaussian elimination, the conjugate gradient method, the Lanczos method and the Wiedemann method.
In the last section, we study the subset sum problem. We first reduce the SSP to problems associated with lattices. We finally present the lattice-basis reduction algorithm due to Lenstra, Lenstra and Lovasz.
Several other computationally intractable problems have been proposed in the literature for building cryptographic systems. Some of these problems are mentioned in the annotated references of Chapter 5. Due to space and time limitations, we will not discuss these problems in this book.
The integer factorization problem is one of the oldest computational problems. Though the exact notion of computational complexity took shape only after the advent of computers, the apparent difficulty of solving the factorization problem has been noticed centuries ago. Crandall and Pomerance [69] call it the fundamental computational problem of arithmetic. Numerous books and articles provide discussions on this subject at varying levels of coverage. Crandall and Pomerance [69] is perhaps the most extensive in this regard. The reader can also take a look at Bressoud’s (much simpler) book [36] or the (compact, yet reasonably detailed) Chapter 10 of Henri Cohen’s book [56]. The articles by Lenstra et al. [164] and by Montgomery [211] are also worth reading.
John M. Pollard has his name attached to three modern inventions in the arena of integer factorization. In [238, 239], he introduces the rho and p – 1 methods. (Later he has been part of the team that has designed the number-field sieve factoring algorithm.) Williams’ p + 1-method appears in 1982 in [305].
The continued fraction method (CFRAC) is apparently the first known subexponential-time integer factoring algorithm. It is based on the work of Lehmer and Powers [162] and first appears in the currently used form in Morrison and Brillhart’s paper [213]. CFRAC happens to be the most widely used integer factoring algorithm used during late 1970s and early 1980s.
The quadratic sieve method, invented by Carl Pomerance [241] in 1984, supersedes the CFRAC method. The multiple-polynomial QSM appears in Silverman [279]. Hendrik Lenstra’s elliptic curve method [174] is proposed almost concurrently as the QSM. Nowadays, the QSM and the ECM are the most commonly used factoring methods. Reyneri’s cubic sieve method is described in Lenstra and Lenstra [165].
The theoretically superior number field sieve method follows from Pollard’s factoring method using cubic integers [240]. The initial proposal for the NFS method is that of the simple NFS and appears in Lenstra et al. [167]. It is later modified to the general NFS method in Buhler et al. [41]. Lenstra and Lenstra [165] is a compilation of papers on the NFS method. Though the NFS method is the asymptotically fastest factoring method, its fairly complicated implementation makes the algorithm superior to QSM or ECM, only when the bit size of the integer to be factored is reasonably large.
Shamir’s factoring engine TWINKLE is proposed in [269]. A. K. Lenstra and Shamir analyse and optimize its design in [168]. Shamir and Tromer [270] have proposed a device called TWIRL (The Weizmann Institute Relation Locator) that is geared to the NFS factoring method. It is estimated that a TWIRL implementation costing US$10K can complete the sieving for a 512-bit RSA modulus in less than 10 minutes, whereas one that does the same for a 1024-bit RSA modulus costs US$10–50M and takes a time of one year. Lenstra et al. [163] provide a more detailed analysis of these estimates. See Lenstra et al. [169] to know about Bernstein’s factorization circuit which is another implementation of the NFS factoring method.
The (finite field) discrete logarithm problem also invoked much research in the last few decades. The older square-root methods are described well in the book [191] by Menezes. Donald Knuth attributes the baby-step–giant-step method to Daniel Shanks. See Stein and Teske [290] for various optimizations of the baby-step–giant-step method. Pollard’s rho method is an adaptation of the same method for integer factorization. See Pohlig and Hellman [234] for the Pohlig–Hellman method.
The first idea of the index calculus method appears in Western and Miller [302]. Coppersmith et al. [59] describe three variants of the index calculus method: the linear sieve method, the residue list sieve method and the Gaussian integer method. The same paper also proposes the cubic sieve method (CSM). LaMacchia and Odlyzko [158] describe an implementation of the linear sieve and the Gaussian integer methods. Das and Veni Madhavan [73] make an implementation study of the CSM. Also look at the survey [189] by McCurley.
Gordon [119] uses number field sieves for computing discrete logarithms over prime fields. Weber et al. [261, 299, 300, 301] have implemented and proved the practicality of the number field sieve method. Also see Schirokauer’s paper [260].
Odlyzko [225] surveys the algorithms for computing discrete logs in the fields
. The best algorithm for these fields is Coppersmith’s algorithm [57]. No analog of this algorithm is known for prime fields. Gordon and McCurley [120] use Coppersmith’s algorithm for the computation of discrete logarithms in
and
.
The article [226] by Odlyzko and the one [242] by Pomerance are two recent surveys on the finite field discrete logarithm problem. Also see Buchmann and Weber [40].
The elliptic curve discrete logarithm problem seems to be a very difficult computational problem. A direct adaptation of the index calculus method is expected to lead to a running time worse than that of brute-force search (Silverman and Suzuki [278] and Blake et al. [24].) Menezes et al. [193] reduce the problem of computing discrete logs in an elliptic curve over
to computing discrete logs in the field
for some k. For supersingular elliptic curves, this k can be chosen to be small. For a general curve, the MOV reduction takes exponential time (Balasubramanian and Koblitz [16]). The SmartASS method is due to Smart [282], Satoh and Araki [257] and Semaev [265]. Joseph H. Silverman proposes the xedni calculus method in [277]. This method has been experimentally and heuristically shown to be impractical by Jacobson et al. [139].
Adleman et al. [2] propose the first subexponential algorithm for the hyperelliptic curve discrete log problem. This algorithm is applicable for curves of high genus over prime fields. The analysis of its running time is based on certain heuristic assumptions. Enge [86] provides a subexponential algorithm which has a rigorously provable running time and which works for curves over any arbitrary field
. Again, the algorithm demands curves of high genus. An implementation of the Adleman–DeMarrais–Huang algorithm is given by Gaudry [105]. Also see Enge and Gaudry [87].
Gaudry et al. [107] propose a Weil-descent attack for the hyperelliptic curve discrete log problem. This is modified in Galbraith [100] and Galbraith et al. [101].
Coppersmith et al. [59] describe sparse system solvers. LaMacchia and Odlyzko [159] implement these methods. For further details, see Montgomery [212], Coppersmith [58], Wiedemann [303], and Yang and Brent [306].
That public-key cryptosystems can be based on the subset-sum problem (or the knapsack problem) was considered at the beginning of the era of public-key cryptography. Historically the first realization of a public-key system is based along this line and is due to Merkle and Hellman [196]. But the Merkle–Hellman system and several variants of it are broken; see Shamir [266], for example. At present, most public-key systems based on the subset-sum problem are known to be insecure.
The lattice-basis reduction algorithm and the associated L3 algorithm for factoring polynomials appear in the celebrated work [166] of Lenstra, Lenstra and Lovasz. Mignotte’s book [203] also describes these topics in good details.
| 5.1 | Introduction |
| 5.2 | Secure Transmission of Messages |
| 5.3 | Key Exchange |
| 5.4 | Digital Signatures |
| 5.5 | Entity Authentication |
| Chapter Summary | |
| Sugestions for Further Reading |
An essential element of freedom is the right to privacy, a right that cannot be expected to stand against an unremitting technological attack.
—Whitfield Diffie
Mary had a little key (It’s all she could export), and all the email that she sent was opened at the Fort.
—Ronald L. Rivest
Treat your password like your toothbrush. Don’t let anybody else use it, and get a new one every six months.
—Clifford Stoll
As we pointed out in Chapter 1, cryptography tends to guard sensitive data from unauthorized access. We shortly describe some algorithms that achieve this goal. We restrict ourselves only to public-key algorithms. In practice, however, public-key algorithms are used in tandem with secret-key algorithms. In this chapter, we describe only the basic routines to which are input mathematical entities like integers, points in finite fields or on curves. Message encoding will be dealt with in Chapter 6.
Consider the standard scenario: a party named Alice, and called sender, is willing to send a secret message m to a party named Bob, and called receiver or recipient, over a public communication channel. A third party Carol may intercept and read the message. In order to maintain the secrecy of the message, Alice uses a well-defined transform fe to convert the plaintext message m to the ciphertext message c and sends c to Bob. Bob possesses some secret information with the help of which he uses the reverse transformation fd in order to get back m. Carol who is expected not to know the secret information cannot retrieve m from c by applying the transformation fd.
In a public-key system, the realization of the transforms fe and fd is based on a key pair (e, d) predetermined by Bob. The public key e is made public, whereas the private key d is kept secret. The encryption transform generates c = fe(m, e). Since e is a public knowledge, anybody can generate c from a given m, whereas the decryption transform m = fd(c, d) can be performed only by Bob who possesses the knowledge of d. The key pair has to be so chosen that knowledge of e does not allow Carol to compute d in feasible time. The intractability of the computational problems discussed in Chapter 4 can be exploited to design such key pairs. The exact realization of the keys e, d and the transforms fe, fd depends on the choice of the underlying intractable problem and also on the way to make use of the problem. Since there are several intractable problems suitable for cryptography, there are several encryption schemes varying widely in algorithmic and mathematical details.
RSA has been the most popular encryption algorithm. Historically also, it is the first public-key encryption algorithm published in the literature (see Rivest et al. [252]). Its security is based on the intractability of the RSAP (or the RSAKIP) discussed in Exercise 4.2. Since both these problems are polynomial-time reducible to the IFP, we often say that the RSA algorithm derives its security from the intractability of the IFP. It may, however, be the case that breaking RSA is easier than factoring integers, though no concrete evidences seem to be available.
Algorithm 5.1 generates a key pair for RSA.
|
Output: A random RSA key pair. Steps: Generate two different random primes p and q each of bit length l. n := pq. Choose an integer e coprime to φ(n) = (p – 1)(q – 1). d := e–1 (mod φ(n)). Return the pair (n, e) as the public key and the pair (n, d) as the private key. |
The length l of the primes p and q should be chosen large enough so as to make the factorization of n infeasible. For short-term security, values of l between 256 and 512 suffice. For long-term security, one may choose l as large as 2,048.
The random primes p and q can be generated using a probabilistic algorithm like those described in Section 3.4.2. Naive primes are normally considered to be sufficiently secure in this respect, since p ± 1 and q ± 1 are expected to have large prime factors in general. Gordon’s algorithm (Algorithm 3.14) can also be used for generating strong primes p and q. Since Gordon’s algorithm runs only nominally slower than the algorithm for generating naive primes, there is no harm in using strong primes. Safe primes, on the other hand, are difficult to generate and may be avoided.
The RSA modulus n is public knowledge. Determining d from n and e is easily doable, given the value of φ(n) = (p – 1)(q – 1) which, in turn, is readily computable, if p and q are known. If an adversary can compute φ(n) (with or without factoring n), the security of the RSA protocol based on the modulus n is compromised. However, computing φ(n) without the knowledge of p and q is (at least historically) a very difficult computational problem, and so, if n is reasonably large, RSA encryption is assumed to be sufficiently secure.
RSA encryption is done by raising the plaintext message m to the power e modulo n. In order to speed up this (modular) exponentiation, it is often expedient to take a small value for e (like 3, 257 and 65,537). However, in that case one should adopt certain precautions as Exercise 5.2 suggests. More specifically, if e entities share a common (small) encryption key e but different (pairwise coprime) moduli and if the same message m is encrypted using all these public keys, then an eavesdropper can reconstruct m easily from a knowledge of the e ciphertext messages. Another potential problem of using small e is that if m is small, that is, if m < n1/e, then m can be retrieved by taking the integer e-th root of the ciphertext message.
Although the pair (n, d) is sufficient for carrying out RSA decryption, maintaining some additional (secret) information significantly speeds up decryption. To this end, it is often recommended that some or all of the values n, e, d, p, q, d1, d2, h be stored, where d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p).
If n can be factored, then d can be easily computed from the public key (n, e). Conversely, if n, e, d are all known, there is an efficient probabilistic algorithm which factors n. This algorithm is based on the fact that if ed – 1 = 2st with t odd, then for at least half of the integers
there exists
such that a2σt ≢ ±1 (mod n), whereas a2σ+1t ≡ 1 (mod n). But then the gcd of n and a2σt – 1 is a non-trivial factor of n. For the details, solve Exercise 7.9.
Different entities in a given network should use different values of n. If two or more entities share a common n but different exponent pairs (ei, di), then each entity can first factor n and then use this factorization to compute the private keys of other entities. Primes are quite abundant in nature and so finding pairwise coprime RSA moduli for all entities is no problem at all. A common value of the encryption exponent e (for example, a small value of e) can, however, be shared by all entities. In that case, for pairwise different moduli ni, the corresponding decryption exponents di will also be pairwise different.
RSA encryption is rather simple, as Algorithm 5.2 shows.
|
Input: The RSA public key (n, e) of the recipient and the plaintext message Output: The ciphertext message Steps: c := me (mod n). |
By Exercise 4.1, the exponentiation function m ↦ me is bijective; so m can be uniquely recovered from c. It is clear why small encryption exponents e speed up RSA encryption. For a general exponent e, the routine takes time O(log3 n), whereas for a small e (that is, e = O(1)) the running time drops to O(log2 n).
RSA decryption (Algorithm 5.3) is analogous to RSA encryption.
|
Input: The RSA private key (n, d) of the recipient and the ciphertext message Output: The recovered plaintext message Steps: m := cd (mod n). |
The correctness of this decryption procedure follows from Exercise 4.1. As in the case of encryption, one might go for small decryption exponents d. In general, both e and d cannot be small simultaneously. If e is small, the security of the RSA scheme is expected not be affected, whereas small values of d are not desirable for several reasons. First, if d is very small, the adversary chooses some m, computes the corresponding ciphertext c (using public knowledge) and then keeps on computing cx (mod n) for x = 1, 2, . . . until x = d is reached, that is, until the original message m is recovered.
Even when d is not very small so that the possibility of exhaustive search with x = 1, 2, . . . can be precluded, there are several attacks known for small private exponents. Wiener [304] proposes an efficient algorithm in this respect. Boneh and Durfee [32] improve Wiener’s algorithm. Sun et al. [294] propose three variants of the RSA scheme that are resistant to these attacks. Durfee and Nguyen [82] extend the Boneh–Durfee attack to break two of these three variants. To sum up, it is advisable not to use small secret exponents d, that is, the bit length of d should be close to that of n in order to achieve the desired level of security.
There are alternative ways to speed up RSA decryption. If the values p, q, d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p) are all available to the recipient, he can use Algorithm 5.4 for RSA decryption.
|
Input: The RSA extended private key (p, q, d1, d2, h) of the recipient and the ciphertext message Output: The recovered plaintext message Steps: m1 := cd1 (mod p). m2 := cd2 (mod q). t := h(m1 – m2) (mod p). m := m2 + tq. |
In this modified routine, m1 := m rem p and m2 := m rem q are first computed and then combined using the CRT to get m modulo n = pq. Algorithm 5.3 performs a single modular exponentiation modulo n, whereas in Algorithm 5.4 two exponentiations modulo p and q respectively take the major portion of the running time. Since an exponentiation modulo N to an exponent O(N) runs in time O(log3 N), and since each of p and q has bit length (about) half of that of n, Algorithm 5.4 runs about four times as fast as Algorithm 5.3.
If only the values p, q, d are stored, then d1, d2 and h can be computed on the fly using relatively inexpensive operations and subsequently Algorithm 5.4 can be used. This leads to a decryption routine almost as fast as Algorithm 5.4, but calls for somewhat smaller memory requirements for the storage of the private key.
The Rabin public-key encryption algorithm is based on the intractability of computing square roots modulo a composite integer (SQRTP). By Exercise 4.10, the SQRTP is probabilistically polynomial-time equivalent to the IFP, that is, breaking the Rabin scheme is provably as hard as factoring integers. Breaking RSA, on the other hand, is only believed to be equivalent to factoring integers. Moreover, Rabin encryption is faster than RSA encryption (for moduli of the same size).
Like RSA, Rabin encryption requires a modulus of the form n = pq.
|
Output: A random Rabin key pair. Steps: Generate two different random primes p and q each of bit length l. n := pq. Return n as the public key and the pair (p, q) as the private key. |
Here, the choice of the bit length l and the generation of the primes p and q follow the same guidelines as discussed in connection with RSA key generation.
Encryption in the Rabin scheme involves a single modular squaring.
|
Input: The Rabin public key n of the recipient and the plaintext message Output: The ciphertext message Steps: c := m2 (mod n). |
Unfortunately, the Rabin encryption map m ↦ m2 (mod n) is not injective. In general, a ciphertext c has four square roots modulo n.[1] This poses ambiguity during decryption. In order to work around this difficulty, one adds some distinguishing feature or redundancy to the message m before encryption. One possibility is to duplicate a predetermined number of bits at the least significant end of m. This reduces the message space somewhat, but is rarely a serious issue. Only one of the (four) square roots of the ciphertext c is expected to have the desired redundancy. If none or more than one square root possesses the redundancy, decryption fails. However, this is a very rare phenomenon and can be ignored for all practical purposes.
[1] More specifically, if an element
is a square modulo both p and q, then the number of square roots of c equals 1 if c = 0; it is 2 if either c ≡ 0 (mod p) or c ≡ 0 (mod q) but not both; and it is 4 if c ≢ 0 (mod p) and c ≢ 0 (mod q). If c is not a square modulo either p or q, then c does not possess a square root modulo n. These assertions can be readily proved using the Chinese remainder theorem.
Rabin decryption (Algorithm 5.7) involves computing square roots modulo n. Since n is composite, this is a very difficult problem (for the eavesdropper). But the knowledge of the prime factors p and q of n allows the recipient to decrypt.
|
Input: The Rabin private key (p, q) of the recipient and the ciphertext message Output: The recovered plaintext message Steps: if
if (c has exactly one distinguished square root m mod n) { Return m. } else { Return “failure”. } |
So far, we have encountered encryption algorithms that are deterministic in the sense that for a given public key of the recipient the same plaintext message encrypts to the same ciphertext message. In a probabilistic encryption algorithm, different calls of the encryption routine produce different ciphertext messages for the same plaintext message and public key.
The Goldwasser–Micali encryption algorithm is probabilistic and is based on the intractability of the quadratic residuosity problem (QRP) described in Exercise 4.2. If n is a composite integer and a an integer coprime to n, then
implies that a is a quadratic non-residue modulo n. The converse does not hold, that is, one may have
, even when a is a quadratic non-residue modulo n. For example, if n is the product of two distinct odd primes p and q, then a is a quadratic residue modulo n if and only if a is a quadratic residue modulo both p and q. However, if
, we continue to have
. There is no easy way to find out if a is a quadratic residue modulo n for an integer a with
. If the factorization of n is available, the QRP is solvable in polynomial time. These observations lead to the design of the Goldwasser–Micali scheme.
The Goldwasser–Micali scheme works in the ring
, where n is the product of two distinct sufficiently large primes. The integer a (resp. b) in Algorithm 5.8 can be found by randomly choosing elements of
(resp.
) and computing the Legendre symbol
(resp.
). Under the assumption that quadratic non-residues are randomly located in
and
, a and b can be found after only a few trials. The integer x is a quadratic non-residue modulo n with
.
Goldwasser–Micali encryption (Algorithm 5.9) is probabilistic, since its output is dependent on a sequence of random elements ai of
. It generates a tuple (c1, . . . , cr) of elements of
such that each
. If mi = 0, then ci is a quadratic residue modulo n, whereas if mi = 1, ci is a quadratic non-residue modulo n. Therefore, if the quadratic residuosity of ci modulo n can be computed, the bit mi can be determined. If one (for example, the recipient) knows the factorization of n or equivalently the prime factor p of n, one can perform decryption easily. An eavesdropper, on the other hand, must solve the QRP (or the IFP) in order to find out the bits m1, . . . , mr. This is how Goldwasser–Micali encryption derives its security.
|
Output: A random Goldwasser–Micali key pair. Steps: Generate two (different) random primes p and q each of bit length l. n := pq. Find out integers a and b such that Compute an integer x with x ≡ a (mod p) and x ≡ b (mod q). /* Use CRT */ Return the pair (n, x) as the public key and the prime p as the private key. |
|
Input: The Goldwasser—Micali public key (n, x) of the recipient and the plaintext message m = m1 . . . mr, Output: The ciphertext message Steps: for i = 1, . . . , r { |
Since randomly chosen non-zero elements of
are with high probability coprime to n, it is sufficient to draw ai from
\{0} and skip the check whether gcd(ai, n) = 1. In fact, if an ai with gcd(ai, n) > 1 is somehow located, this gcd equals a non-trivial factor of n, and the security of the scheme is broken.
The Goldwasser–Micali scheme has the drawback that the length of the ciphertext message is much bigger than that of the plaintext message. Thus, for example, for a 1024-bit modulus n and a message m of bit length 64, the output requires a huge 65,536-bit space. This phenomenon is called message expansion and can be a serious limitation in certain circumstances.
Goldwasser–Micali decryption (Algorithm 5.10) recovers the bits of the plaintext message by computing Legendre symbols modulo the prime divisor p of n. The correctness of this decryption algorithm is evident from the discussion immediately following Algorithm 5.9.
|
Input: The Goldwasser—Micali private key p of the recipient and the ciphertext message Output: The recovered plaintext message m = m1, . . . , mr, Steps: for i = 1, . . . , r { |
The Blum–Goldwasser algorithm is another probabilistic encryption algorithm and is better than the Goldwasser–Micali algorithm in the sense that in this case the message expansion is by only a constant number of bits irrespective of the length of the plaintext message. The Blum–Goldwasser scheme is based on the intractability of the SQRTP (modulo a composite integer).
As in the case of the encryption algorithms discussed so far, the Blum–Goldwasser algorithm works in the ring
, where n = pq is the product of two distinct primes p and q. Now, we additionally demand p and q to be both congruent to 3 modulo 4.
|
Input: A bit length l. Output: A random Blum–Goldwasser key pair. Steps: Generate two (different) random primes p and q each of bit length l and each congruent to 3 mod 4. n := pq. Return n as the public key and the pair (p, q) as the private key. |
Since p and q are two different primes, there exist integers u and v such that up + vq = 1. In order to speed up decryption, it is often expedient to store u and v along with p and q in the private key. Recall that the solution of the congruences x ≡ a (mod p) and x ≡ b (mod q) is given by x ≡ vqa + upb (mod n).
The Blum–Goldwasser encryption algorithm assumes that the input plaintext message m is in the form of a bit string, and breaks m into substrings of a fixed length t. A typical choice for t is t = ⌊lg lg n⌋, where n is the public key of the recipient. Write m = m1 . . . mr, where each mi is a bit string of length t. The ciphertext consists of r bit strings c1, . . . , cr, each of bit length t, and an element
.
|
Input: The Blum–Goldwasser public key n of the recipient and the plaintext message m = m1 . . .mr, where each mi is a bit string of length t. Output: The ciphertext message (c1, . . . , cr, d), where each ci is a bit string of length t and Steps: Choose a random element d := d2 (mod n). |
Blum–Goldwasser encryption involves computation of r modular squares in
and is quite fast (for example, faster than RSA encryption with a general encryption exponent). It makes sense to assume that the initial choice of d is from
, since finding a non-zero non-invertible element of
is as difficult as factoring n.
For an intruder to determine the plaintext message m from the corresponding ciphertext message, the values of d inside the for loop are necessary. These can be obtained by taking repeated square roots modulo n. Since n is composite, this is a difficult problem. On the other hand, since the recipient knows the prime divisors p and q of n, taking square roots modulo n requires only polynomial-time effort.
Recall from Exercise 3.43 that a quadratic residue
(where n is the public key of the recipient) has four distinct square roots of which exactly one is again a quadratic residue modulo n. This distinguished square root y of d satisfies the congruences y ≡ d(p+1)/4 (mod p) and y ≡ d(q+1)/4 (mod q). In the decryption Algorithm 5.13, we assume that
.
Algorithm 5.13 assumes that each value of d is a quadratic residue modulo n. This can be verified by inserting in the for loop a check whether
, before an attempt is made to compute the square root of d modulo n. If (c1, . . . , cr, d) is a valid ciphertext message, this condition necessarily holds, and there is no fun wasting time for checking obvious things. However, if there is a possibility that d is altered by an (active) adversary (or corrupted during transmission), one may insert this check. In that case, the routine should report failure, when the square root of a quadratic non-residue modulo n is to be computed.
|
Input: The Blum–Goldwasser private key (p, q) of the recipient and the ciphertext message (c1, . . . , cr, d), where each ci is a bit string of length t and Output: The recovered plaintext message m = m1 . . . mr, where each mi is a bit string of length t. Steps: for i = r, r – 1, . . . , 1 { |
The ElGamal encryption algorithm works in a group G in which it is difficult to solve the Diffie–Hellman problem (DHP). Typical candidates for G include the multiplicative group
of a finite field
(usually q is a prime or a power of 2), the (additive) group of points on an elliptic curve over a finite field and the (additive) group (called the Jacobian) of reduced divisors on an hyperelliptic curve over a finite field. Here we assume that G is multiplicatively written and has order n. It is not necessary for G to be cyclic, but we should have at our disposal an element
with a suitably large (preferably prime) order k. We essentially work in the cyclic subgroup H of G generated by g (but using the arithmetic of G). For the ElGamal scheme, G (together with its representation), g, n and k are made public and can be shared by different entities on a network.
Generating a key pair for the ElGamal scheme (Algorithm 5.14) involves an exponentiation in G. In order to make the exponentiation efficient, the exponent (the private key) is often chosen to have a small number of 1 bits. However, if this number is too small, exhaustive search by an adversary may become feasible.
If the DLP can be solved in G, the private key d can be computed from the public key gd. This amounts to breaking a system based on this key pair. This is why we often say that the security of the ElGamal encryption scheme banks on the intractability of the DLP. But as we see shortly, the DHP is the more fundamental computational problem that dictates the security of ElGamal encryption.
|
Input: G, g and k as defined above. Output: A random ElGamal key pair. Steps: Generate a random integer d, 2 ≤ d ≤ k – 1. Return gd as the public key and d as the private key. |
Given a message
, the ElGamal encryption procedure (Algorithm 5.15) generates a pair (r, s) of elements of G as the ciphertext message and thus corresponds to message expansion by a factor of 2. Clearly, the sender has all the relevant information for computing (r, s). The need for using a different session key for each encryption is explained in Exercise 5.6.
|
Input: (G, g, k and) the ElGamal public key gd of the recipient and the plaintext message Output: The ciphertext message Steps: Generate a (random) session key d′, 2 ≤ d′ ≤ k – 1. r := gd′. s := mgdd′ = m(gd)d′. |
Notice that ElGamal encryption uses two exponentiations in G to exponents which are O(k). Therefore, the running time of Algorithm 5.15 reduces, if smaller values of k are selected. On the other hand, if k is too small, the square-root methods in H = 〈g〉 may become efficient (see Section 4.4.1). In practice, it is recommended that k be taken as a prime of length 160 bits or more.
ElGamal decryption involves an exponentiation in G to an exponent which is O(k). It is easy to verify that Algorithm 5.16 performs decryption correctly and that the recipient has the necessary information to carry out decryption.
|
Input: (G, g, k and) the ElGamal private key d of the recipient and the ciphertext message Output: The recovered plaintext message Steps: m := sr–d = srk–d. |
An eavesdropper Carol knows the domain parameters G, g, k and n and also the recipient’s public key gd. Determining the message m from a knowledge of the corresponding ciphertext (r, s) is then equivalent to computing the element gdd′. This implies that a (quick) solution of the DHP permits Carol to decrypt a ciphertext. If a (quick) solution of the DLP is available, then the element gdd′ is computable fast. The reverse implication is, however, not clear: it may be easier to solve the DHP than the DLP, though no concrete evidences are available to corroborate this fact.
The Chor–Rivest encryption algorithm is based on a variant of the subset sum problem. It selects a prime p and an integer h ≥ 2, uses a knapsack set A = {a0, . . . , ap–1} with 1 ≤ ai ≤ ph – 2 for each i, and considers sums of the form
,
, with
. In order to construct the set A for which the h-fold sum s is uniquely determined by the binary vector (∊0, . . . , ∊p–1) of weight h (that is, with exactly h bits equal to 1), we take the help of the finite field
. We represent
as
, where
is irreducible of degree h and where x is the residue class of X in
. The parameters p and h must be so chosen that ph –1 is reasonably smooth, so that the integer factorization of ph – 1 can be easily computed. This helps us in two ways. First, a generator g(x) of the multiplicative group
can be made available quickly using Algorithm 3.25. Second, the Pohlig–Hellman method of Section 4.4.1 becomes efficient for computing discrete logarithms in
. We can then take ai := indg(x)(x + i), i = 0, 1, . . . , p – 1. If (∊0, . . . , ∊p–1) and
are two binary vectors of weight h, then
implies
, that is,
, that is,
for all i = 0, . . . , p – 1 , since otherwise x would satisfy a non-zero polynomial of degree < h.
A randomly permuted version of a0, . . . , ap–1 shifted by a noise (that is, a random bias) d together with p and h constitute the public key of the Chor–Rivest scheme. The private key, on the other hand, comprises the polynomials f(X) and g(x), the permutation just mentioned and the noise d. Algorithm 5.17 elaborates the generation of such a key pair. The same values of p and h can be used by different entities on a network. So we assume that p and h are provided instead of generated by the recipient as a part of his public key. For brevity, we use the notation q := ph.
Key generation may be a long process in the Chor–Rivest scheme depending on how difficult it is to compute all the indexes indg(x)(x + i). Furthermore, the size of the public key is quite large, namely O(ph log p). Typically one may take p ≈ 200 and h ≈ 25. The original paper of Chor and Rivest [54] recommends the possibilities (197, 24), (211, 24), (243, 24) and (256, 25) for (p, h). Note that 256 is not a prime, but Chor–Rivest’s algorithm works, even when p is a power of a prime. For the sake of simplicity, we here stick to the case that p is a prime.
|
Input: A prime p and an integer h ≥ 2 such that ph – 1 is smooth. Output: A Chor–Rivest key pair. Steps: Choose an irreducible polynomial Use the representation Choose a random generator g(x) of Compute the indexes ai := indg(x)(x + i) for i = 0, 1, . . . , p – 1. Select a random permutation π of {0, 1, . . . , p – 1}. Select a random noise d in the range 0 ≤ d ≤ q – 2. Compute αi := aπ(i) + d (mod q – 1) for i = 0, 1, . . . , p – 1. Return (α0, α1 . . . , αp–1) as the public key and (f, g, π, d) as the private key. |
The Chor–Rivest encryption procedure (Algorithm 5.18) assumes that the input plaintext message is represented as a binary vector (m0, . . . , mp–1) of weight (that is, number of one-bits) equal to h. Since there are
such binary vectors, arbitrary binary strings of bit length
can be encoded into binary vectors of the above special form. See Chor and Rivest [54] for an algorithm that describes how such an encoding can be done. Chor–Rivest encryption is quite fast, since it computes only h integer additions modulo q – 1.
|
Input: The Chor–Rivest public key (α0, . . . , αp–1) (together with p and h) and the plaintext message (m0, . . . , mp–1) which is a binary vector of weight h. Output: The ciphertext message Steps:
|
The Chor–Rivest decryption procedure (Algorithm 5.19) generates a monic polynomial
of degree h, the h (distinct) roots of which gives the non-zero bits mi in the original plaintext message.
In order to prove that the decryption correctly works, note that
(mod q – 1) , so that
(mod f(X)). The polynomial u(X) is computed as one of degree < h. Adding f(X) to u(X) gives a monic polynomial v(X) of degree h, which is congruent modulo f(X) to
. The roots of v(X) can be obtained either by a root finding algorithm or by trial divisions of v(X) by X + i, i = 0, 1, . . . , p – 1. Applying the inverse of π on these roots then reconstructs the plaintext message.
|
Input: The Chor–Rivest private key (f, g, π, d) (together with p and h) and the ciphertext message Output: The recovered plaintext message (m0, . . . , mp–1) which is a binary vector of weight h. Steps: s := c – hd (mod q – 1). u(X) := g(X)s (mod f(X)). v(X) := f(X) + u(X). Factorize u(X) as u(X) = (X + i1)· · ·(X + ih), For i = 0, 1, . . . , p – 1 set |
An eavesdropper sees only the sum
(mod q – 1) of the (known) knapsack weights α0, . . . , αp–1. In order to recover m0, . . . , mp–1, she should solve the SSP. By choosing p and h carefully, the density of the knapsack set can be adjusted to be high, that is, larger than what the cryptanalytic routines described in Section 4.8 can handle. Thus, the Chor–Rivest scheme is assumed to be secure. However, as discussed in Chor and Rivest [54], the security of the system breaks down, when certain partial information on the private key are available.
XTR, a phonetic abbreviation of efficient and compact subgroup trace representation, is designed by Arjen Lenstra and Eric Verheul as an attractive alternative to RSA (and similar cryptosystems including the ElGamal scheme over finite fields) and elliptic curve cryptosystems (ECC). The attractiveness of XTR arises from the following facts:
XTR runs (about three times) faster than RSA or ECC.
XTR has shorter keys (comparable with ECC).
The security of XTR is based on the DLP/DHP over finite fields of sufficiently big sizes and not on a new allegedly difficult computational problem.
The parameter and key generation for XTR is orders of magnitude faster than that for RSA/ECC.
XTR, though not a fundamental breakthrough, deserves treatment in this chapter. The working of XTR is somewhat involved and we plan to present only a conceptual description of the algorithm, hiding the mathematical details.
XTR considers the following tower of field extensions:

where p ≡ 2 (mod 3) is a prime, sufficiently large so that computing discrete logs in
using known algorithms is infeasible. We have p6 – 1 = (p – 1)(p + 1)(p2 – p + 1)(p2 + p + 1). Let q be a prime divisor of p2 – p + 1 of bit length 160 or more. There is a unique subgroup G of
with #G = q. G is called the XTR (sub)group, whereas the entire group
is called the XTR supergroup. The XTR group G is cyclic (Lemma 2.1, p 27). Let g be a generator of G, that is, G = 〈g〉 = {1, g, g2, . . . , gq–1}.
The working of XTR is based on the discrete log problem in G. Since p2 – p + 1 and hence q are relatively prime to the orders of the multiplicative groups of all proper subfields of
, computing discrete logs in G is (seemingly) as difficult as that in
, that is, one gets the same level of security by the use of G instead of the full XTR supergroup.
The main technical innovation of XTR is the proposal of a compact representation of the elements of G in place of the obvious representation using ⌈6 lg p⌉ bits inherited from that of
. This is precisely where the intermediate field
comes into picture. We require a map
, so that we can represent elements of G by those of
. This map offers two benefits. First, the elements of G can now be represented using ⌈2 lg p⌉ bits leading to a three-fold reduction in the key size. Second, the arithmetic of
can be exploited to implement the arithmetic in G, thereby improving the efficiency of encryption and decryption routines (compared to those over the full XTR supergroup).
The map
uses the traces of elements of
over
(Definition 2.59). In this section, we use the shorthand notation Tr to stand for
. The conjugates of an element
over
are h, hp2, hp4 and so

Let us now specialize to
. Since p2 ≡ p – 1 (mod p2 – p + 1) and p4 ≡ –p (mod p2 – p + 1), the conjugates of h are gn, g(p–1)n, g–pn. Thus, Tr(gn) = gn + g(p–1)n + g–pn. Moreover,

so the minimal polynomial of h = gn over
is

This minimal polynomial is determined uniquely by Tr(gn) and so we can represent
by
. Note, however, that this representation is not unique, that is, the map
, is not injective. More precisely, the only elements of G that map to Tr(gn) are the conjugates gn, g(p–1)n, g–pn of gn. This is often not a serious problem, as we see below.
In order to complete the description of the implementation of the arithmetic of the group G, we need to address two further issues. This is necessary, since the trace representation
defined above is not a homomorphism of groups. First, we specify how one can implement the arithmetic of
. Since p ≡ 2 (mod 3), X2+X+1 is irreducible over
. If
is a root of X2 + X + 1, we have the standard representation
. That is, we can represent
. Since 1 + α + α2 = 0, we have y0 + y1α = (–α – α2)y0 + y1α = (y1 – y0)α + (–y0)α2. This leads to the non-standard representation

Since p ≡ 2 (mod 3) and α3 = 1 + (α – 1)(α2 + α + 1) = 1, the
-basis {α, α2} of
is the same as the normal basis {α, αp}. Under this basis, the basic arithmetic operations in
can be implemented using only a few multiplications (and some additions/subtractions) in
, as described in Table 5.1. Here, the operands are x = x1α + x2α2, y = y1α + y2α2 and z = z1α + z2α2.
| Operation | Number of multiplications |
| xp | 0 (since xp = x2α + x1α2.) |
| x2 | 2 (since x2 = x2(x2 – 2x1)α + x1(x1 – 2x2)α2.) |
| xy | 3 (since xy = (x2y2–x1y2–x2y1)α + (x1y1–x1y2–x2y1)α2, that is, it suffices to compute x1y1, x2y2, (x1 + x2)(y1 + y2).) |
| xz – yzp | 4 (since xz – yzp = (z1(y1 – x2 – y2) + z2(x2 – x1 + y2))α + (z1(x1 – x2 + y1) + z2(y2 – x1 – y1))α2.) |
Now, we explain how arithmetic operations in G translate to those in
under the representation of
by
. To start with, we show how the knowledge of Tr(h) and n allows one to compute Tr(hn) for
. This corresponds to an exponentiation in G. For
, define the polynomial

where h1, h2,
are the three roots (not necessarily distinct) of Fc(X). For
, we use the notation

Putting c = Tr(g) yields cn = Tr(gn), or, more generally, for c = Tr(gk) we have cn = Tr(gkn). Algorithm 5.20 computes

given
(for example, Tr(gk)) and
(typically
). The correctness of the algorithm is based on the following identities, the derivations of which are left to the reader (alternatively, see Lenstra and Verheul [170]).
Equation 5.1

Equation 5.2

Equation 5.3

Equation 5.4

Equation 5.5

Equation 5.6

Equation 5.7

Equation 5.8

|
Output: Steps: if (n < 0) {
/* Initialize */ |
A careful analysis suggests that the computation of cn from c requires 8 lg n multiplications in
. An exponentiation in
, on the other hand, requires an expected number of 23.4 lg n multiplications in
(assuming that, in
, the time for squaring is 80 per cent of that of multiplication). Thus, the XTR representation provides a speed-up of about 3.
The domain parameters for an XTR cryptosystem include primes p and q satisfying the following requirements:
|q| ≥ 160 (where |a| = ⌈lg a⌉ denotes the bit size of a positive integer a).
|p6| ≥ 1024.
p ≡ 2 (mod 3).
q|(p2 – p + 1).
We require a generator g of the XTR group G. Since we planned to replace working in G by working in
, the element g is not needed explicitly. The trace Tr(g) suffices for our purpose. Lenstra and Verheul [170, 172] describe several methods for obtaining the domain parameters p, q, Tr(g). We describe here the naivest strategies. Algorithm 5.21 outputs the primes p, q with |p| = lp and |q| = lq for some given lp,
.
|
Randomly choose Randomly choose |
Determination of Tr(g) for a suitable g requires some mathematics. First, notice that if the polynomial
is irreducible (over
) for some
, then c = Tr(h) for some
with ord h|(p2 – p + 1). Moreover, c(p2 –p+1)/q, if not equal to 3, is the trace of an element (for example, h(p2–p+1)/q) of order q. Thus, we may take Tr(g) = c(p2–p+1)/q. Although we do not need it explicitly, the corresponding
can be taken to be any root of the polynomial FTr(g)(X).
What remains to explain is how one can find an irreducible
. A randomized algorithm results from the fact that for a randomly chosen
the polynomial Fc(X) is irreducible with probability ≈ 1/3.
Once the domain parameters of an XTR system are set, the recipient chooses a random
and computes Tr(gd) using Algorithm 5.20. The tuple (p, q, Tr(g), Tr(gd)) is the public key and d the private key of the recipient.
XTR encryption (Algorithm 5.22) is very similar to ElGamal encryption. The only difference is that now we work in
under the trace representation of the elements of G, that is, one uses Algorithm 5.20 for computing exponentiations in G.
|
Input: The public key (p, q, Tr(g), Tr(gd)) of the recipient and the message Output: The ciphertext message Steps: Generate a random session key Compute r := Tr(gd′) using Algorithm 5.20 with c := Tr(g) and n := d′. Compute Tr(gdd′) using Algorithm 5.20 with c := Tr(gd) and n := d′. Set s := m Tr(gdd′). |
XTR decryption (Algorithm 5.23) is again analogous to ElGamal decryption except that we have to incorporate the XTR representation of elements of G.
|
Input: The private key d of the recipient and the ciphertext Output: The recovered plaintext message m. Steps: Compute Tr(gdd′) using Algorithm 5.20 with c := r = Tr(gd′) and n := d. Set |
Note that XTR encryption and decryption use Algorithm 5.20 for performing exponentiations. Therefore, these routines run about three times faster than the corresponding ElGamal routines based on the standard
arithmetic.
Hoffstein et al. [130] have proposed the NTRU encryption scheme in which encryption involves a mixing system using the polynomial algebra
and reductions modulo two relatively prime integers α and β. The decryption involves an unmixing system and can be proved to be correct with high probability. The security of this scheme banks on the interaction of the mixing system with the independence of the reductions modulo α and β. Attacks against NTRU based on the determination of short vectors in certain lattices are known. However, suitable choices of the parameters make NTRU resistant to these attacks. The most attractive feature of the NTRU scheme is that encryption and decryption in this case are much faster than those in other known schemes (like RSA, ECC and even XTR).
NTRU parameters include three positive integers n, α and β with gcd(α, β) = 1 and with β considerably larger than α (see Table 5.2). Consider the polynomial algebra
. An element of
is represented as a polynomial f = f0 + f1X + · · · + fn–1Xn–1 or, equivalently, as a vector (f0, f1, . . . , fn–1) of the coefficients. Note that Xn – 1 is not irreducible in
(for n ≥ 2) and so R is not a field, but that does not matter for the NTRU scheme. For two polynomials f, g of degree < n and with integer coefficients, we denote by f g the product of f and g in
, whereas f and g as elements of R multiplies to f ⊛ g = h with
| Security | n | α | β | νf | νg | νu |
| short-term | 107 | 3 | 64 | 15 | 12 | 5 |
| moderate | 167 | 3 | 128 | 61 | 20 | 18 |
| standard[*] | 263 | 3 | 128 | 50 | 24 | 16 |
| high | 503 | 3 | 256 | 216 | 72 | 55 |
[*] Assumed to be equivalent to 1024-bit RSA
NTRU works with polynomials having small coefficients. More specifically, we define the following subsets of R. The message space (that is, the set of plaintext messages)
consists of all polynomials of R with coefficients reduced modulo α. Unlike our representation of
so far, we use the integers between –α/2 and +α/2 to represent the coefficients of polynomials in
, that is,

For ν1,
, we also define the subset

of R. For suitably chosen parameters νf, νg and νu (see Table 5.2), we use the special notations:

With these notations we are now ready to describe the NTRU key generation routine. The subsets
,
,
and
are assumed to be public knowledge (along with the parameters n, α and β).
|
Input: n, α, β and Output: A random NTRU key pair. Steps: Choose /* f must be invertible modulo both α and β */ Compute fα and fβ satisfying fα ⊛ f ≡ 1 (mod α) and fβ ⊛ f ≡ 1 (mod β). h := fβ ⊛ g (mod β). Return h as the public key and f (along with fα) as the private key. |
The polynomial fα can be computed from f during decryption. However, for the sake of efficiency, it is recommended that fα be stored along with f.
The integers α and β are either small primes or small powers of small primes (Table 5.2). The most time-consuming step in the NTRU key generation procedure is the computation of the inverses fα and fβ. Suppose we want to compute the inverse of f in
, where p is a small prime and e is a small exponent (we may have e = 1). We first compute f(X)–1 in the ring
. Since p is a prime,
is a field, that is,
is a Euclidean domain (Exercise 2.31). We compute the extended Euclidean gcd of f(X) with Xn – 1. If f(X) and Xn – 1 are not coprime modulo p, then f(X) is not invertible in
, else we get
and s(X) is the inverse of f(X) in
. A randomly chosen f(X) with gcd(f(1), p) = 1 has high probability of being invertible modulo p. Recall that we have chosen
, so that f(1) = 1.
If e = 1, we have already computed the desired inverse of f(X). If e > 1, we have to lift the inverse fp(X) = u(X) of f(X) modulo p to the inverse fp2 (X) of f(X) modulo p2, and then to the inverse fp3 (X) of f(X) modulo p3, and so on. Eventually, we get the inverse fpe (X) of f(X) modulo pe. Here we describe the generic lift procedure of fpk (X) to fpk+1 (X). In the ring
, we have fpk ⊛ f ≡ 1 (mod pk). We can write fpk+1 (X) = fpk (X) + pka(X) for some
. Substituting this value in fpk+1 ⊛ f ≡ 1 (mod pk+1) gives the unknown polynomial a(X) as

where s(X) = fp(X) is the inverse of f modulo p.
It is often recommended that f(X) be taken of the form
for some
. In this case, fα(X) = 1 is trivially available and need not be computed as mentioned above. Such a choice of f also speeds up NTRU decryption (see Algorithm 5.26) by reducing the number of polynomial multiplications from two to one. The inverse fβ, however, has to be computed (but need not be stored).
For NTRU encryption (Algorithm 5.25), the message is encoded to a polynomial in
. The costliest step in this algorithm is computing the product u ⊛ h and can be done in time O(n2). Asymptotically better running time (O(n log n)) is achievable by Algorithm 5.25, if one uses faster polynomial multiplication routines (like those based on fast Fourier transforms). However, for the cryptographic range of values of n, straightforward quadratic multiplication gives better performance. Most other encryption schemes (like RSA) take time O(n3), where n is the size of the modulus. This explains why NTRU encryption is much faster than conventional encryption routines.
|
Input: (n, α, β and) the NTRU public key h of the recipient and the plaintext message Output: The ciphertext c which is a polynomial in R, reduced modulo β. Steps: Randomly select c := αu ⊛ h + m (mod β). |
NTRU decryption (Algorithm 5.26) involves two multiplications in R and runs in time O(n2). In order to prove the correctness of Algorithm 5.26, one needs to verify that v ≡ αu ⊛ g + f ⊛ m (mod β). With an appropriate choice of the parameters, it can be ensured that almost always the polynomial
has coefficients in the interval –β/2 and +β/2. In that case, we have the equality v = αu ⊛ g + f ⊛ m in R. Multiplication of v by fα and reduction modulo α now clearly retrieves m.
|
Input: The NTRU private key f (and fα) of the recipient and the ciphertext message c. Output: The recovered plaintext message Steps: v := f ⊛ c (mod β). /* The coefficients of v are chosen to lie between –β/2 and +β/2 */ m := fα ⊛ v (mod α). |
If f is chosen to be of the special form f = 1 + αf1 (for some polynomial f1), then v = αu ⊛ g + αf1 ⊛ m + m. Thus, reduction of v modulo α straightaway gives m, that is, there is no need to multiply v by fα. Also fα (having the trivial value 1) need not be stored in the private key. To sum up, taking f to be of the above special form increases the efficiency of the NTRU scheme without (seemingly) affecting its security. But now f is no longer an element of
and some care should be taken to choose suitable values of f.
NTRU decryption fails, usually when m is not properly centred (around 0). In that case, representing v as a polynomial with coefficients in the range –β/2 + x and +β/2 + x for a small positive or negative value of x may result in correct decryption. If, on the other hand, no values of x work, NTRU decryption cannot recover m easily and is said to suffer from a gap failure. For suitable parameter values, gap failures are very unlikely and can be ignored for all practical purposes.
Now, let us see how the NTRU system can be broken. In order to find out the private key f from the public key h = fβ ⊛ g, one may keep on searching for
exhaustively, until
. Alternatively, one may try all
, until
. In a similar manner, m can be retrieved from c by trying all
, until
. Clearly, such an attack takes expected time proportional to the size of
or
or
.
A baby-step–giant-step strategy reduces the running times to the square roots of the sizes of the above sets. For example, suppose we want to compute f from h. We split f = f1 + f2 into two nearly equal pieces f1 and f2. If n is odd, f1 may contain the (n + 1)/2 most significant terms and f2 the (n – 1)/2 least significant terms of f. Now, we compute (f2, –f2 ⊛ h (mod β)) for all possibilities of f2 and store the pairs sorted by the second component. Next, for each possibility of f1 (baby step) we compute f1 ⊛ h (mod β) and see if there is any f2 (giant step) for which f1 ⊛ h (mod β) and –f2 ⊛ h (mod β) have nearly equal values. If a matching pair (f1, f2) is located, we take f = f1 + f2. A similar method works for guessing m from c.
It is necessary to take the sets
,
and
big enough, so that exhaustive or square root attacks are not feasible. Typically, choosing the sizes of these sets to be ≥ 2160 is deemed sufficiently secure.
Another relevant attack is discussed in Exercise 5.11. By far, the most sophisticated attack on the NTRU encryption scheme is based on finding short vectors in a lattice. We describe this attack in connection with the computation of the private key f from a knowledge of the private key h. Let L denote the lattice in
generated by the rows of the 2n × 2n matrix:

where h = h0 + h1X + · · · + hn–1Xn–1 = (h0, h1, . . . , hn–1) and where λ is a parameter whose choice is discussed below. Since h = g ⊛ f–1 (mod β), multiplying the i-th row by fi–1 (i = 1, . . . , n) and adding we conclude that the vector v := (λf0, λf1, . . . , λfn–1, g0, g1, . . . , gn–1) is in L. By tuning the value λ, the attacker maximizes the chance for v to be a short vector in L. However, if the system parameters are appropriately selected, lattice reduction algorithms become rather ineffective in finding v. Heuristic evidences suggest that this attack runs in time exponential in n.
| 5.1 | Establish the correctness of Algorithm 5.4. |
| 5.2 |
|
| 5.3 |
|
| 5.4 | Assume that two parties Bob and Barbara share a common RSA modulus n but relatively prime encryption exponents e1 and e2. Alice encrypts the same message by (n, e1) and (n, e2) and sends the ciphertext messages to Bob and Barbara respectively. Suppose also that Carol intercepts both the ciphertexts. Describe a method by which Carol retrieves the (common) plaintext. [H] |
| 5.5 | Let n = pq be a Rabin public key and let be a quadratic residue modulo n. Show that the knowledge of the four square roots of c modulo n breaks the Rabin system.
|
| 5.6 | What is the disadvantage of using the same session key in the ElGamal encryption scheme for encrypting two different messages (for the same recipient)? [H] |
| 5.7 | Let p be an odd prime and g a generator of .
|
| 5.8 | Show that if the private-key parameters f(X) and d are known to a cryptanalyst of the Chor–Rivest scheme, she can recover the other parts of the private key and thus break the system completely. [H] |
| 5.9 | Show that if f(X) is only known to a cryptanalyst of the Chor–Rivest scheme, then also she can recover the full private key. [H] |
| 5.10 |
|
| 5.11 | In this exercise, we use the notations of Section 5.2.8. Assume that Alice encrypts the same message m several times using the NTRU public key h of Bob, but with different random polynomials , i = 1, . . . , r, and sends the corresponding ciphertext messages c1, . . . , cr. Describe a strategy how an eavesdropper Carol can recover a considerable part of u1. [H] Trying all the possibilities for the (relatively small) unknown part of u1 allows Carol to retrieve m with little effort.
|
Consider the scenario wherein two parties Alice and Bob want to share a secret information (say, a DES key for future correspondence), but it is not possible to communicate this secret by personal contact or by conversing over a secure channel. In other words, Alice and Bob want to arrive at a common secret value by communicating over a public (and hence insecure) channel. A key-exchange or a key-agreement protocol allows Alice and Bob to do so. The protocol should be such that an eavesdropper listening to the conversation between Alice and Bob cannot compute the secret value in feasible time.
Public-key technology is used to design a key-exchange protocol in the following way. Alice generates a key pair (eA, dA) and sends the public key eA to Bob. Similarly, Bob generates a random key pair (eB, dB) and sends the public key eB to Alice. Now, Alice and Bob respectively compute the values sA = f(eB, dA) and sB = f(eA, dB) using their respective knowledges, where f is a suitably chosen function. If sA = sB, then this value can be used as the shared secret between Alice and Bob. The intruder Carol can intercept eA and eB, but f should be such that a knowledge of eA and eB alone does not allow Carol to compute sA = sB. She needs dA or dB for this computation. Since (eA, dA) and (eB, dB) are key pairs, we assume that it is infeasible to compute dA from eA or dB from eB.
In what follows, we describe some key-exchange protocols. The security of these protocols is dependent on the intractability of the DHP (or the DLP). We provide a generic description, where we work in a finite Abelian multiplicative group G of order n. We write the identity of G as 1. G need not be cyclic, but we assume that an element
having suitably large (and preferably prime) multiplicative order m is provided. G, g, n and m may be made publicly available, but G should be a group in which one cannot compute discrete logarithms in feasible time. Typical examples of G are given in Section 5.2.5.
Basic key-exchange protocols provide provable security against passive attacks under the intractability of the DHP. However, several models of active attacks are known for the basic protocols. One requires authentication (validation of the public keys) to eliminate these attacks.
The Diffie–Hellman (DH) key-exchange algorithm [78] is one of the pioneering discoveries leading to the birth of public-key cryptography.
|
Input: G, g, n and m as defined above. Output: A secret element Steps: Alice generates a random Alice sends eA to Bob. Bob generates a random Bob sends eB to Alice. Alice computes s := (eB)dA = gdAdB. Bob computes s := (eA)dB = gdAdB. if (s = 1) { Return “failure”. } |
The DH scheme fails, if the shared secret turns out to be a trivial element (like the identity) of G. In that case, Alice and Bob should re-execute the protocol with different key pairs. The probability of such an incident is, however, extremely low.
The intruder Carol learns the group elements gdA and gdB by listening to the conversation between Alice and Bob and intends to compute s = gdAdB. Thus, she has to solve an instance of the DHP in the group G. By assumption, this is computationally infeasible. This is how the DH scheme derives its security.
A small-subgroup attack on the DH protocol can be mounted by an active adversary. Assume that the order m of g in G is composite and has known factorization m = uv with u small. Carol intercepts the messages between Alice and Bob, replaces them by their respective v-th powers and retransmits the modified messages.
|
Alice generates a random Alice transmits eA for Bob. Carol intercepts eA, computes Bob generates a random Bob transmits eB for Alice. Carol intercepts eB, computes Alice computes Bob computes if (s′ = 1) { Return “failure”. } |
But ord g = uv and so (s′)u = 1, that is, s′ has only u – 1 non-trivial values. Since u is small, the possibilities for s′ can be exhaustively searched by Carol. The best countermeasure against this attack is to take m to be a prime (of bit length ≥ 160).
Even when m is prime, it may be the case that the cofactor k := n/m has a small divisor u and it is possible that an active attacker intervenes in such a way that Alice and Bob agree upon a secret value of order (equal to or dividing) u. For example, Carol may replace both the transmitted public keys by an element h of order u. If dA and dB are congruent modulo u, the shared secret has only a few possible values and Carol can obtain the correct value by exhaustive search. On the other hand, if dA ≢ dB (mod u), Alice and Bob do not come up with the same secret. However, if Alice uses her secret to encrypt a message for Bob, it remains easy for Carol to decrypt the intercepted ciphertext by trying only a few choices for Alice’s key. Alice and Bob can prevent this attack by refusing to accept as the shared secret not only the trivial value s = 1 but also elements of small orders.
A small-subgroup attack can also be mounted by one of the communicating parties (say, Bob) in an attempt to gain information about the other’s (Alice’s) secret dA. Let us continue to assume that the cofactor k := n/m has a small divisor u. Bob finds an element h in G of order u. Instead of eB = gdB Bob now sends
h to Alice. Alice computes the shared secret as
. Bob, on the other hand, can normally compute sB := gdAdB. Now, suppose that Alice uses a symmetric cipher with the key
(or some part of it) and sends the ciphertext to Bob. In order to decrypt, Bob tries all of the u possible keys sBhj for j = 0, 1, . . . , u – 1. The value of j for which decryption succeeds equals dA modulo u. A similar attack can be mounted by Bob, when
is chosen to be an element (like h itself) of order u.
If G is cyclic and H is the subgroup generated by g, then an element
is in H if and only if am = 1 (Proposition 2.5, p 27). Moreover, if gcd(k, m) = 1, each communicating party can check the validity of the other party’s public key by using an m-th power exponentiation. An element like
h or h of the last paragraph does not pass this test. If so, Alice should abandon the protocol. However, the validation of the public key requires a modular exponentiation and thereby slows down the protocol.
We now present an efficient modification of the basic Diffie–Hellman scheme that prevents small-subgroup attacks (by a communicating party or an eavesdropper) without calculating an extra exponentiation. We continue with the notation k := n/m and assume that k is coprime to m. Now, the shared secret is computed as gdAdB or gkdAdB depending on whether compatibility with the original DH scheme is desired or not. Algorithm 5.29 describes the modified DH algorithm. Solve Exercise 5.12 in order to establish the effectiveness of this algorithm against small-subgroup attacks.
Other active attack models on the (basic or modified) DH protocol can be conceived of. One important class of attacks is now described.
An unknown key-share attack on a key-exchange protocol makes a party believe that (s)he shares a secret with another party, whereas the secret is actually shared by a third party. Assume that Carol can monitor and modify every message between Alice and Bob. When Alice and Bob execute Algorithm 5.27 or 5.29, Carol can intervene and pretend to Alice that she is Bob and to Bob that she is Alice. At the end of the protocol, Alice and Carol come up with a shared secret sAC, and Bob and Carol with another shared secret sBC. Alice believes that she shares sAC with Bob, and Bob believes that he shares sBC with Alice.
|
Input: G, g, n, m and k as defined above and a flag indicating compatibility with the original DH scheme. Output: A secret element Steps: Alice generates a random Alice sends eA to Bob. Bob generates a random Bob sends eB to Alice. if (compatibility with the original DH algorithm is desired) { |
Now, when Alice wants to send a secret message m to Bob, she encrypts m by sAC and transmits the ciphertext c. Carol intercepts c, decrypts it by sAC to retrieve m, encrypts m by sBC and sends the new ciphertext c′ to Bob. Bob retrieves m by decrypting c′ with his key sBC. The process raises hardly any suspicion in Alice or Bob about the existence of the mediating third party.
In order to avoid this attack, Alice and Bob should each validate the authenticity of the public key of the other party. Public-key certificates can be used to this effect. Unfortunately, using certificates alone may fail to eliminate unknown key-share attacks, as Algorithm 5.30 shows. At the end of this protocol Alice and Bob share a secret s, but Bob believes that he shares it with (the intruder) Carol. Here Carol herself cannot compute the shared secret s (provided that computing discrete logs in G is infeasible). Still there may be situations where this attack can be exploited (see Law et al. [161] for a hypothetical example).
This attack has two potential problems. Under the assumption of intractability of the DLP in G, Carol cannot compute the private key corresponding to the public key eC and so her getting the certificate CertC knowing eC alone may be questioned. Furthermore, replacing (eB, CertB) to ((eB)d, CertB) may make the certificate invalid. If we assume that a certificate authenticates only the entity and not the public key, then these objections can be overruled. In practice, however, a public key certificate should bind the public key to an entity (who can prove the knowledge of the corresponding private key) and so the above attack cannot be easily mounted. Nonetheless, the need for stronger authenticated key-exchange protocols is highlighted by the attack.
|
Alice generates a random Alice gets the certificate CertA on eA from the certifying authority. Alice transmits (eA, CertA) for Bob. Carol intercepts (eA, CertA). Carol chooses a random Carol gets the certificate CertC on eC := (eA)d from the certifying authority. Carol sends (eC, CertC) to Bob. Bob generates a random Bob gets the certificate CertB on eB from the certifying authority. Bob sends (eB, CertB) to Carol. Carol transmits ((eB)d, CertB) to Alice. Alice computes s = ((eB)d)dA = gddAdB. Bob computes s = (eC)dB = ((eA)d)dB = gddAdB. |
The Menezes–Qu–Vanstone (MQV) key-exchange protocol is an improved extension of the basic DH scheme, that incorporates public-key authentication. Though the achievement of the desired security goals by the MQV protocol does not seem to be provable, heuristic arguments suggest the effectiveness of the protocol against active adversaries.
Once again, let Alice and Bob be the two parties who plan to agree on a secret element
, where the domain parameters G, g, n and m are chosen as in the basic DH scheme. In the MQV scheme, each entity uses two key pairs, one of which ((EA, DA) for Alice and (EB, DB) for Bob) is called the static or the long-term key pair, whereas the other ((eA, dA) for Alice and (eB, dB) for Bob) is called the ephemeral or the short-term key pair. The static key is bound to an entity for a certain period of time and is used in every invocation of the MQV protocol during that period. On the other hand, each entity generates and uses a new ephemeral key pair during each invocation of the protocol. The static key of an entity is assumed to be authentic, say, certified by a trusted authority. The ephemeral key, on the other hand, is validated using the static private key.
Assume that there is a (publicly known) function
. Let l := ⌊lg m⌋ + 1 denote the bit length of m = ord g. For
, let
denote the integer
. The bit size of
is about half of that of m. In particular,
(mod m) for all
.
In the MQV protocol, Alice and Bob each computes the shared secret s = gσAσB, where
and
. Here the exponents σA and σB bear the implicit signatures of Alice and Bob, impressed by their respective static private keys. Alice can compute
, since she knows the static public key EB and the ephemeral public key eB of Bob. Similarly, Bob can compute
from a knowledge of the public keys EA and eA of Alice. We summarize the steps in Algorithm 5.31.
|
Input: G, g, n and m as defined above. Output: A secret element Steps: Alice obtains Bob’s static public key EB. Bob obtains Alice’s static public key EA. Alice generates a random integer dA, 2 ≤ dA ≤ m – 1, and computes eA := gdA. Alice sends eA to Bob. Bob generates a random integer dB, 2 ≤ dB ≤ m– 1, and computes eB := gdB. Bob sends eB to Alice. Alice computes Alice computes Bob computes Bob computes if (s = 1) { Return “failure”. } |
Each participating entity using the MQV protocol performs three exponentiations in G. Alice computes gdA,
and
, of which the first and the last ones have exponents O(m). On the other hand,
is
, so that the middle exponentiation is about twice as fast as a full exponentiation. This performance benefit justifies the use of
and
instead of eA and eB themselves. It appears that using these half-sized exponents does not affect security. Also note that
(mod m), which implies a non-zero contribution of the static key DA in the expression σA. Similarly for σB.
In order to guard against small-subgroup attacks, the MQV algorithm can incorporate the cofactor k := n/m, that is, assuming gcd(k, m) = 1, the shared secret would now be gσAσB or gkσAσB, depending on whether compatibility with the original MQV method is desired or not.
The MQV algorithm can be used in a situation when only one party, say, Alice, is capable of initiating a transmission to the other party (Bob). In that case, Bob’s static key pair is used also as his ephemeral key pair, that is, the secret element shared between Alice and Bob is
.
See Raymond and Stiglic [250] to know more about the security issues for the DH key agreement protocol and its variants.
| 5.12 | Let G be a multiplicative Abelian group of order n and with identity 1, H the subgroup of G generated by an element of order n, k := n/m and gcd(k, m) = 1. Further let a be a non-identity element of G.
|
| 5.13 | Write the MQV key-exchange protocol with cofactor exponentiation. |
| 5.14 | Provide the details of the Diffie–Hellman key-exchange algorithm based on the XTR representation (Section 5.2.7). |
Suppose an entity (Alice) is required to be bound to some electronic data (like messages or documents or keys). This binding is achieved by Alice digitally signing the data in such a way that no party other than Alice would be able to generate the signature. The signature should also be such that any entity can easily verify that it was Alice who generated the signature. Digital signatures can be realized using public-key techniques. The entity (Alice) generating a digital signature is called the signer, whereas anybody who wants to verify a signature is called a verifier.
We have seen in Section 5.2 how the encryption and decryption transforms fe, fd achieve confidentiality of sensitive data. If the set of all possible plaintext messages is the same as the set of all ciphertext messages and if fe and fd are bijective maps on that set, then the sequence of encryption and decryption can be reversed in order to realize a digital signature scheme. In order to sign m, Alice uses her private key d and the transform fd to generate s = fd(m, d). Any party who knows the corresponding public key e can recover m as m = fe(s, e). This is broadly how a signature scheme works. Depending on how the representative m is generated from the message M that Alice wants to sign, signature schemes can be classified in two categories.
In this case, one takes m = M. Verification involves getting back the message M. If M is assumed to be (the encoded version of) some human-readable text, then the recovered M = fe(s, e) will also be human-readable. If s is forged, that is, if a private key d′ ≠ d has been used to generate s′ = fd(m, d′), then verification using Alice’s public key yields m′ = fe(s′, e), and typically m′ ≠ m, since d′ and e are not matching keys. The resulting message m′ will, in general, make little or no sense to a human reader. If m is not a human-readable text, one adds some redundancy to it before signing. A forged signature yields m′ during verification, which, with high probability, is expected not to have this redundancy.
Attractive as it looks, it is not suitable if M is a long message. In that case, it is customary to break M into smaller pieces and sign each piece separately. Since public-key operations are slow, signature generation (and also verification) will be time-consuming, if there are too many pieces to sign (and verify). This difficulty is overcome using the second scheme described now.
In this scheme, a short representative m = H(M) of M is first computed.[2] The function H is usually chosen to be a hash function, that is, one which converts bit strings of arbitrary length to bit strings of a fixed length. H is assumed to be a public knowledge, that is, anybody who knows M can compute m. We also assume that H(M) can be computed fast for messages M of practical sizes. Alice uses the decryption transform on m to generate s = fd(m, d). The signature now becomes the pair (M, s). A verifier obtains Alice’s public key e and checks if H(M) = fe(s, e). The signature is taken to be valid if and only if equality holds. If a forger uses a private key d′ ≠ d, she generates a signature (M, s′), s′ = fd(m, d′), on M and a verifier expects with high probability the inequality H(M) ≠ fe(s′, e).
[2] If M is already a short message, one may go for taking m = M. In order to promote uniform treatment, we assume that the function H is always applied for the generation of m. Use of H is also desirable from the point of security considerations (Exercise 5.15).
A kind of forgery is possible on signature schemes with appendix. Assume that Alice creates a valid signature (M, s), s = fd(H(M), d), on a message M. The function H is certainly not injective, since its input space is much bigger (infinite) than its output space (finite). Suppose that Carol finds a message M′ ≠ M with H(M′) = H(M). In that case, the pair (M′, s) is a valid signature of Alice on the message M′, though it is not Alice who has generated it. (Indeed it has been generated without the knowledge of the private key d of Alice.) In order to foil such attacks, the function H should have second pre-image resistance. The first pre-image resistance and collision resistance properties of a hash functions also turn out to be important in the context of digital signatures. See Sections 1.2.6 and A.4 to know about hash functions.
We now describe some specific algorithms for (generating and verifying) digital signatures. Key pairs used for these algorithms are usually identical to those used for encryption algorithms of Section 5.2 and, therefore, we refrain from a duplicate description of the key-generation procedures. We focus our discussion only on signature schemes with appendix.
As in the RSA encryption scheme of Section 5.2.1, each entity generates an RSA modulus n = pq, which is the product of two distinct large primes p and q. A key pair consists of an encryption exponent e (the public key) and a decryption exponent d (the private key) satisfying ed ≡ 1 (mod φ(n)).
RSA signature generation involves a modular exponentiation in the ring
.
|
Input: A message M to be signed and the signer’s private key (n, d). Output: The signature (M, s) on M. Steps: m := H(M). /* |
Signature generation can be speeded up if the parameters p, q, d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p) are stored (secretly) in the private key. Now, one can use Algorithm 5.4 for signature generation.
The verification routine also involves a modular exponentiation in
.
|
Input: A signature (M, s) and the signer’s public key (n, e). Output: Verification status of the signature. Steps: m := H(M). /* |
Small values of e speed up RSA signature verification and are not known to make the scheme suffer from some special attacks. So the values of e like 3, 257 and 65,537 are quite recommended.
As in the Rabin encryption algorithm, we choose two distinct large primes p and q of nearly equal sizes and take n = pq. The public key is n, whereas the private key is the pair (p, q). The Rabin signature scheme is based on the intractability of computing square roots modulo n in absence of the knowledge of the prime factors p and q of n.
Rabin signature generation involves finding a quadratic residue m modulo n as a representative of the message M and computing a square root of m modulo n.
|
Input: A message M to be signed and the signer’s private key (p, q). Output: The signature (M, s) on M. Steps: m := H(M). /*
|
Verification (Algorithm 5.35) involves a square operation in
.
|
Input: A signature (M, s) and the signer’s public key n. Output: Verification status of the signature. Steps:
if else { Return “Signature not verified”. } |
The ElGamal signature algorithm is based on the intractability of computing discrete logarithms in certain groups G. For a general description, we consider an arbitrary (finite Abelian multiplicative) group G of order n. We assume that G is cyclic and that a generator g of G is provided. A key pair is obtained by selecting a random integer (the private key) d, 2 ≤ d ≤ n – 1, and then computing gd (the public key). The hash function H is assumed to convert arbitrary bit strings to elements of
. We further assume that the elements of G can be identified as bit strings (on which the hash function H can be directly applied). G (together with its representation), g and n are considered to be public knowledge and are not input to the signature generation and verification routines.
ElGamal signatures are generated as in Algorithm 5.36. The appendix consists of a pair
.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M. Steps: Generate a random session key d′, 2 ≤ d′ ≤ n – 1. s := gd′. t := d′–1 (H(M) – dH(s)) (mod n). |
The costliest step in the ElGamal signature generation algorithm is the exponentiation gd′. Here, G is assumed to be cyclic and the exponent d′ to be O(n). We will shortly see modifications of the ElGamal scheme in which the exponent can be chosen to be much smaller, namely O(r), where r is a suitably large (prime) divisor of n.
In order to forge a signature, Carol can generate a random session key (d′, gd′) and obtain s. For the computation of t, she requires the private key d of the signer. Conversely, if t (and d′) are available to Carol, she can easily compute the private key d. Thus, forging an ElGamal signature is equivalent to solving the DLP in G.
Each invocation of the ElGamal signature generation algorithm must use a new session key (d′, gd′). If the same session key (d′, gd′) is used to generate the signatures (M1, s1, t1) and (M2, s2, t2) on two different messages M1 and M2, then we have (t1 – t2)d′ ≡ H(M1) – H(M2) (mod n), whence d′ can be computed, provided that gcd(t1 – t2, n) = 1. If d′ is known, the private key d can be easily computed (see Exercise 5.6 for a similar situation).
ElGamal signature verification is described in Algorithm 5.37. This is based on the observation that for a (valid) ElGamal signature (M, s, t) on a message M we have
. This verification calls for three exponentiations in G to full-size exponents. Working in a suitable (cyclic) subgroup of G makes the algorithm more efficient.
|
Input: A signature (M, s, t) and the signer’s public key gd. Output: Verification status of the signature. Steps: a1 := gH(M). a2 := (gd)H(s)st. if (a1 = a2) { Return “Signature verified”. } else { Return “Signature not verified”. } |
ElGamal signatures use a congruence of the form A ≡ dB + d′C (mod n), and verification is done by checking the equality gA = (gd)BsC. Our choice for A, B and C was A = H(M), B = H(s) and C = t. Indeed, any permutation of H(M), H(s) and t are acceptable as A, B, C. These give rise to several variants of the ElGamal scheme. It is also allowed to take as A, B, C any permutation of H(M)H(s), t, 1 or H(M)H(s), H(M)t, 1 or H(M)H(s), H(s)t, 1 or H(M)t, H(s)t, 1. Permutations of H(M)H(t), H(s), 1 or H(M), H(s)t, 1, on the other hand, are known to have security bugs. For any allowed combination of A, B, C, the choices ±A, ±B, ±C are also valid. For some other variants, see Horster et al. [132].
The Schnorr signature scheme is a modification of the ElGamal scheme and is faster than the ElGamal scheme, since it works in a subgroup of G generated by g of small order. We assume that r := ord g is a prime (though it suffices to have ord g possessing a suitably large prime divisor). We suppose further that the elements of G are represented as bit strings and that we have a hash function H that maps bit strings to elements of
. A key pair now consists of an integer d (the private key), 2 ≤ d ≤ r – 1, and the element gd (the public key).
Schnorr signature generation is described in Algorithm 5.38.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M. Steps: Generate a random session key pair (d′, gd′), 2 ≤ d′ ≤ r – 1.
|
Similar to the ElGamal scheme, the most time-consuming step in this routine is the computation of the session public key gd′. But now d′ < r and, therefore, Algorithm 5.38 runs faster than Algorithm 5.36. One can easily check that forging a signature of Alice is computationally equivalent to determining Alice’s private key d from her public key gd. The importance of using a new session key pair in each run of Algorithm 5.38 is exactly the same as in the case of ElGamal signatures.
The verification of Schnorr signatures (Algorithm 5.39) is based upon the fact that gt = gd′(gd)–s. Thus, the knowledge of g, s, t and gd allows one to compute gd′ and subsequently H(M ‖gd′). The algorithm involves two exponentiations with both the exponents (t and s) being ≤ r. Thus, signature verification is also faster in the Schnorr scheme than in the ElGamal scheme.
|
Input: A signature (M, s, t) and the signer’s public key gd. Output: Verification status of the signature. Steps: u := gt(gd)s.
if else { Return “Signature not verified”. } |
The Nyberg–Rueppel (NR) signature algorithm is another adaptation of the ElGamal signature scheme and is based on the intractability of solving the DLP in a group G. We assume that ord G = n has a large prime divisor r and that an element
of order r is available. Here, a key pair is of the form (d, gd), where the private key d is an integer between 2 and r – 1 (both inclusive) and where the public key gd is an element of 〈g〉. The hash function H converts bit strings to elements of
. We also assume the existence of a (publicly known) function
.
NR signature generation can be performed as in Algorithm 5.40.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M. Steps: Generate a random session key pair (d′, gd′), 2 ≤ d′ ≤ r – 1. s := H(M) + F(gd′) (mod r). t := d′ – ds (mod r). |
The only difference between NR signature generation and Schnorr signature generation is the way how s is computed. Therefore, whatever we remarked in connection with the security and the efficiency of the Schnorr scheme applies equally well to the NR scheme. Signature verification is also very analogous, as Algorithm 5.41 explains.
|
Input: A signature (M, s, t) and the signer’s public key gd. Output: Verification status of the signature. Steps: u := gt(gd)s.
if else { Return “Signature not verified”. } |
The digital signature algorithm (DSA) has been proposed as a standard by the US National Institute of Standards and Technology (NIST) and later accepted as a Federal Information Processing Standard (FIPS) by the US government. This standard is also known as the digital signature standard (DSS). See the NIST document [220] for a complete description of this standard.
|
Input: An integer λ, 0 ≤ λ ≤ 8. Output: A prime p of bit length l := 512+64λ such that p – 1 has a prime divisor r of length 160 bits. Steps: Let l – 1 = 160n + b, 0 ≤ b < 160. /* n = (l–1) quot 160, b = (l–1) rem 160. */ |
DSA is based on the intractability of the DLP in the finite field
, where p is a prime of bit length 512+64λ with 0 ≤ λ ≤ 8. The cardinality p–1 of
is required to have a prime divisor r of length (exactly) 160 bits. The NIST document [220] specifies a standard method for obtaining such a field
, which we describe in Algorithm 5.42. We denote by H the SHA-1 hash function that converts bit strings of arbitrary length to bit strings of length 160. We will identify (often without explicit mention) the bit string a1a2. . . ak of length k with the integer a12k–1 + a22k–2 + · · · + ak–12 + ak.
The DSA prime generation procedure (Algorithm 5.42) starts by selecting the prime divisor r and then tries to find a prime p such that r|(p–1). The outputs of H are utilized as pseudorandomly generated bit strings of length 160.
Once the DSA parameters p and r are available, an element
of multiplicative order r can be computed by Algorithm 3.26. Henceforth we assume that p, r and g are public knowledge and need not be supplied as inputs to the signature generation and verification routines. A DSA key pair consists of an integer (the private key) d, 2 ≤ d ≤ r – 1, and the element gd (the public key) of
.
The DSA signature-generation procedure is given as Algorithm 5.43. One may additionally include a check whether s = 0 or t = 0, and, if so, one should repeat signature generation with another session key. But this, being an extremely rare phenomenon, can be ignored for all practical purposes. Both s and t are elements of
and hence are represented as integers between 0 and r – 1.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M. Steps: Generate a random session key d′, 2 ≤ d′ ≤ r – 1.
t := d′–1(H(M) + ds) (mod r). |
DSA signature verification is described in Algorithm 5.44. For a valid signature (M, s, t) on a message M, the algorithm computes w ≡ d′(H(M) + ds)–1 (mod r), w1 ≡ H(M)w (mod r) and w2 ≡sw (mod r). Therefore, gw1 (gd)w2 ≡ gw1+dw2 ≡ gw(H(M)+ds) ≡ gd′(H(M)+ds)–1 (H(M)+ds) ≡ gd′ (mod p). Reduction modulo r now gives
.
|
Input: A signature (M, s, t) and the signer’s public key gd. Output: Verification status of the signature. Steps: if ( w := t–1 (mod r). w1 := H(M)w (mod r). w2 := sw (mod r).
if else { Return “Signature not verified”. } |
DSA signature generation performs a single exponentiation and DSA verification does two exponentiations modulo p. All the exponents are positive and ≤ r. Thus, DSA is essentially as fast as the Schnorr scheme or the NR scheme.
The ECDSA is the elliptic curve analog of the DSA. Algorithm 5.45 describes the generation of the domain parameters necessary to set up an ECDSA system. One first selects a suitable finite field
and takes a random elliptic curve E over
. E must be such that the cardinality n of the group
has a suitably large prime divisor r. One generates a random point
of order r and works in the subgroup 〈P〉 of
generated by P. It is assumed that q is either a prime p or a power 2m of 2.
|
Input: A finite field Output: A set of parameters E, n, r, P for the ECDSA. Steps: while (1) { |
The order n = ord
can be computed using the SEA algorithm (for q = p) or the Satoh–FGH algorithm (for q = 2m) described in Section 3.6. The integer n should be factored to check if it has a prime divisor r > max(2160,
). The condition n (qk – 1) for small values of k is necessary to avoid the MOV attack, whereas the condition n ≠ q ensures that the SmartASS attack cannot be mounted.
is not necessarily a cyclic group. But, r being a prime, a point
must be one of order r.
An ECDSA key pair consists of a private key d (an integer in the range 2 ≤ d ≤ r – 1) and the corresponding public key
. H denotes the hash function SHA-1 that converts bit strings of arbitrary length to bit strings of length 160. As discussed in connection with DSA, we identify bit strings with integers. We also make an association of elements of
with integers in the set {0, 1, . . . , q – 1}. ECDSA signatures can be generated as in Algorithm 5.46. It is necessary to check the conditions s ≠ 0 and t ≠ 0. If these conditions are not both satisfied, one should re-run the procedure with a new session key pair.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M. Steps: Generate a random session key pair (d′, d′P), 2 ≤ d′ ≤ r – 1. /* Let us denote s := h (mod r). t := d′–1 (H(M) + ds) (mod r). |
ECDSA signature verification is explained in Algorithm 5.47. The correctness of this algorithm can be proved like that of Algorithm 5.44.
|
Input: A signature (M, s, t) and the signer’s public key dP. Output: Verification status of the signature. Steps: if ( w := t–1 (mod r). w1 := H(M)w (mod r). w2 := sw (mod r). Q := w1P + w2(dP). if ( /* Otherwise denote
if else { Return “Signature not verified”. } |
As discussed in Section 5.2.7, the XTR family of algorithms is an adaptation of other conventional algorithms over finite fields. XTR achieves a speed-up of about three using a clever way of representing elements in certain finite fields. It is no surprise that the DLP-based signature algorithms, described so far, can be given efficient XTR renderings. We explain here XTR–DSA, the XTR version of the digital signature algorithm.
In order to set up an XTR system, we need a prime p ≡ 2 (mod 3). The XTR group G is a subgroup of the multiplicative group
and has a prime order q dividing p2 – p + 1. For compliance with the original version of DSA, one requires q to be of bit length 160. The trace map
taking
is used to represent an element
by the element
. Under this representation, arithmetic in G translates to that in
. For example, we have seen how exponentiation in G can be efficiently implemented using
arithmetic (Algorithm 5.20). The trace Tr(g) of a generator g of G should also be made available for setting up the XTR domain parameters. In Section 5.2.7, we have discussed how a random set of XTR parameters (p, q, Tr(g)) can be computed.
An XTR key comprises a random integer
(the private key) and the trace
(the public key). Algorithm 5.20 is used to compute Tr(gd) from Tr(g) and d. This algorithm gives Tr(gd–1) and Tr(gd+1) as by-products. For an implementation of XTR–DSA, we require these two elements of
. So we assume that the public key consists of the three traces Sd(Tr(g)) = (Tr(gd–1), Tr(gd),
. As explained in Lenstra and Verheul [172], the values Tr(gd–1) and Tr(gd+1) can be computed easily from Tr(gd) even when d is unknown, so it suffices to store only Tr(gd) as the public key. But we avoid the details of this computation here and assume that all the three traces are available to the signature verifier.
Algorithm 5.20 provides an efficient way of computing exponentiations in G. For DSA-like signature verification (cf. Algorithm 5.44), one computes products of the form ga(gd)b with d unknown. In the XTR world, this amounts to computing the trace Tr(ga(gd)b) from the knowledge of a, b, Tr(g) and Tr(gd) (or Sd(Tr(g))) but without the knowledge of d. The XTR exponentiation algorithm is as such not applicable in such a situation. We should, therefore, prescribe a method to compute traces of products in G. Doing that requires some mathematics that we mention now without proofs. See Lenstra and Verheul [170] for the missing details.
Let e :=ab–1 (mod q). Then, a + bd ≡b(e + d) (mod q), that is, Tr(ga(gd)b) = Tr(gb(e+d)), that is, it is sufficient to compute Tr(ge+d) from the knowledge of e, Tr(g) and Tr(gd). We treat the 3-tuple Sk(Tr(g)) as a row vector (over
). For
, let Mc denote the matrix
Equation 5.9

We take c := Tr(g). It can be shown that det
, that is, the matrix MTr(g) is invertible, and we have:
Equation 5.10

Here the superscript t denotes the transpose of a matrix. With these observations, one can write the procedure for computing Tr(ga(gd)b) as in Algorithm 5.48.
|
Input: a, b, Tr(g) and Sd(Tr(g)) for some unknown d. Output: Tr(ga(gd)b). Steps: Compute e := ab–1 (mod q). |
XTR–DSA signature generation (Algorithm 5.49) is an obvious adaptation of Algorithm 5.43.
|
Input: A message M to be signed and the signer’s private key d. Output: The signature (M, s, t) on M with s, Steps: do { |
The bulk of the time taken by Algorithm 5.43 goes for the computation of Tr(gd′). Since the trace representation of XTR makes this exponentiation three times as efficient as the corresponding DSA exponentiation, XTR–DSA signature generation runs nearly three times as fast as DSA signature generation.
XTR–DSA signature verification can be easily translated from Algorithm 5.44 and is shown in Algorithm 5.50. The most costly step in the XTR–DSA verification routine is the computation of Tr(gw1 (gd)w2). One uses Algorithm 5.48 for this purpose. This algorithm, in turn, invokes the exponentiation Algorithm 5.20 twice. For the original DSA signature verification (Algorithm 5.44), the costliest step is the computation of gw1 (gd)w2, which involves two exponentiations and a (cheap) multiplication. A careful analysis shows that XTR–DSA signature verification runs nearly 1.75 times faster than DSA verification.
|
Input: XTR–DSA signature (M, s, t) on a message M and the signer’s public key (Tr(gd–1), Tr(gd), Tr(gd+1)). Output: Verification status of the signature. Steps: if w := t–1 (mod q). w1 := H(M)w (mod q). w2 := sw (mod q).
if else { Return “Signature not verified”. } |
The NTRU Signature Scheme (NSS) (Hoffstein et al. [131]) is an adaptation of the NTRU encryption algorithm discussed in Section 5.2.8. Cryptanalytic studies (Gentry et al. [110]) show that the NSS has security flaws. A newer version of the NSS, referred to as NTRUSign and resistant to these attacks, has been proposed by Hoffstein et al. [128]. In this section, we provide a brief overview of NTRUSign.
In order to set up the domain parameters for NTRUSign, we start with an
and consider the ring
. Elements of R are polynomials with integer coefficients and of degrees ≤ n – 1. The multiplication of R is denoted by ⊛, which is essentially the multiplication of two polynomials of
followed by setting Xn = 1. We also fix a positive integer β to be used as a modulus for the coefficients of the polynomials in R. The subsets
and
of R are of importance for the NTRUSign algorithm, where for
one defines
, and where νf and νg are suitably chosen parameters. The message space
is assumed to consist of pairs of polynomials of R with coefficients reduced modulo β. We further assume that we have at our disposal a hash function H that maps messages (that is, binary strings) to elements of
.
Let
. The average of the coefficients of a is denoted by
. The centred norm ‖a‖ of a is defined by

For two polynomials a,
, one also defines
‖(a, b)‖2 := ‖a‖2 + ‖b‖2.
The parameters νf and νg should be so chosen that any polynomial
and any polynomial
have (centred) norms on the order O(n). An upper bound B on the norms (of pairs of polynomials) should also be predetermined.
Typical values for NTRUSign parameters are
(n, β, νf, νg, B) = (251, 128, 73, 71, 300).
It is estimated that these choices lead to a security level at least as high as in an RSA scheme with a 1024-bit modulus. For very long-term security, one may go for (n, β) = (503, 256).
In order to set up a key pair, the signer first chooses two random polynomials
and
. The polynomial f should be invertible modulo β and the signer computes
with the property that fβ ⊛ f ≡ 1 (mod β). The public key of the signer is the polynomial h ≡ fβ ⊛ g (mod β), whereas the private key is the tuple (f, g, F, G), where F and G are two polynomials in R satisfying
| f ⊛ G – g ⊛ F = q | and | ‖F‖, ‖G‖ = O(n). |
Hoffstein et al. [128] present an algorithm to compute F and G with ‖F‖,
from polynomials f and g with ‖f‖,
, where c is a given constant.
|
Input: A message M to be signed and the signer’s private key (f, g, F, G). Output: The signature (M, s) on M. Steps: Compute Compute polynomials A, B, a,
where a and A have coefficients in the range between –β/2 and +β/2. Compute s ≡ f ⊛ B + F ⊛ b (mod β). |
NTRUSign signature generation is described in Algorithm 5.51. It is apparent that the NTRUSign algorithm derives its security from the difficulty in computing a vector v in a certain lattice, close to the vector defined by the hashed message (m1, m2). For defining the lattice, we first note that a polynomial
can be identified as a vector (u0, u1, . . . , un–1) of dimension n defined by its coefficients. Similarly, two polynomials u,
define a vector, denoted by (u, v), of dimension 2n. To the public key h we associate the 2n-dimensional lattice

It is clear from the definitions that both (f, g) and (F, G) are in Lh.
If h = (h0, h1, . . . , hn–1), then for each i = 0, 1, . . . , n – 1 we have
| Xi ⊛ h(X) | ≡ | (hn–i, . . . , hn–1, h0, . . . , hn–i–1) (mod β) and |
| 0 ⊛ h(X) | ≡ | βXi (mod β). |
It follows immediately that Lh is generated by the rows of the matrix

Now, consider the signature generation routine (Algorithm 5.51). The hash function H generates from the message M a random 2n-dimensional vector m := (m1, m2) not necessarily on Lh. We then look at the vector v := (s, t) defined as:
| s | ≡ | f ⊛ B + F ⊛ b (mod β), and |
| t | ≡ | g ⊛ B + G ⊛ b (mod β). |
The lattice Lh has the rotational invariance property, namely, if
, then (Xi ⊛ u, Xi ⊛ v) is also in Lh for all i = 0, 1, . . . , n – 1. More generally, if
, then
for any polynomial
. In particular, since v = (s, t) = B ⊛ (f, g) + b ⊛ (F, G) (mod β) and since (f, g),
, it follows that
. Of these two polynomials only s is needed for the generation of NTRUSign signatures. The other is needed during signature verification and can be computed easily from s using the formula t ≡ h ⊛ s (mod β), the validity of which is established from the definition of the lattice Lh.
The vector
is close to the message vector m in the sense that

for the constant c chosen earlier (see Hoffstein et al. [128] for a proof of this relation). The verification routine can, therefore, be designed as in Algorithm 5.52.
|
Input: A signature (M, s) and the signer’s public key h. Output: Verification status of the signature. Steps: Compute Compute t ≡ h ⊛ s (mod β). if (‖(m1 – s, m2 – t)‖ ≤ B) { Return “Signature verified”. } else { Return “Signature not verified”. } |
For the choice (n, β, c) = (251, 128, 0.45), we have ‖(m1 – s, m2 – t)‖ ≈ 216. Therefore, choosing the norm bound B slightly larger than this value (say, B = 300) allows the verification scheme to work correctly most of the time. The knowledge of the private key (f, g, F, G) allows the legitimate signer to compute the close vector (s, t) easily. On the other hand, for a forger (who is lacking the private information) fast computation of a vector v′ = (s′, t′) with small norm ‖(m1 – s′, m2 – t′)‖ (say ≤ 400 for the above parameter values) seems to be an intractable task. This is precisely why forging an NTRUSign signature is considered infeasible.
An exhaustive search can be mounted for generating a valid signature (s′, t′) on a message M with H(M) = (m1, m2). More precisely, a forger fixes half of the 2n coefficients of the polynomials s′ and t′ and then tries to solve t′ ≡ h ⊛ s′ (mod β) for the remaining half such that the norm ‖(m1 – s′, m2 – t′)‖ is small. It is estimated (see Hoffstein et al. [128] for the details) that the probability that a random guess for the unknown half succeeds is very low (≤ 2–178.44 for the given parameter values).
Another attack on the NTRUSign scheme is to determine the polynomials f, g from a knowledge of h. Since (f, g) is a short non-zero vector in the lattice Lh, an algorithm that can find such vectors can determine (f, g) (or a rotated version of it). However, for a proper choice of the parameters such an algorithm is deemed infeasible. (Also see the NTRU encryption scheme in Section 5.2.8.)
Similar to the NTRU encryption scheme, the NTRUSign scheme is fast, namely, both signature generation and verification can be carried out in time O(n2). This is one of the main reasons why the NTRUSign scheme deserves popularity. Indeed, it may be adopted as an IEEE standard. Unfortunately, however, several attacks on NTRUSign are known. Gentry and Szydlo [111] indicate the possibility of extending the attacks of Gentry et al. [110]. Nguyen [217] proposes a more concrete attack on NTRUSign, that is capable of recovering the private key from only 400 signatures. The future of NTRUSign and its modifications remains uncertain.
Suppose that an entity (Alice) referred to as the sender or the user, wants to get a message M signed by a second entity (Bob) called the signer, without revealing M to Bob. This can be achieved as follows. First Alice transforms the message M to
and sends
to Bob. Bob generates the signature (
, σ) on
and sends this pair back to Alice. Finally, Alice applies a second transform g to generate the signature
of Bob on M. The transform f hides the actual message M from Bob and, thereby, disallows Bob from associating Alice with the signed message (M, s). Such a signature scheme is called a blind signature scheme.
Blind signatures are widely used in electronic payment systems in which Alice (a customer) wants the signature of Bob (the bank) on an electronic coin, but does not want the bank to be capable of associating Alice with the coin. In this way, Alice achieves anonymity while spending an electronic coin.
In a blind signature scheme, Bob does not know M, but his signature on
is essential for Alice to reconstruct the signature on M. Furthermore, the blind signature on M should not allow Alice to compute the blind signature on another message M′. More generally, Alice should not be able to generate l + 1 (or more) blind signatures with only l (or fewer) interactions with Bob. A forgery of this kind is often called an (l, l + 1) forgery or a one-more forgery (in case l is bounded above by a polynomial in the security parameter) or a strong one-more forgery (in case l is bounded above poly-logarithmically in the security parameter). An (l, l + 1) forgery is mountable on a scheme which is not existentially unforgeable (Exercises 5.15 and 5.19). Usually, existential forgery gives forged signatures on messages over which the forger has no (or little) control (that is, on messages which are likely to be meaningless).
Now, we describe some common blind signature schemes. We provide a brief overview of the algorithms. Detailed analysis of the security of these schemes can be found in the references cited at the end of this chapter.
Chaum’s blind signature protocol is based on the intractability of the RSAP (or the IFP). The signer generates two (distinct) large random primes p and q and computes n := pq. He then chooses a random integer e with gcd(e, φ(n)) = 1 and computes an integer d such that ed ≡ 1 (mod φ(n)). The public key (of the signer) is the pair (n, e), whereas the private key is d. Chaum’s protocol works as in Algorithm 5.53.
|
Input: A message M generated by Alice. Output: Bob’s blind RSA signature (M, s) on M. Steps: Alice hashes the message M to Alice chooses a random Alice sends Bob generates the signature Bob sends σ to Alice. Alice computes Bob’s (blind) signature s := ρ–1σ (mod n) on M. |
Since σ ≡ (ρem)d ≡ρmd (mod n), we have s ≡ρ–1σ ≡ md (mod n), that is, s is indeed the RSA signature of Bob on M. Bob receives
and gains no idea about m, since ρ is randomly and secretly chosen by Alice.
Let G be a finite multiplicative Abelian group and let
be of order r (a large prime). We assume that computing discrete logarithms in G is an infeasible task. The key pair of the signer is denoted by (d, gd), where the integer d, 2 ≤ d ≤ r – 1, is the private key and gd the public key. The Schnorr blind signature protocol is described in Algorithm 5.54.
|
Input: A message M generated by Alice. Output: Bob’s blind Schnorr signature (M, s, t) on M. Steps: Alice asks Bob to initiate a communication. Bob chooses a random Bob sends Alice selects α, Alice computes Alice computes Alice sends Bob computes Bob sends Alice computes |
It is easy to check that the output (M, s, t) of Algorithm 5.54 is a valid Schnorr signature of Bob on the message M. The session key d′ (Algorithm 5.38) for this signature is
. Since d and
are secret knowledges of Bob, Alice must depend on Bob for the computation of
. The message M is never sent to Bob. Also its hash is masked by β. This is how this protocol achieves blindness.
Okamoto’s adaptation of the Schnorr scheme is proved to be resistant to an attack by a third entity (Pointcheval and Stern [237]). As in the Schnorr scheme, we fix a (finite multiplicative Abelian) group G (in which it is difficult to compute discrete logarithms). We then choose two elements g1,
of (large prime) order r. The private key of the signer now comprises a pair (d1, d2) of integers in {2, . . . , r – 1}, whereas the public key y is the group element
. We assume that there is a hash function H whose outputs are in
. We identify elements of G as bit strings. The Okamoto–Schnorr blind signature protocol is explained in Algorithm 5.55.
|
Input: A message M generated by Alice. Output: Bob’s blind signature (M, s1, s2, s3) on M. Steps: Alice asks Bob to initiate a communication. Bob chooses random Bob sends Alice selects α, β, Alice computes Alice computes Alice sends Bob computes Bob sends Alice computes |
An Okamoto–Schnorr signature (M, s1, s2, s3) on a message can be verified by checking the equality s1 = H(M‖u), where
. Each invocation of the protocol uses a session private key
. Alice must depend on Bob for generating s2 and s3, because she is unaware of the private values d1, d2,
and
. Alice, in an attempt to forge Bob’s blind signature, may start with random
and
of her choice. But she still needs the integers d1 and d2 in order to complete the protocol. The blindness of Algorithm 5.55 stems from the fact that the message M is never sent to Bob and its hash is masked by γ.
So far we have seen signature schemes for which any entity with a knowledge of the signer’s public key can verify the authenticity of a signature. There are, however, situations where an active participation of the signer is necessary for the verification of a signature. Moreover, during a verification interaction a signer should not be allowed to deny a legitimate signature made by him. A signature meeting these requirements is called an undeniable signature.
Undeniable signatures are typically used for messages that are too confidential or private to be given unlimited verification facility. In case of a dispute, an entity should be capable of proving a forged signature to be so and at the same time must accept the binding to his own valid signatures. So in addition to the signature generation and verification protocols, an undeniable signature scheme comes with a denial or disavowal protocol to guard against a cheating signer that is unwilling to accept his valid signature either by not taking part in the verification interaction or by responding incorrectly or by claiming a valid signature to be forged.
There are applications where undeniable signatures are useful. For example, a software vendor can use undeniable signatures to prove the authenticity of its products only to its (paying) customers (and not to everybody).
Chaum and van Antwerpen gave a first concrete realization of an undeniable signature scheme [52, 51]. It is based on the intractability of computing discrete logs in the group
, p a prime. Gennaro et al. [109] later adapted the algorithm to design an RSA-based undeniable signature scheme. We now describe these two schemes. Rigorous studies of these schemes can be found in the original papers. See also [53, 186, 187, 102, 202, 230].
For setting up the domain parameters for Chaum–Van Antwerpen (CvA) signatures, Bob chooses a (large) prime p of the form p = 2r + 1, where r is also a prime. (Such a prime p is called a safe prime (Definition 3.5).) Bob finds a random element
of multiplicative order r, selects a random integer
and computes y := gd (mod p). Bob publishes (p, g, y) as his public key and keeps the integer d secret as his private key. The value d–1 (mod r) is needed during verification and can be precomputed and stored (secretly) along with d. We assume that we have a hash function H that maps messages (that is, bit strings) to elements of the subgroup of order r in
. In order to generate a CvA signature on a message M, Bob carries out the steps given in Algorithm 5.56. Verification of Bob’s CvA signature by Alice involves the interaction given in Algorithm 5.57.
|
Input: The message M to be signed and the signer’s private key (p, d). Output: The signature (M, s) on M. Steps: m := H(M). s := md (mod p). |
If (M, s) is a valid CvA signature, then
v ≡ (siyj)d–1 (mod r) ≡ ((md)i(gd)j)d–1 (mod r) ≡ migj ≡ v′ (mod p).
On the other hand, if s ≢ md (mod p), Bob can guess the element v′ with a probability of only 1/r, even under the assumption that Bob has unbounded computing resources. This means that unless the signature (M, s) is valid, it is extremely unlikely that Bob can make Alice accept the signature.
The denial protocol for the CvA scheme involves an interaction between the prover Bob and the verifier Alice, as given in Algorithm 5.58. In order to see how this denial protocol works, we note that Algorithm 5.58 essentially makes two calls of the verification protocol. First assume that Bob executes the protocol honestly, that is, Bob follows the steps as indicated. If the signature (M, s) is a valid one, the check v1 ≡ mi1 gj1 (mod p) (as well as the check v2 ≡ mi2 gj2 (mod p)) should succeed and Alice’s decision to accept the signature as valid is justified. On the other hand, if (M, s) is a forged signature, that is, if s ≢ md (mod p), then the probability that each of these checks succeeds is 1/r as discussed before. Thus, it is extremely unlikely that a forged signature is accepted as valid by Alice. So Alice eventually computes both w1 and w2 equal to si1 i2d–1 (mod r) (mod p) and accepts the signature to be forged. Finally, suppose that Bob is intending to deny the (purported) signature (M, s). If Bob does not fully take part in the interaction, then his intention becomes clear. Otherwise, he sends v1 and/or v2 not computed according to the formulas specified. In that case, Bob succeeds in making Alice compute w1 = w2 with a probability of only 1/r. Thus, it is extremely unlikely that Bob executing this protocol dishonestly can successfully disavow a valid signature.
|
Input: A CvA signature (M, s) on a message M. Output: Verification status of the signature. Steps: Alice computes m := H(M). Alice chooses two secret random integers i, Alice computes u := siyj (mod p). Alice sends u to Bob. Bob computes v := ud–1 (mod r) (mod p). Bob sends v to Alice. Alice computes v′ := migj (mod p). Alice accepts the signature (M, s) if and only if v = v′. |
|
Input: A (purported) CvA signature (M, s) of Bob on a message M. Output: One of the following decisions by Alice:
Steps: Alice computes m := H(M). Alice chooses two secret random integers i1, Alice computes u1 := si1 yj1 (mod p) and sends u1 to Bob. Bob computes if (v1 ≡ mi1 gj1 (mod p)) { Alice chooses two other secret random integers i2, Alice computes u2 := si2 yj2 (mod p) and sends u2 to Bob. Bob computes if (v2 ≡ mi2 gj2 (mod p)) { Alice computes w1 := (v1g–j1)i2 (mod p) and w2 := (v2g–j2)i1 (mod p). if (w1 = w2) { |
Gennaro, Krawczyk and Rabin’s undeniable signature scheme (the GKR scheme) is based on the (intractability of the) RSA problem.
A GKR key pair differs from a usual RSA key pair. The signer chooses two (large) random primes p and q such that both p′ := (p – 1)/2 and q′ := (q – 1)/2 are also prime, and sets n := pq. Two integers e and d satisfying ed ≡ 1 (mod φ(n)) are then selected. Finally, one requires a
, g ≠ 1, and y ≡ gd (mod n). The public key of the signer is the tuple (n, g, y), whereas the private key is the pair (e, d). It can be shown that g need not be a random element of
. Choosing a (fixed) small value of g (for example, g = 2) does not affect the security of the GKR protocol, but makes certain operations (computing powers of g) efficient.
|
Input: The message M to be signed and the signer’s private key (e, d). Output: The signature (M, s) on M. Steps:
|
GKR signature generation (Algorithm 5.59) is the same as in RSA. The verification protocol described in Algorithm 5.60 accepts, in addition to a valid GKR signature (M, s), the signatures (M, αs), where
has multiplicative order 1 or 2 (there are four such values of α). In view of this, we define the subset

of
. Any element
is considered to be a valid signature on M. Since Bob knows p and q, he can easily find out all the elements α of
of order ≤ 2 and can choose to output (M, αH(M)d) as the GKR signature for any such α. Taking α = 1 (as in Algorithm 5.59) is the canonical choice, but during the execution of the denial protocol Bob will not be allowed to disavow other valid choices.
The interaction between the prover Bob and the verifier Alice during GKR signature verification is given in Algorithm 5.60. It is easy to see that if (M, s) is a valid GKR signature, then v = v′. On the other hand, if (M, s) is a forged signature, that is, if s ∉ Sig M, then the equality v = v′ occurs with a probability of
, even in the case that the forger has unbounded computational resources.
|
Input: A GKR signature (M, s) on a message M. Output: Verification status of the signature. Steps: Alice computes m := H(M). Alice chooses random i, Alice computes u := s2iyj (mod n). Alice sends u to Bob. Bob computes v := ue (mod n). Bob sends v to Alice. Alice computes v′ := m2igj (mod n). Alice accepts the signature (M, s) if and only if v = v′. |
|
Input: A (purported) GKR signature (M, s) of Bob on a message M. Output: One of the following decisions by Alice:
Steps: Alice computes m := H(M). Alice chooses random Alice computes w1 := migj (mod n) and w2 := siyj (mod n). Alice sends (w1, w2) to Bob. Bob computes m := H(M). Bob determines Equation 5.11
if (no such i′ is found) { /* This may happen, if Alice has cheated */ |
The denial protocol for the GKR scheme is described in Algorithm 5.61. This protocol is executed, after verification by Algorithm 5.60 fails. In that case, Alice wants to ascertain whether the signature is actually invalid or Bob has denied his valid signature by incorrectly executing the verification protocol. A small integer k is predetermined for the denial protocol. The prover needs a running time proportional to k, whereas the probability of a successful denial of a valid signature decreases with k. Taking k = O(lg n) gives optimal performance.
In order to see how this protocol prevents Bob from denying a valid signature, first consider the case that (M, s) is a valid GKR signature of Bob. In that case,
. On the other hand, se ≡ αemde ≡ αem (mod n). Therefore, for every
, one has
. Thus, Bob can only guess the secret value of i chosen by Alice and the guess is correct with a probability of 1/k. On the other hand, if (M, s) is a forged signature, Congruence (5.11) holds only for a single i′, that is, for i′ = i (Exercise 5.23). Sending i′ will then convince Alice that the signature is really forged. In both these cases, Congruence (5.11) holds for at least one i′. Failure to detect such an i′ implies that the value(s) of w1 and/or w2 have not been correctly sent by Alice. The protocol should then be aborted.
In order to reduce the probability of successful cheating, it is convenient to repeat the protocol few times instead of increasing k. If k = 1024, Bob can successfully cheat in eight executions of the denial protocol with a probability of only 2–80.
The conventional way to ensure both authentication and confidentiality of a message is to sign the message first and then encrypt the signed message. Now that we have many signature and encryption algorithms in our bag, there is hardly any problem in achieving both the goals simultaneously. Zheng proposes signcryption schemes that combine these two operations together. A signcryption scheme is better than a sign-and-encrypt scheme in two aspects. First, the combined primitive takes less running time than the composite primitive comprising signature generation followed by encryption. Second, a signcrypted message is of smaller size than a signed-and-encrypted message. When communication overheads need to be minimized, signcryption proves to be useful.
Before describing the signcryption primitive, let us first review the composite sign-and-encrypt scheme. Let M be the message to be sent. Alice the sender generates the signature appendix s on M using one of the signature schemes described earlier. This step can be described as s = fs(M, da), where da is the private key of Alice. Next a symmetric key k is generated by Alice. The message M is encrypted by a symmetric cipher (like DES) under the key k, that is, C := E(M, k). The key k is then encrypted using an asymmetric routine under the public-key eb of Bob the recipient, that is, c = fe(k, eb). The triple (C, c, s) is then transmitted to Bob.
Upon reception of (C, c, s) Bob first retrieves k using his private key db, that is, k = fd(c, db). The message M is then recovered by symmetric decryption: M = D(C, k). Finally, the authenticity of M is verified from the signature using the verification operation: fv(M, s, ea), where ea is the public key of Alice. Algorithm 5.62 describes the sign-and-encrypt operation and its inverse.
|
Generate a random symmetric key k. c := fe(k, eb). C := E(M, k). Send (C, c, s) to the recipient. Decrypt-and-verify k := fd(c, db). M := D(C, k). Verify the signature: fv(M, s, ea). |
Zheng’s signcryption scheme combines fs and fe to a single operation fse and also fd and fv to another single operation fdv. Each of these combined operations essentially takes the time of a single public- or private-key operation and hence leads to a performance enhancement by a factor of nearly two. Moreover, the encrypted key c need not be sent with the message, that is, C and s are sufficient for both authentication and confidentiality. This reduces communication overhead.
Signcryption is based on shortened digital signature schemes. Table 5.3 describes the shortened version of DSA (Section 5.4.6). We use the notations of Algorithms 5.43 and 5.44. Also ‖ denotes concatenation of strings, and H is a hash function (like SHA-1). The shortened schemes have two advantages over the original DSA. First, a DSA signature is of length 2|r|, whereas an SDSA1 or SDSA2 signature has length |r| + |H(·)|. For the current version of the standard, both r and H(·) are of size 160 bits. However, one may use potentially bigger r and in that case the shortened schemes give smaller signatures with equivalent security. Finally, DSA requires computing a modular inverse during verification, whereas SDSA does not. So verification is more efficient in the shortened schemes.
| Name | Signature generation | Signature verification |
|---|---|---|
| SDSA1 | s := H(gd′ (mod p)‖M). t := d′(s + d)–1 (mod r). | w := (eags)t (mod p). Verify if s = H(w‖M). |
| SDSA2 | s := H(gd′ (mod p)‖M). t := d′(1 + ds)–1 (mod r). | .
Verify if s = H(w‖M). |
Algorithms 5.63 and 5.64 provide the details of the signcryption algorithm and its inverse called unsigncryption. The algorithms use a keyed hash function KH. One may implement KH(x,
) as
using an unkeyed hash function H.
Signcryption differs from the shortened scheme in that
is used instead of gd′ for the computation of s. The running time of the signcryption algorithm is dominated by this modular exponentiation. When signature and encryption are used separately, the encryption operation uses one (or more) exponentiations. So signcryption significantly improves upon the sign-and-encrypt scheme of Algorithm 5.62.
|
Input: Plaintext message M, the sender’s private key da, the recipient’s public key eb = gdb (mod p). Output: The signcrypted message (C, s, t). Steps: Select a random |
|
Input: The signcrypted message (C, s, t), the sender’s public key ea = gda (mod p) and the recipient’s private key db. Output: The plaintext message M and the verification status of the signature. Steps:
Write k := k1 ‖ k2 with |k2| equal to the length of a symmetric key.
if (KH(M‖N, k1) = s) { Return “Signature verified”. } else { Return “Signature not verified”. } |
The most time-consuming part of unsigncryption is the computation of two modular exponentiations. DSA verification too has this property. However, an additional decryption in the decrypt-and-verify scheme of Algorithm 5.62 calls for one (or more) exponetiations, making it slower that unsigncryption.
| 5.15 |
| ||||||
| 5.16 | Assume that Bob uses the same RSA key pair ((n, e), d) for receiving encrypted messages and for signing. Suppose that Carol intercepts the ciphertext c ≡ me (mod n) sent by Alice. Also suppose that Bob is willing to sign any random message presented by Carol. Explain how Carol can choose a message to be signed by Bob in order to retrieve the secret m. [H] | ||||||
| 5.17 | Let G be a finite cyclic group of order n, and g a generator of G. Suppose that Alice’s private and public keys are respectively d and gd.
| ||||||
| 5.18 | Show that:
(Here we call a signature valid, if it passes the verification routine.) | ||||||
| 5.19 |
| ||||||
| 5.20 | Design the XTR version of the Nyberg–Rueppel signature scheme with appendix (Section 5.4.5). What are the speed-ups achieved by the signature generation and verification routines of the XTR version over the original NR routines? | ||||||
| 5.21 | Repeat Exercise 5.20 with the Schnorr digital signature scheme (Section 5.4.4). | ||||||
| 5.22 |
| ||||||
| 5.23 | Let p, q, p′, q′ be distinct odd primes with p = 2p′ + 1 and q = 2q′ + 1, and let n := pq (as in the RSA-based undeniable signature scheme).
| ||||||
| 5.24 |
|
Entity authentication (also called identification) is a process by means of which an entity Alice, called the claimant, proves her identity to another entity Bob, called the prover or the verifier. Alice is assumed to possess some secret piece(s) of information that no intruder is expected to know. During the execution of the identification protocol, an interaction takes place between Alice and Bob. If the interaction allows Bob to conclude (deterministically or with high probability) that the claimer possesses the secret knowledge, he accepts the claimer as Alice. An intruder Carol lacking the secret information is expected (with high probability) to fail to convince Bob of her identity as Alice. This is how entity authentication schemes tend to prevent impersonation attacks by intruders. Typically, identification schemes are used to protect access to some sensitive piece(s) of data, like a user’s (or a group’s) private files in a computer or an account in a bank. Both secret-key and public-key techniques are used for the realization of entity authentication protocols.
A password is a small string to be remembered by an entity and produced verbatim to the verifier at the time of identification. The most common example is a computer password used to protect access to a user’s private working area in a file system. In this case, an alphanumeric string (or a string that can be input using a computer keyboard) of length between 4 and 20 characters is normally used as the secret information associated with an entity. Passwords are also used to prevent misuse of certain physical objects (like an ATM card for withdrawing cash from one’s bank account, a prepaid telephone card) by anybody other than the legitimate owners of the objects. In this case, a password usually consists of a sequence of four to ten digits and is also called a personal identification number or a PIN.
In order that Bob can recognize an entity from her password, a possibility for Bob is to store the (entity, password) pairs corresponding to all the entities that are expected to participate in identification interactions with Bob. When Alice enters her password, Bob checks if Alice’s input is the same as what he stores in the pair for Alice. The file(s) storing these private records should be preserved with high secrecy, and neither read nor write access should be granted to any user. But a privileged user (the superuser) is usually given the capability to inspect any file (even read-protected ones) and, therefore, can make misuse of the passwords.
This problem can be avoided by storing, instead of the passwords themselves, a one-way transform of the passwords.[3] When Alice enters a password P, Bob computes the transform f(P) and compares f(P) with the record stored for Alice. The identity of Alice is accepted if and only if a match occurs. The password file now need not be read-protected, since any intruder (even the superuser) knowing the value f(P) cannot easily compute P.
[3] Informally speaking, a one-way function is one which is computationally infeasible to invert.
Passwords should be chosen from a space large enough to preclude exhaustive search by an intruder in feasible time. Unfortunately, however, it is a common tendency for human users to choose passwords from limited subsets of the allowed space. For example, use of lower case characters, dictionary words, popular names, birth dates and so on in passwords makes attacks on passwords much easier. A strategy to foil such dictionary-based attacks is to use a pseudorandom bit sequence S known as the salt and apply the one-way function f to a combination of the password P and the salt S. That is, a function f(P, S) is now stored against an entity Alice having a password P. The combination (P, S) is often referred to as a key for the password scheme. Since a password now corresponds to many possible keys, the search space for an intruder increases dramatically. For instance, if S is a pseudorandomly chosen bit string of length 64, the intruder has to compute f(P, S) for a total of 264 times in order to guess the correct candidates for S for each P under trial. It is also necessary that the same key is not chosen for two different entities. If the salt S is a 64-bit string, then by the birthday paradox a collision between two keys is expected to occur only after (at least) 232 keys are generated.
A second strategy to strengthen the protection of passwords is to increase the so-called iteration count n, that is, instead of storing f(P, S) for each password P, Bob now stores fn(P, S). An n-fold application of the function f increases by a factor of n both the time for password verification and for exhaustive search by an intruder. For a legitimate user, this is not really a nuisance, since computation of fn(P, S) only once during identification is tolerable (and may even be unnoticeable), whereas to an intruder breaking a password simply becomes n times as difficult. In typical applications, values of n ≥ 1000 are recommended.
In some situations, it is advisable to lock access to a password-protected area after a predetermined number of (say, three) wrong passwords have been input in succession. This is typically the case with PINs for which the search space is rather small. For unlocking the access (to the legitimate user Alice), a second longer key (again known only to Alice) is used or human intervention is called for.
As a case study, let us briefly describe the password scheme used by the UNIX operating system. During the creation of a password a user supplies a string P of eight 7-bit ASCII characters as the password. (Longer strings are truncated to first 8 characters.) A 56 bit DES[4] key K is constructed from P. A 12-bit random salt S is obtained from the system clock at the time of the creation of the password. The zero message (that is, a block of 64 zero bits) is then iteratively encrypted n = 25 times using K as the key. The encryption algorithm is a variant of the DES, that depends on the salt S. The output ciphertext and the salt (which account for a total of 64 + 12 = 76 bits) are then packed into eleven 7-bit ASCII characters and stored in the password file (usually /etc/passwd). When UNIX was designed (in 1970), this algorithm, often referred to as the UNIX crypt password algorithm, was considered to be reasonably safe under the assumption of the difficulty of finding a DES key from a plaintext–ciphertext pair. With today’s hardware and software speed, a motivated attacker can break UNIX passwords in very little time.
[4] The data encryption standard (DES) is a well-known symmetric-key cipher (Section A.2.1).
Password-based authentication schemes suffer from the disadvantage that the user has to disclose her secret P to the verifier. The verifier may misuse the knowledge of P by storing it secretly and deploying it afterwards. During the process of computation of fn(P, S) the string P resides in the machine’s memory. An eavesdropper capable of monitoring the temporary storage holding the string P easily gets its value. In view of these shortcomings, password schemes are referred to as weak authentication schemes.
In a strong authentication scheme, the claimant proves the possession of a secret knowledge to a verifier without disclosing the secret to the verifier. One of the communicating entities generates a random bit string c known as the challenge and sends c (or a function of c) to the other. The latter then reacts to the challenge appropriately, for example, by sending a response string r to the former. Strong authentication schemes are, therefore, also called challenge–response authentication schemes. The communication between the entities depends both on the random challenge and on the secret knowledge of the claimant. An intruder lacking the secret knowledge of a valid claimant cannot take part properly in the interaction. Furthermore, since a random challenge is used during each invocation of the identification protocol, an eavesdropper cannot use the intercepted transcripts of a particular session for a future invocation of the protocol.
Public-key protocols can be used to realize challenge–response schemes. We assume that Alice is the claimant and Bob is the verifier. Without committing to specific algorithms, we denote the public and private keys of Alice by e and d, and the encryption and decryption transforms by fe and fd respectively. Alice proves her identity by demonstrating her knowledge of d (but without revealing d) to Bob. Bob uses the transform fe and Alice the transform fd under the respective keys e and d. If a key d′ other than d is used by Carol in conjunction with e, some step of the interaction detects this and the protocol rejects Carol’s claim to be Alice. We describe two challenge–response schemes that differ in the sequence of applying the transforms fe and fd.
In this scheme, Bob (the verifier) first generates a random string r, encrypts the same by the public key of Alice (the claimant) and sends the ciphertext c (the challenge) to Alice. Alice uses her private key to decrypt c to the message r′ and sends r′ (the response) back to Bob. Identification of Alice succeeds if and only if r = r′. Algorithm 5.65 illustrates the details of this scheme. It employs a one-way function H (like a hash function) for a reason explained later. This scheme checks whether the claimant can recover the random string r correctly. A knowledge of the decryption key d is needed for that.
|
Bob generates a random bit string r and computes w := H(r). Bob reads Alice’s (authentic) public key e and computes c := fe(r, e). Bob sends (w, c) to Alice. Alice computes r′ := fd(c, d). if (H(r′) ≠ w) { Alice quits the protocol. } Alice sends r′ to Bob. Bob identifies Alice if and only if r′ = r. |
The string H(r) = w is called the witness. By sending w to Alice, Bob convinces her of his knowledge about the secret r without disclosing r itself. If Bob (or a third party pretending to be Bob) tries to cheat, Alice has the option to abort the protocol prematurely. In other words, Alice does not have to decrypt an arbitrary ciphertext presented by Bob without confirming that Bob knows the corresponding plaintext.
In the scheme explained in Algorithm 5.66, Alice (the claimant) first does the private key operation, that is, Alice sends her digital signature on a message to Bob (the prover). Bob then verifies the signature of Alice by employing the encryption transform with Alice’s public key.
|
Bob selects a random string rB. Bob sends rB to Alice. Alice selects a random string rA. Alice generates the signature s := fd(rA‖rB, d). Alice sends (rA, s) to Bob. Bob reads Alice’s (authentic) public key e. Bob retrieves the strings Bob identifies Alice if and only if |
This authentication scheme is based on the assumption that only a person knowing Alice’s private key d can generate a signature s that leads to the equalities
and
. Using only rA and the signature s = fd(rA, d) would demonstrate to Bob that Alice possesses the requisite knowledge of d. The random string rB is used to prevent the so-called replay attack. If rB were not used, an eavesdropper Carol intercepting the transcripts of a session can later claim her identity as Alice by simply supplying rA and Alice’s signature on rA to Bob. Using a new rB in every session (and incorporating it in the signature) guarantees that the signature varies in different sessions, even when rA remains the same.
There is an alternative strategy by which the use of the random string rB can be avoided. All we have to ensure is that a value of rA used once cannot be reused in a subsequent session. This can be achieved by using a timestamp, which is a string reflecting the time when a certain event occurs (in our case, when Alice generates the signature). Thus, if Alice gets the local time tA, computes the signature s := fd(tA, d) and sends (tA, s) to Bob, it is sufficient for Bob to check that the timestamp tA is valid. A possible criterion for the validity of Alice’s timestamp tA is that the difference between tA and the time when Bob is verifying the signature is within an allowed bound (predetermined, based on the approximate time for the communication). But it may be possible for an adversary to provide to Bob the timestamp tA and Alice’s signature on tA, before tA expires. Therefore, Bob should additionally ensure that timestamps from Alice come in a strictly ascending order. Maintaining the timestamp for the last interaction with Alice takes care of this requirement. Algorithm 5.67 describes the modified version of Algorithm 5.66, based on timestamps. A problem with timestamps is that (local) clocks across a network have to be properly synchronized.
|
Alice reads the local time tA. Alice generates the signature s := fd(tA, d). Alice sends (tA, s) to Bob. Bob reads Alice’s (authentic) public key e. Bob retrieves the time-stamp Bob identifies Alice if and only if |
So far, we have described identification schemes that are unidirectional or unilateral in the sense that only Alice tries to prove her identity to Bob. For mutual authentication between Alice and Bob, the above schemes can be used a second time by reversing the roles of Alice and Bob. Algorithm 5.68 describes an alternative strategy that achieves mutual authentication with reduced communication overhead (compared to two invocations of the unidirectional scheme). Now, the key pairs (eA, dA) and (eB, dB) and the transforms fe, A, fd, A and fe, B, fd, B of both Alice and Bob should be used.
The challenge–response schemes described above ensure that the claimant’s secret is not made available to the verifier (or a listener to the communication between the verifier and the claimant). But the claimant uses her private key for generating the response and, therefore, it continues to remain possible that a verifier extracts some partial information on the secret by choosing challenges strategically.
|
Bob selects a random string rB. Bob sends rB to Alice. Alice selects a random string rA. Alice generates the signature sA := fd, A(rA‖rB, dA). Alice sends (rA, sA) to Bob. Bob reads Alice’s (authentic) public key eA. Bob retrieves the strings Bob identifies Alice if and only if Bob generates the signature sB := fd, B(rB‖rA, dB). Bob sends sB to Alice. Alice reads Bob’s (authentic) public key eB. Alice retrieves the strings Alice identifies Bob if and only if |
Using a zero-knowledge (ZK) protocol overcomes this difficulty in the sense that (absolutely) no information on the claimant’s secret is leaked out during the conversation between the claimant and the verifier. The verifier (or a listener) continues to remain as much ignorant of the secret as he was before the invocation of the protocol. In other words, the verifier (or a listener) does not learn anything form the conversation, that he could not learn by himself in absence of the claimant. The only thing the verifier gains is the confidence whether the claimant actually knows the secret or not. This is intuitively the defining feature of a ZK protocol.
Similar to other public-key techniques, the security of the ZK protocols is based on the intractability of some difficult computational problems. A repeated use of a public-key scheme with a given set of parameters may degrade the security of the scheme under those parameters. For example, each encryption of a message (or each generation of a signature) makes available a plaintext–ciphertext pair which may eventually help a cryptanalyst. A ZK protocol, on the other hand, does not lead to such a degradation of the security of the protocol, irrespective of how many times it is invoked.
We stick to the usual scenario: Alice is the claimant, Bob is the verifier and Carol is an eavesdropper trying to impersonate Alice. In the jargon of ZK protocols, Alice (and not Bob) is called the prover. In order to avoid confusions, we continue to use the terms claimant and verifier. A ZK protocol is usually a three-pass interactive protocol. To start with, Alice chooses a random commitment and sends a witness of the commitment to Bob. A new commitment should be selected by Alice during each invocation of the protocol in order to guard against an adversarial verifier. Upon receiving the witness, Bob chooses and sends a random challenge to Alice. Finally, Alice replies by sending a response to the challenge. If Alice knows the secret (and performs the protocol steps correctly), her response can be easily proved by Bob to be valid. Carol, in an attempt to impersonate Alice without knowing the secret, can produce the valid response with a probability P bounded away from 1. If P happens not to be negligibly small, then the protocol can be repeated a sufficient number of times, so that Carol’s probability of giving the correct response on all occasions becomes extremely low.
The parameters and the secrets for a ZK protocol can be set privately by each claimant. Another alternative is that a trusted third party (TTP) generates a set of parameters and makes these parameters available for use by every claimant over a network. A second duty of the TTP is to register a secret against each entity. The secret may be generated either by the TTP or by the respective entity. The knowledge of this (registered) secret by an entity is equivalent to her identity in the network. Finally, the authenticity of the public key of an entity is ensured by the digital signature of the TTP on the public key. For simplicity, however, we will not bother about the existence of the TTP and the way in which the secret (the possession of which by Alice is to be proved) has been created and/or handed over to Alice. We will also assume that each entity’s public key is authentic.
The FFS protocol (Algorithm 5.69) is based on the intractability of computing square roots modulo a composite integer n. We take n = pq with two distinct primes p and q each congruent to 3 modulo 4.
|
Selection of domain parameters: Select two large distinct primes p and q each congruent to 3 modulo 4. n := pq.
Selection of Alice’s secret: Alice selects t random integers Alice selects t random bits Alice computes Alice makes (y1, . . . , yt) public and keeps (x1, . . . , xt) secret. The protocol:
Bob computes Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ ≡ ±w (mod n). |
It is clear from Algorithm 5.69 that knowing the secret (x1, . . . , xt) allows Alice to let Bob accept her identity (as Alice). The check w′ ≠ 0 in the last line is necessary to preclude the commitment c = 0, that makes any claimant succeed irrespective of the availability of the knowledge of the secret.
Now, let us see how an opponent (Carol), without knowing the secret, can succeed in impersonating Alice by taking part in this protocol. To start with, we consider the simple case t = 1 (which corresponds to Fiat and Shamir’s original scheme). Carol can start the process by generating a random c and γ and computing w = (–1)γc2. Now, Carol should send the response c or cx1 depending on whether Bob sends ∊1 = 0 or 1. Her capability of sending both correctly is equivalent to her knowledge of x1. If Bob sends ∊1 = 0, then she can provide the correct response c. Otherwise, Carol can at best select a random response from
, and the probability that this is correct is overwhelmingly low. On the other hand, let Carol choose a random c and
and send the (improper) witness
. In that case, Carol can answer the valid response r = c, if Bob’s challenge is ∊1 = 1. Sending the correct response
to the challenge ∊1 = 0 now requires knowledge of x1. Therefore, if ∊1 is randomly chosen by Bob (without the prior knowledge of Carol), Carol can successfully respond with probability (very close to) 1/2. For t ≥ 1, this probability of a cheat by Carol can be easily shown to be (very close to) 1/2t which is negligibly small for t ≥ 80.
In practice, however, t is chosen to be O(ln ln n). It is, therefore, necessary to repeat the protocol t′ times, so that the probability of a successful cheat becomes (nearly) 1/2tt′. Taking t′ = Θ(ln n) is recommended. It can be shown that these choices for t and t′ offer the FFS protocol the desired ZK property. Without going into a proof of this assertion, let us informally explain the ZK property of the FFS protocol. Neither Bob nor a listener to the conversation between Alice and Bob can get any idea of the secret (x1, . . . , xt). Bob gets as a response the product of c and those xi’s for which ∊i = 1. Since c is randomly chosen by Alice and is not available to Bob, there is no way to choose a strategic challenge. However, if the square root of w (or –w) can be computed by Bob, then the interaction may give away partial information on the secret. For example, if Bob chooses the challenge (∊1, ∊2, . . . , ∊t) = (1, 0, . . . , 0), then Alice’s response would be cx1 from which x1 can be computed by Bob, if he knows c. Thus, the security and the ZK property of the FFS protocol are based on the assumption that computing square roots modulo n is an infeasible computational problem.
The GQ identification protocol is based on the intractability of the RSA problem. The correctness of Algorithm 5.70 (for a legitimate claimant) is easy to establish. The check w′ ≠ 0 is necessary to avoid the commitment c = 0, which makes a claimant succeed always.
A TTP typically selects the domain parameters p, q, n, e and d. It also selects m and gives s to Alice without revealing d. The execution of the protocol does not require the use of the decryption exponent d. In fact, d is a global secret, whereas s is Alice’s personal secret. Alice tries to prove the knowledge of s (and not of d).
In the GQ algorithm, the power s∊ is blinded by multiplying it with the random commitment c. As a witness for c, Alice presents its encrypted version w. With the assumption that RSA decryption without the knowledge of the decryption exponent d is infeasible, Bob (or an eavesdropper) cannot compute c and hence cannot separate out the value of s∊. Thus, no partial information on s is provided. Furthermore, each invocation requires a random ∊. In order to compute a strategic witness, Carol can at best have a guess of ∊. The guess is correct with a probability of 1/e. If e is reasonably large, the probability of a successful cheat is low. However, larger values of e lead to more expensive generation of the witness from the commitment (and also of the response). So small values of e (say, 216 + 1 = 65,537) are usually recommended. In that case, repeating the protocol a suitable number of times makes Carol’s chance of cheating as small as one desires. Taking t′e (where t′ is the number of iterations of the protocol) of the order of (log n)α for some constant α gives the GQ protocol the desired zero-knowledge property.
|
Selection of domain parameters: Select two distinct large primes p and q and set the modulus n := pq. Select an exponent The pair (n, e) is made public and d is kept secret. Selection of Alice’s secret: Alice selects a random Alice makes m public and keeps s secret. The protocol:
Bob computes w′ := m∊re (mod n). Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ = w. |
The Schnorr protocol is based on the intractability of computing discrete logarithms in a large prime field
. We assume that a suitably large prime divisor q of p – 1 and an element
of multiplicative order q are known. The algorithm works in the subgroup of
, generated by g. In order to make the known algorithms for solving the DLP infeasible for the field
, one should have q > 2160.
|
Selection of domain parameters: Select a large prime p such that p – 1 has a large prime divisor q. Select an element Publish (p, q, g).
Selection of Alice’s secret: Alice chooses a random secret integer Alice computes and makes public the integer y := g–d (mod p). The protocol:
Bob computes w′ := gry∊ (mod p). Bob accepts Alice’s identity if and only if w′ = w. |
We leave the analysis of correctness and security of this protocol to the reader. The secret s is masked from Bob and other eavesdroppers by introducing the random additive bias c modulo q. The probability of a successful cheat by an adversary is 2–t, since ∊ is chosen randomly from a set of cardinality 2t. Usually the Schnorr protocol is not used iteratively. Therefore, t ≥ 40 is recommended for making the probability of cheating negligible. On the other hand, if t is too large, then the protocol can be shown to lose the ZK property. For the generation of the witness from the commitment, Alice computes a modular exponentiation to an exponent which is O(q). Generating the response, on the other hand, involves a single multiplication (and a single addition) modulo q and hence is very fast.
| 5.25 |
| |
| 5.26 | Let n := pq with distinct primes p and q each congruent to 3 modulo 4.
|
All the material studied in earlier chapters culminates in this relatively short chapter which describes some popular cryptographic algorithms. We address most of the problems relevant in cryptography, namely, encryption, key agreement, digital signatures and entity authentication. Against each algorithm we mention the (provable or alleged) source of security of the algorithm.
Encryption algorithms are treated first. We start with the seemingly most popular RSA algorithm. This algorithm derives its security from the RSA key inversion problem and the RSA problem. The key inversion problem is probabilistic polynomial-time equivalent to the integer factorization problem. The intractability of the RSA problem is unknown. At present no algorithm other than factoring the RSA modulus is known for solving the RSA problem. We subsequently describe Rabin encryption (based on the square root problem), Goldwasser–Micali encryption (based on the quadratic residuosity problem), Blum–Goldwasser encryption (based on the square root problem), ElGamal encryption (based on the Diffie–Hellman problem) and Chor–Rivest encryption (based on a variant of the subset sum problem). The XTR encryption algorithm is essentially an efficient implementation of ElGamal encryption and is based on a tricky representation of elements in certain finite fields. The last encryption algorithm we discuss is the NTRU algorithm. It derives its security from a mixing system that uses the algebra
. Attacks on NTRU based on the shortest vector problem are also known.
The basic key-agreement scheme is the Diffie–Hellman scheme. In order to prevent small-subgroup attacks on this scheme, one employs a technique known as cofactor expansion. We then explain unknown key-share attacks against key-agreement schemes. These attacks necessitate the use of authenticated key agreement schemes. The MQV algorithm is presented as an example of an authenticated key-agreement scheme.
Next come digital signature algorithms. Digital signatures may be classified in two broad categories: signature schemes with appendix and signature schemes with message recovery. In this book, we study only the signature schemes with appendix. As specific examples of signature schemes, we first explain RSA and Rabin signatures. Then, we present several variants of discrete-log-based signature schemes: ElGamal signatures, Schnorr signatures, Nyberg–Rueppel signatures, the digital signature algorithm (DSA) and its elliptic curve variant ECDSA. All the discrete-log (over finite fields)-based signature schemes have efficient XTR implementations. The NTRUSign algorithm is the last general-purpose signature scheme discussed in this section.
We then present a treatment of some special signature schemes. Blind signatures are created on messages unknown to the signer. Three blind signature schemes are described: Chaum, Schnorr and Okamoto–Schnorr schemes. An undeniable signature, on the other hand, requires an active participation of the signer at the time of verification and comes with a denial protocol that prevents a signer from denying a valid signature at a later time. The Chaum–Van Antwerpen undeniable signature scheme is based on the discrete-log problem, whereas the GKR scheme is based on the RSA problem.
A way to guarantee both authentication and confidentiality of a message is to sign the message and then encrypt the signed message. This involves two basic operations (signature generation and encryption). Zheng’s signcryption scheme combines these two primitives with a view to reducing both running time and message expansion.
The final topic we discuss in this chapter is entity authentication, a mechanism by means of which an entity can prove its identity to another. Here identity of an entity is considered synonymous with the possession of some secret information by the entity. Passwords are called weak authentication schemes, since the claimant has to disclose the secret straightaway to the verifier. A strong authentication scheme (also called a challenge–response scheme) does not reveal the secret to the verifier. We describe two strong authentication schemes; the first is based on encryption and the second on digital signatures. A way to establish mutual authentication between two entities is also presented. Challenge–response algorithms may be vulnerable to some attacks mounted by the verifier. A zero-knowledge protocol comes with a proof that during the authentication conversation no information is leaked to the verifier. Three zero-knowledge protocols are discussed: the Feige–Fiat–Shamir protocol, the Guillou–Quisquater protocol, and the Schnorr protocol.
Public-key cryptography was born from the seminal works of Diffie and Hellman [78] and Rivest, Shamir and Adleman [252]. Though still young, this area has induced much research in the last three decades. In this chapter, we have made an attempt to summarize some important cryptographic algorithms proposed in the literature. The original papers where these techniques have been introduced are listed below. We don’t plan to be exhaustive, but mention only the most relevant resources.
| Algorithm | Reference(s) |
|---|---|
| RSA encryption | [252] |
| Rabin encryption | [246] |
| Goldwasser–Micali encryption | [117] |
| Blum–Goldwasser encryption | [27] |
| ElGamal encryption | [84] |
| Chor–Rivest encryption | [54] |
| XTR encryption | [170, 172, 171, 173, 289, 297] |
| NTRU encryption | [130] |
| Identity-based encryption | [267, 34, 35] |
| Diffie–Hellman key exchange | [78] |
| Menezes–Qu–Vanstone key exchange | [161] |
| RSA signature | [252] |
| Rabin signature | [246] |
| ElGamal signature | [84] |
| Schnorr signature | [263] |
| Nyberg–Rueppel signature | [223, 224] |
| DSA | [220] |
| ECDSA | [141] |
| XTR signature | [170, 172, 171, 173, 289, 297] |
| NTRUSign | [110, 111, 128, 129, 131, 217] |
| Chaum blind signature | [48, 49, 50] |
| Schnorr blind signature | [263, 202] |
| Okamoto–Schnorr blind signature | [227, 236] |
| Chaum–Van Antwerpen undeniable signature | [51, 52, 53] |
| RSA undeniable signature | [109, 187, 102, 186] |
| Signcryption | [310, 311, 312] |
| Signcryption based on elliptic curves | [313, 314] |
| Identity-based signcryption | [178, 185] |
| Feige–Fiat–Shamir ZK protocol | [90, 91] |
| Guillou–Quisquater ZK protocol | [122] |
| Schnorr ZK protocol | [263] |
The Handbook of Applied Cryptography [194] is a single resource where most of the above algorithms have been discussed in good details. See Chapter 8 of this book for encryption algorithms, Chapter 11 for digital signatures and Chapter 10 for identification schemes.
There are several other (allegedly) intractable mathematical problems based on which cryptographic protocols can be built. Some of the promising candidates that we left out in the text are summarized below:
| Algorithm | Intractable problem |
|---|---|
| LUC [284, 285, 286] | RSA and ElGamal-like problems based on Lucas sequences |
| Goldreich–Goldwasser–Halevi [115] | lattice-basis reduction |
| Patarin’s hidden field equation | solving multivariate polynomial |
| (HFE) [232] | equations |
| EPOC/ESIGN [97, 228] | factorization of integers p2q |
| McEliece encryption [190] | decoding of error-correcting codes |
| Number field cryptography [38, 39] | discrete log problem in class groups of quadratic fields |
| KLCHKP (Braid group cryptosystem) [148] | Braid conjugacy problem |
The Internet site http://www.tcs.hut.fi/~helger/crypto/link/public/index.html is a good place to start, for more information on these (and some other) cryptosystems. Also visit http://www.kisa.or.kr/technology/sub1/index-PKC.htm.
The obvious question that crops up now is that, given so many different cryptographic schemes, which one a user should go for.[5] There is no clear-cut answer to this question. One has to study the relative merits and demerits of the systems. If computational efficiency is what matters, we advocate users to go for NTRU schemes. Having said that, we must also add that the NTRU scheme is relatively new and has not yet withstood sufficient cryptanalytic attacks. Various attacks on NSS and NTRUSign cast doubt about the practical safety of applying such young schemes in serious applications.
[5] It is worthwhile to issue a warning to the readers. Many cryptographic algorithms (and also the idea of public-key cryptography) are/were patented. In order to implement these algorithms (in particular, for commercial purposes), one should take care of the relevant legal issues. We summarize here some of the important patents in this area. The list is far from exhaustive.
Patent No.
Covers
Patent holder
Date of issue
US 4,200,770
Diffie–Hellman key exchange (includes ElGamal encryption)
Stanford University
Apr 29, 1980
US 4,218,582
Public-key cryptography
Stanford University
Aug 19, 1980
US 4,405,829
RSA
MIT
Sep 20, 1983
US 5,231,668
DSA
USA, Secretary of Commerce
Jul 27, 1993
US 5,351,298
LUC
P. J. Smith
Sep 27, 1994
US 5,790,675
HFE
CP8 Transac (France)
Aug 4, 1998
EP 0963635A1 / WO 09836526
XTR
Citibank (North America)
Dec 15, 1999
Aug 20, 1998
US 6,081,597
NTRU
NTRU Cryptosystems, Inc.
Jun 27, 2000
—
EPOC/ESIGN
Nippon Telegraph and Telephone Corporation
Apr 17, 2001
Our mathematical trapdoors are not provably secure and this is where the problems begin. We have to rely on historical evidences that should not be collected too hastily. Slow as it is, RSA has stood the test of time, and has successfully survived more than twenty years of cryptanalytic attacks [29]. The risks attached to the fact that an unforeseen attack will break the system tomorrow, appear much less with RSA, compared to newer schemes that have enjoyed only little cryptanalytic studies. The hidden monomial system proposed by Imai and Matsumoto [188] was broken by Patarin [231]. As a by-product, Patarin came up with the idea of cryptosystems based on hidden field equations (HFE) [232]. No serious attacks on HFE are known till date, but as we mentioned earlier, only time will show whether HFE is going to survive.
Bruce Schneier asserts in his Crypto-gram news-letter (15 March 1999, http://www.counterpane.com/crypto-gram.html): No one can duplicate the confidence that RSA offers after 20 years of cryptanalytic review. A standard security review, even by competent cryptographers, can only prove insecurity; it can never prove security. By following the pack you can leverage the cryptanalytic expertise of the worldwide community, not just a handful of hours of a consultant’s time.
Twenty-odd years is definitely not a wide span of time in the history of evolution of our knowledge, but public-key cryptography is only as old as RSA is!
| 6.1 | Introduction |
| 6.2 | IEEE Standards |
| 6.3 | RSA Standards |
| Chapter Summary | |
| Sugestions for Further Reading |
In theory, there is no difference between theory and practice. But, in practice, there is.
—Jan L. A. van de Snepscheut
ECC curves are divided into three groups, weak curves, inefficient curves, and curves patented by Certicom.
—Peter Gutmann
Acceptance of prevailing standards often means we have no standards of our own.
—Jean Toomer (1894 – 1967)
Public-key cryptographic protocols deal with sets like the ring
of integers modulo n, the multiplicative group
of units in a finite field or the group
of points in an elliptic curve over a finite field. Messages that need to be encrypted or signed are, on the other hand, usually human-readable text or numbers or keys of secret-key cryptographic protocols, which are typically represented in computers in the form of sequences of bits (or bytes). It is necessary to convert such bit stings (or byte strings) to mathematical elements before the cryptographic algorithms are applied. This conversion is referred to as encoding. The reverse transition, that is, converting mathematical entities back to bit strings is called decoding.
If Alice and Bob were the only two parties involved in deploying public-key protocols, they could have agreed upon a set of private (not necessarily secret) encoding and decoding rules. In practice, however, when many entities interact over a public network, it is impractical, if not impossible, to have an individual encoding scheme for every pair of communicating parties. This is also unnecessary, because the security of the protocols comes from the encryption process and not from encoding. On the contrary, poorly designed encoding schemes may endanger the security of the underlying protocols.
We, therefore, need a set of standard ways of converting data between various logical formats. This promotes interoperability, removes ambiguities, facilitates simplicity in handling cryptographic data and thereby enhances the applicability and acceptability of public-key algorithms. IEEE (The Institute of Electrical and Electronics Engineers, Inc., pronounced eye-triple-e) and the RSA laboratories have published extensive documents standardizing data conversion and encoding for many popular public-key cryptosystems. Here we summarize the contents of some of these documents. This exposition is meant mostly for software engineers intending to develop cryptographic tool-kits that conform to the accepted standards.
In this section, we outline the first three of the drafts from IEEE, shown in Table 6.1. At the time of writing this book, these are the latest versions of the drafts available from IEEE. In future, these may be superseded by newer documents. We urge the reader to visit the web-site http://grouper.ieee.org/groups/1363/ for more up-to-date information. Also see the standard IEEE 1363–2000: Standards Specifications for Public-key Cryptography [134].
| Draft | Date | Description |
|---|---|---|
| P1363 / D13 | 12 November 1999 | Traditional public-key cryptography based on IFP, DLP and ECDLP |
| P1363a/D12 | 16 July 2003 | Additional techniques on traditional public-key cryptography |
| P1363.1/D4 | 7 March 2002 | Lattice-based cryptography |
| P1363.2/D15 | 25 May 2004 | Password-based authentication |
| P1363.3/D1 | May 2008 | Identity-based public-key cryptography |
Public-key protocols operate on data of various types. The IEEE drafts specify only the logical descriptions of these data types. The realizations of these data types should be taken care of by individual implementations and are left unspecified.
A bit string is a finite ordered sequence a0a1 . . . al–1 of bits, where each bit ai can assume the value 0 or 1. The length of the bit string a0a1 . . . al–1 is l. The bit a0 in the bit string a0a1 . . . al–1 is called the leftmost or the first or the leading or the most significant bit, whereas the bit al–1 is called the rightmost or the last or the trailing or the least significant bit.
The order of appearance of the bits in a bit string is important, rather than the way the bits are indexed or named. That is to say, the most and least significant bits in a given bit string are uniquely determined by their positions of occurrences in the string, and not by the way the individual bits in the string are numbered. Thus, for example, if we call the bit string 01101 as a0a1a2a3a4, then the leading and trailing bits are a0 and a4 respectively. If we index the bits in the same bit string as a2a3a5a7a11, the first bit is a2 and the last bit is a11. Finally, for the indexing a5a4a3a2a1, the leftmost and rightmost bits are a5 and a1 respectively.
Though bits are the basic building blocks in computer memory, programs typically access memory in groups of 8 bits, known as octets. Thus, an octet is a bit string of length 8 and can have one of the 256 values 0000 0000 through 1111 1111. It is convenient to write an octet as a concatenation of two hexadecimal digits, the first (resp. second) one corresponding to the first (resp. last) 4 bits in the octet being treated as an 8-bit integer in base 2. For example, the octet 0010 1011 is represented by 2b. It is also often customary to treat an octet a0a1 . . . a7 as the integer (between 0 and 255, both inclusive) whose binary representation is a0a1 . . . a7.
An octet string is a finite ordered sequence of octets. The length of an octet string is the number of octets in the string. The leftmost (or first or leading or most significant) and the rightmost (or last or trailing or least significant) octets in an octet string are defined analogously as in the case of bit strings. These octets are dependent solely on their positions in the octet string and are independent of how the individual octets in the octet string are numbered.
Integers are the whole numbers 0, ±1, ±2, . . . . For cryptographic applications, one typically considers only non-negative integers. Integers used in cryptography may have binary representations requiring as many as several thousand bits.
Let p be a prime (typically, odd). The elements of
are represented as integers 0, 1, . . . , p – 1 under the standard way of associating the integer
with the congruence class [a]p in
. Arithmetic operations in
are the corresponding integer operations modulo the prime p.
The elements of the field
are represented as bit strings of length m. In order to provide the mathematical interpretation of these bit strings, we recall that
is an m-dimensional
-vector space. Let β0, . . . , βm–1 be an ordered basis of
over
. The bit string a0 . . . am–1 is to be identified with the element a0β0 + · · · + am–1βm–1, where the bit ai represents the element [ai]2 of
. Selection of the basis β0, . . . , βm–1 renders a complete meaning to this representation and determines how arithmetic operations on these elements are to be performed. The following two cases are recommended.
For the polynomial-basis representation, one chooses an irreducible polynomial
of degree m and represents
as
. Letting x denote the canonical image of X in
one chooses the ordered basis β0 = xm–1, β1 = xm–2, . . . , βm–1 = 1. Arithmetic operations in
under this representation are those of
followed by reduction modulo the defining polynomial f(X). Choice of the irreducible polynomial f(X) is left unspecified in the IEEE drafts.
For the normal-basis representation, one selects an element
which is normal over
(see Definition 2.60, p 86), and takes the ordered basis β0 = θ = θ20, β1 = θ21, β2 = θ22, . . . , βm–1 = θ2m–1. Arithmetic in
is carried out as explained in Section 2.9.3.
The IEEE draft P1363a also specifies a composite-basis representation of elements of
, provided that m is composite. Let m = ds with 1 < d < m. One chooses an (ordered) polynomial or normal basis γ0, γ1, . . . , γs–1 of
over
. An element of
is of the form a0γ0 + a1γ1 + · · · + as–1γs–1 and is represented by a0a1 . . . as–1, where each ai, being an element of
, is represented by a bit string of length d. The interpretation of the representation of ai is dependent on how
is represented. One can use a polynomial- or normal-basis representation of
(over
), or even a composite-basis representation of
over
, if d happens to be composite with a non-trivial divisor d′.
A non-prime finite field of odd characteristic is one with cardinality pm for some odd prime p and for some
, m > 1. The field
is represented as
, where
is an irreducible polynomial of degree m. An element of
is then of the form α = am–1xm–1 + · · · + a1x + a0, where x := X + 〈f(X)〉 and where each ai is an element of
, that is, an integer in the range 0, 1, . . . , p – 1. The element α is represented as an integer by substituting p for x, that is, as the integer
(see the packed representation of Exercise 3.39). In order to interpret an integer between 0 and pm – 1 as an element of
, one has to expand the integer in base p.
An elliptic curve defined over a finite field
is specified by two elements a,
. Depending on the characteristic of
this pair defines the following curves.
If char
, 3, then 4a3 + 27b2 must be non-zero in
and the equation of the elliptic curve is taken to be Y2 = X3 + aX + b.
For char
, we must have b ≠ 0 in
and we use the non-supersingular curve Y2 + XY = X3 + aX2 + b. Because of the MOV attack (Section 4.5.1), supersingular curves are not recommended for cryptographic applications.
Finally, if
has characteristic 3, then both a and b must be non-zero in
and the elliptic curve Y2 = X3 + aX2 + b is specified by (a, b).
A point
on an elliptic curve defined over
can be represented either in compressed or in uncompressed form. In the uncompressed form, one represents P as the pair (h, k) of elements of
. The compressed form can be either lossy or lossless. In the lossy compressed form, P is represented by its X-coordinate h only. Such a representation is not unique in the sense that there can be two points on the elliptic curve with the same X-coordinate h. In applications where Y -coordinates of elliptic curve points are not utilized, such a representation can be used. In the lossless compressed form, one represents P as
. There are two solutions (perhaps repeated) for Y for a given value h of X. The bit
specifies which of these two values is represented. Depending on how the bit
is computed, we have two different lossless compressed forms.
The LSB compressed form is applicable for odd prime fields
or fields
of even characteristic. For
, the bit
is taken to be the least significant (that is, rightmost) bit of k (treated as an integer). For
, we have
, if h = 0, whereas if h ≠ 0, then
is the least significant bit of the element kh–1 treated as an integer via the FE2I conversion primitive described in Section 6.2.2.
The SORT compressed form is used for q = pm, m > 1. Let P′ = (h, k′) be the opposite of P = (h, k), that is,
One converts k and k′ to integers
and
using the FE2I primitive and sets
.
One may also go for a hybrid representation of the elliptic curve point P = (h, k), in which information for both the compressed and the uncompressed representations for P are stored, that is, P is stored as
with
computed by one of the methods (LSB or SORT) described above.
For NTRU public-key cryptosystems, we work in the ring
. We denote
as usual. An element of R is a polynomial a(x) = a0 + a1x + a2x2 + · · · + an–1xn–1 with
, and is represented by the ordered n-tuple of integers (a0, a1, . . . , an–1). Addition (resp. subtraction) in R is simply component-wise addition (resp. subtraction), whereas multiplication of a(x) = a0 + a1x + · · · + an–1xn–1 and b(x) = b0 + b1x + · · · + bn–1xn–1 gives c(x) = c0 + c1x + · · · + cn–1xn–1, where
ajbk (see Section 5.2.8). The IEEE draft P1363.1 designates elements of R as ring elements.
It is customary to deal with polynomials in R with small coefficients. If all the coefficients of
are known to be from {0, 1}, it is convenient to represent a(x) as the bit string a0a1 . . . an–1 instead of as an n-tuple of integers. In this case, a(x) is called a binary ring element or simply a binary element.
The IEEE drafts P1363 and P1363.1 specify algorithms for converting data among the formats discussed above. The standardized data conversion primitives are summarized in Figure 6.1. Though these drafts support elliptic curve cryptography, it is not specified how data representing elliptic curves can be converted to data of other types (like octet strings and bit strings).

We now provide a brief description of the data conversion primitives at a logical level. The implementation details depend on the representations of the data types and are left out here.
A bit string a0a1 . . . al–1 can be broken up in groups of eight bits and packed into octets. But we run with difficulty, if the length of the input bit string is not an integral multiple of 8. We have to add extra bits in order the make the length of the augmented bit string an integral multiple of 8. This can be done is several ways and in this context a standard convention needs to be adopted. The IEEE drafts prescribe the following rules:
Every extra bit added must be the zero bit.
Add the minimal number of extra bits.
Add the extra bits, if any, to the left.[1]
[1] At the time of writing this book there is a serious conflict between the latest drafts of P1363 and P1363.1 from IEEE. The former asks to add extra bits to the left, the latter to the right. One of the authors of this book raised this issue in the discussion group stds-p1363-discuss maintained by IEEE and was notified that in the next version of the P1363.1 document this conflict would be resolved in favour of P1363.
In order to see what these rules mean, let a0a1 . . . al–1 be a bit string of length l to be converted to the octet string A0A1 . . . Ad–1. The length of the output octet string must be d = ⌈l/8⌉. 8d – l zero bits should be added to the left of the input bit string in order to create the augmented bit string 0 . . . 0a0a1 . . . al–1 whose length is 8d. Now, we start from the left and pack blocks of consecutive eight bits in A0, A1, . . . , Ad–1. Thus, we have A0 = 0 . . . 0a0 . . . ak–1, A1 = ak . . . ak+7, . . . , Ad–1 = ak+8(d–2) . . . ak+8(d–2)+7, where k = 8 – (8d – l). Note that if l is already a multiple of 8, then 8d – l = 0, that is, no extra bits need to be added.
As an example, consider the input bit string 01110 01101011 of length 13. The output octet string should be of length ⌈13/8⌉ = 2. Padding gives the augmented bit string 00001110 01101011. The first octet in the output octet string will then be 00001110, that is, 0e; and the second octet will be 01101011, that is, 6b.
The OS2BS primitive is designed to ensure that if we convert an octet string generated by BS2OS, we should get back the original bit string (that is, the input to BS2OS) with which we started. Suppose that we want to convert an octet string A0A1 . . . Ad–1. Let us write the bits of Ai as ai,0ai,1 . . . ai,7. The desired length l of the output bit string has to be also specified. If d ≠ ⌈l/8⌉, the procedure OS2BS reports error and stops. If d = ⌈l/8⌉, we consider the bit string
a0,0a0,1 . . . a0,7a1,0a1,1 . . . a1,7 . . . ad–1,0ad–1,1 . . . ad–1,7
of length 8d. If the leftmost 8d – l bits of this flattened bit string are not all zero, OS2BS should quit after reporting error. Otherwise, the trailing l bits of the flattened bit string is returned.
The reader can check that when 0e 6b and l = 13 are input to OS2BS, it returns the bit string 01110 01101011. (See the example in connection with BS2OS.) Notice also that for this input octet string, OS2BS reports error if and only if a value l ≥ 17 or l ≤ 11 is supplied as the desired length of the output bit string.
Let a non-negative integer n be given. The I2BS primitive outputs a bit string of length l representing n. If n ≥ 2l, this conversion cannot be done and the primitive reports error and quits. If n < 2l, we write the binary representation of n as
n = al–12l–1 + al–22l–2 + · · · + a12 + a0 with
.
Treating each ai as a bit[2], I2BS returns the bit string al–1al–2 . . . a1a0. One or more leading bits of the binary representation of n may be zero. There is no limit on how many leading zero bits are allowed during the conversion. In particular, the integer 0 gets converted to a sequence of l zero bits for any value of l supplied.
[2] Each ai is logically an integer which happens to assume one of two possible values: 0 and 1. A bit, on the other hand, is a quantity that can also assume only two possible values. Traditionally, the values of a bit are also denoted by 0 and 1. But one has the liberty to call these values off and on, or false and true, or black and white, or even armadillo and platypus. To many people, bit is an abbreviation for binary digit which our ais logically are. To others, binit is a safer and more individualistic acronym for binary digit. For I2BS, we identify the two concepts.
A request to I2BS to convert n = 2357 = 211 + 28 + 25 + 24 + 22 + 20 with l = 12 returns 1001 00110101, one with l = 18 returns 00 00001001 00110101 and finally one with l ≤ 11 reports failure. Note that for neater look we write bit strings in groups of eight and grouping starts from the right. This convention reflects the relationship between bit strings and octet strings, as mentioned above.
The primitive BS2I converts the bit string a0a1 . . . al–1 to the integer a02l–1 + a12l–2 + · · · + al–22 + al–1, where we again identify a bit with an integer (or a binary digit). As an illustrative example, the bit string 1001 00110101 (or 00 00001001 00110101) gets converted to the integer 211 + 28 + 25 + 24 + 22 + 20 = 2357. The null bit string (that is, the one of zero length) is converted to the integer 0.
In order to convert a non-negative integer n to an octet string of length d, we write the base-256 expansion of n as
n = Ad–1256d–1 + Ad–2256d–2 + · · · + A1256 + A0,
where each
and can be naturally identified with an octet. I2OS returns the octet string Ad–1Ad–2 . . . A1A0. Note that the above representation of n to the base 256 is possible if and only if n < 256d. If n ≥ 256d, I2OS should return failure. Like bit strings, an arbitrary number of leading zero octets are allowed.
Consider the integer 2357 = 9 × 256 + 53. The two-digit hexadecimal representations of 9 and 53 are 09 and 35 respectively. Thus, a call of I2OS on this n with d = 3 (resp. d = 2, resp. d = 1) returns 00 09 35 (resp. 09 35, resp. failure).
Let an octet string A0A1 . . . Ad–1 be given. Each Ai can be identified with a 256-ary digit. OS2I returns the integer A0256d–1 + A1256d–2 + · · · + Ad–2256 + Ad–1. If d = 0, the integer 0 should be output.
In the IEEE P1363 jargon, a field element is an element of the finite field
, where q is a prime or an integral power of a prime. We want to convert an element
to an octet string. Depending on the value of q, we have two cases:
If char
is odd, β is represented as an integer in {0, 1, . . . , q – 1}. FE2OS converts β to an octet string of length ⌈log256 q⌉ by calling the primitive I2OS.
If q = 2m, β is represented as a bit string of length m. The primitive BS2OS is called to convert β to an octet string.
Assume that an octet string is to be converted to an element of the finite field
. Again we have two possibilities depending on q.
If
is of odd characteristic, the primitive OS2I is called to convert the given octet string to an integer. This integer is returned as the field element.
If q = 2m, one calls the primitive OS2BS with the given octet string and with the length m supplied as inputs. The resulting bit string is returned by OS2FE. If OS2BS reports error, so should do OS2FE too.
Let
and the integer equivalent of β be sought for. If q is odd, then β is already represented as an integer (in {0, 1, . . . , q – 1}) and is itself output. If q = 2m, one first converts β to an octet string by FE2OS and subsequently converts this octet string to an integer by calling the primitive OS2I.
The point
at infinity (on an elliptic curve over
) is defined by an octet string comprising a single zero octet only. So let P = (h, k) be a finite point. The EC2OS primitive produces an octet string PO = P C ‖ H ‖ K which is the concatenation of a single octet PC with octet strings H and K representing h and k respectively. The values of PC and K depend on the type of compression used. One has
, where
S = 1 if and only if the SORT compression is used.
U = 1 if and only if uncompressed or hybrid form is used.
C = 1 if and only if compressed or hybrid form is used.
=
if compression is used, it is 0 otherwise.
The first four bits of PC are reserved for (possible) future use and should be set to 0000 for this version of the standard. H is the octet string of length ⌈log256 q⌉ obtained by converting h using FE2OS. If the compressed form is used, K is the empty octet string, whereas if uncompressed or hybrid form is used, we have K = FE2OS(k, ⌈log256 q⌉). Finally, for the lossy compression we have PC = 0000 0001, H = FE2OS(h, ⌈log256 q⌉) and K is empty. Table 6.2 summarizes all these possibilities. Here, l := ⌈log256 q⌉, and p is an odd prime.
| Representation | PC | H | K | q |
| uncompressed | 0000 0100 | FE2OS(h, l) | FE2OS(k, l) | All |
| LSB compressed | ![]() | FE2OS(h, l) | Empty | p, 2m |
| LSB hybrid | ![]() | FE2OS(h, l) | FE2OS(k, l) | p, 2m |
| SORT compressed | ![]() | FE2OS(h, l) | Empty | 2m, pm |
| SORT hybrid | ![]() | FE2OS(h, l) | FE2OS(k, l) | 2m, pm |
| lossy compression | 0000 0001 | FE2OS(h, l) | Empty | All |
point at infinity ![]() | 0000 0000 | Empty | Empty | All |
The OS2EC data conversion primitive takes as input an octet string PO, the length l = ⌈log256 q⌉ and the method of compression. If PO contains only one octet and that octet is zero,
is output. Otherwise, the elliptic curve point P = (h, k) is computed as follows. OS2EC decomposes PO = PC ‖ H ‖ K, with PC the first octet and with H an octet string of length l. If PC does not match with the method of compression, OS2EC returns error. Otherwise, it uses OS2FE to compute the field element h. If no or hybrid compression is used, the Y -coordinate k is also computed using OS2FE on K. If (h, k) is not a point on the elliptic curve, error is reported. For the LSB or SORT compression, the Y -coordinate
is computed using h and
. If the hybrid scheme is used and
, OS2EC halts after reporting error. If all computations are successful till now, the point (h, k) is output.
Note that the checks for (h, k) being on the curve or for the equality
are optional and may be omitted. For the lossy compression scheme, the Y -coordinate k is not necessarily uniquely determined from the input octet string PO. In that case, any of the two possibilities is output.
Ring elements are elements of the convolution polynomial ring
and can be identified as polynomials with integer coefficients and of degrees < n. The element
(where
) is represented by the n-tuple of integers (a0, a1, . . . , an–1). The IEEE draft P1363.1 assumes that the coefficients ai are available modulo a positive integer β ≤ 256. But then each ai is an integer in {0, 1, . . . , β – 1} and can be naturally encoded by a single octet. RE2OS, upon receiving a(x) as input, outputs the octet string a0a1 . . . an–1 of length n.
An example: Let n = 7 and β = 128. The ring element a(x) = 2 + 11x + 101x3 + 127x4 + 71x5 = (2, 11, 0, 101, 127, 71, 0) is converted to the octet string 02 0b 00 65 7f 47 00.
Let an octet string a0a1 . . . an–1 of length n be given, which we want to convert to an element of
. Once again a modulus β ≤ 256 is assumed, so that each octet ai can be viewed as an integer reduced modulo β. Making the natural identification of ai with an integer, the polynomial
is output. Thus, for example, the octet string 02 0b 00 65 7f 47 00 gets converted to the ring element 2 + 11x + 101x3 + 127x4 + 71x5.
The RE2BS primitive assumes that the modulus β is a power of 2, that is, β = 2t for some positive integer t ≤ 8. Let a ring element
be given, where each
. One applies the I2BS primitive on each ai to generate the bit string ai,0ai,1 . . . ai,t–1 of length t. The concatenated bit string
a0,0a0,1 . . . a0,t–1 a1,0a1,1 . . . a1,t–1 . . . an–1,0an–1,1 . . . an–1,t–1
of length nt is then returned by RE2BS.
As before, take the example of n = 7, β = 128 = 27 (so that t = 7) and a(x) = 2 + 11x + 101x3 + 127x4 + 71x5 = (2, 11, 0, 101, 127, 71, 0). The coefficients 2, 11, 0, . . . should first be converted to bit strings of length 7 each, that is, 2 gives 0000010, 11 gives 0001011 and so on. Thus, the bit string output by RE2BS will be 0000010 0001011 0000000 1100101 1111111 1000111 0000000. Note that here we have shown the bits in groups of 7 in order to highlight the intermediate steps (the outputs from I2BS). With the otherwise standard grouping in blocks of 8, the output bit string looks like 0 00001000 01011000 00001100 10111111 11100011 10000000 and hence transforms to the octet string 00 08 58 0c bf d3 80 by an invocation of BS2OS. This example illustrates that RE2BS followed by BS2OS does not necessarily give the same output as the direct conversion RE2OS, even when every underlying parameter (like β) remains unchanged.
Once again we require the modulus β to be a power 2t of 2. Let a bit string a0a1 . . . al–1 of length l be given, and we want to compute the ring element a(x) equivalent to this. If l is not an integral multiple of t, the algorithm should quit after reporting error. Otherwise we let l = nt for some
, and repeatedly call the BS2I primitive on the bit strings a0a1 . . . at–1, atat+1 . . . a2t–1, . . . , ant–tant–t+1 . . . ant–1 to get the integers α0, α1, . . . , αn–1 respectively. The polynomial a(x) = α0 + α1x + · · · + αn–1xn–1 is then output.
We urge the reader to verify that BS2RE with β = 128 and the bit string
0000010 0001011 0000000 1100101 1111111 1000111 0000000
as input produces the ring element
.
A binary (ring) element is an element
with each
. One can convert a(x) to an octet string A0A1 . . . Al–1 of any desired length l as follows. We denote the bits in the octet Ai as Ai,7Ai,6 . . . Ai,0. Here, the index of the bits increases from right to left.
First we rewrite the polynomial a(x) as one of degree 8l – 1, that is, as a(x) = a0 + a1x + · · · + a8l–1x8l–1. If n ≤ 8l, this can be done by setting an = an+1 = · · · = a8l–1 = 0. On the other hand, if n > 8l and one or more of the coefficients a8l, a8l+1, . . . , an–1 are non-zero (that is, 1), the above rewriting of a(x) cannot be done and BE2OS terminates after reporting failure.
When the above rewriting of a(x) becomes successful, one sets the bits of the output octets as A0,0 := a0, A0,1 := a1, . . . , A0,7 := a7, A1,0 := a8, A1,1 := a9, . . . , A1,7 := a15, A2,0 := a16, A2,1 := a17, . . . , A2,7 := a23, . . . , Al–1,0 := a8l–8, Al–1,1 := a8l–7, . . . , Al–1,7 := a8l–1.
As an example, take n = 20 and consider the binary element
. First let l = 1. Rewriting a(x) as a polynomial of degree 7 is not possible, since the coefficients of x10 and x12 are 1; so BE2OS outputs error in this case. If l = 2, then the output octet string will be 00000111 00010100, that is, 07 14. For l ≥ 3, the first two octets will be 07 and 14 as before, whereas the 3rd through l-th octet will be 00.
The BE2OS primitive can be quite effective for reducing storage requirements. For example, the polynomial a(x) of degree 12 of the previous paragraph, viewed as an element of
, can be encoded in just two octets. Of course, by specifying l ≥ 3 one may add l – 2 trailing zero octets, if one desires. On the other hand, RE2OS requires exactly 200 octets, whereas RE2BS with β = 128 followed by BS2OS requires exactly ⌈(200 × 7)/8⌉ = 175 octets for storing the same a(x).
Assume that an octet string A0A1 . . . Al–1 of length l is given and the equivalent binary element in
is to be determined. As in the case with BE2OS, we index the bits in the octet Ai as Ai = Ai,7Ai,6 . . . Ai,0. Now, consider the polynomial a(x) = a0 + a1x + a2x2 + · · · + a8l–1x8l–1, where a8i+j = Ai,j. If n ≥ 8l, we set a8l = a8l+1 = · · · = an–1 = 0 and output the binary element
. On the other hand, if n < 8l and an = an+1 = · · · = a8l–1 = 0, then
equals the polynomial a(x) and is returned. Finally, if n < 8l and if any of the coefficients an, an+1, . . . , a8l–1 is non-zero, then OS2BE returns error.[3]
[3] In this case, it still makes full algebraic sense to treat a(x) as an element of R, though not in the canonical representation.
For example, assume that the octet string 07 14 is given as input to OS2BE. If n ≤ 12, the algorithm outputs error, because the polynomial a(x) in this case has degree 12. For any n ≥ 13, the binary element
is returned.
The public-key cryptography standards (PKCS) [254] refer to a set of standard specifications proposed by the RSA Laboratories. A one-line description of each of these documents is given in Table 6.3. In the rest of this section, we concentrate only on the documents PKCS #1 and #3.
| Document | Description |
|---|---|
| PKCS #1 | RSA encryption and signature |
| PKCS #2 | Merged with PKCS #1 |
| PKCS #3 | Diffie–Hellman key exchange |
| PKCS #4 | Merged with PKCS #1 |
| PKCS #5 | Password-based cryptography |
| PKCS #6 | Extension of X.509 public-key certificates |
| PKCS #7 | Syntax of cryptographic messages |
| PKCS #8 | Syntax and encryption of private keys |
| PKCS #9 | Attribute types for use in PKCS #6, #7, #8 and #10 |
| PKCS #10 | Syntax for certification requests |
| PKCS #11 | Cryptoki, an application programming interface (API) |
| PKCS #12 | Syntax of transferring personal information (private keys, certificates and so on) |
| PKCS #13 | Elliptic curve cryptography (under preparation) |
| PKCS #15 | Syntax for cryptographic token (like integrated circuit card) information |
PKCS #1 describes RSA encryption and RSA signatures. In this section, we summarize Version 2.1 (dated 14 June 2002) of the standard. This version specifies cryptographically stronger encoding procedures compared to the older versions. More specifically, the optimal asymmetric encryption procedure (OAEP [18]) for RSA encryption is incorporated in the Version 2.0 of PKCS #1, whereas the new probabilistic signature scheme (PSS [19]) is introduced in Version 2.1. This latest draft also includes encryption and signature schemes compatible with older versions (1.5 and 2.0). However, adoption of the new algorithms is strongly recommended for enhanced security.
PKCS #1 Version 2.1 introduces the concept of multi-prime RSA, in which the RSA modulus n may have more than two prime divisors. For RSA encryption and decryption to work properly, we only need n to be square-free (Exercise 4.1). Using u > 2 prime divisors of n increases efficiency and does not degrade the security of the resulting system much, as long as u is not very large. More specifically, if T is the time for RSA private-key operation without CRT, then the cost of this operation with CRT is approximately T/u2 (neglecting the cost of CRT combination).
So an RSA modulus is of the form n = r1r2 . . . ru with u ≥ 2 and with pairwise distinct primes r1, . . . , ru. For the sake of conformity with the older versions of the standard, the first two primes are given the alternate special names p := r1 and q := r2. PKCS #1 does not mention any specific way of choosing the prime divisors ri of n, but encourages use of primes that make factorization of n difficult.
An RSA public exponent is an integer e, 3 ≤ e ≤ n – 1, with gcd(e, λ(n)) = 1, where λ(n) := lcm(r1 – 1, r2 – 1, . . . , ru – 1). An RSA public key is a pair (n, e) with n and e chosen as above.
The RSA private key corresponding to (n, e) can be stored in one of the two formats. In the first format, one maintains the pair (n, d) with the private exponent d so chosen as to make ed ≡ 1 (mod λ(n)). In the second format, one stores the five quantities (p, q, dP, dQ, qInv) and, if u > 2, the triples (ri, di, ti) for each i = 3, . . . , u. The meanings of these quantities are as follows:
| p | = | r1 |
| q | = | r2 |
| dP | ≡ | e–1 (mod p – 1) |
| dQ | ≡ | e–1 (mod q – 1) |
| qInv | ≡ | q–1 (mod p) |
| di | ≡ | e–1 (mod ri – 1) |
| ti | ≡ | (r1 . . . ri–1)–1 (mod ri) |
For the sake of consistency, one should store the CRT coefficient
(mod r2), that is, p–1 (mod q). In order to ensure compatibility with older versions of PKCS, q–1 (mod p) is stored instead.
The RSA public-key operation is used to encrypt a message or to verify a signature. The PKCS draft calls these primitives RSAEP (encryption primitive) and RSAVP1 (verification primitive). It is implemented in a straightforward manner as in Algorithm 6.1.
|
Input: RSA public key (n, e) and message/signature representative x. Output: The ciphertext/message representative y. Steps: if (x < 0) or (x ≥ n) { Return “Error: representative out of range”. } y := xe (mod n). |
The RSA decryption or signature-generation primitive is called RSADP or RSASP1 and is given in Algorithm 6.2. The operation depends on the format in which the private key K is stored. The correctness of the primitive is left to the reader as an easy exercise.
|
Input: RSA private key K and the ciphertext/message representative y. Output: The message/signature representative x. Steps: if (y < 0) or (y ≥ n) { Return “Error: representative out of range”. } |
The encryption scheme RSAES–OAEP is based on the optimal asymmetric encryption procedure (OAEP) proposed by Bellare and Rogaway [18, 98]. In this procedure, a string of length slightly less than the size of the modulus n is probabilistically encoded using a hash function and the encoded message is subsequently encrypted. The probabilistic encoding makes the encryption procedure semantically secure and (provably) provides resistance against chosen-ciphertext attacks. Under this scheme, an adversary can produce a ciphertext, only if she knows the corresponding plaintext. Such an encryption scheme is called plaintext-aware. Given an ideal hash function, Bellare and Rogaway’s OAEP is plaintext-aware.
RSAES–OAEP uses a label L which is hashed by a hash function H. One may take L as the empty string. Other possibilities are not specified in the PKCS draft. SHA-1 (or SHA-256 or SHA-384 or SHA-512) is the recommended hash function. The hash values (in hex) of the empty string under these hash functions are given in Table 6.4.
| Function | Hash of the empty string |
|---|---|
| SHA-1 | da39a3ee 5e6b4b0d 3255bfef 95601890 afd80709 |
| SHA-256 | e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855 |
| SHA-384 | 38b060a7 51ac9638 4cd9327e b1b1e36a 21fdb711 14be0743 4c0cc7bf 63f6e1da 274edebf e76f65fb d51ad2f1 4898b95b |
| SHA-512 | cf83e135 7eefb8bd f1542850 d66d8007 d620e405 0b5715dc 83f4a921 d36ce9ce 47d0d13c 5d85f2b0 ff8318d2 877eec2f 63b931bd 47417a81 a538327a f927da3e |
The length of the hash output (in octets) is denoted by hLen. For SHA-1, hLen = 20. The RSA modulus n is assumed to be of octet length k. The octet length mLen of the input message M must be ≤ k–2hLen–2. RSAES–OAEP uses a mask-generation function designated as MGF (see Algorithm 6.11 for a recommended realization).
Algorithm 6.3 describes the RSA–OAEP encryption scheme which employs the EME–OAEP encoding scheme described in Algorithm 6.4. The use of a random seed makes the encryption probabilistic. We use the notation ‖ to denote string concatenation and ⊕ to denote bit-wise XOR.
|
Input: The recipient’s public key (n, e), the message M (an octet string of length mLen) and an optional label L whose default value is the empty string. Output: The ciphertext C of octet length k. Steps: /* Check lengths */ if (L is longer than what H can handle) { Return “Error: label too long”. } /* For example, for SHA-1 the input must be of length ≤ 261 – 1 octets. */ if (mLen > k – 2hLen – 2) { Return “Error: message too long”. } /* Encode M to EM (EME–OAEP encoding scheme) */
|
The matching decryption operation is shown in Algorithm 6.5 which calls the EME–OAEP decoding procedure of Algorithm 6.6. The only error message that the decryption and decoding algorithms issue is decryption error. This is to ensure that an adversary cannot distinguish between different kinds of errors, because such an ability of the adversary may lead her to guess partial information about the decryption process and thereby mount a chosen-ciphertext attack.
|
Input: The message M of octet length mLen, the label L. Output: The EME–OAEP encoded message EM. Steps: lHash := H(L). Generate the padding string PS with k – mLen – 2hLen – 2 zero octets. Generate the data block DB := lHash ‖ PS ‖ 01 ‖ M. Let seed := a random string of length hLen octets. Generate the data-block mask dbMask := MGF(seed, k – hLen – 1). Generate the masked data-block maskedDB := DB ⊕ dbMask. Generate mask for seed seedMask := MGF(maskedDB, hLen). Generate the masked seed maskedSeed := seed ⊕ seedMask. Generate the encoded message EM := 00 ‖ maskedSeed ‖ maskedDB. |
|
Input: The recipient’s private key K, the ciphertext C to be decrypted and an optional label L (the default value of which is the null string). Output: The decrypted message M. Steps: if (the length of L is more than the limitation of H) or (the length of C is not k octets)
|
|
Input: The encoded message EM and the label L. Output: The EME–OAEP decoded message M. Steps: lHash := H(L). |
RSASSA–PSS employs the probabilistic signature scheme proposed by Bellare and Rogaway [19]. Under suitable assumptions about the hash function and the mask-generation function, the RSASSA–PSS scheme produces secure signatures which are also tight in the sense that forging RSASSA–PSS signatures is computationally equivalent to inverting RSA.
|
Input: The message M (an octet string) to be signed, the private key K of the signer. Output: The signature S (an octet string of length k). Steps:
|
|
Input: The message M to be encoded (an octet string), the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9. Output: The encoded message EM, an octet string of length emLen := ⌈emBits/8⌉. Steps: if (M is longer than what H can handle) { Return “Error: message too long”. } Generate the hashed message mHash := H(M). if (emLen < hLen + sLen + 2) { Return “Encoding error”. } Let salt := a random string of length sLen octets. Generate the salted message M′ := 00 00 00 00 00 00 00 00 ‖ mHash ‖ salt. Generate the hashed salted message mHash′ := H(M′). Generate the padding string PS with emLen – sLen – hLen – 2 zero octets. Generate the data block DB := PS ‖ 01 ‖ salt. Generate the data block mask dbMask := MGF(mHash′, emLen – hLen – 1). Generate the masted data block maskedDB := DB ⊕ dbMask. Set to 0 the leftmost 8emLen – emBits bits of the leftmost octet of maskedDB. Compute EM := maskedDB ‖ mHash′ ‖ bc. |
RSASSA–PSS signature generation (Algorithm 6.7) uses the EMSA–PSS encoding method (Algorithm 6.8). Verification (Algorithm 6.9) uses the EMSA–PSS decoding method (Algorithm 6.10). We assume that k is the octet length of the RSA modulus n. Let modBits denote the bit length of n. The encoded message is of length emLen = ⌈(modBits – 1)/8⌉ octets. The probabilistic behaviour of the encoding scheme is incorporated by the use of a random salt, the octet length of which is sLen. A hash function H that produces hash values of octet length hLen is employed.
|
Input: The message M, the signature S to be verified and the signer’s public key (n, e). Output: Verification status of the signature. Steps: if (the length of S is not k octets) { Return “Signature not verified”. }
if (status is “consistent”) { Return “Signature verified”. } else { Return “Signature not verified”. } |
|
Input: The message M (an octet string), the encoded message EM (an octet string of length emLen = ⌈emBits/8⌉) and the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9. Output: Decoding status: “consistent” or “inconsistent”. Steps: if (M is longer than what H can handle) { Return “inconsistent”. } |
A mask-generation function (MGF1) is specified in the PKCS #1 draft. It is based on a hash function H. The mask-generation function is deterministic in the sense that its output is completely determined by its input. However, the (provable) security of OAEP and PSS schemes are based on the pseudorandom nature of the output of the mask-generation function. This means that any part of the output should be statistically independent of the other parts. MGF1 derives this pseudorandomness from that of the underlying hash function H.
|
Input: The seed mg f Seed (an octet string) and the desired octet length maskLen of the output mask. One requires maskLen ≤ 232hLen, where hLen is the octet length of the hash function output. Output: An octet string mask of length maskLen. Steps: if (maskLen > 232hLen) { Return “Error: mask too long”. } |
The older encryption scheme RSAES–PKCS1–v1_5 is no longer recommended, since this scheme is not plaintext-aware, that is, with high probability, an adversary can generate ciphertexts without knowing the corresponding plaintexts. This allows the adversary to mount chosen-ciphertext attacks. The new drafts of PKCS #1 include this old scheme for backward compatibility. Encryption and decryption for RSAES–PKCS1–v1_5 are given in Algorithms 6.12 and 6.13. Here, k is the octet length of the modulus.
|
Input: The recipient’s public key (n, e) and the message M (an octet string). Output: The ciphertext C which is an octet string of length k. Steps: if (mLen > k – 11) { Return “Error: message too long”. }
|
|
Input: The recipient’s private key K and the ciphertext C (an octet string). Output: The plaintext message M (an octet string of length ≤ k – 11). Steps: if (the length of the ciphertext is not k octets) { Return “decryption error”. }
Try to decompose EM = 00 ‖ 02 ‖ PS ‖ 00 ‖ M, where PS is an octet string of length ≥ 8 and containing only non-zero octets. if (the above decomposition is unsuccessful) { Return “decryption error”. } |
The older RSA signature scheme RSASSA–PKCS1–v1_5 is not known to have security loopholes. (Nevertheless, the provably secure PSS scheme is recommended for future applications.) RSASSA–PKCS1–v1_5 uses EMSA–PKCS1–v1_5 message encoding procedure (Algorithm 6.16). The signature generation and verification procedures are given in Algorithms 6.14 and 6.15. Here, k denotes the octet length of the modulus n.
The EMSA–PKCS1–v1_5 message encoding procedure (Algorithm 6.16) uses a hash function H. Although a member of the SHA family is recommended for future applications, MD2 and MD5 are also supported for compliance with older application. An octet string hashAlgo is used whose value depends on the underlying hash algorithm and is given in Table 6.5.
| Function | The string hashAlgo |
|---|---|
| MD2 | 30 20 30 0c 06 08 2a 86 48 86 f7 0d 02 02 05 00 04 10 |
| MD5 | 30 20 30 0c 06 08 2a 86 48 86 f7 0d 02 05 05 00 04 10 |
| SHA-1 | 30 21 30 09 06 05 2b 0e 03 02 1a 05 00 04 14 |
| SHA-256 | 30 31 30 0d 06 09 60 86 48 01 65 03 04 02 01 05 00 04 20 |
| SHA-384 | 30 41 30 0d 06 09 60 86 48 01 65 03 04 02 02 05 00 04 30 |
| SHA-512 | 30 51 30 0d 06 09 60 86 48 01 65 03 04 02 03 05 00 04 40 |
|
Input: The signer’s private key K and the message M to be signed (an octet string). Output: The signature S (an octet string of length k). Steps:
|
|
Input: The signer’s public key (n, e), the message M (an octet string) and the signature S to be verified (an octet string of length k). Output: Verification status of the signature. Steps: if (the length of S is not k octets) { Return “Signature not verified”. }
if (EM = EM′) { Return “Signature verified”. } else { Return “Signature not verified”. } |
|
Input: The message M (an octet string), the intended length emLen of the encoded message. One requires emLen ≥ tLen + 11, where tLen is the octet length of hashAlgo plus the octet length of the hash output. Output: The encoded message EM (an octet string of length emLen). Steps: if (M is longer than what H can handle) { Return “Error: message too long”. } |
PKCS #3 describes the Diffie–Hellman key-exchange algorithm. The draft assumes the existence of a central authority which generates the domain parameters that include a prime p of octet length k, an integer g satisfying 0 < g < p and optionally a positive integer l. The integer g need not be a generator of
, but is expected to be of sufficiently large multiplicative order modulo p. The integer l denotes the bit length of the private Diffie–Hellman key of an entity. Values of l ≪ 8k can be chosen for efficiency. However, for maintaining a desired level of security l should not be too small. Since the central authority determines p, g (and l), individual users need not bother about the generation of these parameters.
During a Diffie–Hellman key-exchange interaction of Alice with Bob, Alice performs the steps described in Algorithm 6.17. Bob performs an identical operation which is omitted here.
|
Input: p, g and optionally l. Output: The shared secret SK (an octet string of length k). Steps: Alice generates a random /* If l is specified, one should have 2l–1 ≤ x < 2l. */ Alice computes y := gx (mod p). Alice converts y to an octet string PV := I2OS(y, k). Alice sends the public value PV to Bob. Alice receives Bob’s public value PV′. Alice converts PV′ to the integer y′ := OS2I(PV′). Alice computes z := (y′)x (mod p) (with 0 < z < p). Alice transforms z to the shared secret SK := I2OS(z, k). |
In this chapter, we describe some standards for representation of cryptographic data in various formats and for conversion of data among different formats. We also present some standard encoding and decoding schemes that are applied before encryption and after decryption. These standards promote easy and unambiguous interfaces with the cryptographic primitives described in the previous chapter.
The IEEE P1363 range of standards defines several data types: bit strings, octet strings, integers, prime finite fields, finite fields of characteristic 2, extension fields of odd characteristics, elliptic curves, elliptic curve points and polynomial rings. The IEEE drafts also prescribe standard ways of converting data among these formats. For example, the primitive BS2OS converts a bit string to an octet string, the primitive FE2I converts a finite-field element to an integer.
We subsequently mention some of the public-key cryptography standards (PKCS) propounded by RSA Laboratories. Draft PKCS #1 deals with RSA encryption and signature. In addition to the standard RSA moduli of the form pq, it also suggests possibility of using multi-prime RSA, that is, moduli which are products of more than two (distinct) primes. The draft recommends use of the optimal asymmetric encryption procedure (OAEP). This probabilistic encryption scheme provides provable security against chosen-ciphertext attacks. A probabilistic signature scheme is also advocated for use. These probabilistic schemes call for using a mask-generation function (MGF). A concrete realization of an MGF is also provided. Draft PKCS #3 standardizes the Diffie–Hellman key exchange algorithm.
The P1363 class of preliminary drafts [134] published by IEEE and the PKC standards [254] from RSA Security Inc. are available for free download from Internet sites. However, IEEE’s published standard 1363-2000 is to be purchased against a fee. In addition to the data types and data conversion primitives described in this chapter, the IEEE drafts (P1363, P1363a, P1363.1 and P1363.2) provide encryption/decryption and signature generation/verification primitives and also several encryption and signature schemes based on these primitives. These schemes are very similar to the algorithms that we described in Chapter 5. So we avoided repetition of the same descriptions here. Elaborate encoding procedures are described in the PKCS drafts, but for only RSA-and Diffie–Hellman-based systems. We have reproduced the details in this chapter. The remaining PKCS drafts deal with topics that this book does not directly deal with. A good exception is PKCS #13 that talks about elliptic curve cryptography. This draft is not ready yet; when it is, it may be consulted to learn about the RSA Laboratories’ standards on elliptic-curve cryptography.
At present, the different families of standards do not seem to have mutually conflicting specifications. The IEEE has a (free) mailing list for promoting the development and improvement of the IEEE P1363 standards, via e-mail discussions.
Other Internet Standards include the Federal Information Processing Standards or FIPS [221] from NIST, and RFCs (Request for Comments) from the Internet Engineering Task Force or (IETF) [135].
| 7.1 | Introduction |
| 7.2 | Side Channel Attacks |
| 7.3 | Backdoor Attacks |
| Chapter Summary | |
| Sugestions for Further Reading |
A man cannot be too careful in the choice of his enemies.
—Oscar Wilde (1854–1900), The Picture of Dorian Gray, 1891
If you reveal your secrets to the wind you should not blame the wind for revealing them to the trees.
—Kahlil Gibran (1883–1931)
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.
—Charles Antony Richard Hoare
The security of public-key cryptographic protocols is based on the apparent intractability of solving some computational problems. If one can factor large integers efficiently, one breaks RSA. In that sense, seeking for good algorithms to solve these problems (like factoring integers) is part of cryptanalysis. Proving that no poly-time algorithm can break RSA enhances the status of the security of the protocol from assumed to provable. On the other hand, developing a poly-time algorithm for breaking RSA (or for factoring integers) makes RSA (and many other protocols) unusable. A temporary set-back to our existing cryptographic tools as it is, it enriches our understanding of the computational problems. In short, breaking the trapdoors of public-key cryptosystems is of both theoretical and practical significance.
But research along these mathematical lines is open-ended. A desperate cryptanalyst may not wait indefinitely for a theoretical negotiation. She tries to find loopholes in the systems, that she can effectively exploit to gain secret information.
A cryptographic protocol must be implemented (in software or hardware) before it can be used. Careless implementations often supply the loopholes that cryptanalysts wait for. For example, a software implementation of a public-key system may allow the private key to be read only from a secure device (a removable medium, like CDROM), but may make copies of the key in the memory of the machine where the decryption routine is executed. If the decryption routine does not lock and eventually flush the memory holding the key, a second user having access to the machine can simply read off the secrets.
Software and hardware implementations often tend to leak out secrets at a level much more subtle than the example just mentioned. A public-key algorithm is a known algorithm and involves a sequence of well-defined steps dictated by the private key. Each step requires its private share of execution time and power consumption. Watching the decrypting device carefully during a private-key operation may reveal information about the exact sequence of basic steps in the algorithm. Random hardware faults during a private-key operation may also compromise security. Such attacks are commonly dubbed as side-channel attacks.
Let us now look at another line of attack. Every user of cryptography is not expected to implement all the routines she uses. On the contrary, most users run precompiled programs available from third parties. How will a user assess the soundness of the products she is using, that is, who will guarantee that there are no (intentional or unintentional) security snags in the products? The key generation software available from a malicious software designer may initiate a clandestine e-mail every time a key pair is generated. It is also possible that a private key supplied by such a program is generated from a small predefined set known to the designer. Even when private keys look random, they need not come with the desired unpredictability necessary for cryptographic usage. Such attacks during key generation are called backdoor attacks.
In short, public-key cryptanalysis at present encompasses trapdoors, backdoors and side channels. The trapdoor methods have already been discussed in Chapter 4. In this chapter, we concentrate on the other attacks on public-key systems.
Side-channel attacks refer to a class of cryptanalytic tools for determining a private key by measuring signals (like timing, power fluctuation, electromagnetic radiation) from or by inducing faults in the device performing operations involving the private key. In this section, we describe three methods of side-channel cryptanalysis: timing attack, power attack and fault attack.
Paul C. Kocher introduced the concept of side-channel cryptanalysis in his seminal paper [155] on timing attacks. Though not unreasonable, timing attacks are somewhat difficult to mount in practice.
The private-key operation in many cryptographic systems (like RSA or discrete-log-based systems) is usually a modular exponentiation of the form
y := xd (mod n),
where d is the private key. The private-key procedure may involve other overheads (like message decoding), but the running time of the routine is usually dominated by and so can be approximated by the time of the modular exponentiation.
Assume that this exponentiation is carried out by a square-and-multiply algorithm known to Carol, the attacker. For example, suppose that Algorithm 3.9 is used. Each iteration of the for loop involves a modular squaring followed conditionally by a modular multiplication. The multiplication is done in an iteration if and only if the corresponding bit ei in the exponent is 1. Thus, an iteration runs slower if ei = 1 than if ei = 0. If Carol could measure the timing of each individual iteration of the for loop, she would correctly guess most (if not all) of the bits in the exponent. But it is unreasonable to assume that an attacker can collect such detailed timing data. Moreover, if Algorithm 3.10 is used, these detailed data do not help much, because in this case the timing of an individual iteration of the for loop can at best differentiate between the two cases ei = 0 and ei ≠ 0. There are 2t – 1 non-zero values for each ei.
However, it is not difficult to think of a situation where the attacker can measure, to a reasonable accuracy, the total time of the exponentiation. In order to guess d, Carol requires the times of the modular exponentiations for several different values of x, say x1, . . . , xk, all known to her. (Note that xi may be messages to be signed or intercepted ciphertexts.) The same exponent d is used for all these exponentiations. Let Ti be the time for computing
(mod n), as measured by Carol. We may assume that all these k exponentiations are carried out on the same machine using the same routine.
Kocher considers the attack on the exponentiation routine of RSAREF, a cryptography toolkit available from the RSA Laboratories. This routine implements Algorithm 3.10 with t = 2. For the sake of convenience, the algorithm is reproduced below. We may assume that the exponent has an even number of bits—if not, pad a leading zero.
|
Input: Output: y := xd (mod n). Steps: (1) z1 := x. |
Every step of the above algorithm runs in a time dependent on the operands. For example, the modular multiplication in Step (9) takes time dependent on the operands y and z(d2j+1d2j)2. The variation in the timing depends on the implementation of the modular arithmetic routines and also on the machine’s architecture. However, we make the assumption that for fixed operands each step requires a constant time on a given machine (or on identical machines). This is actually a loss of generality, since the running time of a complex step (like modular multiplication or squaring) for fixed operands may vary for various reasons like process scheduling, availability of cache, page faults and so on. It may be difficult, perhaps impossible, for an attacker to arrange for herself a verbatim emulation of the victim’s machine at the time when the latter performed the private-key operations. Let us still proceed with our assumption, say by conceiving of a not-so-unreasonable situation where the effects of these other factors are not sizable enough.
We use the subscript i to denote the i-th private-key operation for 1 ≤ i ≤ k. The entire routine takes time Ti for the i-th exponentiation, that is, for the input xi. This measurement may involve some (unknown) error which we denote by ei. The first four steps are executed only once during each call and take a total time of pi (precomputation time). The for loop is executed l times. We ignore the time needed to maintain the loop (like decrementing j) and also the time taken by the if statement in Step (8). Let si,j and ti,j be the times taken respectively by Steps (6) and (7), when the loop variable (j) assumes the value j. If Step (9) is executed, we denote by mi,j the time taken by this step, else we set mi,j := 0. It follows that
Equation 7.1

where the index in the sum decreases from l – 1 to 0 in steps of 1. Carol does not know this break-up (that is, the explicit values of ei, si,j, ti,j and mi,j), but she can make an inductive guess in the following way.
Carol manages a machine and a copy of the exponentiation software both identical to those of the victim. She then successively guesses the secret bit pairs d2l–1d2l–2, d2l–3d2l–4, d2l–5d2l–6 and so on. Assume that at some stage Carol has correctly determined the exponent bits d2j+1d2j for j = l–1, l–2, . . . , j′+1. Initially j′ = l–1. Using this information Carol computes d2j′ +1d2j′ as follows. Carol’s knowledge at this stage allows her to measure pi and si,j, ti,j, mi,j for j = l – 1, . . . , j′ + 1 — she simply runs Algorithm 7.1 on xi. Carol then enters the loop with j = j′. The squaring operations are unconditional. Carol has the exact operands as the victim for the squaring steps. So Carol also measures si,j′ and ti,j′.
The bit pair d2j′+1d2j′ (considered as a binary integer) can take any one of the four values g = 0, 1, 2, 3. Carol measures the time
of Step (9) for each of the four choices of g and adds this time to the time taken by the algorithm so far, in order to obtain:
Equation 7.2

Kocher observed that the distribution of Ti, i = 1, . . . , k, is statistically related to that of
only for the correct guess g. In order to see how, we subtract Equation (7.2) from Equation (7.1) to get:
Equation 7.3

Let us assume that the error term ei is distributed like a random variable E. Similarly suppose that each multiplication (resp. squaring) has the distribution of a random variable M (resp. S). Taking the variance of Equation (7.3) over the values i = 1, 2, . . . , k and assuming that the sample size k is so large that the sample variances are very close to the variances of the respective random variables, we obtain:
Equation 7.4

where λ denotes the number of times Step (9) is executed for j = j′ – 1, . . . , 0. Note that λ is dependent on the private key and not on the arguments to the exponentiation routine. For the correct guess g, we have
and so

On the other hand, for an incorrect guess g we have:

if one of mi,j′ or
is zero, or

if both mi,j′ and
are non-zero. (Recall that Var(αX + βY) = α2 Var(X) + β2 Var(Y) for any real α, β.)
Calculation of the sample variances of
for the four choices of g gives Carol a handle to determine (or guess) the correct choice. Carol simply takes the g for which the variance is minimum. This is the fundamental observation that makes the timing attack work.
Of course, statistical irregularities exist in practice, and the approximation of the actual variances by the sample variances introduces errors in Equation (7.4). These errors are of particular concern for large values of j′, that is, during the beginning of the attack. However, if an incorrect guess is made at a certain stage, this is detected soon with high probability, as Carol proceeds further. Suppose that an erroneous guess of d2j″ + 1d2j″ has been made for some j″ > j′. This means that the values of y are different from the actual values starting from the iteration of the loop with j = j″ – 1. (We may assume that most, if not all, xi ≠ 1.) We then do not have a cancellation of the timings for j = j″ – 1, . . . , j′. More correctly, if the guesses for j = l – 1, . . . , j″ + 1 are correct and the first error occurs at j = j″, then denoting the subsequent timings by
one gets
Equation 7.5

Since each of the square and multiplication operations takes y as an operand, the original timings and the measured timings (the ones with hat) behave like independent variables and, therefore, taking the variance of Equation (7.5) yields

for some λ′ depending on the private key and on the previous guesses, but independent of the current guess g. In other words, Carol loses a meaningful relation of Var
with the correctness of the current guess. Once Carol notices this, she backtracks and changes older guesses until the expected behaviour is restored. Thus, the timing attack comes with an error detection and correction strategy.
An analysis done by Kocher (neglecting E and assuming normal distributions for S and M) shows that Carol needs k = O(l) for a good probability of success.
There are several ways in which timing attacks can be prevented.
If every multiplication step takes exactly the same time and so does every squaring step, the above timing attack does not work. Thus, forcing each multiplication and each squaring take the same respective times independent of their operands disallows Carol to mount the timing attack. Making mi,j constant alone does not suffice, for difference in square timings can be exploited in subsequent iterations to correct a guess. Forcing every operation take exactly the same time as the slowest possibility makes the implementation run slower. Moreover, finding the slowest possibility may be difficult.
Interleaving random delays also makes timing attacks difficult to mount, because the attacker then requires more number of samples in order to smooth out the effect of the delays. But again adding delays harms performance and does not completely rule out the possibility of timing attacks.
Perhaps the best strategy to thwart timing attacks is to use a random pair (u, v) with v := u–d (mod n) for each private-key operation. Initially x is multiplied by u and then the product ux is exponentiated to get udxd ≡ v–1y (mod n). Multiplication by v then yields the desired y. A new random pair (u, v) must be used for every exponentiation. However, the exponentiation v := u–d (mod n) is too costly to be performed during every private-key operation and may itself invite timing attacks. A good trade-off is to choose (u, v) once, keep it secret and for the next private-key operation update (and replace) the old (u, v) by (u′, v′) with u′ ≡ ue (mod n) and v′ ≡ ve (mod n) for some small e (random or deterministic). The choice e = 2 is quite satisfactory in practice—performing two modular squares is much cheaper than computing the full exponentiation v := u–d (mod n).
In connection with timing attacks, we mentioned that if an adversary were able to measure the timing of each iteration of the square-and-multiply loop during an RSA (or discrete-log-based) private-key exponentiation, she could guess the bits in the key quite efficiently from only a few timing measurements. But it is questionable if such detailed timing data can be made available.
Now, think of a situation where Carol can measure patterns of power consumption made by the decrypting (or signing) device during one or more private-key operations with Alice’s private key. If Alice carries out the private-key operations in her personal workstation, it is difficult for Carol to conduct such measurements. So assume that Alice is using a smart card with a device to which Carol has a control. Carol inserts a small resistor in series with the line which drives Alice’s smart card. The power consumed by the smart-card circuit is roughly proportional to the current through the resistor. Measuring the voltage across the resistor (and multiplying by a suitable factor) Carol can observe the power consumed by Alice’s decryption device. Carol has to use a power measuring device that takes readings at a high frequency (100 MHz to several GHz depending on the budget of Carol). A set of power measurements obtained during a cryptographic operation is called a power trace. We now study how power traces can reveal Alice’s secrets.
The individual steps in a private-key operation may be nakedly exposed in a power trace. This is, in particular, the case when different steps consume different amounts of power and/or take different times. Obtaining information about the operation of the decrypting device and/or the secrets by a direct interpretation of power traces is referred to as simple power analysis or SPA in short.
As an example of SPA, consider an implementation of RSA exponentiation using the naive square-and-multiply Algorithm 3.9. Here, the most power-consuming operations are modular squaring and modular multiplication. Modular multiplication typically runs slower than modular squaring. Also modular multiplication requires two different operands to fetch from the memory, whereas modular squaring requires only one operand. Thus, a multiplication operation has more and longer power requirements than a squaring operation.
A hypothetical[1] SPA trace during a portion of an RSA private-key operation is shown in Figure 7.1. Each spike in the trace corresponds to either a square or a multiplication operation. Let us assume that the power consumption is measured with sufficient resolution, so that no spike is missed. Since multiplication runs longer (and requires more operands) than squaring, multiplication spikes are wider than squaring spikes.
[1] SPA traces from real-life experiments on smart cards, as reported in several references, look similar to this. We, however, generated the trace using a random number generator. Absolute conformity to reality is not always crucial for the purposes of illustration.

Let us denote a squaring operation by S and a multiplication operation by M. We observe that Alice’s smart card performs the sequence
SMSMSSMSSSSMSSSMSS
of operations during the measurement interval shown. Since multiplication in an iteration of the loop is skipped if and only if the corresponding bit in the exponent is zero, we can group the operations as
(SM)(SM)(S)(SM)(S)(S)(S)(SM)(S)(S)(SM)(S)(S.
This, in turn, reveals the bit string 110100010010 in Alice’s private key.
Effective as it appears, SPA, in practice, does not pose a huge threat to the security of conventional cryptographic systems. Using algorithms for which power traces do not bear direct relationships with the bits of the private key largely reduces risks of fruitful SPA. The inefficient repeated square-and-multiply Algorithm 7.2 always performs a multiplication after squaring and thereby eliminates chances of a successful SPA.
|
Input: Output: y := xd (mod n). Steps: y := 1. |
Using the (more efficient) Algorithm 7.1 also frustrates SPA. Some chunks of two successive 0 bits are anyway revealed by power traces collected during the execution of this algorithm. But, for a decently large and random private key, this still leaves Carol with many unknown bits to be guessed. Note, however, that neither of the three remedies suggested to thwart the timing attack on Algorithm 7.1 seems to be effective in the context of SPA. Delays normally do not consume much power (unless some power-intensive dummy computations fill up the delays). Also, the masking of (x, y) by (u, v) fails to produce any alteration in the power consumption pattern during exponentiation.
If some private-key algorithm has unavoidable branchings due to individual bits in the private key, SPA can prove to be a notorious botheration.
A carefully designed algorithm (like Algorithm 7.2) does not reveal key information from a simple observation of power traces. Moreover, the observed power traces may be corrupted by noise to an extent where SPA is not feasible. In such cases, differential power analysis (DPA) often helps the cryptanalyst reduce the effects of noise and exploit subtle correlation of power consumption patterns with specific bits in the operands. DPA requires availability of power traces from several private-key operations with the same key.
Consider the SPA-resistant Algorithm 7.2. Suppose that k power traces P1(t), . . . , Pk(t) for the computations of
(mod n), i = 1, . . . , k, are available to Carol, that the ciphertexts x1, . . . , xk are known to Carol and that d = (dl–1 · · · d1d0)2. Carol successively guesses the bits dl–1, dl–2, dl–3, . . . of the exponent. Suppose that Carol has correctly guessed dj for j = l – 1, . . . , j′ + 1. She now uses DPA to guess dj′.
Let e := (dl–1dl–2 · · · dj′ + 1)2. At the beginning of the for loop with j = j′ the variable y holds the value xe modulo n. The loop computes x2e and x2e+1 and assigns y the appropriate value. If dj′ = 0, then in the next iteration the loop computes x4e and x4e+1, whereas if dj′ = 1, then in the next iteration the loop computes x4e+2 and x4e+3. It follows that the algorithm handles the value x4e if and only if dj′ = 0.
For each i = 1, . . . , k, Carol computes
(mod n). Carol then chooses a particular bit position (say, the least significant bit) and considers the bit bi of zi at this position. We make the assumption that there is some subsequent step (or substep) in the implementation for which the average power consumption Π0 for b = 0 is different from the average power consumption Π1 for b = 1.[2]
[2] The exact step which exhibits differential bias toward an individual bit value is dependent on the implementation. If the implementation does not provide such a step, the attack cannot be mounted in this way. Initially, the DPA was proposed for DES, a symmetric encryption algorithm, in which such a dependence is clearly available. With asymmetric-key encryption, such a strong dependence of the power, consumed by a step, on an individual bit value is not obvious. One may, however, use other dividing criteria, like low versus high Hamming weight (that is, number of one-bits) in the operand, which bear more direct relationships with power consumption.
Carol partitions {1, . . . , k} into two subsets:
| I0 | := | {i | bi = 0}, |
| I1 | := | {i | bi = 1}. |
Carol computes the average power traces
and
and subsequently the differential power trace

First, let dj′ = 0. In this case, the routine handles
and so the power consumption at some time τ is correlated to the bit bi of
. At any other instant, the power consumption is uncorrelated to this particular bit value. Therefore, if the sample size is sufficiently large and if the measurement noise has mean at zero, we have:

On the other hand, if dj′ = 1, the value
never appears in the execution of the algorithm and so at every time t the power consumption is uncorrelated to the particular bit of
and so we expect
| Δ(t) ≈ 0 | for all t. |
Figure 7.2 illustrates the two cases.[3] If the differential power trace has a distinct spike, the guess dj′ = 0 is correct. So by observing the existence or otherwise of a spike, Carol determines whether dj′ = 0 or dj′ = 1.
[3] Once again, these are hypothetical traces obtained by random number generators.
(a) for the correct guess
(b) for an incorrect guess

The number k of samples required for a good probability of success depends on the bias Π1–Π0 relative to the measurement noise. We assume that
. If the noise has a variance of σ2, then by the central limit theorem the noise in each average power trace
or
has at each t an approximate variance 2σ2/k, and so in the differential power trace Δ(t) the noise has an approximate variance 4σ2/k. In order that the bias Π1 –Π0 stands out against the noise, we require
, say,
, that is, k ≥ 64σ2/(Π1 – Π0)2.
Several countermeasures can be adopted to prevent DPA, both in the software level and in the hardware level.
Interleaving random delays between instructions destroys the alignment of the time τ in different power traces. Using a clock with randomly varying tick-rate has a similar effect. The delays should be such that they cannot be easily analyzed and subsequently removed. Random delays increase the number of samples required for a successful DPA to an infeasible value.
Suitable implementations of the power-critical steps destroy the power consumption signature of these steps. For example, one may go for an implementation that exhibits a constant power consumption pattern irrespective of the operands. Another possibility is replacement of complex critical instructions by atomic instructions (like assembly instructions) for which the dependence of power consumption on operands is less or difficult to analyze. However, the assumption that one can measure power at any resolution (perhaps at infinite resolution, say, using an analog device) indicates that this countermeasure challenges only the attacker’s budget.
Masking (x, y) by multiplying with (u, v) (as we did to prevent timing attacks) also eliminates chances of mounting successful DPA. One has to use a fresh mask for each private-key operation. Random unknown masks destroy the correlation of the bit values bi with power consumption. That is, the chosen bit bi of
behaves randomly in relation to the same bit of (uixi)4e and so the differential power trace no longer leaks the bias Π1 – Π0.
Another strategy to foil DPA is to use randomization in the private exponent d. Instead of computing y := xd (mod n), one chooses a small random integer r (typically of bit size ≤ 20) and computes y := xd+rh (mod n), where h is φ(n) for RSA or the order of the discrete-log (sub)group. Since d = O(h) typically, the performance of the exponentiation routine does not deteriorate much. But random values of r during different private-key operations change the exponent bits in an unpredictable manner.
Quick changes in the exponent (the private key, that is, the key pair) also prevent the attacker to gather sufficiently many power traces for mounting a successful DPA. A key-use counter can be employed for this purpose. Whenever a given private key has been used on a small predetermined number of occasions, the key pair is updated.
Hardware shielding of the decrypting device also reduces DPA possibilities. For example, in-chip buffers between the external power source and the chip processor have been proposed to mask off the variation of internal power from external measurements. Such hardware countermeasures are, in general, somewhat costlier than software countermeasures.
Paul Kocher asserts: DPA highlights the need for people who design algorithms, protocols, software, and hardware to work closely together when producing security products.
We finally come to the third genre of side-channel cryptanalysis. We investigate how hardware faults occurring during private-key operations can reveal the secret to an adversary. There are situations where a single fault suffices. Boneh et al. [30] classify hardware faults into three broad categories.
Transient faults These are faults caused by random (unpredictable) hardware malfunctioning. These may be the outcomes of occasional flips of bit values in registers or of temporary erroneous outputs from logic or arithmetic circuits in the processor. These faults are called transient, because they are not repeated. It is rather difficult to detect such (silent) faults.
Latent faults These are faults generated by some permanent malfunctioning and/or bugs inherent in the processor. For example, the floating-point bug in the early releases of the Pentium processor may lead to latent faults. Latent faults are permanent, that is, repeated, but may be difficult to locate in practice.
Induced faults An induced fault is deliberately caused by an adversary. For example, a short surge of electromagnetic radiation may cause a smart card to malfunction temporarily. A malicious adversary can induce such temporary hardware faults to extract secret information from the smart card. It is, however, difficult to induce deliberate faults in a remote workstation.
Although induced faults appear to be the ones to guard against most seriously, the other two types of faults are also of relevance. Consider a certifying authority signing many messages. Transient and/or unknown latent faults may reveal the authority’s private key to a user who can later utilize this knowledge to produce false certificates.
Consider the implementation of RSA private-key operation based on the CRT combination of the values obtained by exponentiation modulo the prime divisors p and q of the modulus n (Algorithm 5.4). Suppose that m is a message to be signed and s := md (mod n) the corresponding signature, where d is the signer’s private key. The CRT-based implementation computes s1 := s (mod p) and s2 := s (mod q). Assume that due to hardware fault(s) exactly one of s1 and s2 is wrongly computed. Say, s1 is incorrectly computed as
. The corresponding faulty signature is denoted by
. We assume that the CRT combination of
and s2 is correctly computed.
An adversary requires the faulty signature
and the correct signature s on the same message m in order to obtain the factor q of n. To see how, note that
(mod p), s ≡ s1 (mod p) and
(mod p), so that
(mod p), that is,
. On the other hand,
(mod q), that is,
. Therefore,

This is how the fault analysis of Boneh et al. [30] works.
Arjen K. Lenstra et al. [142] point out that the knowledge of the faulty signature
alone reveals the secret divisor q, that is, one does not require the genuine signature s on m. The verification key e of the signer is publicly known. Since RSA exponentiation is bijective,
(mod n). However,
(mod q), and so
(mod p). It follows that

Now, consider an implementation of RSA decryption based on a single exponentiation modulo n. For such an implementation, several models of fault attacks have been proposed. These attacks are less practical than the attack on CRT-based RSA just mentioned, because now one requires several faulty signatures in order to deduce the entire private key. Here, we present an attack due to Bao et al. [17].
As usual, the RSA modulus is n = pq and the signer’s key pair is (e, d). Consider a valid signature s on a message m. Let d = (dl–1 · · · d1d0)2 be the binary representation of the private key. Consider the powers:
| si ≡ m2i (mod n) | for i = 0, 1, . . . , l – 1. |
The signature s can be written as:

We assume that the attacker knows m and s and hence can compute si and
modulo n for i = 0, . . . , l – 1. There is no harm in assuming that the message m is randomly chosen. (We may assume that randomly chosen integers are invertible modulo n, because encountering a non-invertible non-zero integer by chance is a stroke of unimaginable good luck and is tantamount to knowing the factors of n.)
In order to guess a bit of d, the attacker induces a fault in exactly one of the bits dj, changing it from dj to
. The position j is random, that is, not under the control of the attacker. Now, the algorithm outputs the faulty signature

and so

A repetition in the values sl–1, . . . , s0,
, . . . ,
modulo n is again an incident of minuscule probability. Hence the attacker can uniquely identify the bit position j and the bit value dj in d by comparing
with these 2l values.
Statistical analysis implies that the attacker needs to repeat this procedure about l log l times (on same or different (m, s) pairs) in order to ensure that the probability of identifying all the bits of d is at least 1/2.
Recall from Algorithm 5.34 that the Rabin signature algorithm uses CRT to combine s1 (mod p) and s2 (mod q). Thus, the attack on CRT-based RSA, described earlier, is applicable mutatis mutandis to the Rabin signature scheme. The computation of the square roots s1 and s2 demands the major portion of the running time of the routine. Inducing a fault during the execution is, therefore, expected to affect exactly one of s1 and s2, as desired by the attacker.
Bao et al. [17] propose a fault attack on the digital signature algorithm (DSA). We work with the notations of Algorithm 5.43 and Algorithm 5.44, except that, for maintaining uniformity in this section, we use m (instead of M) to denote the message to be signed. The (public) parameters are p, a prime divisor r of p – 1 of length 160 bits and an element
of multiplicative order r. The signer’s DSA key pair is (d, gd(mod p)) with 1 < d < r.
Suppose that during the generation of a DSA signature, an attacker induces a fault in exactly one bit position of d changing it to
. The routine generates the faulty signature
, where

(d′, gd′) being the session key pair (not mutilated). As in the DSA signature-verification scheme, the attacker computes the following:

For each i = 0, . . . , l – 1 (where the bit length of d is l), the attacker also computes

Assume that the j-th bit dj of d is altered. If dj = 0,
and so

On the other hand, if dj = 1, then
and a similar calculation shows that

Thus, the attacker computes
and
for all j = 0, . . . , l – 1 and notices a unique match (with s). This discloses the position j and the corresponding bit dj.
A fault attack similar to that on the DSA scheme can be mounted on the ElGamal signature scheme. We here propose an alternative method proposed by Zheng and Matsumoto [315]. The novelty in their approach is that it performs the cryptanalysis of the ElGamal signature scheme by inducing fault on the pseudorandom bit generator of the signer’s smart card.
Algorithms 5.36 and 5.37 describe the ElGamal signature scheme on a general cyclic group G. Here, we restrict our attention to the specific group
(though the following exposition works perfectly well for a general G). The parameters are a prime modulus p and a generator g of
. The signer’s key-pair is (d, gd(mod p)) for some d, 2 ≤ d ≤ p – 2.
In order to generate a signature (s, t) on a message m, a random session key d′ is generated and subsequently the following computations are carried out:
| s | ≡ | gd′ (mod p), |
| t | ≡ | d′–1(H(m) – dH(s)) (mod p – 1). |
Zheng and Matsumoto attack the generation of the session key d′. They propose the possibility that an abnormal physical stress (like low voltage) forces a constant output d0 for d′ from the pseudorandombit generator (software or hardware) in the smart card. First, assume that this particular value d0 is known a priori to the attacker. She then lets a message m generate a signature (s, t) with the session secret d0. The private key d is then immediately available from the equation:
d ≡ H(s)–1(H(m) – d0t) (mod p – 1).
Here, we assume that H(s) is invertible modulo p – 1.
If d0 is not known a priori, the attacker generates two signatures (s1, t1) and (s2, t2) on messages m1 and m2 respectively. Since d′ is always d0, we have s1 = s2 = s0, say. One can then easily calculate
d0 ≡ (t1 – t2)–1(H(m1) – H(m2)) (mod p – 1),
which, in turn, yields
d ≡ H(s0)–1(H(m1) – d0t1) (mod p – 1).
Let us conclude our repertoire of fault attack examples by explaining an attack on the FFS zero-knowledge identification protocol. This attack is again from Boneh et al. [30].
We use the notations of Algorithm 5.69. A modulus n = pq, p,
, is first chosen (by Alice or by a trusted third party). Alice selects random x1, . . . ,
and random bits δ1, . . . , δt, computes
(mod n), publishes (y1, . . . , yt) and keeps (x1, . . . , xt) secret.
During an identification session with Bob, Alice generates a random commitment
and sends to Bob the witness w := c2 (mod n). (For simplicity, we take γ of Algorithm 5.69 to be 0.) When Alice is waiting for a challenge from Bob, a fault occurs in her smart card changing the commitment c to c + E. Assume that the fault is at exactly one bit position, that is, E = ±2j for some
, l being the bit length of c (or of n). This fault may be purposely induced by Bob with the malicious intention of guessing Alice’s secret (x1, . . . , xt).
Bob then generates a random challenge
as usual. Upon reception of this challenge Alice computes and sends to Bob the faulty response

The knowledge of
now aids Bob to obtain the product
as follows. First, note that

so that
![]() | for some . |
There are only 4l possible values of (E, δ). Bob tries all these possibilities one by one. To simplify matters we assume that only one value of (E, δ) with E of the special form ±2j and with
satisfies the last congruence. In practice, the existence of two (or more) solutions for (E, δ) is an extremely improbable phenomenon. For a guess of (E, δ), the commitment c can be computed as

The correctness of the guess (E, δ) can be verified from the relation w ≡ c2 (mod n). Bob can now compute the desired product

In order to strengthen the confidence about the correctness of T, Bob may repeat the protocol once more with the same values of ∊1, . . . , ∊t, but under normal conditions (that is, without faults). This time he obtains w′ ≡ (c′)2 (mod n) and r′ ≡ c′T (mod n), which together give (r′)2 ≡ w′T2 (mod n), a relation that proves the correctness of T.
Bob repeats the above procedure t times in order to generate the system:
Equation 7.6

Here, ∊ki and Tk are known to Bob. Moreover, the exponents ∊ki can be so selected that the matrix (∊ki) is invertible modulo 2. In order to determine x1, Bob tries to find
satisfying

for some integers v1, . . . , vt. Comparing the coefficients gives the linear system

which can be solved for u1, . . . , ut, since the matrix (∊ki) is invertible modulo 2. The solution gives v1, . . . , vt and hence

Similarly, x2, . . . , xt can be determined up to sign. Plugging in these values of xi in System (7.6) and solving another linear system modulo 2 gives the exact signs of all xi.
Notice that Bob could have selected ∊ki = δki (where δ is the Dirac delta). For this choice, System (7.6) immediately gives x1, . . . , xt. But, in practice, Alice may disagree to respond to such simplistic challenges. Moreover, Bob must not raise any suspicion about a possible malpractice. For a general choice, all Bob has to do additionally is a little amount of simple linear algebra. The parameter t is rather small (typically less than 20); so this extra effort is of little concern to Bob.
Fault analysis could be a serious threat, especially to smart-card users and certification authorities. We mention here some precautions to guard against such attacks. Some of these work for a general kind of fault attack, the others are specific to the algorithms they plan to protect.
One obvious general strategy is to perform the private-key operation twice and compare the results from the two executions. If the two results disagree, a fault must have taken place. It is then necessary to restart the computation from the beginning. This strategy slows down the implementation by a factor of two. Moreover, latent (permanent) faults cannot be detected by this method—the same error creeps in during every run.
It is sometimes easier to verify the correctness of the output by performing the reverse operation. For instance, after an RSA signature s ≡ md (mod n) is generated, one can check whether m ≡ se (mod n). If so, one can be reasonably confident about the correctness of s. If the RSA encryption exponent is small (like 3 or 257), this verification is quite efficient.
Ad hoc algorithm-specific tricks often offer effective and efficient checks for errors. Shamir [268] proposes the following check for CRT-based RSA signature generation. One chooses a small random prime r (say, of length ~ 32 bits) and computes s1 ≡ md (mod pr) and s2 ≡ md (mod qr). If s1 ≢ s2 (mod r), then one or both of the exponentiations went wrongly. If, on the other hand, s1 ≡ s2 (mod r), then s1 (mod p) and s2 (mod q) are combined by the CRT.
Maintaining extraneous error-checking data can guard against random bit flips. Parity check bits can detect the existence of single bit flips. Retaining a verbatim copy of a secret information d and comparison of the two copies at strategic instants can help detect more serious faults. It appears unlikely that both the copies can be affected by faults in exactly the same way. For discrete-log-based systems, maintaining d–1 in tandem with d appears to be a sound approach. Since the bits of d–1 are not in direct relationship with those of d, an attack on d cannot easily produce the relevant changes in d–1. As an example, consider the attack on DSA effected by toggling a bit of the secret key d. The second part of the signature can be generated in two ways: by computing t1 ≡ d′–1(H(m) + ds) (mod r) using d, and by computing t2 ≡ d′–1(d–1)–1(d–1H(m) + s) (mod r) using d–1. If t1 ≡ t2 (mod r), we can be pretty confident that this common value is the correct signature.
Appending random strings to the messages being signed also prevents timing attacks. Such random strings are not known to the adversary and cannot be easily recovered by the verification routine on a faulty signature. Also in this case the signer signs different strings on different occasions, even when the message remains the same.
Hardware countermeasures can also be adopted. Adequately shielded cards resist induced faults. In a situation described by Zheng and Matsumoto, the card should refuse to work instead of generating constant random bits. In the scenario of fault analysis, it, however, appears that robustness can be implanted easily at the software level. At any rate, sloppy hardware designs are never advocated.
| 7.1 | Consider the notations of Section 7.2.1. Assume that mi,j is constant for all i, j (and irrespective of d2j+1d2j), but the square times si,j and ti,j vary according to their operands. Device a timing attack on such a system. |
| 7.2 | Show that under reasonable assumptions the SPA-resistant Algorithm 7.2 can be crypt-analyzed by timing attacks. |
| 7.3 | Recall that SPA of Algorithm 7.1 may leak partial information on the private key (some 00 sequences in the key). Rewrite the algorithm to prevent this leakage. |
| 7.4 | Assume that in Bao et al.’s attack on RSA described in the text, the attacker can induce faults in exactly two bit positions of d. Suggest how the two bits of d at these positions can be revealed from the resulting faulty signature. |
| 7.5 | Consider a variant of the Bao et al.’s attack on RSA described in the text, in which the valid signature s on m is unknown to the attacker. Explain how the position j of the erroneous bit and the bit dj at this position can still be identified. [H] |
| 7.6 | Bao et al. [17] propose an alternate fault analysis on RSA with square-and-multiply exponentiation. Use the notations (n, e, d, m, s, si) as in the text. Assume that the attacker knows an (m, s) pair and can induce a fault in exactly one of the values sj (and nowhere else) and generate the corresponding faulty signature. Suggest a strategy how the position j and the bit dj can be recovered in this case. |
| 7.7 | Propose a fault attack on the ElGamal signature scheme (Algorithms 5.36 and 5.37), similar to the attack on DSA described in the text. |
Backdoor attacks on a public-key cryptosystem refer to attacks embedded in the key generation procedure (hardware or software) by the designer of the procedure. A contaminated cryptosystem is one in which the key generation procedure comes with hidden backdoors. A good backdoor attack should meet the following criteria:
To a user, keys generated by the contaminated system should be indistinguishable from those generated by an honest version of the cryptosystem. For example, the parameters and keys must look sufficiently random.
Keys generated by the contaminated system should satisfy the input/output requirements of an honest system. For example, for the RSA cryptosystem the user should be allowed to opt for small public exponents.
A contaminated key generation procedure should not run (on an average) much slower than the honest procedure.
The designer (and nobody else) should have the exclusive capability of determining the secret information from a contaminated published public key.
A user (other than the designer), detecting or suspecting information leakage from a contaminated system, may reverse-engineer the binaries or the smart card to identify the contaminated key generation procedure. The user may even be given the source code of the contaminated routine. Still the user should not be able to steal keys from other users of the same contaminated system. In this sense, a good backdoor protects the designer universally.
A stronger requirement is that reverse-engineering (or source code) should also not allow a user to distinguish (in poly-time) between keys generated by the contaminated procedure and those generated by a genuine procedure. It is exclusively the designer who should possess the capability to make such distinctions in poly-time.
Young and Yung [307] have proposed using public-key cryptography itself for generating backdoors. In their schemes, the attacker (the designer) embeds the encryption routine and the encryption key of the attacker in the key generation procedure of the contaminated system. The decryption key of the attacker is not embedded in the contaminated system and is known only to the attacker. The attacker’s encryption system is assumed to be honest and unbreakable and, thereby, it gives the attacker the exclusive power to decrypt contaminated keys. Young and Yung call such a backdoor a secretly embedded trapdoor with universal protection (SETUP). They also coined the term kleptography to denote such use of cryptography against cryptography.
In the rest of this section, we denote the attacker’s encryption and decryption functions by fe and fd respectively. We often do not restrict these functions to public-key routines only. Since public-key routines are slow, symmetric-key routines can be employed in practice. Simple XOR-ing with a fixed bit string (known to the designer) may also suffice. However, for these faster alternatives of fe, fd, reverse engineering reveals the symmetric key or the XOR operand to the user who can subsequently mimic the attacker to steal keys generated elsewhere by the same contaminated system.
We use the following shorthand notations. Here, n stands for a positive integer that can be naturally identified with a unique bit string having the most significant (that is, leftmost) bit equal to 1.
| |n| | = | the bit length of n. |
| lsbk(n) | = | the least significant k bits of n. |
| msbk(n) | = | the most significant k bits of n. |
| (a1 ‖ a2 ‖ · · · ‖ ar) | = | the concatenation of the bit strings a1, a2, . . . , ar. |
RSA, (seemingly) being the most popular public-key cryptosystem, has been the target of most cryptanalytic attacks. Backdoor attacks are not an exception. The backdoor attacks on RSA work by cleverly hiding some secret information in the public key (n, e) of a user. As earlier, we denote the corresponding private exponent by d and the prime factors of n by p and q.
The simplest attack is to choose a fixed p known to the designer. The other prime q is generated randomly, and correspondingly n = pq and the key pairs (e, d) are computed. Reverse engineering such a scheme is pretty simple, since two different moduli n1 = pq1 and n2 = pq2 belch out p = gcd(n1, n2) easily.
A better approach is given in Algorithm 7.3. The function fe may be RSA encryption under the designer’s public key. In that case, the RSA modulus of the attacker should be so chosen that the condition e < n is satisfied with good probability. On the other hand, if this modulus is too small, then this scheme will generate values of e much smaller than n.
In order to determine the secret exponent from a public key generated using this scheme, the attacker runs Algorithm 7.4. If fe and fd are RSA functions under the attacker’s keys, nobody other than the attacker can apply fd to generate p from e. This provides the designer with the exclusive capability of stealing keys.
A problem with Algorithm 7.3 is that the attacker has little control over the length of the public exponent e. If the user demands a small modulus (like e = 3 or e = 257), this scheme fails to produce one. Algorithm 7.5 overcomes this difficulty by hiding p in the high order bits of the modulus n (instead of in the exponent e). Young and Yung [307] proposed this algorithm in the name PAP (pretty awful privacy). The name contrasts with PGP (pretty good privacy), a popular and widely used RSA implementation.
|
Input: Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d). Steps: Generate a random k-bit prime q. |
|
Input: An RSA public key (n, e). Output: The corresponding secret (p, q, d) or failure. Steps: p := fd(e). |
Algorithm 7.5 works as follows. Following Young and Yung [307], we assume that the attacker uses RSA to realize fe and fd. The RSA modulus of the attacker is denoted by N. The attack requires |N| = k, where |p| = |q| = k. To start with, a random prime p of the desired bit length k is generated. This prime is to be encrypted using fe and so one requires p < N. Instead of encrypting p directly, the attacker uses a permutation function π keyed by K + i for some fixed K and for i = 1, 2, . . . , B, where B is a small bound (typically B = 16). This permutation helps the attacker in two ways. First, one may now have p > N, so a suspicion regarding bounded values of p does not arise. Second, it is cheaper to apply the permutation instead of generating fresh candidates for p. (In an (honest) RSA key generation routine, the prime generation part typically takes the most of the running time.)
|
Input: Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d). Steps: while (1) { |
Once a suitable p and the corresponding p′ = πK+i(p) are generated, the encryption function fe is applied to generate p″ = fe(p′). Now, instead of embedding p″ directly in the modulus n, another keyed permutation is applied on p″ to generate
. This permutation facilitates investigating several choices for q and so is a faster alternative than restarting the entire process afresh, every time an unsuitable q is computed. A pseudorandom bit string a of length k is appended to p‴ to obtain an approximation X for n. If q := ⌊X/p⌋ happens to be a prime of bit length k, the exact n = pq is computed, else another j is tried. If all values of
(for some small bound B′) fail, the entire procedure is repeated with a new k-bit prime p.
For random choices of a, the quotients q = ⌊X/p⌋ behave like random integers and so the probability that q is prime is almost the same as random integers of bit length k. Write X = qp + r with r = X rem p. If r > a, then n = X – r has p‴ – 1 embedded in its higher bits, whereas if r ≤ a, then p‴ itself is embedded in the higher bits of n.
Once suitable p and q are found, the PAP routine generates (like PGP) a small encryption exponent e relatively prime to φ(n) and its inverse d modulo φ(n). One can anyway opt for bigger values of e. In that case, instead of choosing e successively from the sequence 17, 19, 21, 23, . . . one writes one’s customized steps for generating candidate values for e. Choosing small e in Algorithm 7.5 indicates resemblance with PGP and the flexibility of doing so.
The authors of PAP compare their implementation of Algorithm 7.5 with that of the honest PGP key generation procedure. The contaminated routine has been found to run on an average only 20 per cent slower than the honest routine.
Algorithm 7.6 recovers the prime factor p of n from a public key (n, e) generated by PAP, using the RSA decryption function fd of the attacker. Reverse engineering may make available to the user the permutation functions π and π′, the fixed constants K, B, B′ and the designer’s public key. But this knowledge alone does not empower the user to steal PAP-generated keys.
|
Input: An RSA public key (n, e) with n = pq. Output: The prime divisor p of n or failure. Steps: Write n = (U ‖ V) with |V | = k. |
Another possible backdoor is hiding an RSA key pair (∊, δ) with small δ inside a key pair (e, d). Crépeau and Slakmon [70] realize this backdoor using a result from Boneh and Durfee [32], which describes a polynomial-time (in |n|) algorithm for computing δ from the public key (n, ∊), provided that δ is less than n0.292. This attack is explained in Algorithm 7.7. Here, the modulus is a genuine random RSA modulus. The mischievous key ∊ is neatly hidden by the attacker’s encryption routine fe. The resulting output key pair (e, d) looks reasonably random. However, this scheme has a drawback similar to Algorithm 7.3; that is, it cannot easily generate small values of e.
|
Input: Output: An RSA modulus n = pq with |n| = k and a key pair (e, d). Steps: Generate random primes p, q of bit length ~ k/2, such that n := pq has |n| = k. |
Algorithm 7.8 retrieves d from a public key (n, e) generated by Algorithm 7.7.
|
Input: An RSA public key (n, e) generated by Algorithm 7.7. Output: The corresponding private key d. Steps: ∊ := fd(e). /* Recover the hidden exponent */ |
The correctness of Algorithm 7.8 is evident. In order to see how the knowledge of ∊ and δ reveals φ(n), note that x := ∊δ – 1 is a multiple of φ(n); that is,
Equation 7.7

for some integer l. Since δ < n0.292 and ∊ < n, we have x < n1.292. But φ(n) ≈ n and so l cannot be much larger than n0.292. Since |p| ≈ k/2 ≈ |q|, we have l(p+q–1) < n. Now, if we write
x = an + b = (a + 1)n – (n – b)
with a = x quot n and b = x rem n, comparison with Equation (7.7) reveals that l = a + 1. This gives φ(n) = x/l.
Although not needed explicitly here, the factorization of n can be easily obtained by solving the equations pq = n and p + q = n – φ(n) + 1. If ∊ and δ are not small, we may have l(p + q – 1) ≥ n, and φ(n) cannot be calculated so easily as above. A randomized polynomial-time algorithm can still factor n from the knowledge of ∊, δ and n. For the details, solve Exercise 7.9.
Crépeau and Slakmon propose another backdoor attack based on the following result due to Boneh et al. [33]. Let (∊, δ) be a key pair for an RSA modulus n = pq. Further, let
and 2t–1 ≤ ∊ < 2t. There exists a polynomial-time algorithm that, given n, ∊, and t most significant and |n|/4 least significant bits of δ, recovers the full private exponent δ.
|
Input: Output: An RSA modulus n = pq with |n| = k and a key pair (e, d). Steps: Generate random primes p, q of bit length ~ k/2, such that n := pq has |n| = k. |
Algorithm 7.9 uses fe to hide in e a small ∊, t most significant bits of δ and |n|/4 least significant bits of δ. A string of bit length 2t + k/4 is encrypted by fe. Applying the decryption routine fd on e recovers these hidden values, from which ∊ and δ and hence φ(n) can be obtained. Algorithm 7.10 does this task. This scheme also fails, in general, to produce small public exponents e.
|
Input: An RSA public key (n, e) generated by Algorithm 7.9 and the matching Output: The corresponding private key d. Steps: Compute fd(e) and retrieve the following: |
We now describe a backdoor attack on the ElGamal signature Algorithm 5.36. This attack does not work when the user’s permanent key pair is generated. It manipulates the session-key generation in such a way that the user’s permanent private key is revealed to the attacker from two successive signatures.
Let p be a prime, g a generator of
, and (d, gd(mod p)) the permanent key pair of Alice. The attacker uses the same field and a key pair (D, gD (mod p)) with gD supplied to the signing device. Suppose that Alice signs two messages m1 and m2 to generate signatures (s1, t1) and (s2, t2) respectively, where

The attack proceeds by letting d1 arbitrary, but by taking
d2 ≡ (gD)d1 (mod p).
Since
, we have

that is,

The private key D of the attacker (or d1) is required for computing d; so nobody other than the designer can retrieve Alice’s secret by observing the contaminated signatures (s1, t1) and (s2, t2).
For ElGamal encryption (Algorithm 5.15) and for Diffie–Hellman key exchange (Algorithm 5.27) over
, a party (Alice) generates random session key pairs of the form (d′, gd′(mod p)) and communicates the public session key gd′ to another party. The following backdoor manipulates the session-key generation in such a way that two public session keys reveal the second private session key (but not the permanent private key). We assume that the attacker learns the public session keys by eavesdropping. The attacker’s key-pair is (D, gD(mod p)). The contaminated routine contains the public key gD(mod p), but not the private key D.
Let (d1, r1) and (d2, r2) be two session keys used by Alice, where
| r1 | ≡ | gd1 (mod p), |
| r2 | ≡ | gd2 (mod p). |
The contaminated routine that generates the session keys uses a fixed odd integer u, a hash function H and a random bit
to generate d2 from d1 as follows:
| z | ≡ | gd1+ub(gD)d1 (mod p), |
| d2 | ≡ | H(z) (mod p – 1). |
The attacker knows r1 and r2 by eavesdropping. She computes d2 by Algorithm 7.11, the correctness of which is established from that
.
|
|
Algorithm 7.11 requires the attacker’s private key D (or d1) and can be performed only by the attacker. Now, d2 can be analogously used to generate the third session key d3 and so on, that is, the attacker can steal all the private session keys (except the first).
The odd integer u is used for additional safety. In order to see what might happen without it (that is, with b = 0 always), assume that H can be inverted. This gives z and
(mod p). If D is even, y is always a quadratic residue modulo p. If D is odd, y is a quadratic residue or non-residue modulo p depending on whether d1 is even or odd. The randomly added odd bias u destroys this correlation of z with quadratic residues.
Using trustworthy implementations (hardware or software) of cryptographic routines (in particular, key generation routines) eliminates or reduces the risk of backdoor attacks. Preferences should be given to software applications with source codes (rather than to the more capable ones without source codes). Random number generators should be given specific attention. Cascading products from different independent sources also minimizes the possibility of hidden backdoors.
If the desired grain of trust is missing from the available products, the only safe alternative is to write the codes oneself. Complete trust on cryptographic devices and packages and using them as black boxes without bothering about the internals is often called black-box cryptography. Users should learn to question black-box cryptography. The motto is: Be aware or bring peril.
| 7.8 | Argue that reverse engineering the PAP routine (Algorithm 7.5) can enable a user to distinguish in polynomial time between key pairs generated by PAP and those generated by honest procedures. |
| 7.9 | Let n = pq be an RSA modulus and (e, d) a key pair under this modulus. Write ed – 1 = 2st, where s = v2(ed – 1) (so that t is odd). Since ed – 1 is a multiple of φ(n) = (p – 1)(q – 1) with odd p, q, we have s ≥ 2.
|
In this chapter, we discuss some indirect ways of attacking public-key cryptosystems. These attacks do not attempt to solve the underlying intractable problems, but watch the decryption device and/or use malicious key generation routines in order to gain information about private keys.
The timing attack works based on the availability of the total times of several private-key operations under the same private key. It successively keeps on guessing bits of the private key by performing some variance calculations.
The power attack requires the availability of the power consumption patterns (also called power traces) of the decrypting (or signing) device during one or more private-key operations. If the measurements are done with good accuracy and resolution, a single power trace may reveal the private key to the attacker; this is called simple power analysis. In practice, however, such power measurements are often contaminated with noise. Differential power analysis requires power traces from several decryption operations under the same private key. The different traces are combined using a technique that reduces the effect of noise.
A fault attack can be mounted by injecting one or more faults in the device performing private-key operations. Fault attacks are discussed in connection with several encryption (RSA), signature (ElGamal, DSA and so on) and authentication (FFS) schemes.
The above three kinds of attacks are collectively called side-channel attacks. Several general and algorithm-specific countermeasures against side-channel attacks are discussed.
Backdoor attacks, on the other hand, are mounted by malicious key generation routines. Young and Yung propose the concept of secretly embedded trapdoor with universal protection (SETUP). In a SETUP-contaminated system, the designer of the key generation routine possesses the exclusive right to steal keys from users. Several examples of backdoor attacks on RSA and ElGamal cryptosystems are described.
Kocher introduces the concept of side-channel attacks in his seminal paper [155]. This paper describes further details about the timing attack (like a derivation of the choice of the sample size k) and some experimental results.
Timing attacks in various forms are applicable to other systems. Kocher [155] himself suggests a chosen message attack on an RSA implementation based on CRT (Algorithm 5.4). Carol, in an attempt to guess Alice’s public key d, tries to guess the factor p (or q) of the modulus n using a timing attack. She starts by letting Alice sign a message y (c in Algorithm 5.4) close to an initial guess of p. The CRT-based algorithm first reduces y modulo p and modulo q before performing the modular exponentiations. If y < p already, then the initial reduction modulo p returns (almost) immediately, whereas if y ≥ p, the reduction involves at least one subtraction. This gives a variation in the timings based on the value of p. This fact is exploited by the attack to arrive at better and better approximations of p.
A known-message timing attack (in addition to the chosen message attack mentioned in the last paragraph) on the CRT-based RSA signature scheme is proposed by Kocher in the same paper [155]. Kocher also explains a timing attack on the signature algorithm DSA (Algorithm 5.43), based on the dependence of the modular reduction of H(M) + ds modulo r on the bits of the signer’s private key d.
Large scale implementations of timing attacks are reported in the technical reports [77, 259] from the Crypto group of Université catholique de Louvain. These implementations study Montgomery exponentiation.
Kocher [155] mentions the possibility of power attacks. However, a concrete description is first published in Kocher et al. [156], which explains both SPA and DPA. DES is the basic target of this paper, though possibilities for using these techniques against public-key systems are also mentioned.
Several variants of the basic DPA model described in the text have been proposed. Messerges et al. [200] describe attacks against smart-card implementations of exponentiation-based public-key systems. Also consult Aigner and Oswald’s tutorial [9] for a recent survey.
DPA seems to be the most threatening of all side-channel attacks. Many papers suggesting countermeasures against DPA have appeared. Chari et al. [45] propose a masking method. Messerges [199] applies this idea to a form suitable for AES.[4] Messerges’ countermeasure is broken in [63] using a multi-bit DPA. Some other useful papers on DPA include [10, 55, 201].
[4] AES is an abbreviation for advanced encryption standard which is a US-government standard that supersedes the older standard DES. AES uses the Rijndael cipher [219].
Boneh et al. [30, 31] from the Bellcore Lab. announce the first systematic study of fault attacks on asymmetric-key cryptosystems. They explain fault attacks on RSA (with and without CRT), the Rabin signature scheme, the Feige–Fiat–Shamir identification protocol and on the Schnorr identification protocol. These attacks are collectively known as Bellcore attacks.
Arjen K. Lenstra points out that the fault attack on CRT-based RSA does not require the valid signature. Joye and Quisquater propose some generalizations of the Bellcore–Lenstra attack. A form of this attack is applicable to elliptic-curve cryptosystems. The paper [142] talks about these developments.
Bao et al. [17] propose fault attacks on DSA, ElGamal and Schnorr signatures. They also describe variants of the fault analysis of RSA based on square-and-multiply algorithms. Zheng and Matsumoto [315] indicate the possibilities of attacking the random bit generator in a smart card.
Biham and Shamir [22] investigate fault analysis of symmetric-key ciphers and introduce the concept of differential fault analysis. Anderson and Kuhn [11] also study fault analysis of symmetric-key ciphers. Aumüller et al. [15] publish their practical experiences regarding physical realizations of faults in smart cards. They also suggest countermeasures against such attacks.
James A. Muir’s work [215] is a very readable and extensive survey on side-channel cryptanalysis. Also look at Boneh’s survey [29].
Because of small key sizes, elliptic-curve cryptosystems are very attractive for implementation in smart cards. It is, therefore, necessary to provide effective countermeasures against side-channel attacks (most importantly, against the DPA) for elliptic-curve cryptosystems. Many recent articles discuss this issue. Coron [62] suggests the use of random projective coordinates to avoid the costly (and power-consuming) field inversion operation needed for adding and doubling of points. Möller [206] proposes a non-conventional way of carrying out the double-and-add procedure. Izu and Takagi [138] describe a Montgomery-type point addition scheme resistant against side-channel attacks. An improved version of this algorithm, that works for a more general class of elliptic curves, is presented in Izu et al. [137].
Young and Yung introduce the concept of SETUP in [307]. The PAP SETUP on RSA and the ElGamal signature SETUP are from this paper which also includes attacks on DSA and Kerberos authentication protocol. In a later paper [308], Young and Yung categorizes SETUP in three types: regular, weak and strong. Strong SETUPs are proposed for Diffie–Hellman key exchange and for RSA. The third reference [309] from the same authors extends the ideas of kleptography further and provides backdoor routines for several other cryptographic schemes.
Crépeau and Slakmon [70] adopt a more informal approach and discuss several backdoors for RSA key generation. In addition to the trapdoors with hidden small private and public exponents, described in the text, they propose a trapdoor that hides small prime public exponent. They also present an improved version of the PAP routine. Unlike Young and Yung, they suggest symmetric techniques for designing fe, fd. Symmetric techniques endanger universal protection of the attacker, but continue to make perfect sense in the context of black-box cryptography.
| 8.1 | Introduction |
| 8.2 | Quantum Computation |
| 8.3 | Quantum Cryptography |
| 8.4 | Quantum Cryptanalysis |
| Chapter Summary | |
| Sugestions for Further Reading |
Our best theories are not only truer than common sense, they make far more sense than common sense does.
—David Deutsch [76]
One can be a masterful practitioner of computer science without having the foggiest notion of what a transistor is, not to mention how it works.
—N. David Mermin [197]
But suppose I could buy a truly powerful quantum computer off the shelf today — what would I do with it? I don’t know, but it appears that I will have plenty of time to think about it!
—John Preskill [243]
So far, we studied algorithms in the area of cryptology, that can be implemented on classical computers (Turing machines or von Neumann’s stored-program computers). Now, we shift our attention to a different paradigm of computation, known as quantum computation. The working of a quantum computer is specified by the laws of quantum mechanics, a branch of physics developed in the 20th century. However counterintuitive, contrived or artificial these laws initially sound, they have been accepted by the physics community as robust models of certain natural phenomena. A bit, modelled as a quantum mechanical system, appears to be a more powerful unit than a classical bit to build a computing device.
This enhanced power of a computing device has many important ramifications in cryptology. On one hand, we have polynomial-time quantum algorithms to solve the integer factorization and the discrete-log problems. This implies that most of the cryptographic algorithms that we discussed earlier become (provably) insecure. On the other hand, there are proposals for a quantum key-exchange method that possesses unconditional (and provable) security.
Unfortunately, it is not clear how one can manufacture a quantum computer. Technological difficulties involved in the process appear enormous and a section of the crowd even questions the feasibility of building such a machine. However, no laws or proofs rule out the possibility of success in the (near or distant) future. Myth has it that Thomas Alva Edison, after several hundred futile attempts to manufacture an electric light bulb, asserted that he knew hundreds of ways how one cannot make an electric bulb. Edison succeeded eventually and dream turned into reality.
But we will not build quantum computers in this chapter. That is well beyond the scope of this book, or, for that matter, of computer science in general. It is thoroughly unimportant to understand the I-V curves of a transistor (or even to know what a transistor actually is), when one designs and analyses (classical) algorithms. In order to design and analyse quantum algorithms, it is equally unimportant to know how a quantum computer can be realized.
We start with a formal description of quantum computation. Quantum mechanical laws govern this paradigm. We will pay little attention to the physical interpretations of these laws. A mathematical formulation suffices for our purpose.
For defining a quantum mechanical system, we need to enrich our mathematical vocabulary. Let V be a vector space over
(or
). Using Dirac’s ket notation we denote a vector ψ in V as |ψ〉.
|
An inner product (also called a dot product or a scalar product) on V is a function
A vector space V with an inner product is called an inner product space. |
|
For
|
|
The inner product on a vector space V induces a norm (Definition 2.115) on V:
An inner product space |
|
We define an equivalence relation ~ on a Hilbert space |
|
An orthonormal basis of a Hilbert space
It is customary to denote the n vectors in an orthonormal basis of |
|
|0〉 := (1, 0, 0 . . . , 0), |1〉 := (0, 1, 0, . . . , 0), . . . , |n – 1〉 := (0, 0, . . . , 0, 1) form an orthonormal basis of |
The following axiom describes the model of a quantum mechanical system.
|
A system is a ray in a (finite-dimensional) Hilbert space (over |
|
The simplest non-trivial quantum mechanical system is a ray in a 2-dimensional Hilbert space In order distinguish a qubit from a classical bit, we call the latter a cbit. |
has an orthonormal basis {|0〉, |1〉}. In the classical interpretation, a cbit can assume only the two values |0〉 and |1〉, whereas a qubit can assume any value of the form
| a|0〉 + b|1〉 | with | a, , |a|2 + |b|2 = 1. |
Such a state of the qubit is called a superposition of the classical states.
Though we don’t care much, at least for the moment, here are two promising candidates for realizing a qubit:
Spin of an electron: The spin of a particle (like electron) in a given direction, say, along the Z-axis, is modelled as a quantum mechanical system with an orthonormal basis consisting of spin up and spin down.
Polarization of a photon: Photons constitute another class of quantum systems, where the two independent states are provided by the polarization of a photon.
A conceptual example of a 2-state quantum system is the Schrödinger cat. The two independent states of a cat, as we classically know, are |alivei〉 and |deadi〉. However, if we think of the cat confined in a closed room and isolated from our observations, quantum mechanics models the state of the cat as a superposition (that is, a complex-linear combination) of these two states. But then if the quantum model were true, opening the room may reveal the cat in a non-trivial state a|alive〉 + b|dead〉 for some complex numbers a, b with |a|2 + |b|2 = 1. It would indeed be an exciting experience. But alas, quantum mechanics precludes the possibility of such an observation. Read on to know what we would actually see, if we open the room.
A single qubit is too small to build a useful computer. We need to use several (albeit a finite number of) qubits and hence must have a way to describe the combined system in terms of the individual qubits. As the simplest and basis case, we first concentrate on combining two quantum systems into one.
|
Let A and B be two quantum mechanical systems with respective Hilbert spaces
where {|i〉A ⊗ |j〉B | i = 0, . . . ,m – 1 and j = 0, . . . , n – 1}. |
It is customary to abbreviate the normalized vector |i〉A ⊗ |j〉B as |i〉A|j〉B or even as |ij〉AB. A general state of AB is of the form

We can generalize this construction to describe a system having
components A1, . . . , Ak. If
is the Hilbert space of Ai with an orthonormal basis {|j〉i | 0 ≤ j < ni}, the composite system A1 · · · Ak has the n1 · · · nk-dimensional Hilbert space with an orthonormal basis comprising the vectors
|j1〉1 ⊗ |j2〉2 ⊗ · · · ⊗ |jk〉k = |j1〉1|j2〉2 · · · |jk〉k = |j1j2 . . . jk〉
with 0 ≤ ji < ni for all i = 1, . . . , k.
|
An n-bit quantum register is a system having exactly n qubits. |
Let A1, . . . , An denote the individual bits in an n-bit quantum register A. Each Ai has the Hilbert space
with orthonormal basis {|0〉, |1〉}. So A has the 2n-dimensional Hilbert space
with an orthonormal basis consisting of the vectors
|j1〉 ⊗ |j2〉 ⊗ · · · ⊗ |jn〉 = |j1〉|j2〉 · · · |jn〉 = |j1j2 · · · jn〉
with each
. Viewed as an integer in binary notation, j1j2 . . . jn is an integral value between 0 and 2n – 1. This gives us a canonical numbering |0〉, |1〉, . . . , |2n – 1〉 of the basis vectors for the register A. These 2n values are precisely the states that a classical n-bit register can have. The quantum register can, however, be in any state |ψ〉 which is a superposition of the classical states:

Let us once again look at the general composite system A = A1 · · · Ak. In the classical sense, each state of A is composed of the individual states of the subsystems Ai. For example, each of the 2n classical states of an n-bit register corresponds to a choice between |0〉 and |1〉 for each individual bit. That is, each individual component retains its own state in a classical composite system. This is, however, not the case with a quantum composite system. Just think of a 2-bit quantum register C := AB. A state
|ψ〉C = c0|0〉C + c1|1〉C + c2|2〉C + c3|3〉C
of C equals a tensor product
| |ψ1〉A ⊗ |ψ2〉B | = | (a0|0〉A + a1|1〉A) ⊗ (b0|0〉B + b1|1〉B) |
| = | a0b0|0〉C + a0b1|1〉C + a1b0|2〉C + a1b1|3〉C, |
|
The state |ψ〉 of a quantum register A = A1 · · · An is called entangled, if |ψ〉 cannot be written as a tensor product of the states of any two parts of A. In other others, |ψ〉 is entangled if and only if no set of fewer than n qubits of A possesses its individual state. |
Entanglement essentially implies correlation or interaction between the components. In a composite quantum system, we cannot treat the components individually. A quantum system, as we have defined (axiomatically) earlier, is a completely isolated system. In reality, interactions with the surroundings make a (non-isolated) system change its state and get entangled. This is one of the biggest problems in the realization of a quantum computer. Quantum error correction is an important topic in quantum computation. For our purpose, we stick to the abstract model of an isolated system (quantum register) immune from external disturbances.
Quantum registers give us a way to store quantum information. A computation involves manipulating the information stored in the registers. In quantum mechanics, all such operations must be reversible, that is, it must be possible to invert every operation. The only invertible operations on the classical states |0〉, |1〉, . . . , |2n – 1〉 of an n-bit quantum register A are precisely all the permutations of the classical states. Now that A can be in many more (quantum) states, there are other allowed operations on A. Any such operation must be reversible and of a particular type. This is the third axiom of quantum mechanics, which is detailed shortly.
A classical n-bit register supports many non-invertible operations. For example, erasing the content of the register (that is, resetting all the bits to zero) is a non-invertible process, since the pre-erasure state of the register cannot be uniquely determined after the erase operation is carried out. Classical computation is based on (classical) gates (like NOT, AND, OR, XOR, NOR, NAND), most of which are non-invertible. XOR, as an example, requires two input bits and outputs a single bit. It is impossible to determine the inputs uniquely from the output only. All such non-reversible operations are disallowed in the quantum world. An invertible version of the XOR operation takes two bits x and y as input and outputs the two bits x and x ⊕ y (where ⊕ denotes XOR of bits). Given the output (x, x ⊕ y), the input can be uniquely determined as (x, y) = (x, x ⊕ (x ⊕ y)), that is, by applying the reversible XOR operation once more.
Like XOR, all bit operations that build up a classical computer can be realized using reversible operations only. This gives us the (informal) assurance that quantum computers are at least as powerful as classical computers.
Back to the business—the third axiom of quantum mechanics.
|
Let U be a square matrix (that is, an m × m matrix for some |
Let A be a quantum system (like a quantum register) with Hilbert space
. An m × m unitary matrix U defines a unitary linear transformation on
taking a normalized vector |ψ〉 to a normalized vector U|ψ〉. Moreover, the transformation maps an orthonormal basis of
to another orthonormal basis of
(Exercise 8.4).
|
A quantum system evolves unitarily, that is, any operation on a quantum mechanical system is a unitary transformation. |
|
The Hadamard transform H on one qubit is defined as:
(Recall that a linear transformation is completely specified by its images of the elements of a basis.) If one takes
By linearity, H transforms a general state |ψ〉 = a|0〉 + b|1〉 to the state
Some other unitary operators are described in Exercises 8.5 and 8.6. |
An important consequence of quantum mechanical dynamics is that cloning of a state of a system is not permissible. In other words, there does not exist an operator that copies an arbitrary state (content) of one quantum register to another.
|
For two n-bit registers A and B, there do not exist a unitary transform U of the composite system AB and a state Proof Assume that such a state |
We have seen how to represent a quantum mechanical system and do operations on the system. Now comes the final part of the game, namely observing or measuring or reading the state of a quantum system. In classical computation, reading the value stored in a classical register is a trivial exercise—just read it! In quantum mechanics, this is not the case.
|
Let A be a quantum mechanical system with an orthonormal basis {|0〉, |1〉, . . . , |m – 1〉}. Assume that A is in a state |
This means that whatever the state |ψ〉 of A was before the measurement, the process of measurement can reveal only one of m possible integer values. Moreover, the measurement causes a total loss of information about the pre-measurement amplitudes ai. Thus, it is impossible to measure A repeatedly at the state |ψ〉 to see a statistical pattern in the occurrences of different values of i so as to guess the probabilities |ai|2.
If we open the room, we can see the Schrödinger cat in only one of the two possible states: |alivei or |deadi. Well, then, what else can we expect? Quantum mechanics only models the cat in the isolated room as one evolving following the unitary dynamics.
At first glance, this is rather frustrating. We claim that the system went through a series of classically meaningless states, but the classical states are all we can see. What is the guarantee that the system really evolved in the quantum mechanical way? Well, there is no guarantee actually. The solace is that the axioms of quantum mechanics can explain certain natural phenomena. Also it is perfectly consistent with the classical behaviour in that if the system A evolves classically and is measured at the state |i〉 (so that ai = 1 and aj = 0 for j ≠ i), measuring A reveals i with probability one and causes the system to collapse to the state |i〉, that is, to remain in the state |i〉 itself.
There is a positive side of the quantum mechanical axioms. A quantum mechanical system is inherently parallel. An n-bit classical register at any point of time can hold only one of the classical values |0〉, . . . , |2n – 1〉. An n-bit quantum register, on the other hand, can simultaneously hold all these classical values, with respective probabilities. This inherent parallelism seems to impart a good deal of power to a computing device. Of course, as long as we cannot harness some physical objects to build a real quantum mechanical computing device, quantum computation continues to remain science fiction. But on an algorithmic level, the inherent parallelism of a (hypothetical) quantum computer can be exploited to do miracles, for example, to design a polynomial-time integer factorization algorithm. This is where we win—at least conceptually. Our failure to see a cat in the state
(|alive〉 – |dead〉) should not bother us at all!
Measurement of a quantum register gives us a way to initialize a quantum register A to a state |ψ〉. Suppose that we get the value i upon measuring A. We then apply any unitary transform on A that changes A from the post-measurement state |i〉 to the desired state |ψ〉.
The measurement described in Axiom 8.4 is called measurement in the classical basis. The system A has, in general, many orthonormal bases other than the classical one {|0〉, . . . , |m – 1〉}. If B is any such basis, we can conceive of measuring A in the basis B. All we need to perform is to rewrite the state of A in terms of the new basis B. This can be achieved by applying to A a unitary transformation (the change-of-basis transformation) before the measurement in the classical basis is carried out.
A generalization of the Born rule is also worth mentioning here. Suppose that we have an m + n-bit quantum register A and we want to measure not all but some of the bits of A. To be more specific, let us say that we want to measure the leftmost m bits of A, though the generalized Born rule works for any arbitrary choice of m bit positions in the register A. Denoting by |i〉m, i = 0, . . . , 2m – 1, the canonical basis vectors for the left m bits and by |j〉n, j = 0, . . . , 2n – 1, those for the right n bits, a general state of A can be written as

with Σi,j|ai,j|2 = 1 and with |i, j〉m+n identified as |i〉m|j〉n = |i〉m ⊗ |j〉n. A measurement of the left m bits of A yields an integer i, 0 ≤ i ≤ 2m – 1, with probability
. Also this measurement causes A to collapse to the state
.
Now, if we immediately apply the generalized Born rule once again on the right n bits of A, we get an integer j, 0 ≤ j ≤ 2n – 1, with probability |ai,j|2/pi and the system collapses to the state |i〉n|j〉n. The probability of getting |i〉n|j〉n by this two step process is then pi|ai,j|2/pi = |ai,j|2. This is consistent with a single application of the original Born rule.
We start with a general framework of doing computations using quantum registers. Suppose we want to compute a function f which requires an m-bit integer as input and which outputs an n-bit integer. A general function f need not be invertible, but we cannot afford non-invertible operations on quantum registers. This is why we work on an m + n-bit quantum register A in which the left m bits represent the input and the right n bits the output. Computing f(x) for a given x is tantamount to designing a unitary transformation Uf that acts on A and converts its state from |x〉m|y〉n to |x〉m|f(x) ⊕ y〉n, where ⊕ is the bitwise XOR operation, and where the subscripts (m and n) indicate the number of bits in the input or output part of A. It is easy to verify that Uf is unitary. Moreover, the inverse of Uf is Uf itself. For y = 0, we, in particular, have Uf (|x〉m|0〉n) = |x〉m|f(x)〉n.
It may still be unclear to the reader what one really gains by using this quantum model. The answer lies in the parallelism inherent in a quantum register. In order to see how this parallelism can be exploited, we describe David Deutsch’s algorithm which, being the first known quantum algorithm, has enough historical importance to be included here in spite of its apparent irrelevance in the context of cryptology.
Assume that f : {0, 1} → {0, 1} is a function that operates on one bit and outputs one bit. There are four such functions: Two of these are constant functions (f(0) = f(1)) and the remaining two non-constant (f(0) ≠ f(1)). We are given a black box Df representing f. We don’t know which one of the four functions Df actually implements, but we can supply a bit to Df as input and read its output on this bit. Our task is to determine whether Df represents a constant function or not. Classically, we make two invocations of Df on the inputs 0 and 1 and make a comparison of the output values f(0) and f(1). It is impossible to solve the problem classically using only one invocation of the black box. The Deutsch algorithm makes this task possible using quantum computational techniques.
Following the general quantum computational model we assume that Df is a unitary transformation on a 2-bit register A (with m = n = 1) that computes Df |x〉|y〉 = |x〉|f(x) ⊕ y〉 with the left (resp. the right) bit corresponding to the input (resp. the output) of f. Instead of supplying a classical input to Df we initialize the register A to the state

Linearity shows that on this input, Df ends its execution leaving A in the state

Here,
. We won’t measure A right now, but apply the Hadamard transform on the left bit. This transforms A to the state

Now, if we measure the input bit, we deterministically get the integer 1 or 0 according as whether f is constant or not respectively. That’s it!
Deutsch’s algorithm solved a rather artificial problem, but it opened up the possibilities of exploring a new paradigm of computation. Till date, (good) quantum algorithms are known for many interesting computational problems. In the rest of this chapter, we concentrate on some of the quantum algorithms that have an impact in cryptology.
| 8.1 | Let S be a finite set and let l2(S) denote the set of all functions .
| ||||||||
| 8.2 | Show that the vectors and form an orthonormal -basis of .
| ||||||||
| 8.3 | Show that is an entangled state of a 2-bit quantum register.
| ||||||||
| 8.4 | Prove the following assertions.
| ||||||||
| 8.5 |
| ||||||||
| 8.6 | Let A be an n-bit quantum register. Let us plan to number the bits of A as 1, . . . , n from left to right. One can apply the operators like X, Z, H of Exercise 8.5 on each individual bit of A. A qubit operation B applied on bit i of A will be denoted by Bi.
| ||||||||
| 8.7 | Suppose that whenever you switch on your quantum computer, every bit in its registers is initialized to the state |0〉. Describe how you can use the operators I, X, Z and H defined in Exercise 8.5, in order to change the state of a qubit from |0〉 to the following:
| ||||||||
| 8.8 | Let A be an n-bit quantum register at the state |0|〉n. Show that the application of the Hadamard transform individually to each bit of A transforms A to the state . This is precisely the state of A in which all of the 2n possible outcomes in a measurement of A are equally likely. What happens if we apply H a second time individually to each bit of A, that is, what is H1H2 · · · Hn|ψ〉, where Hi denotes the Hadamard transform on the i-th bit of A?
| ||||||||
| 8.9 | We know that any arithmetic or Boolean operation can be implemented using AND and NOT gates. This exercise suggests a reversible way to implement these operations. The Toffoli gate is a function T : {0, 1}3 → {0, 1}3 that maps (x, y, z) ↦ (x, y, z ⊕ xy), where ⊕ means XOR, and xy means AND of x and y. Thus, T flips the third bit, if and only if the first two bits are both 1.
|
We now describe the quantum key-exchange algorithm due to Bennett and Brassard. The original paper also talks about a practical implementation of the algorithm—an implementation using polarization of photons. For this moment, we do not highlight such specific implementation issues, but describe the algorithm in terms of the conceptual computational units called qubits.
The usual actors Alice and Bob want to agree upon a shared secret using communication over an insecure channel. A third party who gave her name as Carol plans to eavesdrop during the transmission. Alice and Bob repeat the following steps. Here, H stands for the Hadamard transform.
|
Alice generates a random classical bit Alice makes a random choice Alice computes the quantum bit A := Hx|i〉. Alice sends A to Bob. Bob makes a random choice Bob computes B := HyA. Bob measures B to get the classical bit Bob sends y to Alice. Alice sends x to Bob. if (x = y) { Bob and Alice retains i = j } |
The algorithm works as follows. Alice generates a random bit i and a random decision x whether she is going to use the Hadamard transform H. If x = 0, she sends the quantum bit |0〉 or |1〉 to Bob. If x = 1, she sends either
or
to Bob. At this point Bob does not know whether Alice applied H before the transmission. So Bob makes a random guess
and accordingly skips/applies the Hadamard transform on the qubit received. If x = y = 0, then Bob has the qubit B = H0H0|i〉 = |i〉 and a measurement of this qubit reveals i with probability 1. On the other hand, if x = y = 1, then B = H2|i〉 = |i〉, since H2 is the identity transform (Exercise 8.5). In this case also, Bob retrieves Alice’s classical bit i with certainty by measuring B.
If x ≠ y, then B is generated from Alice’s initial choice |i〉 using a single application of H, that is,
in this case. A measurement of this bit outputs 0 or 1, each with probability
, that is, Bob gathers no idea about the initial choice of Alice. So after it is established that x ≠ y, they both discard the bit.
If we assume that x and y are uniformly chosen, Bob and Alice succeed in having x = y about half of the time. They eventually set up an n-bit secret after about 2n invocations of the above protocol. Table 8.1 illustrates a sample session between Alice and Bob. After 20 iterations of the above procedure, they agree upon the shared secret 0001110111.
| Iteration | i | x | A | y | B | j | Common bit |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 1 | ![]() | 0 | ![]() | 1 | |
| 2 | 0 | 0 | |0〉 | 1 | ![]() | 1 | |
| 3 | 0 | 1 | ![]() | 1 | |0〉 | 0 | 0 |
| 4 | 0 | 1 | ![]() | 0 | ![]() | 0 | |
| 5 | 1 | 1 | ![]() | 0 | ![]() | 1 | |
| 6 | 0 | 0 | |0〉 | 0 | |0〉 | 0 | 0 |
| 7 | 0 | 0 | |0〉 | 0 | |0〉 | 0 | 0 |
| 8 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | 1 |
| 9 | 0 | 0 | |0〉 | 1 | ![]() | 0 | |
| 10 | 1 | 1 | ![]() | 0 | ![]() | 0 | |
| 11 | 0 | 1 | ![]() | 0 | ![]() | 1 | |
| 12 | 0 | 0 | |0〉 | 1 | ![]() | 0 | |
| 13 | 1 | 0 | |1〉 | 1 | ![]() | 1 | |
| 14 | 1 | 1 | ![]() | 1 | |1〉 | 1 | 1 |
| 15 | 1 | 1 | ![]() | 1 | |1〉 | 1 | 1 |
| 16 | 0 | 1 | ![]() | 1 | |0〉 | 0 | 0 |
| 17 | 1 | 1 | ![]() | 1 | |1〉 | 1 | 1 |
| 18 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | 1 |
| 19 | 0 | 1 | ![]() | 0 | ![]() | 0 | |
| 20 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | 1 |
What remains to explain is how this protocol guards against eavesdropping by Carol. Let us model Carol as a passive adversary who intercepts the qubit A transmitted by Alice, investigates the bit to learn about Alice’s secret i and subsequently transmits the qubit to Bob. In order to guess i, Carol mimics the role of Bob. At this point Carol does not know x, so she makes a guess z about x, accordingly skips/applies the Hadamard transform on the intercepted qubit in order to get a qubit C, measures C to get a bit value k and sends the measured qubit D to Bob. (Recall from Theorem 8.1 that it is impossible for Carol to make a copy of A, work on this copy and transmit the original qubit A to Bob.) Bob receives D, assumes that it is the qubit A transmitted by Alice and carries out his part of the work to generate the bit j. Bob and Alice later reveal x and y. If x ≠ y, they anyway reject the bits obtained from this iteration. Carol should also reject her bit k in this case. So let us concentrate only on the case that x = y. The introduction of Carol in the protocol changes A to D and hence Alice and Bob may eventually agree upon distinct bits. A sample session of the protocol in presence of Carol is illustrated in Table 8.2. The three parties generate the secret as:
| Alice | 0110 0111 1000 1011 |
| Bob | 0101 1101 1100 1011 |
| Carol | 0100 0101 0100 1011 |
| Iteration | i | x | A | z | C = HzA | k | D | y | B = HyD | j |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 1 | ![]() | 1 | |0〉 | 0 | |0〉 | 1 | ![]() | 0 |
| 2 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | |1〉 | 0 | |1〉 | 1 |
| 3 | 1 | 0 | |1〉 | 1 | ![]() | 0 | |0〉 | 0 | |0〉 | 0 |
| 4 | 0 | 1 | ![]() | 0 | ![]() | 0 | |0〉 | 1 | ![]() | 1 |
| 5 | 0 | 1 | ![]() | 1 | |0〉 | 0 | |0〉 | 1 | ![]() | 1 |
| 6 | 1 | 1 | ![]() | 1 | |1〉 | 1 | |1〉 | 1 | ![]() | 1 |
| 7 | 1 | 1 | ![]() | 0 | ![]() | 0 | |0〉 | 1 | ![]() | 0 |
| 8 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | |1〉 | 0 | |1〉 | 1 |
| 9 | 1 | 1 | ![]() | 0 | ![]() | 0 | |0〉 | 1 | ![]() | 1 |
| 10 | 0 | 1 | ![]() | 0 | ![]() | 1 | |1〉 | 1 | ![]() | 1 |
| 11 | 0 | 0 | |0〉 | 1 | ![]() | 0 | |0〉 | 0 | |0〉 | 0 |
| 12 | 0 | 0 | |0〉 | 0 | |0〉 | 0 | |0〉 | 0 | |0〉 | 0 |
| 13 | 1 | 1 | ![]() | 1 | |1〉 | 1 | |1〉 | 1 | ![]() | 1 |
| 14 | 0 | 0 | |0〉 | 0 | |0〉 | 0 | |0〉 | 0 | |0〉 | 0 |
| 15 | 1 | 0 | |1〉 | 0 | |1〉 | 1 | |1〉 | 0 | |1〉 | 1 |
| 16 | 1 | 0 | |1〉 | 1 | ![]() | 1 | |1〉 | 0 | |1〉 | 1 |
In this example, Alice and Bob’s shared secrets differ in five bit positions. Carol’s intervention causes a shared bit to differ with a probability of
(Exercise 8.11). Thus, the more Carol eavesdrops, the more she introduces different bits in the secret shared by Alice and Bob.
Once Alice and Bob generate a shared secret of the desired bit length, they can check for the equality of their secret values without revealing them. For example, if the shared secret is a 64-bit DES key, Alice can send Bob one or more plaintext–ciphertext pairs generated by the DES algorithm using her shared key. Bob also generates the ciphertexts on Alice’s plaintexts using his secret key. If the ciphertexts generated by Bob differ from those generated by Alice, Bob becomes confident that their shared secrets are different and this happened because of the presence of some adversary (or because of communication errors). They then repeat the key-exchange protocol.
Another possible way in which Alice and Bob can gain confidence about the equality of their shared secrets is the use of parity checks. Suppose Alice breaks up her secret in blocks of eight bits and for each block computes the parity bit and sends these bits to Bob. Bob generates the parity bits on the blocks of his secret and compares the two sets of parity bits. If the shared secrets of Alice and Bob differ, it is revealed by this parity check with high probability.
A minor variant of the key-exchange algorithm just described comes with an implementation strategy. The polarization of a photon is measured by an angle θ, 0° ≤ θ < 180°.[1] A photon polarized at an angle θ passes through a φ-filter with the probability cos2(φ – θ) and gets absorbed in the filter with the probability sin2(φ – θ). Therefore, a photon polarized at the angles 0°, 90°, 45°, 135° can be used to represent the quantum states |0〉, |1〉,
,
respectively. Alice and Bob use 0°- and 45°-filters. Alice makes a random choice (x) among the two filters. If x = 0, she sends a photon polarized at an angle 0° or 90°. If x = 1, a photon polarized at an angle 45° or 135° is sent. When Bob receives the photon transmitted by Alice, he makes a random guess y. If y = 0, he uses the 0°-filter to detect its polarization, and if y = 1, he uses the 45°-filter to detect its polarization. Then, Alice and Bob reveal their choices x and y and if the two choices agree, they share a common secret bit. See Exercise 8.12 for a mathematical formulation of this strategy.
[1] Ask a physicist!
One of the most startling features of this Bennett–Brassard algorithm (often called the BB84 algorithm) is that there has been successful experimental implementations of the strategy. The first prototype was designed by the authors themselves in the T. J. Watson Research Center. They used a quantum channel of length 32 cm. Using longer channels requires many technological barriers to be overcome. For example, fiber optic cables tend to weaken and may even destroy the polarization of photons. Using boosters to strengthen the signal is impossible in the quantum mechanical world, since doing so produces an effect similar to eavesdropping. Interference pattern (instead of polarization) has been proposed and utilized to build longer quantum channels for key exchange. At present, Stucki et al. [293] hold the world record of performing quantum key exchange over an (underwater) channel of length 67 km between Geneva and Lausanne.
| 8.10 | We have exploited the property that H2 = I in order to prove the correctness of the quantum key-exchange algorithm. Exercise 8.5 lists some other operators (X and Z) which also satisfy the same property (X2 = Z2 = I). Can one use one of these transforms in place of H in the quantum key-exchange algorithm? | ||||||||||||||||||||||||||||||||||||
| 8.11 | Assume that Carol eavesdrops (in the manner described in the text) during the execution of the quantum key-exchange protocol between Alice and Bob. Derive for different choices of i, x and z the following probabilities Pixz of having i ≠ j in the case x = y.
If all these choices of i, x, z are equally likely, show that the probability that Carol introduces mismatch (that is, i ≠ j) in a shared bit during a random execution of the key-exchange protocol with x = y is 3/8. (Note that if x = y = z = 0, that is, if the execution of the algorithm proceeds entirely in the classical sense, Carol goes unnoticed. It is the application of the classically meaningless Hadamard transform, that introduces the desired security in the protocol.) | ||||||||||||||||||||||||||||||||||||
| 8.12 | In the key-exchange algorithm described in the text, Bob (and also Carol) always measure qubits in the classical basis {|0〉, |1〉}. Now, consider the following variant of this algorithm. Alice sends, as before, one of the four qubits |0〉, |1〉, , depending on her choice of i and x. Bob upon receiving the qubit A generates a random guess . If y = 0, Bob measures A in the classical basis, whereas if y = 1, Bob measures A in the basis {H|0〉, H|1〉}. After this, they exchange x and y, and retain/discard the bits as in the original algorithm.
|
The quantum parallelism has been effectively exploited to design fast (polynomial-time) algorithms to solve some of the intractable mathematical problems discussed in Chapter 4. With the availability of quantum computers, cryptographic systems that derive their security from the intractability of these problems will be unusable (completely insecure). Nobody, however, has the proof that these intractable problems cannot have fast classical algorithms. It is interesting to wait and see which (if any) is invented first, a quantum computer or a polynomial-time classical algorithm.
Let us set up some terminology for the rest of this chapter. Let P be a unitary operator on a qubit. One can apply P individually on the i-th bit of an n-bit register. In this case, we denote the operation by Pi. If Pi is operated for each i = 1, . . . , n (in succession or simultaneously), then we abbreviate P1 · · · Pn by the short-hand notation P(n). The parentheses distinguish the operation from Pn which is the n-fold application of P on a single qubit.
If P and Q are unitary transforms on n1- and n2-bit quantum registers respectively, we let P ⊗ Q denote the unitary transform on an n1 + n2-bit register, with P operating on the left n1 bits and Q on the right n2 bits of the register.
Let N := 2n for some
. Let
be a periodic function with (least) period r, that is, f(x + kr) = f(x) for every x,
. Suppose further that 1 ≪ r ≤ 2n/2 and also that f(0), f(1), . . . , f(r – 1) are pairwise distinct. Shor proposed an algorithm for an efficient computation of the period r in this case.
Let’s first look at the problem classically. If one evaluates f at randomly chosen points, by the birthday paradox (Exercise 2.172) one requires
evaluations of f on an average in order to find two different integers x and y with f(x) = f(y). But then r|(x – y). If sufficiently many such pairs (x, y) are available, the period can be obtained by computing the gcd of the integers x – y. If r is large, say, r = O(2n/2), this gives us an algorithm for computing r in expected time exponential in n. Shor’s quantum algorithm determines r in expected time polynomial in n.
Let us assume that we have an oracle Uf which, on input the 2n-bit value |x〉n|y〉n, computes |x〉n|f(x) ⊕ y〉n. We prepare a 2n-bit register A in the state |0〉n|0〉n. Then, we apply the Hadamard transform H(n) on the left n-bits. By Exercise 8.8, the state of A becomes

Supplying this state as the input to the oracle Uf yields the state

We then measure the output register (right n bits). By the generalized Born rule, we get a value
for some
and the state of the register A collapses to the uniform superposition of all those |x〉|f(x)〉 for which f(x) = f(x0). By the given periodicity properties of f, the post-measurement state of the input register (left n bits) can be written as
Equation 8.1

for some M determined by the relations:
x0 + (M – 1)r < N ≤ x0 +Mr.
This is an interesting state, for if we were allowed to make copies of this state and measure the different copies, we could collect some values x0+j1r, . . . , x0+jkr, which in turn would reveal r with high probability. But the no-cloning theorem disallows making copies of quantum states. Shor proposed a trick to work around with this difficulty. He considered the following transform:
Equation 8.2

By Exercise 8.13, F is a unitary transform. F is known as the Fourier transform. Applying F to State (8.1) transforms the input register to the state

A measurement of this state gives an integer
with the probability

Application of the Fourier transform to State (8.1) helps us to concentrate the probabilities of measurement outcomes in strategic states. More precisely, consider a value
of y, where –1/2 ≤ ∊k < 1/2, that is, a value of y close to an integral multiple of N/r. In this case,

The last summation is that of a geometric series and we have

Now, we use the inequalities
for 0 ≤ x ≤ π/2 and the facts that rM ≈ N and that
to get

Since
has about r positive integral multiples less than N and each such multiple has a closest integer yk for some k, the probability that we obtain one such yk as the outcome of the measurement is at least 4/π2 = 0.40528 . . . , that is, after O(1) iterations of the above procedure we get some yk. The Fourier transform increases the likelihood of getting some yk to a level bounded below by a positive constant.
What remains is to show that r can be retrieved from such a useful observation yk. We have
. If a/b and c/d are two distinct rationals with b,
and with
and
, then by the triangle inequality we have
. On the other hand, since a/b ≠ c/d, we have
, a contradiction. Therefore, since
, there is a unique rational k/r satisfying
, and this rational k/r can be determined by efficient classical algorithms, for example, using the continued fraction expansion[2] of yk/N.
[2] Consult Zuckerman et al. [316] to learn about continued fractions and their applications in approximating real numbers.
If gcd(k, r) = 1, we get r. We can verify this by checking if f(x) = f(x + r). If gcd(k, r) > 1, we get a factor of r. Repeating the entire procedure gives another k′/r, from which we get (hopefully) another factor of r (if not r itself). After a few (O(1)) iterations, we obtain r as the lcm of its factors obtained.
Much of the quantum magic is obtained by the use of the Fourier transform F on a suitably prepared quantum register. The question is then how easy it is to implement F. We will not go to the details, but only mention that a circuit consisting of basic quantum gates and of size O(n2) can be used to realize the Fourier transform (cf. Exercise 8.14).
To sum up, we have a polynomial-time (in n) randomized quantum algorithm for computing the period r of f. This leads to efficient quantum algorithms for solving many classically intractable problems of cryptographic significance.
Let m = pq with p,
. We have φ(m) = (p – 1)(q – 1). Choose an RSA key pair (e, d) with gcd(e, φ(m)) = 1 and ed ≡ 1 (mod φ(m)). Given a message
the ciphertext message is b ≡ ae (mod m). The task of a cryptanalyst is to compute a from the knowledge of m, e and b. If gcd(b, m) > 1, then this gcd is a non-trivial factor of m. So assume that
. But then
also. Since b ≡ ae (mod m), b is in the subgroup of
generated by a. Similarly, a ≡ bd (mod m), that is, a is in the subgroup of
generated by b. It follows that these two subgroups are equal and, in particular, the multiplicative orders of a and b modulo m are the same. This order—call it r—divides φ(m) and hence is ≤ (p – 1)(q – 1) < m.
Choose
with N := 2n ≥ m2 > r2. The function
sending x ↦ bx (mod m) is periodic of (least) period r. By Shor’s algorithm, one computes r efficiently. Since gcd(e, φ(m)) = 1 and r|φ(m), we have gcd(e, r) = 1, that is, using the extended gcd algorithm one obtains an integer d′ with d′e ≡ 1 (mod r). But then bd′ ≡ ad′e ≡ a (mod m).
The private key d is the inverse of e modulo φ(m). It is not necessary to compute d for decrypting b. The inverse d′ of e modulo r = ordm(a) = ordm(b) suffices.
Let m be a composite integer that we want to factor. Choose a non-zero integer
. If gcd(a, m) > 1, then we already know a non-trivial factor of m. So assume that gcd(a, m) = 1, that is,
. Let r := ordm(a).
As in the case of breaking RSA, choose
with N := 2n ≥ m2 > r2. The function
, x ↦ ax (mod m), is periodic of least period r. Shor’s algorithm computes r. If r is even, we can write:
(ar/2 – 1)(ar/2 + 1) ≡ 0 (mod m).
Since ordm(a) = r, ar/2 – 1 ≢ 0 (mod m). If we also have ar/2 + 1 ≢ 0 (mod m), then gcd(ar/2 + 1, m) is a non-trivial factor of m. It can be shown that the probability of finding an even r with ar/2 + 1 ≢ 0 (mod m) is at least half (cf. Exercise 4.9). Thus, trying a few integers
one can factor m.
A variant of Shor’s algorithm in Section 8.4.1 can be used to compute discrete logarithms in the finite field
,
,
. For the sake of simplicity, let us concentrate only on prime fields (s = 1). Let g be a generator of
and our task is to compute for a given
an integer
with a ≡ gr (mod p). We assume that p is a large prime, that is, p is odd.
Choose
with N := 2n satisfying p < N < 2p. We use a 3n-bit quantum register A in which the left 2n bits constitute the input part and the right n bits the output part. The input part is initialized to the uniform superposition of all pairs
, that is, A has the initial state:

(see Exercise 8.15). Then, we use an oracle
Uf : |x〉n|y〉n|z〉n ↦ |x〉n|y〉n|f(x, y) ⊕ z〉n
to compute the function f(x, y) := gxa–y (mod p) in the output register. Applying Uf transforms A to the state

Measurement of the output register now gives a value z ≡ gk (mod p) for some
and causes the input register to jump to the state

Note that gxa–y ≡ gk (mod p) if and only if x – ry ≡ k (mod p – 1), that is, only those pairs (x, y) that satisfy this congruence contribute to the post-measurement state. For each value of y modulo p – 1, we get a unique x ≡ ry + k (mod p – 1), that is, there are exactly p – 1 such pairs (x, y).
If we were allowed make copies of this state and observe two copies separately, we would get pairs (x1, y1) and (x2, y2) with x1 – ry1 ≡ x2 – ry2 ≡ k (mod p – 1). Now, if gcd(y1 – y2, p – 1) = 1, we would get r ≡ (y1 – y2)–1 (x1 – x2) (mod p – 1). But we are not allowed to copy quantum states. So Shor used his old trick, that is, applied the Fourier transforms

to obtain the state

A measurement of the input register at this state yields
with probability:
Equation 8.3

As in Shor’s period-finding algorithm, we now require to identify a set of useful pairs (u, v) which are sufficiently many in number so as to make the probability of observing one of them bounded below by a positive constant. We also need to demonstrate how a useful pair can reveal the unknown discrete logarithm r of a. The jugglery with inequalities and approximations is much more involved in this case. Let us still make a patient attempt to see the end of the story.
First, we eliminate one of x, y from Equation (8.3). Since x ≡ ry + k (mod p – 1) and 0 ≤ x ≤ p – 2, we have x = (ry + k) rem
. But then
. Let j be the integer closest to u(p – 1)/N, that is, u(p – 1) = jN + ∊ with
, –N/2 < ∊ ≤ N/2. This yields
Equation 8.4

where
Equation 8.5

Since
is an integer, substituting Equation (8.4) in Equation (8.3) gives

Writing S = lN + σ with –N/2 < σ ≤ N/2 then gives

We now impose the usefulness conditions on u, v:
Equation 8.6

Equation 8.7

Involved calculations show that the probability pu,v for a (u, v) satisfying these two conditions is at least
. Let us now see how many pairs (u, v) satisfy the conditions. From Equation (8.5), it follows that for each u there exists a unique v, such that Condition (8.6) is satisfied. Condition (8.7), on the other hand, involves only u. If w := v2(p – 1), then 2w must divide ∊. For each multiple of 2w not exceeding N/12 in absolute value, we get 2w distinct solutions for u modulo N. (We are solving for u the congruence u(p – 1) ≡ ∊ (mod 2n).) There is a total of at least N/12 of them. Therefore, the probability of making any one of the useful observations (u, v) is at least
, since N < 2p.
We finally explain the extraction of r from a useful observation (u, v). Condition (8.6) and Equation (8.5) give
. Dividing throughout by N and using the fact that u(p – 1) = jN + ∊, we get

that is, the fractional part of
must lie between
and
. The measurement of the input gives us v and we know N. We approximate
to the nearest multiple
of
and get rj ≡ λ (mod p – 1). Now, j, being the integer closest to u(p – 1)/N, is also known to us. If gcd(j, p – 1) = 1, we have r ≡ j–1λ (mod p – 1). We don’t go into the details of determining the likelihood of the invertibility of j modulo p – 1. A careful analysis shows that Shor’s quantum discrete-log algorithm runs in probabilistic polynomial time (in n).
| 8.13 | Let F be the Fourier Transform (8.2). For basis vectors |x〉 and |x′〉, show that
|
| 8.14 | Let N = 2n. Let x, have binary expansions (xn–1 · · · x1x0)2 and (yn–1 · · · y1y0)2 respectively.
|
| 8.15 | Let , N := 2n and . Consider an (n + 1)-bit quantum register with input consisting of the left n bits and the output the rightmost bit. Suppose there is an oracle Uf that takes an n-bit input x and outputs the bit:
First prepare the register in the state |
| 8.16 | Recall that the Fourier Transform (8.2) is defined for N equal to a power of 2. It turns out that for such values of N the quantum Fourier transform is easy to implement. For this exercise, assume hypothetically that one can efficiently implement F for other values of N too. In particular, take N = p – 1 in Shor’s quantum discrete-log algorithm. Show that in this case, the probability pu,v of Equation (8.3) becomes:
Conclude that an outcome (u, v) of measuring the input register yields r ≡ –u–1v (mod p – 1), provided gcd(u, p – 1) = 1. |
This chapter is a gentle introduction to the recent applications of quantum computation in public-key cryptography. These developments have both good and bad impacts for cryptologers. It is still a big question whether a quantum computer can ever be manufactured. So at present a study of quantum cryptology is mostly theoretical in nature.
Quantum mechanics is governed by a set of four axioms that define a system and prescribe the properties of a system. A quantum bit (qubit) is a quantum mechanical system that has two orthogonal states |0〉 and |1〉. A quantum register is a collection of qubits of a fixed size.
As an example of what we can gain by using quantum algorithms, we first describe the Deutsch algorithm that determines whether a function f : {0, 1} → {0, 1} is constant by invoking f only once. A classical algorithm requires two invocations.
Next we present the BB84 algorithm for key exchange over a quantum mechanical channel. The algorithm guarantees perfect security. This algorithm has been implemented in hardware, and key agreement is carried out over a channel of length 67 km.
Finally, we describe Shor’s polynomial-time quantum algorithms for factoring integers and for computing discrete logarithms in finite fields. These algorithms are based on a technique called quantum Fourier transform.
If quantum computers can ever be realized, RSA and most other popular cryptosystems described and not described in this book will forfeit all security guarantees. And what will happen to this book? If you don’t possess a copy of this wonderful book, just rush to your nearest book store now—they have not yet mastered the quantum technology!
There was a time when the newspapers said that only twelve men understood the theory of relativity. I do not believe there ever was such a time . . . On the other hand, I think I can safely say that nobody understands quantum mechanics.
—Richard Feynman, The Character of Physical Law, BBC, 1965
Quantum mechanics came into existence, when Werner Heisenberg, at the age of 25, proposed the uncertainty principle in 1927. It created an immediate stir in the physics community. Eventually Heisenberg and Niels Bohr came up with an interpretation of quantum mechanics, known as the Copenhagen interpretation. While many physicists (like Max Born, Wolfgang Pauli and John von Neumann) subscribed to this interpretation, many other eminent ones (including Albert Einstein, Erwin Schrödinger, Max Planck and Bertrand Russell) did not. Interested readers may consult textbooks by Sakurai [255] and Schiff [258] to study this fascinating area of fundamental science.[3]
[3] Well! We are not physicists. These books are followed in graduate and advanced undergraduate courses in many institutes and universities.
For a comprehensive treatment of quantum computation (including cryptographic and cryptanalytic quantum algorithms), we refer the reader to the book by Nielsen and Chuang [218]. Mermin’s paper [197] and course notes [198] are also good sources for learning quantum mechanics and computation, and are suitable for computer scientists. Preskill’s course notes [244] are also useful, though a bit more physics-oriented. The very readable article [243] by Preskill on the realizability of quantum computers is also worth mentioning in this context. The first known quantum algorithm is due to Deutsch [75].
Bennett and Brassard’s quantum key-exchange algorithm (BB84) appeared in [20]. The implementation due to Stucki et al. of this algorithm is reported in [293].
Shor’s polynomial-time quantum factorization and discrete-log algorithms are described in [271]. All the details missing in Section 8.4.4 can be found in this paper. No polynomial-time quantum algorithms are known to solve the elliptic curve discrete logarithm problem. Proos and Zalka [245] present an extension of Shor’s algorithm for a special class of elliptic curves. See [146] for an adaptation of this algorithm applicable to fields of characteristic 2.
| A.1 | Introduction |
| A.2 | Block Ciphers |
| A.3 | Stream Ciphers |
| A.4 | Hash Functions |
Sour, sweet, bitter, pungent, all must be tasted.
—Chinese Proverb
Unless we change direction, we are likely to end up where we are going.
—Anonymous
Not everything that can be counted counts, and not everything that counts can be counted.
—Albert Einstein
Cryptography, today, cannot bank solely on public-key (that is, asymmetric) algorithms. Secret-key (that is, symmetric) techniques also have important roles to play. This chapter is an attempt to introduce to the readers some rudimentary notions about symmetric cryptography. The sketchy account that follows lacks both the depth and the breadth of a comprehensive treatment. Given the focus of this book, Appendix A could have been omitted. Nonetheless, some attention to the symmetric technology is never irrelevant for any book on cryptology.
It remains debatable whether hash functions can be treated under the banner of this chapter—a hash function need not even use a key. If the reader is willing to accept symmetric as an abbreviation for not asymmetric, some justifications can perhaps be given. How does it matter anyway?
Block ciphers encrypt plaintext messages in blocks of fixed lengths and are more ubiquitously used than public-key encryption routines. In a sense, public-key encryption is also block encryption. Since public-key routines are much slower than (secret-key) block ciphers, it is a custom to use public-key algorithms only in specific situations, for example, for encrypting single blocks of data, like keys of symmetric ciphers.
In the rest of this chapter, we use the word bit in the conventional sense, that is, to denote a quantity that can take only two possible values, 0 and 1. It is convenient to use the symbol
to refer to the set {0, 1}. We also let
stand for the set of all bit strings of length m. Whenever we plan to refer to the field (or group) structure of
, we will use the alternative notation
.
|
A block cipher f of block-size n and of key-size r is a map
that encrypts a plaintext block m of bit length n to a ciphertext block c of bit length n under a key K, a bit string of length r. To ensure unique decryption, the map
for a fixed key K has to be a permutation of (that is, a bijective function on) |
A good block cipher has the following desirable properties:
The sizes n and r should be big enough, so that an adversary cannot exhaustively check all possibilities of m or K in feasible time.
For most, if not all, keys K, the permutations fK should be sufficiently random. In other words, if the key K is not known, it should be computationally infeasible to guess the functions fK and
. That is, it should be difficult to guess c from m or m from c, unless the key K is provided. The identity map on
, though a permutation of
, is a bad candidate for an encryption function fK. It is also desirable that the functions fK for different values of K are unpredictably selected from the set of all permutations of
. Thus, for example, taking fK to be a fixed permutation for all choices of K leads to a poor design of a block cipher f.
For most, if not all, pairs of distinct keys K1 and K2, the functions gK1 ο gK2 should not equal gK for some key K, where g stands for f or f–1 with independent choices in the three uses. A more stringent demand is that the subgroup generated by the permutations fK for all possible keys K should be a very big subset of the group of all permutations of
. If gK = gK1 ο gK2 ο · · · ο gKt for some t ≥ 2, multiple encryption (see Section A.3) forfeits its expected benefits.
A block cipher provably possessing all these good characteristics (in particular, the randomness properties) is difficult to construct in practice. Practical block ciphers are manufactured for reasonably big n and r and come with the hope of representing reasonably unpredictable permutations. We dub a block cipher good or safe, if it stands the test of time. Table A.1 lists some widely used block ciphers.
| Name | n | r |
|---|---|---|
| DES (Data Encryption Standard) | 64 | 56 |
| FEAL (Fast Data Encipherment Algorithm) | 64 | 64 |
| SAFER (Secure And Fast Encryption Routine) | 64 | 64 |
| IDEA (International Data Encryption Algorithm) | 64 | 128 |
| Blowfish | 64 | ≤ 448 |
| Rijndael, accepted as AES (Advanced Encryption Standard) by NIST (National Institute of Standards and Technology, a US government organization) | 128/192/256 | 128/192/256 |
The data encryption standard (DES) was proposed as a federal information processing standard (FIPS) in 1975. DES has been the most popular and the most widely used among all block ciphers ever designed. Although its relatively small key-size offers questionable security under today’s computing power, DES still enjoys large-scale deployment in not-so-serious cryptographic applications.
DES encryption requires a 64-bit plaintext block m and a 56-bit key K.[1] Let us plan to use the notations DESK and
to stand respectively for DES encryption and decryption functions under the key K.
[1] A DES key K = k1k2 . . . k64 is actually a 64-bit string. Only 56 bits of K are used for encryption. The remaining 8 bits are used as parity-check bits. Specifically, for each i = 1, . . . , 8 the bit k8i is adjusted so that the i-th byte (k8i – 7k8i – 6 . . . k8i) has an odd number of one-bits.
The DES algorithm first computes sixteen 48-bit keys K1, K2, . . . , K16 from K using a procedure known as the DES key schedule described in Algorithm A.1. These 16 keys are used in the 16 rounds of encryption. The key schedule uses two fixed permutations PC1 and PC2 described after Algorithm A.1 and to be read in the row-major order. Here, PC is an abbreviation for permuted choice.
|
Input: A DES key K = k1k2 . . . k64 (containing the parity-check bits). Output: Sixteen 48-bit round keys K1, K2, . . . , K16. Steps: Use PC1 to generate |
| PC1 | ||||||
|---|---|---|---|---|---|---|
| 57 | 49 | 41 | 33 | 25 | 17 | 9 |
| 1 | 58 | 50 | 42 | 34 | 26 | 18 |
| 10 | 2 | 59 | 51 | 43 | 35 | 27 |
| 19 | 11 | 3 | 60 | 52 | 44 | 36 |
| 63 | 55 | 47 | 39 | 31 | 23 | 15 |
| 7 | 62 | 54 | 46 | 38 | 30 | 22 |
| 14 | 6 | 61 | 53 | 45 | 37 | 29 |
| 21 | 13 | 5 | 28 | 20 | 12 | 4 |
| PC2 | |||||
|---|---|---|---|---|---|
| 14 | 17 | 11 | 24 | 1 | 5 |
| 3 | 28 | 15 | 6 | 21 | 10 |
| 23 | 19 | 12 | 4 | 26 | 8 |
| 16 | 7 | 27 | 20 | 13 | 2 |
| 41 | 52 | 31 | 37 | 47 | 55 |
| 30 | 40 | 51 | 45 | 33 | 48 |
| 44 | 49 | 39 | 56 | 34 | 53 |
| 46 | 42 | 50 | 36 | 29 | 32 |
DES encryption, as described in Algorithm A.2, proceeds in 16 rounds. The i-th round uses the key Ki (obtained from the key schedule) in tandem with the encryption primitive e. A fixed permutation IP and its inverse IP–1 are also used.[2]
[2] A block cipher that executes several encryption rounds with the i-th round computing the two halves as Li := Ri – 1 and Ri := Li – 1 ⊕ e(Ri – 1, Ki) for some round key Ki and for some encryption primitive e, is called a Feistel cipher. Most popular block ciphers mentioned earlier are of this type. Rijndael is an exception, and its acceptance as the new standard has been interpreted as an end of the Feistel dynasty.
It requires a specification of the round encryption function e to complete the description of DES encryption. The function e can be compactly depicted as:
e(X, J) := P(S(E(X) ⊕ J)),
|
Input: Plaintext block m = m1m2 . . . m64 and the round keys K1, . . . , K16. Output: The ciphertext block Steps: Apply the initial permutation on m to get |
| IP | |||||||
|---|---|---|---|---|---|---|---|
| 58 | 50 | 42 | 34 | 26 | 18 | 10 | 2 |
| 60 | 52 | 44 | 36 | 28 | 20 | 12 | 4 |
| 62 | 54 | 46 | 38 | 30 | 22 | 14 | 6 |
| 64 | 56 | 48 | 40 | 32 | 24 | 16 | 8 |
| 57 | 49 | 41 | 33 | 25 | 17 | 9 | 1 |
| 59 | 51 | 43 | 35 | 27 | 19 | 11 | 3 |
| 61 | 53 | 45 | 37 | 29 | 21 | 13 | 5 |
| 63 | 55 | 47 | 39 | 31 | 23 | 15 | 7 |
| IP–1 | |||||||
|---|---|---|---|---|---|---|---|
| 40 | 8 | 48 | 16 | 56 | 24 | 64 | 32 |
| 39 | 7 | 47 | 15 | 55 | 23 | 63 | 31 |
| 38 | 6 | 46 | 14 | 54 | 22 | 62 | 30 |
| 37 | 5 | 45 | 13 | 53 | 21 | 61 | 29 |
| 36 | 4 | 44 | 12 | 52 | 20 | 60 | 28 |
| 35 | 3 | 43 | 11 | 51 | 19 | 59 | 27 |
| 34 | 2 | 42 | 10 | 50 | 18 | 58 | 26 |
| 33 | 1 | 41 | 9 | 49 | 17 | 57 | 25 |
where
is an expansion function,
is a contraction function and P is a fixed permutation of
(called the permutation function). S uses eight S-boxes (substitution boxes) S1, S2, . . . , S8. Each S-box Sj is a 4 × 16 matrix with each row a permutation of 0, 1, 2, . . . , 15 and is used to convert a 6-bit string y1y2y3y4y5y6 to a 4-bit string z1z2z3z4 as follows. Let μ denote the integer with binary representation y1y6 and ν the integer with binary representation y2y3y4y5. Then, z1z2z3z4 is the 4-bit binary representation of the μ, ν-th entry in the matrix Sj. (Here, the numbering of the rows and columns starts from 0.) In this case, we write Sj(y1y2y3y4y5y6) = z1z2z3z4. Algorithm A.3 provides the description of e.
|
Input: Output: e(X, J). Steps: Y := E(X) ⊕ J (where E(x1x2 . . . x32) = x32x1x2 . . . x32x1). |
The tables for E and P are as follows.
| E | |||||
|---|---|---|---|---|---|
| 32 | 1 | 2 | 3 | 4 | 5 |
| 4 | 5 | 6 | 7 | 8 | 9 |
| 8 | 9 | 10 | 11 | 12 | 13 |
| 12 | 13 | 14 | 15 | 16 | 17 |
| 16 | 17 | 18 | 19 | 20 | 21 |
| 20 | 21 | 22 | 23 | 24 | 25 |
| 24 | 25 | 26 | 27 | 28 | 29 |
| 28 | 29 | 30 | 31 | 32 | 1 |
| P | |||
|---|---|---|---|
| 16 | 7 | 20 | 21 |
| 29 | 12 | 28 | 17 |
| 1 | 15 | 23 | 26 |
| 5 | 18 | 31 | 10 |
| 2 | 8 | 24 | 14 |
| 32 | 27 | 3 | 9 |
| 19 | 13 | 30 | 6 |
| 22 | 11 | 4 | 25 |
Finally, the eight S-boxes are presented:
| S1 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 14 | 4 | 13 | 1 | 2 | 15 | 11 | 8 | 3 | 10 | 6 | 12 | 5 | 9 | 0 | 7 |
| 0 | 15 | 7 | 4 | 14 | 2 | 13 | 1 | 10 | 6 | 12 | 11 | 9 | 5 | 3 | 8 |
| 4 | 1 | 14 | 8 | 13 | 6 | 2 | 11 | 15 | 12 | 9 | 7 | 3 | 10 | 5 | 0 |
| 15 | 12 | 8 | 2 | 4 | 9 | 1 | 7 | 5 | 11 | 3 | 14 | 10 | 0 | 6 | 13 |
| S2 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15 | 1 | 8 | 14 | 6 | 11 | 3 | 4 | 9 | 7 | 2 | 13 | 12 | 0 | 5 | 10 |
| 3 | 13 | 4 | 7 | 15 | 2 | 8 | 14 | 12 | 0 | 1 | 10 | 6 | 9 | 11 | 5 |
| 0 | 14 | 7 | 11 | 10 | 4 | 13 | 1 | 5 | 8 | 12 | 6 | 9 | 3 | 2 | 15 |
| 13 | 8 | 10 | 1 | 3 | 15 | 4 | 2 | 11 | 6 | 7 | 12 | 0 | 5 | 14 | 9 |
| S3 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 0 | 9 | 14 | 6 | 3 | 15 | 5 | 1 | 13 | 12 | 7 | 11 | 4 | 2 | 8 |
| 13 | 7 | 0 | 9 | 3 | 4 | 6 | 10 | 2 | 8 | 5 | 14 | 12 | 11 | 15 | 1 |
| 13 | 6 | 4 | 9 | 8 | 15 | 3 | 0 | 11 | 1 | 2 | 12 | 5 | 10 | 14 | 7 |
| 1 | 10 | 13 | 0 | 6 | 9 | 8 | 7 | 4 | 15 | 14 | 3 | 11 | 5 | 2 | 12 |
| S4 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7 | 13 | 14 | 3 | 0 | 6 | 9 | 10 | 1 | 2 | 8 | 5 | 11 | 12 | 4 | 15 |
| 13 | 8 | 11 | 5 | 6 | 15 | 0 | 3 | 4 | 7 | 2 | 12 | 1 | 10 | 14 | 9 |
| 10 | 6 | 9 | 0 | 12 | 11 | 7 | 13 | 15 | 1 | 3 | 14 | 5 | 2 | 8 | 4 |
| 3 | 15 | 0 | 6 | 10 | 1 | 13 | 8 | 9 | 4 | 5 | 11 | 12 | 7 | 2 | 14 |
| S5 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 12 | 4 | 1 | 7 | 10 | 11 | 6 | 8 | 5 | 3 | 15 | 13 | 0 | 14 | 9 |
| 14 | 11 | 2 | 12 | 4 | 7 | 13 | 1 | 5 | 0 | 15 | 10 | 3 | 9 | 8 | 6 |
| 4 | 2 | 1 | 11 | 10 | 13 | 7 | 8 | 15 | 9 | 12 | 5 | 6 | 3 | 0 | 14 |
| 11 | 8 | 12 | 7 | 1 | 14 | 2 | 13 | 6 | 15 | 0 | 9 | 10 | 4 | 5 | 3 |
| S6 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | 1 | 10 | 15 | 9 | 2 | 6 | 8 | 0 | 13 | 3 | 4 | 14 | 7 | 5 | 11 |
| 10 | 15 | 4 | 2 | 7 | 12 | 9 | 5 | 6 | 1 | 13 | 14 | 0 | 11 | 3 | 8 |
| 9 | 14 | 15 | 5 | 2 | 8 | 12 | 3 | 7 | 0 | 4 | 10 | 1 | 13 | 11 | 6 |
| 4 | 3 | 2 | 12 | 9 | 5 | 15 | 10 | 11 | 14 | 1 | 7 | 6 | 0 | 8 | 13 |
| S7 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 11 | 2 | 14 | 15 | 0 | 8 | 13 | 3 | 12 | 9 | 7 | 5 | 10 | 6 | 1 |
| 13 | 0 | 11 | 7 | 4 | 9 | 1 | 10 | 14 | 3 | 5 | 12 | 2 | 15 | 8 | 6 |
| 1 | 4 | 11 | 13 | 12 | 3 | 7 | 14 | 10 | 15 | 6 | 8 | 0 | 5 | 9 | 2 |
| 6 | 11 | 13 | 8 | 1 | 4 | 10 | 7 | 9 | 5 | 0 | 15 | 14 | 2 | 3 | 12 |
| S8 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13 | 2 | 8 | 4 | 6 | 15 | 11 | 1 | 10 | 9 | 3 | 14 | 5 | 0 | 12 | 7 |
| 1 | 15 | 13 | 8 | 10 | 3 | 7 | 4 | 12 | 5 | 6 | 11 | 0 | 14 | 9 | 2 |
| 7 | 11 | 4 | 1 | 9 | 12 | 14 | 2 | 0 | 6 | 10 | 13 | 15 | 3 | 5 | 8 |
| 2 | 1 | 14 | 7 | 4 | 10 | 8 | 13 | 15 | 12 | 9 | 0 | 3 | 5 | 6 | 11 |
DES decryption is analogous to DES encryption. To obtain
one first computes the round keys K1, K2, . . . , K16 using Algorithm A.1. One then calls a minor variant of Algorithm A.2. First, the roles of m and c are interchanged. That is, one inputs c instead of m, and obtains m in place of c as output. Moreover, the right half Ri in the i-th round is computed as Ri := Li – 1 ⊕ e(Ri – 1, K17 – i). In other words, DES decryption is same as DES encryption, only with the sequence of using the keys K1, K2, . . . , K16 reversed. Solve Exercise A.1 in order to establish the correctness of this decryption procedure.
Some test vectors for DES are given in Table A.2.
| Key | Plaintext block | Ciphertext block |
|---|---|---|
| 0101010101010101 | 0000000000000000 | 8ca64de9c1b123a7 |
| fefefefefefefefe | ffffffffffffffff | 7359b2163e4edc58 |
| 3101010101010101 | 1000000000000001 | 958e6e627a05557B |
| 1010101010101010 | 1111111111111111 | f40379ab9e0ec533 |
| 0123456789abcdef | 1111111111111111 | 17668dfc7292532d |
| 1010101010101010 | 0123456789abcdef | 8a5ae1f81ab8f2dd |
| fedcba9876543210 | 0123456789abcdef | ed39d950fa74bcc4 |
DES, being a popular block cipher, has gone through a good amount of cryptanalytic studies. At present, linear cryptanalysis and differential cryptanalysis are the most sophisticated attacks on DES. But the biggest problem with DES is its relatively small key size (56 bits). An exhaustive key search for a given plaintext–ciphertext pair needs carrying out a maximum of 256 encryptions in order to obtain the correct key. But how big is this number 256 = 72,057,594,037,927,936 (nearly 72 quadrillion) in a cryptographic sense?
In order to review this question, RSA Security Inc. posed several challenges for obtaining the DES key from given plaintext–ciphertext pairs. The first challenge, posed in January 1997, was broken by Rocke Verser of Loveland, Colorado, with approximately 96 days of computing. DES Challenge II-1 was broken in February 1998 by distributed.net with 41 days of computing, and the DES challenge II-2 was cracked in July 1998 by the Electronic Frontier Foundation (EFF) in just 56 hours. Finally, DES Challenge III was broken in a record of 22 hours 15 minutes in January 1999. The computations were carried out in EFF’s supercomputer Deep Crack with collaborative efforts from nearly 105 PCs on the Internet guided by distributed.net. These figures demonstrate that DES offers hardly any security against a motivated adversary.
Another problem with DES is that its design criteria (most importantly, the objectives behind choosing the particular S-boxes) were never made public. Chances remain that there are hidden backdoors, though none has been discovered till date.
The advanced encryption standard (AES) [219] has superseded the older standard DES. The Rijndael cipher designed by Daemen and Rijmen has been accepted as the advanced standard. As mentioned in Footnote 2, Rijndael is not a Feistel cipher. Its working is based on the arithmetic in the finite field
and in the finite ring
.
AES encrypts data in blocks of 128 bits. Let B = b0b1 . . . b127 be a block of data, where each bi is a bit. Keeping in view typical 32-bit processors, each such block B is represented as a sequence of four 32-bit words, that is, B = B0B1B2B3, where Bi represents the bit string b32ib32i+1 . . . b32i+31. Each word C = c0c1 . . . c31, in turn, is viewed as a sequence of four octets, that is, C = C0C1C2C3, where Ci stores the bit string c8ic8i+1 . . . c8i+7. Each octet is identified as an element of
, whereas an entire 32-bit word is identified with an element of
.
The field
is represented as
, where f(X) is the irreducible polynomial X8 + X4 + X3 + X + 1. Let x := X + 〈f(X)〉. The element
is identified with the octet d7d6 . . . d1d0. Thus, the i-th octet c8ic8i+1 . . . c8i+7 in a word is treated as the finite field element
.
Now, let us explain the interpretation of a 32-bit word C = C0C1C2C3. The
-algebra
is not a field, since the polynomial Y4 + 1 is reducible (over
and so over
). However, each element β of A can be uniquely expressed as a polynomial β = α3y3 + α2y2 + α1y + α0, where y := Y + 〈Y4 + 1〉 and where each αi is an element of
. As described in the last paragraph, each αi is represented as an octet. We take Ci to be the octet representing α3 – i, that is, the 32-bit word α3α2α1α0 stands for the element
.
and A are rings and hence equipped with arithmetic operations (addition and multiplication). These operations are different from the usual addition and multiplication operations defined on octets and words. For example, the addition of two octets or words under the AES interpretation is the same as bit-wise XOR of octets or words. The AES multiplication of octets and words, on the other hand, involves polynomial arithmetic and reduction modulo the defining polynomials and so cannot be expressed so simply as addition. To resolve ambiguities, let us plan to denote the multiplication of
by ⊙ and that of A by ⊗, whereas regular multiplication symbols (·, × and juxtaposition) stand for the standard multiplication on octets or words. Exercises A.5, A.6 and A.7 discuss about efficient implementations of the arithmetic in
and A.
Every non-zero element
is invertible; the inverse is denoted by α–1 and can be computed by the extended gcd algorithm on polynomials over
. With an abuse of notation, we take 0–1 := 0. Every non-zero element of A is not invertible (under the multiplication of A). The AES algorithm uses the following invertible element β := 03010102 (in hex notation); its inverse is β–1 = 0b0d090e.
The AES algorithm uses an object called a state, comprising 16 octets arranged in a 4 × 4 array. Each message block also consists of 16 octets. Let M = μ0μ1 . . . μ15 be a message block (of 16 octets). This block is translated to a state as follows:
Equation A.1

Thus, each word in the block is relocated in a column of the state. At the end of the encryption procedure, AES makes the reverse translation of a state to a block:
Equation A.2

A collection of round keys is generated from the given AES key K. The number of rounds of the AES encryption algorithm depends on the size of the key. Let us denote the number of words in the AES key by Nk and the corresponding number of rounds by Nr. We have:

One first generates an initial 128-bit key K0K1K2K3. Subsequently, for the i-th round, 1 ≤ i ≤ Nr, a 128-bit key K4iK4i+1K4i+2K4i+3 is required. Here, each Kj is a 32-bit word. The key schedule (also called key expansion) generates a total of 4(Nr + 1) words K0, K1, . . . , K4Nr+3 from the given secret key K using a procedure described in Algorithm A.4. Here, (02)j – 1 stands for the octet that represents the element
. The following table summarizes these values for j = 1, 2, . . . , 15.
| j | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| xj – 1 | 01 | 02 | 04 | 08 | 10 | 20 | 40 | 80 | 1b | 36 | 6c | d8 | ab | 4d | 9a |
The transformation SubWord on a word T = τ0τ1τ2τ3 is the octet-wise application of AES S-box substitution SubOctet, that is,
SubWord(T) = SubOctet(τ0) ‖ SubOctet(τ1) ‖ SubOctet(τ2) ‖ SubOctet(τ3).
|
Input: (Nk and) the secret key K = κ0κ1 ... κ4Nk – 1, where each κi is an octet. Output: The expanded keys K0, K1, . . . , K4Nr+3. Steps: /* Initially copy the bytes of K */ |
The transformation SubOctet is also used in each encryption round and is now described. Let A = a0a1 . . . a7 be an octet that can be identified with an element of
as mentioned earlier. Let B = b0b1 . . . b7 denote the octet representing the inverse of this finite field element. (We take 0–1 = 0.) One then applies the following affine transformation on B to generate the final value C := SubOctet(A) := c0c1 . . . c7. Here, D = d0d1 . . . d7 is the constant octet 63 = 01100011.
Equation A.3

In order to speed up this octet substitution, one may use table lookup. Since the output octet C depends only on the input octet A, one can precompute a table of values of SubOctet(A) for the 256 possible values of A. This list is given in Table A.3. The table is to be read in the row-major fashion. In other words, if hi and lo respectively represent the most and the least significant four bits of A, then SubOctet(A) can be read off from the entry in the table having row number hi and column number lo. For example, SubOctet(a7) = 5c. In an actual implementation, a one-dimensional array is to be used. We use a two-dimensional format in Table A.3 for the sake of clarity of presentation.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f | |
| 0 | 63 | 7c | 77 | 7b | f2 | 6b | 6f | c5 | 30 | 01 | 67 | 2b | fe | d7 | ab | 76 |
| 1 | ca | 82 | c9 | 7d | fa | 59 | 47 | f0 | ad | d4 | a2 | af | 9c | a4 | 72 | c0 |
| 2 | b7 | fd | 93 | 26 | 36 | 3f | f7 | cc | 34 | a5 | e5 | f1 | 71 | d8 | 31 | 15 |
| 3 | 04 | c7 | 23 | c3 | 18 | 96 | 05 | 9a | 07 | 12 | 80 | e2 | eb | 27 | b2 | 75 |
| 4 | 09 | 83 | 2c | 1a | 1b | 6e | 5a | a0 | 52 | 3b | d6 | b3 | 29 | e3 | 2f | 84 |
| 5 | 53 | d1 | 00 | ed | 20 | fc | b1 | 5b | 6a | cb | be | 39 | 4a | 4c | 58 | cf |
| 6 | d0 | ef | aa | fb | 43 | 4d | 33 | 85 | 45 | f9 | 02 | 7f | 50 | 3c | 9f | a8 |
| 7 | 51 | a3 | 40 | 8f | 92 | 9d | 38 | f5 | bc | b6 | da | 21 | 10 | ff | f3 | d2 |
| 8 | cd | 0c | 13 | ec | 5f | 97 | 44 | 17 | c4 | a7 | 7e | 3d | 64 | 5d | 19 | 73 |
| 9 | 60 | 81 | 4f | dc | 22 | 2a | 90 | 88 | 46 | ee | b8 | 14 | de | 5e | 0b | db |
| a | e0 | 32 | 3a | 0a | 49 | 06 | 24 | 5c | c2 | d3 | ac | 62 | 91 | 95 | e4 | 79 |
| b | e7 | c8 | 37 | 6d | 8d | d5 | 4e | a9 | 6c | 56 | f4 | ea | 65 | 7a | ae | 08 |
| c | ba | 78 | 25 | 2e | 1c | a6 | b4 | c6 | e8 | dd | 74 | 1f | 4b | bd | 8b | 8a |
| d | 70 | 3e | b5 | 66 | 48 | 03 | f6 | 0e | 61 | 35 | 57 | b9 | 86 | c1 | 1d | 9e |
| e | e1 | f8 | 98 | 11 | 69 | d9 | 8e | 94 | 9b | 1e | 87 | e9 | ce | 55 | 28 | df |
| f | 8c | a1 | 89 | 0d | bf | e6 | 42 | 68 | 41 | 99 | 2d | 0f | b0 | 54 | bb | 16 |
AES encryption is described in Algorithm A.5. The algorithm first converts the input plaintext message block to a state, applies a series of transformations on this state and finally converts the state back to a message (the ciphertext).
The individual state transition transformations are now explained. The transition SubState is an octet-by-octet application of the substitution function SubOctet, that is, SubState maps

where
for all r, c. The transform ShiftRows cyclically left rotates the r-th row by r byte positions, that is, maps

The AddKey operation uses four 32-bit round keys L0, L1, L2, L3. Name the octets of Li as λi0λi1λi2λi3. The i-th key Li is XORed with the i-th column of the state, that is, AddKey transforms

Finally, the MixCols transform multiplies each column of the state, regarded as an element of
, by the element
, where the coefficients (expressions within square brackets) are octet values in hexadecimal, that can be identified with elements of
. For the c-th column, this transformation can be represented as:

|
Input: The plaintext message M = μ0μ1 . . . μ15 and the round keys K0, K1, . . . , K4Nr+3. Output: Ciphertext message C = γ0γ1 . . . γ15. Steps: Convert M to the state S. /* Use Transform (A.1) */ |
AES decryption involves taking inverse of each state transition performed during encryption. The key schedule needed for encryption is to be used during decryption too. The straightforward decryption routine is given in Algorithm A.6.
|
Input: The ciphertext message C = γ0γ1 . . . γ15 and the round keys K0, K1, . . . , K4Nr+3. Output: The recovered plaintext message M = μ0μ1 . . . μ15. Steps: Convert C to the state S. /* Use Transform (A.1) */ |
What remains is a description of the inverses of the basic state transformations. AddKey involves octet-by-octet XORing and so is its own inverse. Table A.4 summarizes the inverse of the substitution transition SubOctet (Exercise A.8). For computing SubState–1(S), one should apply SubOctet–1 on each octet of S. The inverse of ShiftRows is also straightforward and can be given by

Finally, MixCols–1 involves multiplication of each column by the inverse of the element
, that is, by the element [0b]y3 + [0d]y2 + [09]y + [0e]. So MixCols–1 transforms each column of the state as follows:

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f | |
| 0 | 52 | 09 | 6a | d5 | 30 | 36 | a5 | 38 | bf | 40 | a3 | 9e | 81 | f3 | d7 | fb |
| 1 | 7c | e3 | 39 | 82 | 9b | 2f | ff | 87 | 34 | 8e | 43 | 44 | c4 | de | e9 | cb |
| 2 | 54 | 7b | 94 | 32 | a6 | c2 | 23 | 3d | ee | 4c | 95 | 0b | 42 | fa | c3 | 4e |
| 3 | 08 | 2e | a1 | 66 | 28 | d9 | 24 | b2 | 76 | 5b | a2 | 49 | 6d | 8b | d1 | 25 |
| 4 | 72 | f8 | f6 | 64 | 86 | 68 | 98 | 16 | d4 | a4 | 5c | cc | 5d | 65 | b6 | 92 |
| 5 | 6c | 70 | 48 | 50 | fd | ed | b9 | da | 5e | 15 | 46 | 57 | a7 | 8d | 9d | 84 |
| 6 | 90 | d8 | ab | 00 | 8c | bc | d3 | 0a | f7 | e4 | 58 | 05 | b8 | b3 | 45 | 06 |
| 7 | d0 | 2c | 1e | 8f | ca | 3f | 0f | 02 | c1 | af | bd | 03 | 01 | 13 | 8a | 6b |
| 8 | 3a | 91 | 11 | 41 | 4f | 67 | dc | ea | 97 | f2 | cf | ce | f0 | b4 | e6 | 73 |
| 9 | 96 | ac | 74 | 22 | e7 | ad | 35 | 85 | e2 | f9 | 37 | e8 | 1c | 75 | df | 6e |
| a | 47 | f1 | 1a | 71 | 1d | 29 | c5 | 89 | 6f | b7 | 62 | 0e | aa | 18 | be | 1b |
| b | fc | 56 | 3e | 4b | c6 | d2 | 79 | 20 | 9a | db | c0 | fe | 78 | cd | 5a | f4 |
| c | 1f | dd | a8 | 33 | 88 | 07 | c7 | 31 | b1 | 12 | 10 | 59 | 27 | 80 | ec | 5f |
| d | 60 | 51 | 7f | a9 | 19 | b5 | 4a | 0d | 2d | e5 | 7a | 9f | 93 | c9 | 9c | ef |
| e | a0 | e0 | 3b | 4d | ae | 2a | f5 | b0 | c8 | eb | bb | 3c | 83 | 53 | 99 | 61 |
| f | 17 | 2b | 04 | 7e | ba | 77 | d6 | 26 | e1 | 69 | 14 | 63 | 55 | 21 | 0c | 7d |
AES decryption is as efficient as AES encryption, since each state transformation primitive has the same structure as its inverse. However, the sequence of application of these primitives in the loop (rounds) for decryption differs from that for encryption. For some implementations, mostly in hardware, this may be a problem. Compare this with DES for which the encryption and decryption algorithms are identical save the sequence of using the round keys (Exercise A.1). With little additional effort AES can also be furnished with this useful property of DES. All we have to do is to use a different key schedule for decryption. The necessary modifications are explored in Exercise A.9.
Table A.5 provides the ciphertexts for the plaintext block
M = 00112233445566778899aabbccddeeff
under different keys.
| Cipher | Key | Ciphertext block |
|---|---|---|
| AES-128 | 0001020304050607 \ 08090a0b0c0d0e0f | 69c4e0d86a7b0430 \ d8cdb78070b4c55a |
| AES-192 | 0001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617 | dda97ca4864cdfe0 \ 6eaf70a0ec0d7191 |
| AES-256 | 0001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617 \ 18191a1b1c1d1e1f | 8ea2b7ca516745bf \ eafc49904b496089 |
AES has been designed so that linear and differential attacks are infeasible. Another attack known as the square attack has been proposed by Lucks [184] and Ferguson et al. [93], but at present can tackle less number of rounds than used in Rijndael encryption. Also see Gilbert and Minier [112] to know about the collision attack.
The distinct algebraic structure of AES encryption invites special algebraic attacks. One such potential attack (the XSL attack) has been proposed by Courtois and Pieprzyk [68]. Although this attack has not yet been proved to be effective, a better understanding of the algebra may, in foreseeable future, lead to disturbing consequences for the advanced standard.
For more information on AES, read the book [71] from the designers of the cipher. Also visit the following Internet sites:
| http://www.esat.kuleuven.ac.be/~rijmen/rijndael/ | Rijndael home |
| http://csrc.nist.gov/CryptoToolkit/aes/index1.html | NIST site for AES |
| http://www.cryptosystem.net/aes/ | Algebraic attacks |
Multiple encryption presents a way to achieve a desired level of security by using block ciphers of small key sizes. The idea is to cascade several stages of encryption and/or decryption, with different stages working under different keys. Figure A.1 illustrates double and triple encryption for a block cipher f. Each gi or hj represents either the encryption or the decryption function of f under the given key.

For double encryption, we have K1 ≠ K2 and both g1 and g2 are usually the encryption function. Unless fK2 ο fK1 is the same as fK for some key K and if the permutations of f are reasonably random, it appears at the first glance that double encryption increases the effective key size by a factor of two. Unfortunately, this is not the case. The meet-in-the-middle attack on double encryption works as follows.
Suppose that an adversary knows a plaintext–ciphertext pair (m, c) under the unknown keys K1, K2. We assume as before that f has block-size n and key-size r. The adversary computes for each possibility of
the encrypted message xi := fi(m). She also computes for each
the decrypted message
. Now, (i, j) is a possible value of (K1, K2) if and only if
.
A given pair (m, c) usually gives many such candidates (i, j) for (K1, K2). More precisely, if each
is assumed to be a random permutation of
, for a given i we have the equality
for an expected number of 2r/2n values of j. Considering all possibilities for i gives an expected number of 2r × 2r/2n = 22r – n candidate pairs (i, j). If f = DES, this number is 22 × 56–64 = 248.
If a second pair (m′, c′) under (K1, K2) is also known to the adversary, then for a given i the pair (i, j) is consistent with both (m, c) and (m′, c′) with probability 2r/(2n × 2n). Thus, we get an expected number of (2r × 2r)/(2n × 2n) = 22r – 2n candidates (i, j). For DES, this number is 2–16. This implies that it is very unlikely that a false candidate (i, j) satisfies both (m, c) and (m′, c′). Thus, with high probability the adversary uniquely identifies the double DES key (K1, K2) from two plaintext–ciphertext pairs.
This attack calls for O(2r) encryptions and O(2r) decryptions. With the assumption that each encryption takes roughly the same time as each decryption (as in the case of DES), the adversary spends a time for O(2r) encryptions. Moreover, she can find all the matches
in O(r2r) time. This implies that double encryption increases the effective key size (over single encryption) by a few bits only. On the other hand, both the actual key size and the encryption time get doubled. In view of these shortcomings, double encryption is rarely used in practice.
For the triple encryption scheme of Figure A.1, a meet-in-the-middle attack at x or y demands an effort equivalent to O(22r) encryptions, that is, the effective key size gets doubled. It is, therefore, customary to take K1 = K3 and K2 different from this common value. The actual key size also gets doubled with this choice—one doesn’t have to remember K3 separately. It is also a common practice to take h1 and h3 the encryption function (under K1 = K3) and h2 the decryption function (under K2). One often calls this particular triple encryption an E-D-E scheme.
In practice, the length of the message m to be encrypted need not equal the block length n of the block cipher f. One then has to break up m into blocks of some fixed length n′ ≤ n and encrypt each block using the block cipher. In order to make the length of m an integral multiple of n′, one may have to pad extra bits to m (say, zero bits at the end). It is often necessary to store the initial size of m in a separate block, say, after the last message block. In what follows, we shall assume that the input message m gives rise to l blocks m1, m2, . . . , ml each of size n′. The corresponding ciphertext blocks c1, c2, . . . , cl will also be of bit length n′ each. The reason for choosing the block size n′ ≤ n will be clear soon.
The easiest way to encrypt multiple blocks m1, . . . , ml is to take n′ = n and encrypt each block mi as ci := fK(mi). Decryption is analogous:
. This mode of operation of a block cipher is called the electronic code-book or the ECB mode. Algorithms A.7 and A.8 describe this mode.
|
Input: The plaintext blocks m1, . . . , ml and the key K. Output: The ciphertext c = c1 . . . cl. Steps: for i = 1, . . . , l { ci := fK(mi) } |
|
Input: The ciphertext blocks c1, . . . , cl and the key K. Output: The plaintext m = m1 . . . ml. Steps: for |
In this mode, identical message blocks encrypt to identical ciphertext blocks (under the same key), that is, partial information about the plaintext may be leaked out. The following three modes overcome this problem.
In the cipher-block chaining or the CBC mode, one takes n′ = n and each plaintext block is first XOR-ed with the previous ciphertext block and then encrypted. In order to XOR the first plaintext block, one needs an n-bit initialization vector (IV). The IV need not be kept secret and may be sent along with the ciphertext blocks.
|
Input: The plaintext blocks m1, . . . , ml, the key K and the IV. Output: The ciphertext c = c1 . . . cl. Steps: c0 := IV. for i = 1, . . . , l { ci := fK(mi ⊕ ci – 1). } |
|
Input: The ciphertext blocks c1, . . . , cl, the key K and the IV. Output: The plaintext m = m1 . . . ml. Steps: c0 := IV. for |
In the cipher feedback or the CFB mode, one chooses
. In this mode, the plaintext blocks are not encrypted, but masked by XOR-ing with a stream of random keys generated from a (not necessarily secret) n-bit IV. In this sense, the CFB mode works like a stream cipher (see Section A.3).
|
Input: The plaintext blocks m1, . . . , ml, the key K and the IV. Output: The ciphertext c = c1 . . . cl. Steps: k0 := IV. /* Initialize the key stream */ |
Algorithm A.11 explains CFB encryption. The notation msbk(z) (resp. lsbk(z)) stands for the most (resp. least) significant k bits of a bit string z. For CFB decryption (Algorithm A.12), the identical key stream k0, k1, . . . , kl is generated and used to mask off the message blocks from the ciphertext blocks.
|
Input: The ciphertext blocks c1, . . . , cl, the key K and the IV. Output: The plaintext m = m1 . . . ml. Steps: k0 := IV. |
The output feedback or the OFB mode also works like a stream cipher by masking the plaintext blocks using a stream of keys. The key stream in the OFB mode is generated by successively applying the block encryption function on an n-bit (not necessarily secret) IV. Here, one chooses any
.
OFB encryption is explained in Algorithm A.13. OFB decryption (Algorithm A.14) is identical, with only the roles of m and c interchanged, and requires the generation of the same key stream k0, k1, . . . , kl used during encryption.
|
Input: The plaintext blocks m1, . . . , ml, the key K and the IV. Output: The ciphertext c = c1 . . . cl. Steps: k0 := IV. /* Initialize the key stream */ |
|
Input: The ciphertext blocks c1, . . . , cl, the key K and the IV. Output: The plaintext m = m1 . . . ml. Steps: k0 := IV. /* Initialize the key stream */ |
| A.1 | Let us use the notations of Algorithm A.2. For a message m and round keys Ki, we have the values V, Li, Ri, W, c. For another message m′ and another set of round keys , let us denote these values by V′, , , W′, c′. Show that if m′ = c and if for i = 1, . . . , 16, then and for all i = 0, 1, . . . , 16. Deduce that in this case we have c′ = m. (This shows that DES decryption is the same as DES encryption with the key schedule reversed.)
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.2 | For a bit string z, let denote the bit-wise complement of z. Deduce that , that is, complementing both the plaintext message and the key complements the ciphertext message. [H]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.3 | A DES key K is said to be weak, if the DES key schedule on K gives K1 = K2 = · · · = K16. Show that there are exactly four weak DES keys which in hexadecimal notation are:
0101 0101 0101 0101 FEFE FEFE FEFE FEFE 1F1F 1F1F 0E0E 0E0E E0E0 E0E0 F1F1 F1F1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.4 | A DES key K is said to be anti-palindromic, if the DES key schedule on K gives for all i = 1, . . . , 16. Show that the following four DES keys (in hexadecimal notation) are anti-palindromic:
01FE 01FE 01FE 01FE FE01 FE01 FE01 FE01 1FE0 1FE0 0EF1 0EF1 E01F E01F F10E F10E | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.5 | Represent , where f(X) = X8 + X4 + X3 + X + 1 (Section A.2.2).
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.6 | The multiplication of can be made table-driven. Since this field contains 256 elements, a 256 × 256 array suffices to store all the products. That requires a storage of 64 kb. We can considerably reduce the storage by using discrete logs.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.7 | Denote the multiplication of by ⊗ (Section A.2.2).
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.9 |
Algorithm A.15. Equivalent form of AES decryption
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.10 | Show that a multiple encryption scheme with exactly k stages provides an effective security of ⌈k/2⌉ keys against the meet-in-the-middle attack. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A.11 | Consider a message m broken into blocks m1, . . . , ml, encrypted to c1, . . . , cl and sent to an entity.
|
A block cipher encrypts large blocks of data using a fixed key. A stream cipher, on the other hand, encrypts small blocks of data (typically bits or bytes) using different keys. The security of a stream cipher stems from the unpredictability of guessing the keys in the key stream. Here, we deal with stream ciphers that encrypt bit-by-bit.
|
A stream cipher F encrypts a plaintext m = m1m2 . . . ml to a ciphertext c = c1c2 . . . cl using a key stream k = k1k2 . . . kl, where each mi, ci, |
|
An obvious choice for fκ is fκ(μ) := μ ⊕ κ, so that
So Pr(ci = 1) is 1/2 too, that is, the two values of ci are equally likely, irrespective of the probability p. This, in turn, implies that the ciphertext bit ci provides absolutely no information about the plaintext bit mi. In this sense, this stream cipher, called Vernam’s one-time pad, offers unconditional security. |
Generating a truly random key stream of arbitrary length is a difficult problem. Moreover, the same key stream is used for decryption and has to be reproduced at the recipient’s end. In view of these difficulties, Vernam’s one-time pad is used only very rarely.
A practical solution is to use a pseudorandom key stream k1, k2, k3, . . . generated from a secret key J of fixed small length. The bits in the pseudorandom stream should be sufficiently unpredictable and the length of J adequately large, so as to preclude the possibility of mounting a successful attack in feasible time.
Depending on how the key stream is generated from J, stream ciphers can be broadly classified in two categories. In a synchronous stream cipher, each key in the key stream is generated independent of any plaintext or ciphertext bit, whereas in a self-synchronizing (or asynchronous) stream cipher each key in the stream is generated based only on J and a fixed number of previous ciphertext bits. Algorithms A.16 and A.17 explain the workings of these two classes of stream ciphers.
|
Input: The message m = m1m2 . . . ml, the secret key J and a (not necessarily secret) initial state S of the key stream generator. Output: The ciphertext c = c1c2 . . . cl. Steps: s0 := S. /* Initialize the state of the key stream generator */ |
|
Input: The message m = m1m2 . . . ml, the secret key J and a (not necessarily secret) initial state (c–t+1, c–t+2, . . . , c0). Output: The ciphertext c = c1c2 . . . cl. Steps: for i = 1, . . . , l { |
A block cipher in the OFB mode works like a synchronous stream cipher, whereas a block cipher in the CFB mode like an asynchronous stream cipher.
Linear feedback shift registers (LFSRs), being suitable for hardware implementation and possessing good cryptographic properties, are widely used as basic building blocks for many stream ciphers. Figure A.2 depicts an LFSR L with d stages or delay elements D0, D1, . . . , Dd–1, each capable of storing one bit. The state of the LFSR is described by the d-tuple s := (s0, s1, . . . , sd–1), where si is the bit stored in Di. It is often convenient to treat s as the column vector (s0 s1 . . . sd–1)t.

There are d control bits a0, a1, . . . , ad–1. The working of the LFSR is governed by a clock. At every clock pulse the bits stored in the delay elements are bit-wise AND-ed with the respective control bits and the AND gate outputs are XOR-ed to obtain the bit sd. The bit s0 stored in D0 is delivered to the output. Finally, for each
the delay element Di sets its stored bit to si+1, that is, the register experiences a right shift by one bit with the feedback bit sd filling up the leftmost delay element.
Thus, a clock pulse changes the state of the LFSR from s := (s0, s1, . . . , sd–1) to t := (t0, t1, . . . , td–1), where s and t are related as:

If s and t are treated as column vectors, this can be compactly represented as
Equation A.4

where the transition matrix ΔL is given by
Equation A.5

When the LFSR L is initialized to a non-zero state, the bit stream output by it can be used as a pseudorandom bit sequence. For a given set of control bits a0, . . . , ad–1, the next state of L is uniquely determined by its previous state only. Since L has only finitely many (2d – 1) non-zero states, the output bit sequence of L must be (eventually) periodic. For cryptographic use, the period of the bit sequence should be as large as possible. If the period is maximum possible, namely 2d – 1, L is called a maximum-length LFSR.
Many properties of the LFSR L can be explained in terms of its connection polynomial defined as:
Equation A.6

For example, assume that a0 = 1, so that deg CL(X) = d. Assume further that CL(X) is irreducible (over
). Consider the extension
of
, represented as
, where
. It turns out that if x is a generator of the cyclic group
, then L is a maximum-length LFSR. In this case, the polynomial CL(X) is called a primitive polynomial of
.[3]
[3] A primitive polynomial defined in this way has nothing to do with a primitive polynomial over a UFD, defined in Exercise 2.54. Mathematicians often go for such multiple definitions of the same terms and phrases.
The bit sequence output by an LFSR L can be used as the key stream k1k2 . . . kl in order encrypt a plaintext stream m1m2 . . . ml to the ciphertext stream c1c2 . . . cl with ci := mi ⊕ ki. The number d of stages in L should be chosen reasonably large and the control bits a0, . . . , ad–1 should be kept secret. The initial state of L may or may not be a secret. For suitable choices of a0, . . . , ad–1, the output sequences from L possess good statistical properties and hence L appears to be an efficient key stream generator.
Unfortunately, such a key stream generator is vulnerable to a known-plaintext attack as follows. Suppose that mi and ci are known for i = 1, 2, . . . , 2d. One can easily compute ki = mi⊕ci for all these i. Let si := (ki, ki+1, . . . , ki+d–1) denote the state of L while outputting ci. By Congruence (A.4), si+1 ≡ ΔLsi (mod 2) for i = 1, 2, . . . , d. Define the d × d matrices S := (s1 s2 . . . sd) and T := (s2 s3 . . . sd+1), where si are treated as column vectors as before. We then have T ≡ ΔLS (mod 2). If S is invertible modulo 2, then ΔL and hence the secret control bits can be easily computed. In order to avoid this known-plaintext attack, one should introduce some non-linearity in the LFSR outputs.
A non-linear combination generator combines the output bits u1, u2, . . . , ur from r LFSRs by a non-linear function
in order to generate the key
. The Geffe generator of Figure A.3 gives a well-known example. It uses the non-linear function
, that is,
(mod 2).

A non-linear filter generator generates the key as k = ψ(s0, s1, . . . , sd–1), where s0, . . . , sd–1 are the bits stored in the delay elements of a single LFSR and where ψ is a non-linear function.
Several other ad hoc schemes can destroy the linearity of an LFSR’s output. The shrinking generator, for example, uses two LFSRs L1 and L2. Both L1 and L2 are simultaneously clocked. If the output of L1 is 1, the output of L2 goes to the key stream, whereas if the output of L1 is 0, the output of L2 is discarded. The resulting key stream is an irregularly (and non-linearly) decimated subsequence of the output sequence of L2.
The non-linear function (
or ψ) eliminates the chance of mounting the straightforward known-plaintext attack described above. However, for polynomial non-linearities certain algebraic attacks are known, for example, see Courtois and Pieprzyk [67, 66].[4] Solving non-linear polynomial equations is usually more difficult than solving linear equations, but ample care should be taken to avoid accidental encounters with easily solvable systems. Complacency is a word ever excluded from a cryptologer’s world.
[4] Visit the Internet site http://www.cryptosystem.net/ for more papers in related areas.
| A.12 | For each of the two classes of stream ciphers (Algorithms A.16, A.17) discuss the effects on decryption of
of a ciphertext bit during transmission. |
| A.13 | Suppose that the LFSR L of Figure A.4 is initialized to the state (1, 0, 0, 0). Derive the sequence of state transitions of the LFSR, and hence determine the output bit sequence of L. Argue that L is a maximum-length LFSR. Verify (according to the definition) that the connection polynomial CL(X) is primitive.
Figure A.4. An LFSR with four stages
|
| A.14 | Let ΔL and CL(X) be as in Equations (A.5) and (A.6). Show that:
|
| A.15 | Let L be an LFSR with connection polynomial CL(X). Further let , , denote a power series[5] over . Show that L generates the (infinite) bit sequence s0, s1, s2, . . . if and only if the product CL(X)S(X) modulo 2 is a polynomial of degree < d.
|
| A.16 | Let σ = s0s1 . . . sd–1 ≠ 00 . . . 0 be a bit string of length d ≥ 1. The linear complexity L(σ) of σ is defined to be the length of the shortest LFSR that generates σ as the leftmost part of its output (after it is initialized to a suitable state). Prove that:
|
| A.17 | Assume that the three LFSR outputs u1, u2, u3 in the Geffe generator are uniformly distributed. Show that Pr(k = u1) = 3/4 = Pr(k = u3). Thus, partial information about the internal details of the Geffe generator is leaked out in the key stream. |
A hash function maps bit strings of any length to bit strings of a fixed length n. For practical uses, hash functions should be easy to compute, that is, computing the hash of x should be doable in time polynomial in the size of x.
Since a hash function H maps an infinite set to a finite set, there must exist pairs (x1, x2) of distinct strings with H(x1) = H(x2). Such a pair is called a collision for H. For cryptographic applications (for example, for generating digital signatures), it should be computationally infeasible to find collisions for hash functions. To elaborate this topic further we mention the following two desirable properties of hash functions used in cryptography.
|
A hash function H is called second pre-image resistant, if it is computationally infeasible[6] to find, for a given bit string x1, a second bit string x2 with H(x1) = H(x2).
|
|
A hash function H is called collision resistant, if it is computationally infeasible to find any two distinct bit strings x1 and x2 with H(x1) = H(x2). |
In order to prevent existential forgery (Exercise 5.15) of digital signatures, hash functions should also be difficult to invert.
|
An n-bit hash function H is called first pre-image resistant (or simply pre-image resistant), if it is computationally infeasible to find, for almost all bit strings y of length n, a bit string x (of any length) such that y = H(x). The qualification almost all in the last sentence was necessary, since one can compute and store the pairs (xi, H(xi)), i = 1, 2, . . . , k, for some small k and for some xi of one’s choice. If the given y turns out to be one of these hash values H(xi), a pre-image of y is easily available. |
A hash function (provably or believably) satisfying all these three properties is called a cryptographic hash function. A hash function having first and second pre-image resistance is often called a one-way hash function. Some authors require both second pre-image resistance and collision resistance to define a collision-resistant hash function, but here we stick to Definitions A.3 and A.4. In what follows, an unqualified use of the phrase hash function indicates a cryptographic hash function.
Most of the properties of a cryptographic hash function are mutually independent. However, we have the following implication.
|
A collision resistant hash function is second pre-image resistant. Proof Let H be a (non-cryptographic) hash function which is not second pre-image resistant. This means that there is an algorithm A that efficiently computes second pre-images, except perhaps for a vanishingly small fraction of inputs. Choose a random bit string x1. The probability that x1 is not a bad input to A is very high and, in that case, A outputs a second pre-image x2 quickly. This gives us an efficient randomized algorithm to compute collisions (x1, x2) for H. |
The converse of Proposition A.1 is not true: A second pre-image resistant hash function need not be collision resistant (Exercise A.19). Also collision resistance (or second pre-image resistance) does not imply first pre-image resistance (Exercise A.20), and first pre-image resistance does not imply second pre-image resistance (Exercise A.21).
A hash function may or may not be used in conjunction with a secret key. An unkeyed hash function is typically used to check the integrity of a message and is often called a modification detection code (MDC). A keyed hash function, on the other hand, is usually employed to authenticate the origin of a message (in addition to verifying the integrity of the message) and so is often called a message authentication code (MAC).
Let us now describe a generic method of constructing hash functions. We start by defining the following basic building block.
|
Let m, |
Since m > n, collisions must exist for F. For cryptographic use, collisions should be difficult to locate. We can define first and second pre-image resistance and collision resistance of compression functions as before.
|
Input: A compression function Output: The hash value H(x). Steps: Let λ be the bit length of x. |
Algorithm A.18 demonstrates how a compression function can be used to design an n-bit hash function H. The input message x is first broken into l ≥ 0 blocks each of bit length r, after padding zero bits, if necessary. The initial bit length λ of x is then stored in a new block. This implies that H cannot handle bit strings of length ≥ 2r. For a reasonably big r, this is not a practical limitation. Storing λ is necessary for several reasons. First, it ensures that the for loop is executed at least once for any message. This prevents the trivial hash value 0r (the bit string of length r containing zero bits only) for the null message. Moreover, if hi = 0r for some
, then, without the length block, we would get H(x1 ‖ . . . ‖ xl) = H(xi+1 ‖ . . . ‖ xl) that leads to a collision for H.
We now show if F possesses the desired properties for use in cryptography, then so does H too.
|
If F is first pre-image resistant, then so is H. Proof Assume that H is not first pre-image resistant, that is, an efficient algorithm A exists to compute x with H(x) = y for most (if not all) |
|
If F is collision resistant, then H is collision resistant (and hence also second pre-image resistant). Proof Given a collision (x, x′) for H, we can find a collision for F with little additional effort. We use the notations of Algorithm A.18 with primed variables for x′. First consider l ≠ l′. But then, in particular, the length blocks xl+1 and Now, suppose that The only case that remains to be treated is |
In order to design cryptographic hash functions, it suffices to design cryptographic compression functions. Block ciphers can be used for that purpose. Let f be a block cipher with block size n and key size r. Take m := n + r and consider the map
that sends x = L ‖ R with
and
to the encrypted bit string fR(L). If fR are assumed to be random permutations of
, the resulting compression function F possesses the desirable properties.
Several custom-designed hash functions have been popularly used by the cryptography community. MD4 and MD5 are somewhat older 128-bit hash functions. Soon after its conception, MD4 was found to be vulnerable to several attacks. Also collisions for the compression function of MD5 are known. Therefore, these two hash functions have lost the desired level of confidence for cryptographic uses.
NIST has proposed a family of four hash algorithms. These algorithms are called secure hash algorithms and have the short names SHA-1, SHA-256, SHA-384 and SHA-512, which respectively produce 160-, 256-, 384- and 512-bit hash values. No collisions for SHA are known till date. In the rest of this section, we explain the SHA-1 algorithm. The workings of the other SHA algorithms are very similar and can be found in the FIPS document [222]. RIPEMD-160 is another popular 160-bit hash function.
SHA-1 (like other custom-designed hash functions mentioned above) is suitable for implementation in 32-bit processors. Suppose that we want to compute the hash SHA-1(M) of a message M of bit length λ. First, M is padded to get the bit string M′ := M ‖ 1 ‖ 0k ‖ Λ, where Λ is the 64-bit representation of λ, and where k is the smallest non-negative integer for which the bit length of M′, that is, λ + 1 + k + 64, is a multiple of 512. M′ is broken into blocks M(1), M(2), . . . , M(l) each of length 512 bits. Each M(i) is represented as a collection of sixteen 32-bit words
, j = 0, 1, . . . , 15. SHA-1 supports big-endian packing, that is,
stores the leftmost 32 bits of M(i),
the next 32 bits of
the rightmost 32 bits of M(i).
The SHA-1 computations are given in Algorithm A.19. One starts with a fixed initial 160-bit hash H(0). Successively for i = 1, 2, . . . , l the i-th message block M(i) is considered and the previous hash value H(i–1) is updated to H(i). At the end of the loop the 160-bit string H(l) is returned as SHA-1(M). Each H(i) is represented by five 32-bit words
, j = 0, 1, 2, 3, 4. Here also, big-endian notation is used, that is,
stores the leftmost 32 bits of H(i), . . . ,
the rightmost 32 bits of H(i).
The updating procedure uses logical functions fj. Here, product (like xy) implies bit-wise AND, bar (as in
) denotes bit-wise complementation and ⊕ denotes bit-wise XOR, each on 32-bit operands. The notation LRk(z) (resp. RRk(z)) stands for a left (resp. right) rotation, that is, a cyclic left (resp. right) shift, of the bit string z of length 32 by k positions.
The bits of H(i) are well-defined transformations of the bits of H(i–1) under the guidance of the bits of M(i). The good amount of non-linearity, introduced by the functions fj and the modulo 232 sums, makes it difficult to invert the transformation H(i–1) ↦ H(i) and thereby makes SHA-1 an (apparently) secure hash function.
|
Output: The hash SHA-1(M) of M. Steps: Generate the message blocks M(i), i = 1, 2, . . . , l. |
A test vector for SHA-1 is the following (here 616263 is the string “abc”):
SHA-1(616263) = a9993e364706816aba3e25717850c26c9cd0d89d.
| A.18 | Let x be a bit string. Break up x into blocks x1, . . . , xl each of bit size n (after padding, if necessary). Define H1(x) := x1 ⊕ . . . ⊕ xl. Show that H1 possesses none of the desirable properties of a cryptographic hash function. |
| A.19 | Let H be an n-bit cryptographic hash function and S a finite set of strings with #S ≥ 2. Define the function . Here, 0n+1 refers to a bit string of length n + 1 containing zero-bits only. Show that H2 is second pre-image resistant, but not collision resistant. [H]
|
| A.20 | Let H be an n-bit cryptographic hash function. Show that the function H3 defined as is collision resistant (and hence second pre-image resistant), but not first pre-image resistant. [H]
|
| A.21 | Let m be a product of two (unknown) big primes and let the binary representation of m (with leading one-bit) have n bits. Assume that it is computationally infeasible to compute square roots modulo m. We can identify bit strings with integers in a natural way. For a bit string x, take y := 1 ‖ x and let H4(x) denote the n-bit binary representation of y2 (mod m). Show that H4 is first pre-image resistant, but not second pre-image resistant (and hence not collision-resistant). [H] |
| A.22 | Let H be an n-bit cryptographic hash function. Assume that H produces random hash values on random input strings. Prove that O(2n/2) hash values need to be computed to detect a collision for H with high probability. [H] Deduce also that nearly 2n–1 hash values need to be computed on an average to obtain a second pre-image x′ of H(x). |
| A.23 | Let be a collision resistant compression function.
|
| A.24 |
|
| A.25 | Assume that in the SHA-1 algorithm the designers opted for Algorithm A.19 with the following minor modifications: They defined fj as fj(x, y, z) := x ⊕ y ⊕ z for all and they replaced all costly mod 232 addition operations (+) by cheap bit-wise XOR operations (⊕). Do you sense anything wrong with this design? [H]
|
| B.1 | Introduction |
| B.2 | Security Issues in a Sensor Network |
| B.3 | The Basic Bootstrapping Framework |
| B.4 | The Basic Random Key Predistribution Scheme |
| B.5 | Random Pairwise Scheme |
| B.6 | Polynomial-pool-based Key Predistribution |
| B.7 | Matrix-based Key Predistribution |
| B.8 | Location-aware Key Predistribution |
One of the keys to happiness is a bad memory.
—Rita Mae Brown
That theory is worthless. It isn’t even wrong!
—Wolfgang Pauli
You’re only as sick as your secrets.
—Anonymous
Public-key cryptography is not a solution to every security problem. Asymmetric routines are bulky and slow, and, in practice, augment symmetric cryptography by eliminating the need for prior secret establishment of keys between communicating parties. On a workstation of today’s computing technology, this is an interesting and acceptable breakthrough. A 1 GHz processor runs one public-key encryption or key-exchange primitive in tens to hundreds of milliseconds, using at least hundreds of kilobytes of memory. That is reasonable for most applications, given that the routines are invoked rather infrequently.
Now, imagine a situation, where many tiny computing nodes, called sensor nodes, are scattered in an area for the purpose of sensing some data and transmitting the data to nearby base stations for further processing. This transmission is done by short-range radio communications. The base stations are assumed to be computationally well-equipped, but the sensor nodes are resource-starved. Such networks of sensor nodes are used in many important applications including tracking of objects in an enemy’s area for military purposes and scientific, engineering and medical explorations like wildlife monitoring, distributed seismic measurement, pollution tracking, monitoring fire and nuclear power plants and tracking patients. In some cases, mostly for military and medical applications, data collected by sensor nodes need to be encrypted before transmitting to neighbouring nodes and base stations.
Evidently one has to resort to symmetric-key cryptography in order to meet the security needs in a sensor network. Appendix B provides an overview of some key exchange schemes suitable for sensor networks.
Several issues make secure communication in sensor networks different from that in usual networks:
Each sensor node contains a primitive processor featuring very low computing speed and only small amount of programmable memory. The popular Atmel ATmega 128L processor, as an example, has an 8-bit 4 MHz RISC processor and only 128 kbytes of RAM. The processor does not support instructions for multiplying or dividing integers. One requires tens of minutes to several hours for performing a single RSA or Diffie–Hellman exponentiation for cryptographic key sizes.
Each sensor node is battery-powered and is expected to operate for only a few days. Once deployed sensor nodes die, it becomes necessary to add fresh nodes to the network for continuing the data collection operation. This calls for dynamic management of security objects (like keys).
Sensor nodes communicate with each other and the base stations by wireless radio transmission at low bandwidth and over small communication ranges. For the Atmel ATmega 128L processor, the maximum bandwidth is 40 kbps, and the communication range is at most 100 feet (30 m).
Moreover, the deployment area may have irregularities (like physical obstacles) that further limit the communication abilities of the nodes. One, therefore, expects that a deployed sensor node can directly communicate with only few other nodes in the network.
A sensor network is vulnerable to capture of nodes by the enemy. The captured nodes may be physically destroyed or utilized to send misleading signals and/or disrupt the normal activity of the network. As a result, no node should have full trust on the nodes with which it communicates. The relevant security goal in this context is that the captured nodes should not divulge to the enemy enough secrets to jeopardize the communication among the uncaptured nodes.
In many situations (like scattering of nodes from airplanes or trucks), the post-deployment configuration of the sensor network is not known a priori. It is unreasonable to use security algorithms that have strong dependence on locations of nodes in the network. For example, each sensor node u is expected to have only a few neighbours with which it can directly communicate. This is precisely the set of nodes with which u needs to share keys. However, the list cannot be determined before the actual deployment. An approximate knowledge of the locations of the nodes may strengthen the protocols, but robustness for handling run-time variations must be built in the protocols.
Sensor nodes may be static or mobile. Mobile nodes change the network configurations (like the lists of neighbours) as functions of time and call for time-varying security tools.
Still, sensor nodes need to communicate secretly. The clear impracticality of using public-key routines forces one to use symmetric ciphers. But setting up symmetric keys among communicating nodes is a difficult task. The number n of nodes in a sensor network can range up to several hundred thousands. Storing a symmetric key for each pair of nodes is impossible, since that requires each sensor to have a memory large enough to store n – 1 keys. On the other extreme, every communication may use a single network-wide symmetric key. In that case the capture of a single node makes communication over the entire network completely insecure.
The plot thickens. There are graceful ways out. A host of algorithms has been recently proposed to address key establishment issues in sensor networks. In the rest of this appendix, we provide a quick survey of these tools. For the sake of simplicity, we assume here that our sensor network is static, that is, the nodes have no (or negligibly small) mobility. Though the schemes described below may be adapted to mobile networks, the required modifications are not necessarily easy and the current literature does not seem to be ready to take mobility into account.
We continue to deal with sensor processors of the capability of Atmel ATmega 128L. In practice, better processors (with speed, storage and cost roughly one order of magnitude higher) are available. We assume that the size (number of nodes) n of a sensor network is (usually) not bigger than a million, and also that a sensor node has of the order of 100 neighbours in its communication range.
Key establishment in a sensor network is effected by a three-stage process called bootstrapping. Subsequent node-to-node communication uses the keys established during the bootstrapping phase. The three stages of bootstrapping are as follows:
This step is carried out before the deployment of the sensors. A key set-up server chooses a pool
of randomly generated keys and assigns to each sensor node ui a subset
of
. The set
is called the key ring of the node ui. The key predistribution algorithms essentially differ in the ways the sets
and
are selected. Each key
is associated with an ID that need not be kept secret and can even be transmitted in plaintext. Similarly, each sensor node is given a unique ID which need not be maintained secretly.
Immediately after deployment, each sensor node tries to determine all other sensor nodes with which it can communicate directly and secretly. Two nodes that are within the communication ranges of one another are called physical neighbours, whereas two nodes sharing one (or more) key(s) in their key rings are called key neighbours. Two nodes can secretly (and directly) communicate with one another if and only if they are both physical and key neighbours; let us plan to call such pairs direct neighbours.
In the direct key establishment phase, each sensor node u locates its direct neighbours. To that end u broadcasts its own ID and the IDs of the keys in its key ring. Each physical neighbour v of u responds by mentioning the matching key IDs, if any, stored in the key ring of v. This is how u identifies its direct neighbours.
If sending unencrypted key IDs can be a potential threat to the security of the network, each node u can encrypt some plaintext message m by the keys in its ring and broadcasts the corresponding ciphertexts instead of the key IDs. Those physical neighbours of u that can decrypt one of the transmitted ciphertexts using one of the keys in their respective key rings establish themselves as direct neighbours of u.
This is an optional stage and, if executed, adds to the connectivity of the network. Suppose that two physical neighbours u and v fail to establish a direct link between them in the direct key establishment phase. But there exists a path u = u0, u1, u2, . . . , uh–1, uh = v in the network with each ui a direct neighbour of ui+1 (for i = 0, 1, . . . , h – 1). The node u then generates a random key k, encrypts k with the key shared between u and u1 and sends the encrypted key to u1. Subsequently, u1 retrieves k by decryption, encrypts k by the key shared by u1 and u2 and sends this encrypted version of k to u2. This process is repeated until the key k reaches the desired destination v. Now, u and v can communicate secretly and directly using k and thereby become direct neighbours.
The main difficulty in this process is the discovery of a path between u and v. This can be achieved by u initiating a message reflecting its desire to communicate with v. Let u1 be a direct neighbour of u. If u1 is also a direct neighbour of v, a path between u and v is discovered. Else u1 retransmits u’s request to the direct neighbours u2 of u1. This process is repeated, until a path is established between u and v, or the number of hops exceeds a certain limit. Note that path discovery may incur substantial communication overhead and so the maximum number h of hops allowed needs to be fixed at a not-so-big value. Typically, the values h = 2, 3 are recommended.
A bootstrapping algorithm, or more precisely, a key predistribution algorithm must fulfill the following requirements. These requirements often turn out to be mutually contradictory. A key predistribution scheme attempts to achieve suitable trade-offs among them.
Each key ring should be small enough to fit in a sensor node’s memory. Typically 50–200 cryptographic keys (say, 128-bit keys of block ciphers) can be stored in each processor. That number is between n – 1 (a key for each pair) and 1 (a master key for the entire network).
The key rings in different nodes are to be randomly chosen from a big pool, so that there is not enough overlap between the rings of two nodes.
The resulting network should be connected in the sense that the undirected graph G = (V, E) with V comprising the nodes in the network and E containing a link (u, v) if and only if u and v are direct neighbours, must be (or at least with high probability) connected.
Ideally, the capture of any number of nodes must not divulge the secret key(s) between uncaptured direct neighbours. Practically, the fraction of communication links among uncaptured nodes, that are compromised because of node captures, must be small, at least as long as the fraction of nodes that are captured is not too high.
Arbitrarily (but not impractically) big networks should be supported.
One should allow new nodes to join the network at any point of time after the initial deployment, for example, to replenish captured, faulty and dead nodes.
Additional requirements may also be conceived of in order to take curative measures against active attacks and/or faults. However, a study of active attacks and of countermeasures against those is beyond the scope of our treatment here.
There should be a mechanism to detect the presence and identities of dead, malfunctioning and rogue nodes. Here, a rogue node stands for a captured node that is used by the enemy to disrupt the natural working of the network. Active attacks mountable by the enemy include transmission of unauthorized and misleading data across the network, making neighbours always busy and letting them run out of battery sooner than the expected lifetime (sleep deprivation attack), and so on.
Faulty and rogue nodes must be pruned out of the network before they can cause sizeable harm.
Captured nodes can be replicated and the copies deployed by the enemy with the intention that these added nodes outnumber the legitimate nodes and eventually take control of the network. There should be a strategy to detect and cure replication of malicious nodes.
We now concentrate on some concrete realizations of the bootstrapping scheme. The optional third stage (path key establishment) will often be excluded from our discussion, because there are little algorithm-specific issues in this stage.
Before we introduce specific algorithms, let us summarize the notations we are going to use in the rest of this chapter:
| n | = Number of nodes in the sensor network |
| n′ | = (Expected) number of nodes in the physical neighbourhood of each node |
| d | = Degree of connectivity of each node in the key/direct neighbourhood graph |
| Pc | = Global connectivity (a high probability like 0.9999) |
| p′ | = Local connectivity (probability that two physical neighbours share a key) |
| M | = Size of the key pool |
| m | = The size of key ring of each node (in number of cryptographic keys) |
![]() | = The underlying field for the poly-pool and the matrix-pool schemes |
| S | = Size of the polynomial (or matrix) pool |
| s | = Number of polynomial (or matrix) shares in the key ring of each node |
| t | = Degree of a polynomial (or dimension of a matrix) |
| c | = Number of nodes captured |
| Pe | = Probability of successful eavesdropping expressed as a function of c |
The paper [88] by Eschenauer and Gligor is a pioneering research on bootstrapping in sensor networks. Their scheme, henceforth referred to as the EG scheme, is essentially the basic bootstrapping method just described.
The key set-up server starts with a pool
of randomly generated keys. The number M of keys in
is taken to be a small multiple of the network size n. For each sensor node u to be deployed, a random subset of m keys from
is selected and given to u as its key ring. Upon deployment, each node discovers its direct neighbours as specified in the generic description. We now explain how the parameters M, m are to be chosen so as to make the resulting network connected with high probability.
Let us first look at the key neighbourhood graph Gkey on the n sensor nodes in which a link exists between two nodes if and only if these nodes are key neighbours. Let p denote the probability that a link exists between two randomly selected nodes of this graph. A result on random graphs due to Erdös and Rényi indicates that in the limit n → ∞, the probability that Gkey is connected is
Equation B.1

We fix Pc at a high value, say, 0.9999, and express the expected degree of each node in Gkey as
Equation B.2

In practice, we should also bring physical neighbourhood in consideration and look at the direct neighbourhood graph G = Gdirect on the n deployed sensor nodes. In this graph, two nodes are connected by an edge if and only if they are direct neighbours. G is not random, since it depends on the geographical distribution of the nodes in the deployment area. However, we assume that the above result for random graphs continues to hold for G too. In particular, we fix the degree of direct connectivity of each node to be (at least) d and require
Equation B.3

where n′ denotes the expected number of physical neighbours of each node, and where p′ is the probability that two physical neighbours share one or more keys in their key rings
and
. (Pc is often called the global connectivity and p′ the local connectivity.)
For the determination of p′, we first note that there is a total of
key rings of size m that can be chosen from the pool
of size M. For a fixed
, the total number of ways of choosing
such that
does not share a key with Ki is equal to the number of ways of choosing m keys from
. This number is
. It then follows that
Equation B.4

Equations (B.2), (B.3) and (B.4) dictate how the key-pool size M is to be chosen, given the values of n, n′ and m.
|
As a specific numerical example, consider a sensor network with n = 10,000 nodes. For the desired probability Pc = 0.9999 of connectedness of Gkey, we use Equation (B.2) to obtain the desired degree d as d ≥ 18.419. Let us take d = 20. Now, suppose that the expected number of physical neighbours of each deployed node is n′ = 50. By Equation (B.3), we then require p′ = d/n′ = 0.4. Finally, assume that each sensor can hold m = 150 keys in its memory. Equation (B.4) indicates that we should have M ≤ 44,195 in order to ensure p′ ≥ 0.4. In particular, we may take M = 40,000. |
Let us now study the resilience of the EG scheme against node captures. Assume that c nodes are captured at random from the network and that u and v are two uncaptured nodes that are direct neighbours. We compute the probability Pe that an eavesdropper can decipher encrypted communication between u and v based on the knowledge of the keys available from the c captured key rings. Clearly, smaller values of Pe indicate higher resilience against node captures.
Suppose that u and v use the key k for communication between them. Then, Pe is equal to the probability that k resides in one of the key rings of c captured nodes. Since each key ring consists of m keys randomly chosen from a pool of M keys, the probability that a particular key k is not available in a key ring is
and consequently the probability that k does not appear in all of the c compromised key rings is
. Thus, the probability of successful eavesdropping is

|
As in Example B.1, take n = 10,000, n′ = 50, m = 150 and M = 40,000. If c = 100 nodes are captured, the fraction of compromised communication is Pe ≈ 0.313. Thus, a capture of only 100 nodes leads to a compromise of about one-third of the traffic. That is not a satisfactory figure. We need better algorithms. |
Chan et al. [44] propose several modifications of the basic EG scheme in order to improve upon the resilience of the network against node capture. The q-composite scheme, henceforth abbreviated as the qC scheme, is based on the requirement of a bigger overlap of key rings for enabling nodes to communicate.
As in the EG scheme, the key set-up server decides a pool
of M random keys and loads the key ring of each node with a random subset of size m of
. Let the network consist of n nodes.
In the direct key establishment phase, each node u discovers all its physical neighbours that share q or more keys with u, where q is a predetermined system-wide parameter. Those physical neighbours that do so are now called direct neighbours of u. Let v be a direct neighbour of u and let q′ ≥ q be the actual number of keys shared by u and v. Call these keys k1, k2, . . . , kq′. The nodes use the key
k := H(k1‖k2‖ · · · ‖kq′)
for future communication, where ‖ denotes string concatenation and H is a hash function. A pair of physical neighbours that share < q predistributed keys do not communicate directly.
Recall that for the basic EG scheme q = 1 and the key k for communication between direct neighbours is taken to be one shared key instead of a hash value of all shared keys. The motivation behind going for the qC scheme is that requiring a bigger overlap between the key rings of a pair of physical neighbours leads to a smaller probability Pe of successful eavesdropping, since now the eavesdropper has to possess the knowledge of at least q shared keys (not just one). However, the requirement of q (or more) matching keys between communicating nodes restricts the key pool size M more than the EG scheme, and consequently a capture of fewer nodes reveals a bigger fraction of the total key pool
to the eavesdropper. Chan et al. [44] report that the best trade-off is achieved for the value q = 2 or 3.
Let us now derive the explicit expressions for M and Pe. Equations (B.1), (B.2) and (B.3) hold for the qC scheme with the sole exception that now the interpretation of the probability p′ of direct neighbourhood is different. There is a total of
ways of choosing two random key rings of size m from a pool of M keys. Let us compute the number
of such pairs of key rings sharing exactly r keys. First, these shared r keys can be chosen in
ways. Out of the remaining M – r keys, the remaining m – r keys for the first ring can be chosen in
ways. Finally, the remaining m – r keys for the second ring can be chosen in
ways from M – m keys not present in the first ring. Thus, we have


is the equivalent of Equation (B.4) for the qC scheme.
|
As in Example B.1, consider n = 10,000, n′ = 50, m = 150. For d = 20, we require p′ ≥ 0.4. This, in turn, demands M ≤ 16,387 for q = 2 and M ≤ 9,864 for q = 3. Compare these with the requirement M ≤ 44,195 for the EG scheme. |
Let us now calculate the probability Pe of successfully deciphering the communication between two uncaptured nodes u and v, given that c nodes are already captured by the eavesdropper. Let q′ ≥ q be the actual number of keys shared by u and v; this happens with probability
. Each of these common keys is available to the eavesdropper with a probability
. It follows that

|
Let us continue with the network of Examples B.1, B.2 and B.3. The following table summarizes the probabilities Pe for various values of c. For the EG scheme, we take M = 40,000, whereas for the qC scheme, we take M = 16,000 for q = 2 and M = 9,800 for q = 3.
This table indicates that when the number of nodes captured is small, the qC scheme outperforms the EG scheme. However, for large values of c, the effects of smaller values of the key-pool size show up, leading to a poorer performance of the qC schemes compared to the EG scheme. | |||||||||||||||||||||||||||||||||||||||||||||
Another way to improve the resilience of the network against node captures is the multi-path key reinforcement scheme proposed again by Chan et al. [44]. As in the EG scheme, sensor nodes are deployed each with m keys in its key ring chosen randomly from a pool of M keys. Let u and v establish themselves as direct neighbours sharing the key k. Instead of using k itself as the key for future communication, the nodes try to locate several pairwise node-disjoint paths between them. Such a path u = v0, v1, . . . , vl = v consists of pairs of direct neighbours (vi, vi+1) for i = 0, . . . , l – 1. A randomly generated key k′ is then routed securely along the path from u to v.
Assume that r node-disjoint paths between u and v are discovered and the random keys
are transfered securely along these paths. The nodes u and v then use the key

The reason why this scheme improves resilience against node captures is that even if the original k resides in the memory of a captured node, the new key k′ is computable by the adversary if and only if she can obtain all of the r session secrets
. The bigger r is, the more difficult it is for the adversary to eavesdrop on all of the r node-disjoint paths. On the other hand, if the lengths of these paths are large, then the probability of eavesdropping at some links of the paths increases. Moreover, increasing the lengths of the paths incurs bigger communication overhead. The proponents of the scheme recommend only 2-hop multi-path key reinforcement.
We do not go into the details of the analysis of the multi-path key reinforcement scheme, but refer the reader to Chan et al. [44]. We only note that though it is possible to use multi-path key reinforcement for the q-composite scheme, it is not a lucrative option. The smaller size of the key pool for the q-composite scheme tends to nullify the effects of multi-path key reinforcement.
A pairwise key predistribution scheme offers perfect resilience against node captures, that is, the capture of any number c of nodes does not reveal any information about the secrets used by uncaptured nodes. This corresponds to Pe = 0 irrespective of c. This desirable property of the network is achieved by giving each key to the key rings of only two nodes. Moreover, the sharing of a key k between two unique nodes u and v implies that these nodes can authenticate themselves to one another — no other node possesses k and can prove itself as u to v or as v to u.
Pairwise keys can be distributed to nodes in many ways. Now, we deal with random distribution. Let m denote the size of the key ring of each sensor node. For each node u in the network, the key set-up server randomly selects m other nodes v1, . . . , vm and distributes a new random key ki to each of the pairs (u, vi) for i = 1, . . . , m. This distribution mechanism should also ensure that two nodes u, v in the network share at most one key. If k is given to u and v, the set-up server also attaches the ID of v to the copy of k in the key ring of u and the ID of u to the copy of k in the key ring of v.
In the direct key establishment phase, each node u broadcasts its own ID. Each physical neighbour v of u, that finds the ID of u stored against a key in the key ring of v, identifies u as its direct neighbour and also the unique key shared by u and v.
The analysis of the random pairwise scheme is a bit tricky. Here, the global connectivity graph Gkey is m-regular, that is, each node has degree exactly m and we cannot expect to maintain this degree locally too. On the other hand, it is reasonable to assume under a random deployment model that the fraction of nodes with which a given node shares pairwise keys remains the same both locally and globally. More precisely, we equate p′ with p, that is,
Equation B.5

Here, d denotes the desired local degree of a node. Equation (B.2) gives the formula for d in terms of the global connectivity Pc. For Pc = 0.9999, we have d = 16.11 for n = 1,000, d = 18.42 for n = 10,000, d = 20.72 for n = 100,000, and d = 23.03 for n = 1,000,000. That is, the value of d does not depend heavily on n, as long as n ranges over practical values. In particular, one may fix d = 20 (or d = 25 more conservatively) for all applications.
Equation (B.5) implies

This equation reflects the drawback of the random pairwise scheme. The value m is limited by the memory of a sensor node, whereas n′ is dictated by the density of nodes in the deployment area and d can be taken as a constant, and so the network size n is bounded above by the quantity
called the maximum supportable network size. The basic scheme (and its variants) support networks of arbitrarily large sizes, whereas the random pairwise scheme has only limited supports.
|
Take m = 150, n′ = 50 and d = 20. The maximum supportable network size is then |
Since m and d are limited by hard constraints, the only way to increase the maximum supportable network size is to increase the effective size n′ of the physical neighbourhood of a node. The multi-hop range extension strategy accomplishes that. In the direct key establishment phase, each node u broadcasts its ID. Each physical neighbour v of u re-broadcasts the ID of u. Each physical neighbour w of v then re-re-broadcasts the ID of u. This process is continued for a predetermined number r of hops. Any node u′ reachable from u in ≤ r hops and sharing a pairwise key with u can now establish a path of secure communication with u. During a future communication between u and u′, the intermediate nodes in the path simply forward a message encrypted by the pairwise key between u and u′. Using r hops thereby increases the effective radius of physical neighbourhood by a factor of r, and consequently the number of effective neighbours of each node gets multiplied by a factor of r2. Thus, the maximum supportable network size now becomes

For r = 3 and for the parameters of Example B.5, this size now attains a more decent value of 3375.
Increasing r incurs some cost. First, the communication overhead increases quadratically with r. Second, since intermediate nodes in a multi-hop path simply retransmit messages without authentication, chances of specific active attacks at these nodes increase. Large values of r are, therefore, discouraged.
Liu and Ning’s polynomial-pool-based key predistribution scheme (abbreviated as the poly-pool scheme) [181, 183] is based on the idea presented by Blundo et al. [28]. Let
be a finite field with q just large enough to accommodate a symmetric encryption key. For a 128-bit block cipher, one may take q to be smallest prime larger than 2128 (prime field) or 2128 itself (extension field of characteristic 2). Let
be a bivariate polynomial that is assumed to be symmetric, that is, f(X, Y) = f(Y, X). Let t be the degree of f in each of X and Y. A polynomial share of f is a univariate polynomial f(α)(X) := f(X, α) for some element
. Two shares f(α) and f(β) of the same polynomial f satisfy
Equation B.6

Thus, if the shares f(α), f(β) are given to two nodes, they can come up with the common value
as a shared secret between them.
Given t + 1 or more shares of f, one can reconstruct f(X, Y) uniquely using Lagrange’s interpolation formula (Exercise 2.53). On the other hand, if only t or less shares are available, there are many (at least q) possibilities for f and it is impossible to determine f uniquely. So the disclosure of up to t shares does not reveal the polynomial f to an adversary and uncompromised shared keys based on f remain secure.
Using a single polynomial for the entire network is not a good proposal, since t is limited by memory constraints in a sensor node. In order to increase resilience against node captures, many bivariate polynomials need to be used, and shares of random subsets of this polynomial pool are assigned to the key rings of individual nodes. This is how the poly-pool scheme works. If the degree t equals 0, this scheme degenerates to the EG scheme.
The key set-up server first selects a random pool
of S symmetric bivariate polynomials in
each of degree t in X and Y. Some IDs
are also generated for the nodes in the network. For each node u in the network, s polynomials f1, f2, . . . , fs are randomly picked up from
and the polynomial shares f1(X, α), f2(X, α), . . . , fs(X, α) are loaded in the key ring of u, where α is the ID of u. Each key ring now requires space for storing s(t + 1) log q bits, that is, for storing m := s(t + 1) symmetric keys.
Upon deployment, each node u broadcasts the IDs of the polynomials, the shares of which reside in its key ring. Each physical neighbour v of u, that has shares of some common polynomial(s), establishes itself as a direct neighbour of u. The exact pairwise key k between u and v is then calculated using Equation (B.6). If broadcasting polynomial IDs in plaintext is too unsafe, each node u can send some message encrypted by potential pairwise keys based on its polynomial shares. Those physical neighbours that can decrypt one of these encrypted messages have shares of common polynomials.
Like the EG scheme, the poly-pool scheme can be analysed under the framework of random graphs. Equations (B.1), (B.2) and (B.3) continue to hold under the poly-pool scheme. However, in this case the local connection probability p′ is computed as
Equation B.7

Given constraints on the network and the nodes, the desired size S of the polynomial pool
can be determined from this formula.
Let us now compute the probability Pe of compromise of communication between two uncaptured nodes u, v as a function of the number c of captured nodes. If c ≤ t, the eavesdropper cannot gather enough polynomial shares to learn about any polynomial in
, that is, Pe = 0. So assume that c > t and let pr denote the probability that exactly r shares of a given polynomial f (say, the one whose shares are used by the two uncaptured nodes u, v) are available in the key rings of the c captured nodes. The probability that a share of f is present in a key ring is
and so (by the Bernoulli distribution)
Equation B.8

Since t + 1 or more shares of f are required for the determination of f, we have
Equation B.9

|
Let n = 10,000 (network size), n′ = 50 (expected size of physical neighbourhood of a node), m = 150 (key ring size in number of symmetric keys) and Pc = 0.9999 (global connectivity). Let us plan to choose bivariate polynomials of degree t = 49, so that each key ring can hold s = 3 polynomial shares. For the determination of S, we first compute d = 20 as in Example B.1. We then require The following table lists the probability Pe for various values of c.
The table shows substantial improvement in resilience against node capture as achieved by the poly-pool scheme over the EG and qC schemes. |
The poly-pool scheme can be made pairwise by allowing no more than t + 1 shares of any polynomial to be distributed among the nodes. The best that the adversary can achieve is a capture of nodes with all these t + 1 shares and a subsequent determination of the corresponding bivariate polynomial. But this knowledge does not help the adversary, since no other node in the network uses a share of this compromised polynomial. That is, two uncaptured nodes continue to communicate with perfect secrecy.
However, like the random pairwise scheme, the pairwise poly-pool scheme suffers from the drawback that the maximum supportable network size is now limited by the quantity
. For the parameters of Example B.6, this size turns out to be an impractically low 333.
The grid-based key predistribution considerably enhances the resilience of the network against node captures. To start with, let us play a bit with Example B.6.
|
Take n = 10,000, n′ = 50 and m = 150. We calculated that the optimal value of S that keeps the network connected with high probability is S = 20. Now, let us instead take a much bigger value of S, say, S = 200. First, let us look at the brighter side of this choice. The probability Pe is listed in the following table as a function of c.
That is a dramatic improvement in the resilience figures. It, however, comes at a cost. The optimal value S = 20 was selected in Example B.6 in order to achieve a desired connectivity in the network. With S = 200, the probability p′ reduces from 0.404 to |
The grid-based key predistribution allocates polynomial shares cleverly to the nodes so as to achieve resilience figures of the last example with a reasonable guarantee that the resulting network remains connected. Let n be the size of the network and take
. For the sake of simplicity, let us assume that n = σ2. The n nodes are then placed on a σ × σ square grid. The node at the (i, j)-th grid location (where i,
) is identified by the pair (i, j). The set-up server generates 2σ random symmetric bivariate polynomials
, each of degree t in both X and Y. The i-th polynomial
corresponds to the i-th row and the j-th polynomial
to the j-th column in the grid. The key ring of the node at location (i, j) in the grid is given the two polynomial shares
and
. The memory required for this is equivalent to the storage for 2(t + 1) symmetric keys.
Now, look at the key establishment phase. Let two nodes u, v with IDs (i, j) and (i′, j′) be physical neighbours after deployment. First, consider the simple case i = i′. Both the nodes have shares of the polynomial
and can arrive at the common secret value
using the column identities of one another. Similarly, if j = j′, the nodes can compute the shared secret
. It follows that each node can establish keys directly with 2(σ – 1) other nodes in the network. That’s, however, a truly small fraction of the entire network.
Assume now that i ≠ i′ and j ≠ j′. If the node w with identity either (i, j′) or (i′, j) is in the physical neighbourhood of both u and v, then there is a secure link between u and w, and also one between w and v. The nodes u and v can then establish a path key via the intermediate node w.
So suppose also that neither (i, j′) nor (i′, j) resides in the communication ranges of both u and v. Consider the nodes w1 := (i, k) and w2 := (i′, k) for some
. Suppose further that w1 is in the physical neighbourhood of u, w2 in that of w1 and v in that of w2. But then there is a secure u, v-path comprising the links u → w1, w1 → w2 and w2 → v. Similarly, the nodes (k, j) and (k, j′) for each k ≠ i, i′ can help u and v establish a path key. To sum up, there are 2(σ – 2) potential three-hop paths between u and v.
If all these three-hop paths fail, one may go for four-hop, five-hop, . . . paths, but at the cost of increased communication overhead. As argued in Liu and Ning [181, 183], exploring paths with ≤ 3 hops is expected to give the network high connectivity.
For the grid-based scheme, we have S = 2σ (the key pool size) and s = 2 (the number of polynomial shares in each node’s key ring). Thus, the probability Pe can now be derived like Equations (B.8) and (B.9) as
Pe = 1 – (p0 + p1 + · · · + pt) = pt+1 + pt+2 + · · · + pc,
where

|
Take n = 10,000 and m = 150. Since each node has to store only two polynomial shares, we now take t = 74. Moreover, σ = 100, that is, the size of the polynomial pool is S = 200. The probability Pe can now be tabulated as a function of c (number of nodes captured) as follows:
This is a very pretty performance. The capture of even 60 per cent of the nodes leads to a compromise of only 3.34 per cent of the communication among uncaptured nodes. |
This robustness of the grid-based distribution comes at a cost, though. The path key establishment stage is communication-intensive and is mandatory for ensuring good connectivity. Moreover, this stage is based on the assumption that during bootstrapping not many nodes are captured. If this assumption cannot necessarily be enforced, the scheme forfeits much of its expected resilience guarantees.
The matrix-based key predistribution scheme is derived from the idea proposed by Blom [25]. It is similar to the polynomial-based key predistribution and employs symmetric matrices (in place of symmetric polynomials). Let
be a finite field with q just large enough to accommodate a symmetric key and let G be a t × n matrix over
, where t is determined by the memory of a sensor node and n is the number of nodes in the network. It is not required to preserve G with secrecy. Anybody, even the enemies, may know G. We only require G to have rank t, that is, any t columns of G must be linearly independent. If g is a primitive element of
, the following matrix is recommended.
Equation B.10

In a memory-starved environment, this G has a compact representation, since its j-th column is uniquely identified by the value gj. The remaining elements in the column can be easily computed by performing few multiplications.
Let D be a secret t × t symmetric matrix, and A the n × t matrix defined by:
A := (DG)t = GtDt = GtD.
Finally, define the n × n matrix
K := AG.
It follows that K = AG = Gt DG = Gt (Gt Dt)t = Gt (Gt D)t = Gt At = (AG)t = Kt, that is, K is a symmetric matrix. If the (i, j)-th element of K is denoted by kij, we have kij = kji, that is, this common value can be used as a pairwise key between the i-th and j-th nodes.
Let the (i, j)-th element of A be denoted by aij for 1 ≤ i ≤ n and 1 ≤ j ≤ t. Also let gij, 1 ≤ i ≤ t and 1 ≤ j ≤ n, denote the (i, j)-th element of G. But then the pairwise key kij = kji is expressed as:

Thus, the i-th row of A and the j-th column of G suffice for the i-th node to compute kij. Similarly, the j-th row of A and the i-th column of G allow the j-th node to compute kji. In view of this, every node, say, the i-th node, is required to store the i-th row of A and the i-th column of G. If G is as in Equation (B.10), only gi needs to be stored instead of the full i-th column of G. Thus, the storage of t + 1 elements of
(equivalent to t + 1 symmetric keys) suffices.
During direct key establishment, two physical neighbours exchange their respective columns of G for the computation of the common key. Since G is allowed to be a public knowledge, this communication does not reveal secret information to the adversary.
Suppose that the adversary gains knowledge of some t′ ≥ t rows of A (say, by capturing nodes). We also assume that the matrix G is completely known to the adversary. The adversary picks up any t known rows of A and constructs a t × t matrix A′ comprising these rows. But then A′ = G′D, where G′ is a suitable t × t submatrix of G. Since G is assumed to be of rank t, G′ is invertible and so the secret matrix D can be easily computed. Conversely, if D is known to the adversary, she can compute A and, in particular, any t′ ≥ t rows of A.
If only t′ < t rows are known to the adversary, then any choice of any t – t′unknown rows of A yields a value of the matrix D, and subsequently we can construct the remaining n–t unknown rows of A. In other words, D cannot be uniquely recovered from a knowledge of less than t rows of A. This task is difficult too, since there is an infeasible number of choices for assigning values to the elements of the unknown t – t′rows of A.
To sum up, the matrix-based key predistribution scheme is completely secure, if less than t nodes are only captured. On the other hand, if t or more nodes are captured, then the system is completely compromised. Thus, the resilience against node capture of this scheme is determined solely by t and is independent of the size n of the network. The parameter t, in turn, is restricted by the memory of a sensor node (a node has to store t + 1 elements of
).
In order to overcome this difficulty, Du et al. [79] propose a matrix-pool-based scheme. Here, S matrices A1, A2, . . . , AS are computed from S pairwise different secret matrices D1, D2, . . . , DS. The same G may be used for all these key spaces. Each node is given shares (that is, rows) of s matrices randomly chosen from the pool {A1, A2, . . . , AS}. The resulting details of the matrix-pool-based scheme are quite analogous to those pertaining to the polynomial-pool-based scheme described in the earlier section, and are omitted here.
The key predistribution algorithms discussed so far are based on a random deployment model. In practice, the deployment model (like the expected location of each node and the overall geometry of the deployment area) may be known a priori. This knowledge can be effectively exploited to tune the key predistribution algorithms so as to achieve better connectivity and higher resilience against node capture. As an example, consider sensor nodes deployed from airplanes in groups or scattered uniformly from trucks. Since the approximate tracks of these vehicles are planned a priori, the key rings of the nodes can be loaded appropriately to achieve the expected performance enhancements.
Two nodes that are in the physical neighbourhoods of one another need only share a pairwise key. Therefore, the basic objective of designing location-aware schemes is to predistribute keys in such a way that two nodes that are expected to remain close in the deployment area are given common pairwise keys, whereas two nodes that are expected to be far away after deployment need not share any pairwise key. The actual deployment locations of the nodes cannot usually be predicted accurately. Nonetheless, an approximate knowledge of the locations can boost the performance of the network considerably. The smaller the errors between the expected and actual locations of the nodes are, the better a location-aware scheme is expected to perform.
Liu and Ning [182] propose a modification of the random pairwise key scheme (Section B.5) based on deployment knowledge. Let there be n sensor nodes in the network with each node capable of storing m cryptographic keys. The expected deployment location of each node is provided to the key set-up server. For each node u in the network, the server determines m other nodes whose expected locations of deployment are closest to that of u and for which pairwise keys with u have not already been established. For every such node v, a new random key kuv is generated. The key-plus-ID combination (kuv, v) is loaded in u’s key ring, whereas the pair (kuv, u) is loaded in v’s key ring.
This natural and simple-minded strategy provides complete security against node captures, as it is a pairwise key distribution scheme. Now, there is no limitation on the maximum supportable network size (under the reasonable assumption that there are much less than 2l nodes in the network, where l is the bit length of a cryptographic key, say, 64 or 128). Moreover, the incorporation of deployment knowledge increases the connectivity of the network. In order to analyse this gain, we first introduce some formal notations.
For the sake of simplicity, we assume that the deployment region is two-dimensional, so that every point in that region is expressed by two coordinates x and y. Let u be a sensor node whose expected deployment location is (ux, uy) and whose actual deployment location is
. This corresponds to a deployment error of
. The actual location
(or equivalently the error eu) is modelled as a continuous random variable that can assume values in
. The probability density function fu of
characterizes the pattern of deployment error. One possibility is to assume that
is uniformly distributed within a circle with centre at (ux, uy) and of radius ∊ called the maximum deployment error. We then have:
Equation B.11

An arguably more realistic strategy is to model
as a random variable following the two-dimensional normal (Gaussian) distribution with mean (ux, uy) and variance σ2. The corresponding density function is:

Let u and v be two deployed nodes. We assume that each node has a communication range of ρ. We also make the simplifying assumption that the different nodes are deployed independently, that is,
and
are independent random variables. The probability that u and v lie in the communication ranges of one another can be expressed as a function of the expected locations (ux, uy) and (vx, vy) as:

Here, the integral is over the region C of
defined by
.
Let n′ denote the number of physical neighbours of u (or of any sensor node). We know that u shares pairwise keys with exactly m nodes. We assume that these key neighbours of u are distributed uniformly in a circle centred at u and of radius ρ′. The expected value of ρ′ is:

Let v be a key neighbour of u. The probability that v lies in the physical neighbourhood of u is given by

where C′ is the region (vx – ux)2 + (vy – uy)2 ≤ ρ′2. Therefore, u is expected to have m × p(u) direct neighbours. Since the size of the physical neighbourhood of u is n′, the local connectivity, that is, the probability that u can establish a pairwise key with a physical neighbour is given by

In general, it is difficult to compute the above integrals. Liu and Ning [182] compute the probability p′ for the density function given by Equation (B.11) and establish that p′ ≈ 1 for small deployment errors, namely ∊ ≤ ρ. As ∊ increases, p′ gradually reduces to the corresponding probability for the random pairwise scheme.
In order to add sensor nodes at a later point of time, the key set-up server again uses deployment knowledge. The keys rings of the new nodes are loaded based on the expected deployment locations of these nodes and on the (expected or known) locations of the deployed nodes. Pairwise keys between the new and the deployed nodes are communicated to the deployed nodes over secure channels (routing through uncompromised nodes).
Several variants of the closest pairwise keys scheme have been proposed. Liu and Ning themselves propose an extension based on pseudorandom functions [182]. Du et al. propose a variant of the basic (EG) scheme based on a specific model of deployment [80]. We end this section by briefly outlining a location-aware adaptation of the polynomial-pool-based scheme (Section B.6).
For simplicity, let us assume that the deployment region is a rectangular area. This region is partitioned into a 2-dimensional array of rectangular cells. Let the partition consist of R rows and C columns. The cell located at the i-th row and the j-th column is denoted by Ci,j. The neighbours of the cell Ci,j are taken to be the four adjacent cells: Ci–1,j, Ci+1,j, Ci,j–1, Ci,j+1.
The key set-up server first decides a finite field
with q just big enough to accommodate a cryptographic key. The server also chooses R×C random symmetric bivariate polynomials
. The polynomial fi,j is meant for the cell Ci,j. The degree t (in both X and Y) of each fi,j is so chosen that each sensor node has sufficient memory to store the shares of five such polynomials.
Let u be a node to be deployed and let the expected deployment location of u lie in the cell Ci,j called the home cell of u. The key ring of u is loaded with the shares (evaluated at u) of the five polynomials corresponding to the home cell and its four neighbouring cells. More precisely, u gets the five shares: fi,j(X, u), fi–1,j(X, u), fi+1,j(X, u), fi,j–1(X, u), and fi,j+1(X, u). The set-up server also stores in u’s memory the ID (i, j) of its home cell.
In the direct key establishment phase, each node u broadcasts the ID (i, j) of its home cell (or some messages encrypted by potential pairwise keys). Those physical neighbours whose home cells are either the same as or neighbouring to that of u can establish pairwise keys with u.
An analysis of the performance of this location-aware poly-pool-based scheme can be carried out along similar lines to the closest pairwise scheme. We leave out the details here and refer the reader to Liu and Ning [182].
| C.1 | Introduction |
| C.2 | Provably Difficult Computational Problems Are not Suitable |
| C.3 | One-way Functions and the Complexity Class UP |
. . . complexity turns out to be most elusive precisely where it would be most welcome.
—C. H. Papadimitriou [229]
Real knowledge is to know the extent of one’s ignorance.
—Confucius
The complex develops out of the simple.
—Colin Wilson
It is worthwhile to ask the question why public-key cryptography must be based on problems that are only believed to be difficult. Complexity theory suggests concrete examples of provably intractable problems. This appendix provides a brief conceptual explanation why these provably difficult problems cannot be used for building cryptographic protocols. We may consequently conclude that at present we cannot prove a public-key cryptosystem to be secure. That is bad news, but we have to live with it.
Here, we make no attempts to furnish definitions of formal complexity classes. The excellent books by Papadimitriou [229] and by Sipser [280] can be consulted for that purpose. Here is a list of the complexity classes that we require for our discussion. The relationships between these classes are depicted in Figure C.1. All the containments shown in this figure are conjectured to be proper. With an abuse of notations we identify functional problems with decision problems.
| Class | Brief description |
|---|---|
| P | Languages accepted by deterministic polynomial-time Turing machines |
| NP | Languages accepted by non-deterministic polynomial-time Turing machines |
| coNP | Complements of languages in NP |
| UP | Languages accepted by unambiguous polynomial-time Turing machines |
| PSPACE | Languages accepted by polynomial-space Turing machines |
| EXPTIME | Languages accepted by deterministic exponential-time Turing machines |
| EXPSPACE | Languages accepted by exponential-space Turing machines |

The
problem, arguably the deepest unsolved problem in theoretical computer science, may be suspected to have some bearing on public-key cryptography. Under the assumption that P ≠ NP, one may feel tempted to go for using NP-complete problems for building secure cryptosystems. Unfortunately, this tempting invitation does not prove to be fruitful. Several cryptosystems based on NP-complete problems were broken and that is not really a surprise.
It may be the case that P = NP, and, if so, all NP-complete problems are solvable in polynomial time. It may, therefore, be advised to select problems that lie outside NP, that is, in strictly bigger complexity classes. By the time and space hierarchy theorems, we have
and
. Both EXPTIME and EXPSPACE have complete problems. An EXPTIME-complete problem cannot be solved in polynomial time, whereas an EXPSPACE-complete problem cannot be solved in polynomial space nor in polynomial time too. How about using these complete problems for designing cryptosystems? The idea may sound interesting, but these provably exponential problems turn out to be even poorer, perhaps irrelevant, for use in cryptography.
Let fe and fd be the encryption and decryption transforms for a public-key cryptosystem. We assume that the set of plaintext messages and the set of ciphertext messages are both finite. (Public-key cryptosystems are like block ciphers in this respect.) Moreover, since a ciphertext c = fe(m, e) is computable in polynomial time, the length of c is bounded by a polynomial in the length of m. An intruder can non-deterministically guess messages m (from the finite space) and check if c = fe(m, e) to validate the correctness of the guess. It, therefore, follows that deciphering a ciphertext message (with no additional information) is a problem in NP. That is the reason why we should not look beyond NP.
However, the full class NP, in particular, the most difficult (that is, complete) problems of NP, may be irrelevant for cryptography, as we argue in the next section. In other words, for building cryptosystems we expect to effectively exploit problems that are believed to be easier than NP-complete. Both the integer factoring and the discrete log problems are in the class NP ∩ coNP. We have P ⊆ NP ∩ coNP. It is widely believed that this containment is proper. Also NP ∩ coNP is not known (nor expected) to have complete problems. Even if
, both the factoring and the discrete log problems need not be outside P, since we are unlikely to produce completeness proofs for them. Only historical evidences exist, in favor of the fact that these two problems are difficult. The situation may change tomorrow. Complexity theory does not offer any formal protection.
| C.1 | Prove that the primality testing problem
is in NP ∩ coNP. (Remark: The AKS algorithm is a deterministic poly-time primality testing algorithm and therefore PRIME is in P and so trivially in NP ∩ coNP too. It can, however, be independently proved that primes have succinct certificates.) |
| C.2 | Consider the decision version of the integer factorization problem:
|
| C.3 | Let G be a finite cyclic multiplicative group with a generator g. Assume that one can compute products in G in polynomial time. Consider the decision version of the discrete log problem in G:
Here, indices (indg a) are assumed to lie between 0 and (#G) – 1.
|
Any public-key encryption behaves like a one-way function, easy to compute but difficult to invert.
|
Let Σ be an alphabet (a finite set of symbols). One may assume, without loss of generality, that Σ = {0, 1}. Let Σ* denote the set of all strings over Σ. A function f : Σ* → Σ* is called a one-way function, if it satisfies the following properties.
|
Property (1) ensures unique decryption. Property (2) implies that the length of f(α) is polynomially bounded both above and below by the length of α. Property (3) suggests ease of encryption, whereas Property (4) suggests difficulty of decryption.
We do not know whether there exists a one-way function. The following functions are strongly suspected to be one-way. However, we do not seem to have any clues about how we can prove these functions to be one-way.
|
It is evident that if P = NP, there cannot exist one-way functions. The converse of this is not true, that is, even if P ≠ NP, there may exist no one-way functions.
|
A non-deterministic Turing machine which has at most one accepting branch of computation for every input string is called an unambiguous Turing machine. The class of languages accepted by poly-time unambiguous Turing machines is denoted by UP. |
Clearly, P ⊆ UP ⊆ NP. Both the containments are assumed to be proper. The importance of the class UP stems from the following result:
|
There exists a one-way function if and only if P ≠ UP. |
Therefore, the
question is relevant for cryptography and not the
question. The class UP is not known (nor expected) to have complete problems. So locating a one-way function may be a difficult task. But at the minimum we are now in the right track.[2] Complexity theory helped us shift our attention from NP (or bigger classes) to UP.
[2] Well, hopefully!
In order to use a one-way function f for cryptographic purposes, we require additional properties of f. Computing f–1 must be difficult for an intruder, whereas the same computation ought to be easy to the legitimate recipient. Thus, f must support poly-time inversion, provided that some secret piece of information (the trapdoor) is available during the computation of the inverse. A one-way function with a trapdoor is called a trapdoor one-way function.
The first two functions of Example C.1 do not have obvious trapdoors and so cannot be straightaway used for designing cryptosystems. The third function (RSA encryption) has the requisite trapdoor, namely, the decryption exponent d satisfying ed ≡ 1 (mod φ(n)).
The hunt for a theoretical foundation does not end here. It begins. Most part of complexity theory deals with worst-case complexities of problems, rather than their average or expected complexities. A one-way function, even if existent, may be difficult to invert for only few instances, whereas cryptography demands the inversion problem to be difficult for most instances. A function meeting even this cryptographic demand need not be suitable, since there may be reductions to map hard instances to easy instances. Moreover, the trapdoors themselves may inject vulnerabilities and prepare room for quick attacks.
There still remains a long way to go!
| C.4 | Let f : Σ* → Σ* be a function with the property that f(f(α)) = f(α) for every . Argue that f is not a one-way function.
|
| C.5 | Design unambiguous polynomial time Turing machines for computing the inverses of the functions described in Example C.1. |
| C.6 | Show that if there exists a bijective one-way function, then NP ∩ coNP ≠ P. [H] |
The greatest thing in family life is to take a hint when a hint is intended and not to take a hint when a hint isn’t intended.
—Robert Frost
Teachers open the door, but you must enter by yourself.
—Chinese Proverb
Imagination grows by exercise, and contrary to common belief, is more powerful in the mature than in the young.
—W. Somerset Maugham
| 2.11 (a) | Apply Theorem 2.3 to the restriction to H of the canonical homomorphism G → G/K. |
| 2.11 (b) | Apply Theorem 2.3 to the canonical homomorphism G/H → G/K, aH ↦ aK, .
|
| 2.14 (c) | Consider the canonical surjection G → G/H. |
| 2.17 (a) | Let i ≠ j and . Then ord g divides both and and so is equal to 1, that is, g = e. Now let hi, and with . But then . Thus #(HiHj) = (#Hi)(#Hj). Generalize this argument to show that #(H1 · · · Hr) = n.
|
| 2.18 | First consider the special case #G = pr for some and . For each , the order ordG g is of the form psg for some sg ≤ r. Let s be the maximum of the values sg, . Take any element with ordG h = ps. Then e, h, . . . , hps–1 are all the elements x that satisfy xps = e. But by the choice of s every element satisfies xps = e. Hence we must have s = r. This proves the assertion for the special case. For the general case, use this special case in conjunction with Exercise 2.17.
|
| 2.19 (b) | Show that , (h1, . . . , hr) ↦ h1 . . . hr, is a group isomorphism.
|
| 2.23 | Use Zorn’s lemma. |
| 2.24 (c) | Let be the intersection of all prime ideals of R. First show that . To prove the reverse inclusion take and consider the set S of all non-unit ideals of R such that for all . If f is a non-unit, the set S is non-empty and by Zorn’s lemma has a maximal element, say . Show that is a prime ideal of R.
|
| 2.25 | For , the map R → R, b ↦ ab, is injective and hence surjective by Exercise 2.4.
|
| 2.30 | Apply the isomorphism theorem to the canonical surjection , .
|
| 2.33 | [(1)⇒(2)] Let be an ascending chain of ideals of R. Consider the ideal which is finitely generated by hypothesis.
[(3)⇒(1)] Let |
| 2.36 | Use the pigeon-hole principle: If there are n + 1 pigeons in n holes, then there exists at least one hole containing more than one pigeons. |
| 2.37 | Consider the integer satisfying 2t ≤ n < 2t+1.
|
| 2.39 (e) | 12 ≡ (n – 1)2 (mod n). |
| 2.39 (f) | Apply Wilson’s theorem. |
| 2.40 | Use Fermat’s little theorem. |
| 2.41 | Use Wilson’s theorem or Euler’s criterion. |
| 2.45 | Reduce to the case y2 ≡ α (mod p). |
| 2.49 (a) | Consider the canonical group homomorphism and the fact that a surjective group homomorphism from a cyclic group G onto G′ implies that G′ is cyclic.
|
| 2.49 (b) | Let be a primitive element modulo p. The residue class of a in has order k(p – 1) for some . Show that the order of b := p + 1 modulo pe is pe–1. So the order of akb modulo pe is pe–1(p – 1) = φ(pe).
|
| 2.50 | Use the Chinese remainder theorem in conjunction with Exercises 2.20 and 2.49. |
| 2.53 | Take . The interpolating polynomial is . Use Exercise 2.52 to establish the uniqueness.
|
| 2.56 (b) | is irreducible in if and only if f(X + 1) is irreducible in .
|
| 2.58 | Use the fundamental theorem of algebra. |
| 2.63 | Consider the set of all linearly independent subsets of V that contain T. Show that every chain in has an upper bound in . By Zorn’s Lemma, there exists a maximal element . Show that S generates V.
|
| 2.64 (b) | Use Exercise 2.63. |
| 2.68 | Let p1, . . . , pn be n distinct primes. Take and ai := a/pi for i = 1, . . . , n.
|
| 2.72 (a) | If N is the -submodule of generated by ai/bi, i = 1, . . . , n, with gcd(ai, bi) = 1, then for any prime p that does not divide b1 · · · bn we have 1/p ∉ N.
|
| 2.72 (b) | Any two distinct elements of are linearly dependent over . Now use Exercise 2.69.
|
| 2.74 (b) | Let the conjugates of over F be α1 = α, α2, . . . , αn. Since is injective, it follows from (a) that makes a permutation of α1, . . . , αn. So is surjective.
|
| 2.75 (a) | Use Exercise 2.61. |
| 2.76 (b) | The if part follows from Exercise 2.61. For proving the only if part, take . If the polynomial f(X) := Xp – a splits over F, we are done. So suppose that there exists an irreducible divisor of f(X) of degree ≥ 2. By the separability of F, there exist two distinct roots α, β of g(X). Let K := F (α, β). Show that the Frobenius map , , is an endomorphism of K. Also there exists a field isomorphism τ : F (α) → F (β) which fixes F element-wise and takes α ↦ β. But then . Since any field homomorphism is injective, α equals β, a contradiction. Thus no g(X) chosen as above can exist.
|
| 2.77 (a) | Let be an irreducible polynomial with g(α) = 0 for some . Let β be another root of g. We show that . By Lemma 2.5, there is an isomorphism μ : F(α) → F(β). Clearly, K is the splitting field of f over F(α). Let K′ be the splitting field of μ*(f) over F (β). By Proposition 2.33, K ≅ K′. If are the roots of f, then K′ ≅ F (β, γ1, . . . , γd) = K(β). But then K ≅ K(β).
|
| 2.78 (a) | Consider transcendental numbers. |
| 2.78 (b) | Let . For , we have , implying that for a, with a ≤ b. Now assume for some . Choose a rational number b with . Then , a contradiction. Thus . Similarly .
|
| 2.80 | Use the binomial theorem and induction on n. |
| 2.82 | Follow the proof of Theorem 2.37. |
| 2.90 | Example 2.18. |
| 2.91 (b) | By the fundamental theorem of Galois theory, # . Now show that are distinct -automorphisms of .
|
| 2.92 (a) | Assume r > 1. We have the extensions , where is the splitting field of f over and hence over . Consider the minimal polynomial of a root of f over . Conversely, let f be reducible over . Choose an irreducible factor of f with deg h = s < d. Now h has one (and hence all) roots in and, therefore, d|sm.
|
| 2.93 | Use Corollary 2.18. |
| 2.98 | In each case, the defining polynomial is quadratic in Y (and with coefficients in K[X]). If this polynomial admits a non-trivial factorization, one can reach a contradiction by considering the degrees of X in the coefficients of Y1 and Y0. |
| 2.103 | For simplicity, consider the case char K ≠ 2, 3. Show that the curves Y2 + Y = X3 and Y2 = X3 + X have j-invariants 0 and 1728 respectively. Finally, if , 1728, then the curve has j-invariant . One must also argue that these are actually elliptic curves, that is, have non-zero discriminants.
|
| 2.111 | Use Theorem 2.51. |
| 2.112 (a) | Pair a point with its opposite. This pairing fails for points of orders 1 and 2. |
| 2.112 (c) | Consider the elliptic curve E : Y2 = X3 + 3 over . We have , whereas X3 + 3 is irreducible modulo 13.
|
| 2.113 (a) | Every element of has a unique square root.
|
| 2.115 (a) | Use Theorem 2.49 or Exercise 2.17. |
| 2.115 (b) | Use Theorem 2.50. |
| 2.115 (c) | The trace of Frobenius at q is 0 in this case. Now, use Theorem 2.50. |
| 2.123 | Factor N(G) in .
|
| 2.127 | Let . For each i, write , . But then det , where , δij being the Kronecker delta.
|
| 2.128 (b) | Use Part (a) and Exercise 2.126(c). |
| 2.128 (c) | Let . By Exercise 2.130, is integral over . Let be the ideal generated by in and let and be the ideals of generated respectively by and . Now, use Part (b).
|
| 2.133 (b) | In a PID, non-zero prime ideals are maximal. |
| 2.137 (a) | Since and are maximal, we have , that is, a1 + a2 = 1 for some and . Now use the fact that (a1 + a2)e1 + e2 = 1.
|
| 2.137 (b) | Use CRT. |
| 2.138 (a) | Since is invertible, for some fractional ideal .
|
| 2.140 (a) | For , let constitute a complete residue system of modulo . Then also form a complete residue system of modulo .
|
| 2.142 (d) | Take in Part (b).
|
| 2.143 (a) | Reduce modulo 4. |
| 2.143 (c) | Let divide this gcd. Then divides 2y and . Take norms.
|
| 2.144 (b) | Look at the expansion of a – 1 in base p. More precisely, let a < pN for some . Then –a = (pN – a) – pN = [(pN – 1) – (a – 1)] – pN.
|
| 2.152 (c) | First show that .
|
| 2.153 | Use unique factorization of rationals. |
| 2.154 | Show by induction on n that pn+1 divides apn+1 – apn in for all .
|
| 2.161 | There exists an irreducible polynomial in of every degree .
|
| 3.7 | The implication is obvious. For the reverse implication, use Proposition 2.5.
|
| 3.18 (b) | Consider the binary expansion of m. |
| 3.19 | if n is a pseudoprime to base a and not a pseudoprime to base b, then n is not a pseudoprime to base ab. |
| 3.20 (a) | If p2|n for some , take with ordn(a) = p. If n is square-free, consider a prime divisor p of n and take with and a ≡ 1 (mod n/p).
|
| 3.20 (b) | if n is an Euler pseudoprime to base a and not an Euler pseudoprime to base b, then n is not an Euler pseudoprime to base ab. |
| 3.21 (a) | Let be the prime factorization of n with r and each αi in . Then, . For odd pi, the group is cyclic of order and hence contains an element of order pi – 1.
|
| 3.21 (b) | ordn(–1) = 2. |
| 3.21 (c) | Let vp(n) ≥ 2 for some odd prime p. Construct an element with ordn(a) = p.
|
| 3.28 | Proceed by induction on i = 1, . . . , r. For 1 ≤ i ≤ r, define νi := n1 · · · ni and let be a solution of the congruences bi ≡ aj (mod nj) for j = 1, . . . , i. If i < r, use the combining formula given in Section 2.5 to find such that bi+1 ≡ bi (mod νi) and bi+1 ≡ ai+1 (mod ni+1).
|
| 3.31 | Apply Newton’s iteration to compute a zero of x2 – n. |
| 3.32 (a) | Apply Newton’s iteration to compute a zero of xk – n. |
| 3.34 (b) | The updating d(X) := d(X) – Xi–sb(X) needs to consider only the non-zero words of b. |
| 3.36 (b) | First consider b = 0 and note that the roots of X(q–1)/2 – 1 (resp. X(q–1)/2 + 1) are all the quadratic residues (resp. non-residues) of .
|
| 3.36 (c) | First consider b = 0. |
| 3.40 | For , we have ord(a)|m and for each i = 1, . . . , r the multiplicity vpi (ord(a)) is the smallest of the non-negative integers k satisfying .
|
| 3.41 (a) | Use the CRT. |
| 3.43 (a) | Use the CRT and the fact that for an odd prime r ≡ 3 (mod 4).
|
| 4.1 (a) | Using the CRT, reduce to the case that n is prime. Then is bijective ⇔ the restriction is bijective. Now, if gcd(a, φ(n)) = 1, the inverse of is given by , where ab ≡ 1 (mod φ(n)). On the other hand, if q is a prime divisor of gcd(a, φ(n)), choose an element with ord(y) = q. But then ya ≡ 1 (mod n), that is, is not injective. This exercise provides the foundation for the RSA cryptosystems.
|
| 4.1 (b) | In view of the CRT, reduce to the case n = pα for and α > 1. Then (pα–1)a ≡ 0 (mod n).
|
| 4.6 | Consider the integral .
|
| 4.9 | Use the CRT and lifting. |
| 4.10 | For proving , let n be an odd composite integer, choose a random and compute a square root x of y2 modulo n. By Exercise 4.9, the probability that x ≡ ±y (mod n) is at most 1/2.
|
| 4.12 (d) | Eliminate a from T (a, b, c) using a + b + c = 0. For each fixed c, allow b to vary and use a sieve to find out all the values of b for which T (a, b, c) is smooth for the fixed c. |
| 4.13 | You may use the prime number theorem and the fact that the sum of the reciprocals of the first t primes asymptotically approaches ln ln t.
|
| 4.15 | If a < a1 or a > am, then no i exists. So assume that a1 ≤ a ≤ am and let d := ⌊(1 + m)/2⌋. If a = ad, return d, else if a < ad, recursively search a among the elements a1, . . . , ad–1, and if a > ad, recursively search a among the elements ad+1, . . . , am. |
| 4.16 (a) | Use Lagrange’s interpolation formula (Exercise 2.53). |
| 4.18 (a) | One may precompute the values σi := p rem qi, i = 1, . . . , t. Note that qi|(gα + kp) if and only if ρk,i = 0. |
| 4.19 (a) | Use the approximation T (c1, c2) ≈ (c1 + c2)H. |
| 4.21 (c) | T (a, b, c) = –b2 – c(x + cy)b + (z – c2x). |
| 4.21 (d) | Imitate the second stage of the LSM. |
| 4.23 | Let the factor base consist of all irreducible polynomials over of degrees ≤ m together with the polynomials of the form Xk + h(X), , deg h ≤ m. The optimal running time of this algorithm corresponds to .
|
| 4.24 (b) | is square-free.
|
| 4.24 (c) | Use the fact Xm – 1 = (Xm/pvp(m) – 1)pvp(m). |
| 4.24 (d) | Theorem 2.39. |
| 4.25 (a) | Look at the roots of the polynomials on the two sides. |
| 4.25 (c) | If ord ω = m, then ord(–ω) = 2m. |
| 4.25 (d) | ω, ωq, . . . , ωql–1 are all the roots of the minimal polynomial of ω over .
|
| 4.26 (b) | Use the Mordell–Weil theorem. |
| 4.26 (c) | Use Theorem 4.2. |
| 5.2 (a) | Solve the simultaneous congruences x ≡ ci (mod ni), i = 1, . . . , e, and then take the integer e-th root of the solution x, 1 ≤ x ≤ n1 · · · ne. |
| 5.2 (b) | Append (different) pseudorandom bit strings to m before encryption. This process is often referred to as salting. |
| 5.3 (a) | In view of the Chinese remainder theorem, reduce to the case n = pr for some and .
|
| 5.4 | ue1 + ve2 = 1 for some u, .
|
| 5.6 | If the same session key is used to generate the ciphertext pairs (r1, s1) and (r2, s2) on two plaintext messages m1 and m2, then m1/m2 = s1/s2. |
| 5.7 (c) | Let x = (xl–1 . . . x1x0)2. Define x′ := (xl–1 . . . x2x1)2 and y′ := gx′ (mod p). Then, y ≡ y′2gx0 (mod p). Since x0 is easily computable, y′ can be obtained by obtaining a square root of y modulo p. Argue that a call of the oracle helps us choose the correct square root y′ of y. Now, use recursion. |
| 5.8 | Let g′ be any randomly chosen generator of , where q := ph. One computes for i = 0, 1, . . . , p – 1. We then have the equality of the sets
modulo q – 1, where l := indg′ g. But then for each i we have a (yet unknown) j such that |
| 5.9 | Let g′, and l be as in Exercise 5.8. Now, we have the equality of the sets
modulo q – 1. |
| 5.11 | (mod β) are polynomials with small coefficients.
|
| 5.15 (a) | If Alice generates the signatures (M1, s1) and (M2, s2) on two messages M1 and M2, then her signature on a message M with H(M) ≡ H(M1)H(M2) (mod n) is s1s2 (mod n). Thus, without knowing the private key of Alice, an intruder can generate a valid signature (M, s1s2) of Alice, provided that such an M can be computed. Of course, here the intruder has little control over the message M. The PKC standards form RSA Laboratories add some redundancy to the hash function output before signing. The product of two hash values with redundancy is, in general, expected not to have the redundancy. This increases the security of the scheme against existential forgeries beyond that provided by the first pre-image resistance of the underlying hash function. |
| 5.15 (b) | For any , a valid signature is (M, s), where H(M) ≡ s2 (mod n).
|
| 5.15 (c) | Choose random integers u, v with gcd(v, n) = 1 and take d′ := u + dv. Of course, d and hence d′ are unknown to Carol, but she can compute s = gd′ = gu(gd)v and t ≡ –H(s)v–1 (mod n). But then (M, s, t) is a valid ElGamal signature on a message M for which H(M) ≡ tu (mod n). |
| 5.16 | Obviously, c itself could be a possible choice, but that is not random and Bob might refuse to sign c. Carol should hide c by cre (mod n) for some randomly chosen r known to her. |
| 5.23 (a) | by the CRT.
|
| 5.25 (a) | Replace the random challenge of the verifier by the hash value of the string obtained by concatenating the message to be signed with the witness. |
| 5.26 (d) | Bob finds a random b′ with and sends a := (b′)2 (mod n) to Alice. But then Alice’s response b yields a non-trivial factor gcd(b – b′, n) of n.
|
| 7.5 | (mod n) and m ≡ se (mod n).
|
| 7.9 (a) | Use Exercise 2.44(b). |
| 7.9 (c) | Again use Exercise 2.44(b). |
| 7.9 (d) | Use Part (c) in conjunction with the CRT, and separately consider the three cases v2(p–1) = v2(q – 1), v2(p – 1) > v2(q – 1) and v2(p – 1) < v2(q – 1). |
| A.2 | for all X, J. One does not have to look at the S-boxes for proving this.
|
| A.9 (c) | For i = 0, 1, 2, 3, 4Nr, 4Nr + 1, 4Nr + 2, 4Nr + 3, take . For other values of i, take .
|
| A.14 (b) | Let DL(X) := XdCL(1/X) = a0 + a1X + a2X2 + · · · + ad–1Xd–1 + Xd. Consider the -algebra , where x := X + 〈DL(X)〉. The -linear transformation λx : A → A defined by g(x) ↦ xg(x) has the matrix ΔL with respect to the polynomial basis (1, x, . . . , xd–1). If is the minimal polynomial of λx, then [f(λx)](1) = f(x) = 0. Now, use the fact that 1, x, . . . , xd–1 are linearly independent over .
|
| A.16 (b) | [only if] Take σ ≠ 00 · · · 01. Since σ is non-zero, si = 1 for some . Construct an LFSR with d – 1 stages initialized to s0s1 · · · sd–2 to generate σ.
|
| A.19 | Suppose that we want to compute a second pre-image for H2(x). If , any is a second pre-image for H2(x). If , computing a second pre-image for H2(x) is equivalent to computing a second pre-image for H(x). The density of the (finite) set S is 0 in the (infinite) set of all bit strings. Thus, H2 is second pre-image resistant. On the other hand, for any two distinct x, we have a collision (x, x′) for H2.
|
| A.20 | Collision resistance of H implies that of H3. On the other hand, for a positive fraction (half) of the (n + 1)-bit strings y, it is easy to compute a pre-image of y under H3. |
| A.21 | If y is a square root of a modulo m, then so is m – y too. |
| A.22 | Use the birthday paradox (Exercise 2.172). |
| A.23 (d) | Let L := F1(L′) and R := F1(R′) with both R and R′ non-zero. Then, F1(L ‖ R) = F2(L′ ‖ R′). |
| A.25 | Let h(i) denote the column vector of dimension 160 having the bits of H(i) as its elements and m(i) the column vector of dimension 512 + 160 = 672 having the bits of M(i) and of H(i) as its elements. Show that the modified design of SHA-1 leads to the relation h(i) ≡ Am(i–1) + c (mod 2) for some constant 160 × 672 matrix A over and for some constant vector c. So what then?
|
| C.6 | For α, , call α ≤ β if and only if |α| < |β| or |α| = |β| and α is lexicographically smaller than β. This ≤ produces a well-ordering of Σ*. For a one-way function f, look at the language for some with γ ≤ β}.
|
If you steal from one author, it’s plagiarism; if you steal from many, it’s research.
—Wilson Mizner
Literature is the question minus the answer.
—Roland Barthes
Everything that can be invented, has been invented.
—Charles H. Duell, 1899
[1] Adkins, W. A. and S. H. Weintraub (1992). Algebra: An Approach via Module Theory. Graduate Texts in Mathematics, 136. New York: Springer.
[2] Adleman, L. M., J. DeMarrais and M.-D. A. Huang (1994). “A Subexponential Algorithm for Discrete Logarithms over the Rational Subgroup of the Jacobians of Large Genus Hyperelliptic Curves over Finite Fields”, Algorithmic Number Theory—ANTS-I, Lecture Notes in Computer Science, 877. pp. 28–40. Berlin/Heidelberg: Springer.
[3] Adleman, L. M. and M.-D. A. Huang (1992). “Primality Testing and Two Dimensional Abelian Varieties over Finite Fields”, Lecture Notes in Mathematics, 1512. Berlin: Springer.
[4] Adleman, L. M., C. Pomerance and R. S. Rumely (1983). “On Distinguishing Prime Numbers from Composite Numbers”, Annals of Mathematics, 117: 173–206.
[5] Agarwal, M., N. Kayal and N. Saxena (2002), “Primes Is in P” [online document]. Available at http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf (October 2008).
[6] * Ahlfors, L. V. (1966). Complex Analysis. New York: McGraw-Hill.
[7] * Aho, A. V., J. E. Hopcroft and J. D. Ullman (1974). The Designs and Analysis of Algorithms. Reading, Massachusetts: Addison-Wesley.
[8] * Aho, A. V., J. E. Hopcroft and J. D. Ullman (1983). Data Structues and Algorithms. Reading, Massachusetts: Addison-Wesley.
[9] Aigner, M. and E. Oswald (2007), “Power Analysis Tutorial” [online document]. Available at http://www.iaik.tugraz.at/content/research/implementation_attacks/introduction_to_impa/dpa_tutorial.pdf (October 2008).
[10] Akkar, M.-L., R. Bevan, P. Dischamp and D. Moyart (2000). “Power Analysis, What Is Now Possible”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 489–502. Berlin/Heidelberg: Springer.
[11] Anderson, R. and M. Kuhn (1997). “Low Cost Attacks on Tamper Resistant Devices”, Security Protocols—5th International Workshop, Lecture Notes in Computer Science, 1361. pp. 125–136. Berlin/Heidelberg: Springer.
[12] * Apostol, T. M. (1976). Introduction to Analytic Number Theory. Undergraduate Texts in Mathematics. New York: Springer.
[13] Arnold, V. I. (1999). “Polymathematics: Is Mathematics a Single Science or a Set of Arts?”, in V. Arnold, M. Atiyah, P. Lax and B. Mazur (eds.), Mathematics: Frontiers and Perspectives, pp. 403–416. Providence, Rhode Island: American Mathematical Society.
[14] Atiyah, M. F. and I. G. MacDonald (1969). Introduction to Commutative Algebra. Reading, Massachusetts: Addison-Wesley.
[15] Aumüller, C., P. Bier, W. Fischer, P. Hofreiter and J.-P. Seifert (2002), “Fault Attacks on RSA with CRT: Concrete Results and Practical Countermeasures” [online document]. Available at http://eprint.iacr.org/2002/073 (October 2008).
[16] Balasubramanian, R. and N. Koblitz (1998). “The Improbability that an Elliptic Curve has Subexponential Discrete Log Problem under the Menezes-Okamoto Vanstone Algorithm”, Journal of Cryptology, 11: 141–145.
[17] Bao, F., R. H. Deng, Y. Han, A. B. Jeng, A. D. Narasimhalu, T.-H. Ngair (1997). “Breaking Public Key Cryptosystems on Tamper Resistant Devices in the Presence of Transient Faults”, Security Protocols—5th International Workshop, Lecture Notes in Computer Science, 1361. pp. 115–124. Berlin/Heidelberg: Springer.
[18] Bellare, M. and P. Rogaway (1995). “Optimal Asymmetric Encryption—How to Encrypt with RSA”, Advances in Cryptology—EUROCRYPT ’94, Lecture Notes in Computer Science, 950. pp. 92–111. Berlin/Heidelberg: Springer. A revised version is available at http://www-cse.ucsd.edu/users/mihir/papers/oaep.html (October 2008).
[19] Bellare, M. and P. Rogaway (1996). “The Exact Security of Digital Signatures: How to Sign with RSA and Rabin”, Advances in Cryptology—EUROCRYPT ’96, Lecture Notes in Computer Science, 1070. pp. 399–416. Berlin/Heidelberg: Springer. A revised version is available at http://www-cse.ucsd.edu/users/mihir/papers/exactsigs.html (October 2008).
[20] Bennett, C. H. and G. Brassard (1984). “Quantum Cryptography: Public Key Distribution and Coin Tossing”, pp. 175–179. Proceedings of the IEEE International Conference on Computers, Systems and Signal Processing, Bangalore, India, December.
[21] Berlekamp, E. R. (1968). Algebraic Coding Theory. New York: McGraw-Hill.
[22] Biham, E. and A. Shamir (1997). “Differential Fault Analysis of Secret Key Cryptosystems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 513–528. Berlin/Heidelberg: Springer.
[23] Blake, I. F., R. Fuji-Hara, R. C. Mullin and S. A. Vanstone (1984). “Computing Logarithms in Finite Fields of Characteristic Two”, SIAM Journal of Algebraic and Discrete Methods, 5: 276–285.
[24] Blake, I. F., G. Seroussi and N. P. Smart (1999). Elliptic Curves in Cryptography. Cambridge: Cambridge University Press.
[25] Blom, R. (1985). “An Optimal Class of Symmetric Key Generation Systems”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 335–338. Berlin/Heidelberg: Springer.
[26] Blum, L., M. Blum, and M. Shub (1986). “A Simple Unpredictable Pseudo-Random Number Generator”, SIAM Journal on Computing, 15: 364–383.
[27] Blum, M. and S. Goldwasser (1985). “An Efficient Probabilistic Public Key Encryption Scheme Which Hides All Partial Information”, Advances in Cryptology—CRYPTO ’84, Lecture Notes in Computer Science, 196. pp. 289–299. Berlin/Heidelberg: Springer.
[28] Blundo, C., A. De Santis, A. Herzberg, S. Kutten, U. Vaccaro and M. Yung (1993). “Perfectly-Secure Key Distribution for Dynamic Conferences”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 471–486. Berlin/Heidelberg: Springer.
[29] Boneh, D. (1999). “Twenty Years of Attacks on the RSA Cryptosystem”, Notices of the American Mathematical Society, 46 (2): 203–213.
[30] Boneh, D., R. A. DeMillo and R. J. Lipton (1997). “On the Importance of Checking Cryptographic Protocols for Faults”, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 37–51. Berlin/Heidelberg: Springer.
[31] Boneh, D., R. A. DeMillo and R. J. Lipton (2001). “On the Importance of Eliminating Errors in Cryptographic Computations”, Journal of Cryptology, 14 (2): 101–119.
[32] Boneh, D. and G. Durfee (1999). “Cryptanalysis of RSA with Private Key d Less Than N0.292”, Advances in Cryptology—EUROCRYPT ’99, Lecture Notes in Computer Science, 1592. pp. 1–11. Berlin/Heidelberg: Springer.
[33] Boneh, D., G. Durfee and Y. Frankel (1998). “Exposing an RSA Private Key Given a Small Fraction of Its Bits”, Advances in Cryptology—ASIACRYPT ’98, Lecture Notes in Computer Science, 1514. pp. 25–34. Berlin/Heidelberg: Springer.
[34] Boneh, D. and M. K. Franklin (2001). “Identity-based Encryption from the Weil Pairing”, Advances in Cryptology—CRYPTO 2001, Lecture Notes in Computer Science, 2139. pp. 213–229. Berlin/Heidelberg: Springer.
[35] Boneh, D. and M. K. Franklin (2003). “Identity-based Encryption from the Weil Pairing”, SIAM Journal of Computing, (32) 3: 586–615.
[36] Bressoud, D. M. (1989). Factorization and Primality Testing. Undergraduate Texts in Mathematics. New York: Springer.
[37] * Buchmann, J. A. (2004). Introduction to Cryptography. Undergraduate Texts in Mathematics. New York: Springer.
[38] Buchmann, J. A. et al. (2004), “The Number Field Cryptography Project” [online document]. Available at http://www.informatik.tu-darmstadt.de/TI/Forschung/nfc.html (October 2008).
[39] Buchmann, J. A. and S. Hamdy (2001). “A Survey on IQ Cryptography”. Technical report TI-4/01, TU Darmstadt, Fachbereich Informatik.
[40] Buchmann, J. A. and D. Weber (2000). “Discrete Logarithms: Recent Progress”, in J. Buchmann, T. Hoeholdt, H. Stichtenoth and H. Tapia-Recillas (eds.), Coding Theory, Cryptography and Related Areas, pp. 42–56. Proceedings of an International Conference on Coding Theory, Cryptography and Related Areas, Guanajuato, Mexico, April 1998.
[41] Buhler, J., H. W. Lenstra and C. Pomerance (1993). “Factoring Integers with the Number Field Sieve”, in A. K. Lenstra and H. W. Lenstra (eds.), The Development of the Number Field Sieve, Lecture Notes in Mathematics, 1554. pp. 50–94. Berlin: Springer.
[42] * Burton, D. M. (1998). Elementary Number Theory, 4th ed. New York: McGraw-Hill.
[43] Cantor, D. G. (1994). “On the Analogue of Division Polynomials for Hyperelliptic Curves”, Journal für die reine und angewandte Mathematik, 447: 91–145.
[44] Chan, H., A. Perrig and D. Song (2003). “Random Key Predistribution Schemes for Sensor Networks”, pp. 197–213. Proeedings of the 24th IEEE Symposium on Research in Security and Privacy, Berkeley, California, 11–14 May.
[45] Chari, S., C. S. Jutla, J. R. Rao, and P. Rohatgi (1999). “Towards Sound Approaches to Counteract Power-Analysis Attacks”, Advances in Cryptology—CRYPTO ’99, Lecture Notes in Computer Science, 1666. pp. 398–412. Berlin/Heidelberg: Springer.
[46] Charlap, L. S. and R. Coley (1990). “An Elementary Introduction to Elliptic Curves II”, CCR Expository Report 34.
[47] Charlap, L. S. and D. P. Robbins (1988). “An Elementary Introduction to Elliptic Curves”, CRD Expository Report 31.
[48] Chaum, D. (1983). “Blind Signatures for Untraceable Payments”, Advances in Cryptology—CRYPTO ’82. pp. 199–203. New York: Plenum Press.
[49] Chaum, D. (1985). “Security Without Identification: Transaction System to Make Big Brother Obsolete”, Communications of the ACM, 28 (10): 1030–1044.
[50] Chaum, D. (1989). “Privacy Protected Payments: Unconditional Payer and/or Payee Untraceability”, Smart Card 2000: The Future of IC Cards, pp. 69–93. Amsterdam: North-Holland.
[51] Chaum, D. (1990). “Zero-Knowledge Undeniable Signatures”, Advances in Cryptology—CRYPTO ’90, Lecture Notes in Computer Science, 473. pp. 458–464. Berlin/Heidelberg: Springer.
[52] Chaum, D. and H. van Antwerpen (1989). “Undeniable Signatures”, Advances in Cryptology—CRYPTO ’89, Lecture Notes in Computer Science, 435. pp. 212–217. Berlin/Heidelberg: Springer.
[53] Chaum, D., E. van Heijst and B. Pfitzmann (1991). “Cryptographically Strong Undeniable Signatures, Unconditionally Secure for the Signer”, Advances in Cryptology—CRYPTO ’91, Lecture Notes in Computer Science, 576. pp. 470–484. Berlin/Heidelberg: Springer.
[54] Chor, B. and R. L. Rivest (1988). “A Knapsack Type Cryptosystem Based on Arithmetic in Finite Fields”, IEEE Transactions on Information Theory, 34: 901–909.
[55] Clavier, C., J.-S. Coron and N. Dabbous (2000). “Differential Power Analysis in the Presence of Hardware Countermeasures”, Cryptographic Hardware and Embedded Systems—CHES 2000, Lecture Notes in Computer Science, 1965. pp. 252–263. Berlin/Heidelberg: Springer.
[56] Cohen, H. (1993). A Course in Computational Algebraic Number Theory. Graduate Texts in Mathematics, 138. New York: Springer.
[57] Coppersmith, D. (1984). “Fast Evaluation of Logarithms in Fields of Characteristic Two”, IEEE Transactions on Information Theory, 30: 587–594.
[58] Coppersmith, D. (1994). “Solving Homogeneous Equations over GF[2] via Block Wiedemann Algorithm”, Mathematics of Computation, 62: 333–350.
[59] Coppersmith, D., A. M. Odlyzko and R. Schroeppel (1986). “Discrete Logarithms in GF (p)”, Algorithmica, 1: 1–15.
[60] Coppersmith, D. and S. Winograd (1982). “On the Asymptotic Complexity of Matrix Multiplication”, SIAM Journal of Computing, 11 (3): 472–492.
[61] * Cormen, T. H., C. E. Lieserson, R. L. Rivest and C. Stein (2001). Introduction to Algorithms, 2nd ed. Cambridge, Massachusetts: MIT Press.
[62] Coron, J.-S. (1999). “Resistance Against Differential Power Analysis for Elliptic Curve Cryptosystems”, Cryptographic Hardware and Embedded Systems—CHES 1999, Lecture Notes in Computer Science, 1965. pp. 292–302. Berlin/Heidelberg: Springer.
[63] Coron, J.-S., L. Goubin (2000). “On Boolean and Arithmetic Masking Against Differential Power Analysis”, Cryptographic Hardware and Embedded Systems—CHES 2000, Lecture Notes in Computer Science, 1965. pp. 231–237. Berlin/Heidelberg: Springer.
[64] Coster, M. J., A. Joux, B. A. LaMacchia, A. M. Odlyzko, C. P. Schnorr and J. Stern (1992). “Improved Low-Density Subset Sum Algorithms”, Computational Complexity, 2: 111–128.
[65] Coster, M. J., B. A. LaMacchia, A. M. Odlyzko and C. P. Schnorr (1991). “An Improved Low-Density Subset Sum Algorithm”, Advances in Cryptology—EUROCRYPT ’91, Lecture Notes in Computer Science, 547. pp. 54–67. Berlin/Heidelberg: Springer.
[66] Courtois, N. (2003). “Fast Algebraic Attacks on Stream Ciphers with Linear Feedback”, Advances in Cryptology—CRYPTO 2003, Lecture Notes in Computer Science, 2729. pp. 177–194. Berlin/Heidelberg: Springer.
[67] Courtois, N. and W. Meier (2003). “Algebraic Attacks on Stream Ciphers with Linear Feedback”, Advances in Cryptology—EUROCRYPT 2003, Lecture Notes in Computer Science, 2656. pp. 345–359. Berlin/Heidelberg: Springer.
[68] Courtois, N. and J. Pieprzyk (2003). “Cryptanalysis of Block Ciphers with Overdefined Systems of Equations”, Advances in Cryptology—ASIACRYPT 2002, Lecture Notes in Computer Science, 2501. pp. 267–287. Berlin/Heidelberg: Springer.
[69] Crandall, R. and C. Pomerance (2001). Prime Numbers: A Computational Perspective. New York: Springer.
[70] Crépeau, C. and A. Slakmon (2003). “Simple Backdoors for RSA Key Generation”, Topics in Cryptology—CT-RSA 2003, Lecture Notes in Computer Science, 2612. pp. 403–416. Berlin/Heidelberg: Springer.
[71] Daemen, J. and V. Rijmen (2002). The Design of Rijndael: AES—The Advanced Encryption Standard. New York: Springer.
[72] Das, A. (1999). Galois Field Computations: Implementation of a Library and a Study of the Discrete Logarithm Problem [dissertation]. Bangalore, India: Indian Institute of Science.
[73] Das, A. and C. E. Veni Madhavan (1999). “Performance Comparison of Linear Sieve and Cubic Sieve Algorithms for Discrete Logarithms over Prime Fields”, Algorithms and Computation, ISAAC ’99, Lecture Notes in Computer Science, 1741. pp. 295–306. Berlin/Heidelberg: Springer.
[74] * Delfs, H. and H. Knebl (2007). Introduction to Cryptography: Principles and Applications, 2nd ed. Berlin and New York: Springer.
[75] Deutsch, D. (1985). “Quantum Theory, the Church-Turing Principle and the Universal Quantum Computer”. Proceedings of the Royal Society of London, Series A, 400. pp. 97–117.
[76] Deutsch, D. (1998). The Fabric of Reality: The Science of Parallel Universes—and Its Implications. London: Penguin.
[77] Dhem, J.-F., F. Koeune, P.-A. Leroux, P. Mestré, J.-J. Quisquater and J.-L. Willems (2000). “A Practical Implementation of the Timing Attack”, in J.-J. Quisquater and B. Schneier (eds.), Smart Card: Research and Applications, Lecture Notes in Computer Science, 1820. Proceedings of the Third Working Conference on Smart Card Research and Advanced Applications—CARDIS ’98, Louvain-la-Neuve, Belgium, 14–16 September 1998. Springer.
[78] Diffie, W. and M. Hellman (1976). “New Directions in Cryptography”, IEEE Transactions on Information Theory, 22: 644–654.
[79] Du, W., J. Deng, Y. S. Han and P. K. Varshney (2003). “Establishing Pairwise Keys in Distributed Sensor Networks”, pp. 42–51. Proceedings of the 10th ACM Conference on Computer and Communication Security, Washington D.C., USA, 27–30 October.
[80] Du, W., J. Deng, Y. S. Han, S. Chen and P. K. Varshney (2004). “A Key Management Scheme for Wireless Sensor Networks Using Deployment Knowledge”. Proceedings of IEEE INFOCOM 2004, Hong Kong, 7–11 March.
[81] * Dummit, D. and R. Foote (2004). Abstract Algebra, 3rd ed. Somerset, New Jersey: John Wiley & Sons.
[82] Durfee, G. and P. Q. Nguyen (2000). “Cryptanalysis of the RSA Schemes with Short Secret Exponent from Asiacrypt ’99”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 30–44. Berlin/Heidelberg: Springer.
[83] Dusart, P. (1999). “The kth Prime Is Greater than k(ln k+ln ln k–1) for k > 2”, Mathematics of Computation, 68: 411–415.
[84] ElGamal, T. (1985). “A Public-Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms”, IEEE Transactions on Information Theory, 31: 469–472.
[85] Elkies, N. D. (1998). “Elliptic and Modular Curves over Finite Fields and Related Computational Issues”, AMS/IP Studies in Advanced Mathematics, 7: 21–76.
[86] Enge, A. (1999). “Computing Discrete Logarithms in High-Genus Hyperelliptic Jacobians in Provably Subexponential Time”. Technical report CORR 99-04, University of Waterloo, Canada.
[87] Enge, A. and P. Gaudry (2002). “A General Framework for Subexponential Discrete Logarithm Algorithms”, Acta Arithmetica, 102 (1): 83–103.
[88] Eschenauer, L. and V. D. Gligor (2002). “A Key-Management Scheme for Distributed Sensor Networks”. Proceedings of the 9th ACM Conference on Computer and Communication Security, pp. 41–47. Washington D.C., USA, 18–22 November.
[89] * Esmonde, J. and M. Ram Murty (1999). Problems in Algebraic Number Theory. Graduate Texts in Mathematics, 190. New York: Springer.
[90] Fiat, A. and A. Shamir (1987). “How to Prove Yourself: Practical Solutions to Identification and Signature Problems”, Advances in Cryptology—CRYPTO ’86, Lecture Notes in Computer Science, 263. pp. 186–194. Berlin/Heidelberg: Springer.
[91] Feige, U., A. Fiat, and A. Shamir (1988). “Zero-Knowledge Proofs of Identity”, Journal of Cryptology, 1: 77–94.
[92] * Feller, W. (1966). Introduction to Probability Theory and Its Applications, 3rd ed. New York: John Wiley & Sons.
[93] Ferguson, N., J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner and D. Whiting (2000). “Improved Cryptanalysis of Rijndael”, Fast Software Encryption—FSE 2000, Lecture Notes in Computer Science, 1978. pp. 213–230. Berlin/Heidelberg: Springer.
[94] Fouquet, M., P. Gaudry and R. Harley (2000). “An Extension of Satoh’s Algorithm and Its Implementation”, Journal of Ramanujan Mathematical Society, 15: 281–318.
[95] Fouquet, M., P. Gaudry and R. Harley (2001). “Finding Secure Curves with the Satoh-FGH Algorithm and an Early-Abort Strategy”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. Berlin/Heidelberg: Springer.
[96] * Fraleigh, J. B. (1998). A First Course in Abstract Algebra, 6th ed. Reading, Massachusetts: Addison-Wesley.
[97] Fujisaki, E., T. Kobayashi, H. Morita, H. Oguro, T. Okamoto, S. Okazaki, D. Pointcheval and S. Uchiyama (1999). “EPOC: Efficient Probabilistic Public-Key Encryption”, contribution to IEEE P1363a.
[98] Fujisaki, E., T. Okamoto, D. Pointcheval, J. Stern (2001). “RSA-OAEP is Secure under the RSA Assumption”, Advances in Cryptology—CRYPTO 2001, Lecture Notes in Computer Science, 2139. pp. 260–274. Berlin/Heidelberg: Springer.
[99] Fulton, W. (1969). Algebraic Curves. Mathematics Lecture Notes Series. New York: W. A. Benjamin.
[100] Galbraith, S. D. (2003). “Weil Descent of Jacobians”, Discrete Applied Mathematics, 128 (1): 165–180.
[101] Galbraith, S. D., F. Hess and N. P. Smart (2002). “Extending the GHS Weil Descent Attack”, Advances in Cryptology—EUROCRYPT 2002, Lecture Notes in Computer Science, 2332. pp. 29–44. Berlin/Heidelberg: Springer.
[102] Galbraith, S. D., W. Mao, and K. G. Paterson (2002). “RSA-based Undeniable Signatures for General Moduli”, Topics in Cryptology—CT-RSA 2002, Lecture Notes in Computer Science, 2271. pp. 200–217. Berlin/Heidelberg: Springer.
[103] Gathen, J. von zur and J. Gerhard (1999). Modern Computer Algebra. Cambridge: Cambridge University Press.
[104] Gathen, J. von zur and V. Shoup (1992). “Computing Frobenius Maps and Factoring Polynomials”, pp. 97–105. Proceedings of the 24th Annual ACM Symposium on Theory of Computing, Victoria, British Columbia, Canada.
[105] Gaudry, P. (2000). “An Algorithm for Solving the Discrete Log Problem on Hyperelliptic Curves”, Advances in Cryptology—EUROCRYPT 2000, Lecture Notes in Computer Science, 1807. pp. 19–34. Berlin/Heidelberg: Springer.
[106] Gaudry, P. and R. Harley (2000). “Counting Points on Hyperelliptic Curves over Finite Fields”, Algorithmic Number Theory—ANTS-IV, Lecture Notes in Computer Science, 1838. pp. 313–332. Berlin/Heidelberg: Springer.
[107] Gaudry, P., F. Hess and N. P. Smart (2002). “Constructive and Destructive Facets of Weil Descent on Elliptic Curves”, Journal of Cryptology, 15 (1): 19–46.
[108] Geddes, K. O., S. R. Czapor and G. Labahn (1992). Algorithms for Computer Algebra. Boston: Kluwer Academic Publishers.
[109] Gennaro, R., H. Krawczyk and T. Rabin (2000). “RSA-based Undeniable Signatures”, Journal of Cryptology, 13 (4): 397–416.
[110] Gentry, C., J. Jonsson, M. Szydlo and J. Stern (2001). “Cryptanalysis of the NTRU Signature Scheme (NSS) from Eurocrypt 2001”, Advances in Cryptology—ASIACRYPT 2001, Lecture Notes in Computer Science, 2248. pp. 1–20. Berlin/Heidelberg: Springer.
[111] Gentry, C. and M. Szydlo (2002). “Cryptanalysis of the NTRU Signature Scheme”, Advances in Cryptology—EUROCRYPT ’02, Lecture Notes in Computer Science, 2332. pp. 299–320. Berlin/Heidelberg: Springer.
[112] Gilbert, H. and M. Minier (2000). “A Collision Attack on Seven Rounds of Rijndael”, pp. 230–241. Proceedings of the 3rd AES Conference, NIST, New York, April 2000.
[113] * Goldreich, O. (2001). Foundations of Cryptography, Volume 1: Basic Tools. Cambridge: Cambridge University Press.
[114] * Goldreich, O. (2004). Foundations of Cryptography, Volume 2: Basic Applications. Cambridge: Cambridge University Press.
[115] Goldreich, O., S. Goldwasser and S. Halevi (1997). “Public-key Cryptosystems from Lattice Reduction Problems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 112–131. Berlin/Heidelberg: Springer.
[116] Goldwasser, S. and J. Kilian (1986). “Almost All Primes Can Be Quickly Certified”, pp. 316–329. Prodeedings of the 18th Annual ACM Symposium on Theory of Computing, Berkeley, California.
[117] Goldwasser, S. and S. Micali (1984). “Probabilistic Encryption”, Journal of Computer and Systems Sciences, 28: 270–299.
[118] Gordon, D. M. (1985). “Strong Primes are Easy to Find”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 216–223. Berlin/Heidelberg: Springer.
[119] Gordon, D. M. (1993). “Discrete Logarithms in GF (p) Using the Number Field Sieve”, SIAM Journal of Discrete Mathematics, 6: 124–138.
[120] Gordon, D. M. and K. S. McCurley (1992). “Massively Parallel Computation of Discrete Logarithms”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 312–323. Berlin/Heidelberg: Springer.
[121] Grinstead, C. M. and J. L. Snell (1997). Introduction to Probability, 2nd revised ed. Providence, Rhode Island: American Mathematical Society. The book is also available at http://www.dartmouth.edu/~chance/book.html (October 2008).
[122] Guillou, L. C. and J.-J. Quisquater (1988). “A Practical Zero-Knowledge Protocol Fitted to Security Microprocessor Minimizing Both Trasmission and Memory”, Advances in Cryptology—EUROCRYPT ’88, Lecture Notes in Computer Science, 330. pp. 123–128. Berlin/Heidelberg: Springer.
[123] Hankerson, D., A. J. Menezes and S. Vanstone (2004). Guide to Elliptic Curve Cryptography. New York: Springer.
[124] Hartshorne, R. (1977). Algebraic Geometry. Graduate Texts in Mathematics, 52. New York, Heidelberg and Berlin: Springer.
[125] * Herstein, I. N. (1975). Topics in Algebra. New York: John Wiley & Sons.
[126] Hess, F., G. Seroussi and N. P. Smart (2000). “Two Topics in Hyperelliptic Cryptography”. HP Labs technical report HPL-2000-118.
[127] * Hoffman, K. and R. Kunze (1971). Linear Algebra. Englewood Cliffs, New Jersey: Prentice-Hall.
[128] Hoffstein, J., N. Howgrave-Graham, J. Pipher, J. H. Silverman and W. White (2003). “NTRUSign: Digital Signatures Using the NTRU Lattice”, Topics in Cryptology—CT-RSA 2003, Lecture Notes in Computer Science, 2612. pp. 122–140. Berlin/Heidelberg: Springer.
[129] Hoffstein, J., N. Howgrave-Graham, J. Pipher, J. H. Silverman and W. White (2005). “Performance Improvements and a Baseline Parameter Generation Algorithm for NTRUSign”, Workshop on Mathematical Problems and Techniques in Cryptology, Barcelona, Spain, June 2005. Also available at http://www.ntru.com/cryptolab/articles.htm (October 2008).
[130] Hoffstein, J., J. Pipher and J. H. Silverman (1998). “NTRU: A Ring-Based Public Key Cryptosystem”, Algorithmic Number Theory—ANTS-III, Lecture Notes in Computer Science, 1423. pp. 267–288. Berlin/Heidelberg: Springer.
[131] Hoffstein, J., J. Pipher and J. H. Silverman (2001). “NSS: An NTRU Lattice-Based Signature Scheme”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. pp. 211–228. Berlin/Heidelberg: Springer.
[132] Horster, P., M. Michels and H. Petersen (1994). “Meta-ElGamal Signature Schemes”. Technical report TR-94-5-F, Department of Computer Science, Teschnische Universität, Chemnitz-Zwickau.
[133] * Hungerford, T. W. (1974). Algebra, 5th ed. Graduate Texts in Mathematics, 73. Berlin: Springer.
[134] IEEE (2008), “Standard Specifications for Public-Key Cryptography” [online document]. Available at http://grouper.ieee.org/groups/1363/index.html (October 2008).
[135] IETF (2008), “The Internet Engineering Task Force” [online document]. Available at http://www.ietf.org/ (October 2008).
[136] * Ireland, K. and M. Rosen (1990). A Classical Introduction to Modern Number Theory. Graduate Texts in Mathematics, 84. New York: Springer.
[137] Izu, T., B. Möller and T. Takagi (2002). “Improved Elliptic Curve Multiplication Methods Resistant Against Side Channel Attacks”, Progress in Cryptology—INDOCRYPT 2002, Lecture Notes in Computer Science, 2551. pp. 296–313. Berlin/Heidelberg: Springer.
[138] Izu, T. and T. Takagi (2002). “A Fast Parallel Elliptic Curve Multiplication Resistant Against Side Channel Attacks”, Public Key Cryptography—PKC 2002, Lecture Notes in Computer Science, 2274. pp. 280–296. Berlin/Heidelberg: Springer. An improved version of this paper is published as the technical report CORR 2002-03 of the Centre for Applied Cryptographic Research, University of Waterloo, Canada, and is available at http://www.cacr.math.uwaterloo.ca/ (October 2008).
[139] Jacobson, M. J., N. Koblitz, J. H. Silverman, A. Stein and E. Teske (2000). “Analysis of the Xedni Calculus Attack”, Design, Codes and Cryptography, 20: 41–64.
[140] Janusz, G. J. (1995). Algebraic Number Fields. Providence, Rhode Island: American Mathematical Society.
[141] Johnson, D. and A. Menezes (1999). “The Elliptic Curve Digitial Signature Algorithm (ECDSA)”. Technical report CORR 99-34, Department of Combinatorics and Optimization, University of Waterloo, Canada. Also published in International Journal on Information Security (2001), 1: 36–63.
[142] Joye, M., A. K. Lenstra and J.-J. Quisquater (1999). “Chinese Remaindering Based Cryptosystems in the Presence of Faults”, Journal of Cryptology, 12 (4): 241–246.
[143] Kaltofen, E. and V. Shoup (1995). “Subquadratic-Time Factoring of Polynomials over Finite Fields”, pp. 398–406. Proceedings of the 27th Annual ACM Symposium on Theory of Computing, Las Vegas, Nevada.
[144] Kampkötter, W. (1991). Explizite Gleichungen für Jacobishe Varietäten hyperelliptischer Kurven [dissertation]. Essen: Gesamthochschule.
[145] Katz, J. and Y. Lindell (2007). Introduction to Modern Cryptography. Boca Raton, Florida; London and New York: CRC Press.
[146] Kaye, P. and C. Zalka (2004), “Optimized Quantum Implementation of Elliptic Curve Arithmetic over Binary Fields” [online document]. Available at http://arxiv.org/abs/quant-ph/0407095 (October 2008).
[147] * Knuth, D. E. (1997). The Art of Computer Programming, Volume 2: Seminumerical Algorithms. Reading, Massachusetts: Addison-Wesley.
[148] Ko, K. H., S. J. Lee, J. H. Cheon, J. W. Han, J. S. Kang and C. S. Park (2000). “New Public-Key Cryptosystem Using Braid Groups”, Advances in Cryptology—CRYPTO 2000, Lecture Notes in Computer Science, 1880. pp. 166–183. Berlin/Heidelberg: Springer.
[149] Koblitz, N. (1984). p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd ed. Graduate Texts in Mathematics, 58. New York, Heidelberg and Berlin: Springer.
[150] Koblitz, N. (1987). “Elliptic Curve Cryptosystems”, Mathematics of Computation, 48: 203–209.
[151] Koblitz, N. (1989). “Hyperelliptic Cryptosystems”, Journal of Cryptology, 1: 139–150.
[152] Koblitz, N. (1993). Introduction to Elliptic Curves and Modular Forms, 2nd ed. Graduate Texts in Mathematics, 97. Berlin: Springer.
[153] * Koblitz, N. (1994). A Course in Number Theory and Cryptography, 2nd ed. New York:Springer.
[154] Koblitz, N. (1998). Algebraic Aspects of Cryptography. New York: Springer.
[155] Kocher, P. C. (1996). “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”, Advances in Cryptology—CRYPTO ’96, Lecture Notes in Computer Science, 1109. pp. 104–113. Berlin/Heidelberg: Springer.
[156] Kocher, P. C., J. Jaffe and B. Jun (1999). “Differential Power Analysis”, Advances in Cryptology—CRYPTO ’99, Lecture Notes in Computer Science, 1666. pp. 388–397. Berlin/Heidelberg: Springer.
[157] Lagarias, J. C. and A. M. Odlyzko (1985). “Solving Low-Density Subset Sum Problems”, Journal of ACM, 32: 229–246.
[158] LaMacchia, B. A. and A. M. Odlyzko (1991a). “Computation of Discrete Logarithms in Prime Fields”, Designs, Codes and Cryptography, 1: 46–62.
[159] LaMacchia, B. A. and A. M. Odlyzko (1991b). “Solving Large Sparse Linear Systems over Finite Fields”, Advances in Cryptology—CRYPTO ’90, Lecture Notes in Computer Science, 537. pp. 109–133. Berlin/Heidelberg: Springer.
[160] Lang, S. (1994). Algebraic Number Theory. Graduate Texts in Mathematics, 110. New York: Springer.
[161] Law, L., A. Menezes, A. Qu, J. Solinas and S. Vanstone (1998). “An Efficient Protocol for Authenticated Key Agreement”. Technical report CORR 98-05, Department of Combinatorics and Optimization, University of Waterloo, Canada.
[162] Lehmer, D. H. and R. E. Powers (1931). “On Factoring Large Numbers”, Bulletin of the AMS, 37: 770–776.
[163] Lenstra, A. K., E. Tromer, A. Shamir, W. Kortsmit, B. Dodson, J. Hughes and P. Leyland (2003). “Factoring Estimates for a 1024-Bit RSA Modulus”, Advances in Cryptology—ASIACRYPT 2003, Lecture Notes in Computer Science, 2894. pp. 55–74. Berlin/Heidelberg: Springer.
[164] Lenstra, A. K. and H. W. Lenstra (1990). “Algorithms in Number Theory”, in J. van Leeuwen (ed.), Handbook of Theoretical Computer Science, Volume A, pp. 675–715, Amsterdam: Elsevier.
[165] Lenstra, A. K. and H. W. Lenstra (ed.) (1993). The Development of the Number Field Sieve. Lecture Notes in Mathematics, 1554. Berlin: Springer.
[166] Lenstra, A. K., H. W. Lenstra and L. Lovasz (1982). “Factoring Polynomials with Rational Coefficients”, Mathematische Annalen, 261: 515–534.
[167] Lenstra, A. K., H. W. Lenstra, M. S. Manasse and J. M. Pollard (1990). “The Number Field Sieve”, pp. 564–572. Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, Baltimore, Maryland, USA, 13–17 May.
[168] Lenstra, A. K. and A. Shamir (2000). “Analysis and Optimization of the TWINKLE Factoring Device”, Advances in Cryptology—EUROCRYPT 2000, Lecture Notes in Computer Science, 1807. pp. 35–52. Berlin/Heidelberg: Springer.
[169] Lenstra, A. K., A. Shamir, J. Tomlinson and E. Tromer (2002). “Analysis of Bernstein’s Factorization Circuit”, Advances in Cryptology—ASIACRYPT 2002, Lecture Notes in Computer Science, 2501. pp. 1–26. Berlin/Heidelberg: Springer.
[170] Lenstra, A. K. and E. R. Verheul (2000a). “The XTR Public Key System”, Advances in Cryptology—CRYPTO 2000, Lecture Notes in Computer Science, 1880. pp. 1–20. Berlin/Heidelberg: Springer.
[171] Lenstra, A. K. and E. R. Verheul (2000b). “Key Improvements to XTR”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 220–233. Berlin/Heidelberg: Springer.
[172] Lenstra, A. K. and E. R. Verheul (2001a). “An Overview of the XTR Public Key System”, pp. 151–180. Proceedings of the Public Key Cryptography and Computational Number Theory Conference, Warsaw, Poland, 2000. Berlin: Verlages Walter de Gruyter.
[173] Lenstra, A. K. and E. R. Verheul (2001b). “Fast Irreducibility and Subgroup Membership Testing in XTR”, Public Key Cryptography—PKC 2001, Lecture Notes in Computer Science, 1992. pp. 73–86. Berlin/Heidelberg: Springer.
[174] Lenstra, H. W. (1987). “Factoring Integers with Elliptic Curves”, Annals of Mathematics, 126: 649–673.
[175] Lenstra, H. W. and C. Pomerance (2005), “Primality Testing with Gaussian Periods” [online document]. Available at http://www.math.dartmouth.edu/~carlp/PDF/complexity12.pdf (October 2008).
[176] Lercier, R. (1997). “Finding Good Random Elliptic Curves for Cryptosystems Defined over
“, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 379–392. Berlin/Heidelberg: Springer.
[177] Lercier, R. and D. Lubicz (2003). “Counting Points on Elliptic Curves over Finite Fields of Small Characteristic in Quasi Quadratic Time”, Advances in Cryptology—EUROCRYPT 2003, Lecture Notes in Computer Science, 2656. pp. 360–373. Berlin/Heidelberg: Springer.
[178] Libert, B. and J.-J. Quisquater (2003), “New Identity Based Signcryption Schemes from Pairings” [online document]. Available at http://eprint.iacr.org/2003/023/ (October 2008).
[179] Lidl, R. and H. Niederreiter (1984). Finite Fields, Encyclopedia of Mathematics and Its Applications, 20. Cambridge: Cambridge University Press.
[180] Lidl, R. and H. Niederreiter (1994). Introduction to Finite Fields and Their Applications. Cambridge: Cambridge University Press.
[181] Liu, D. and P. Ning (2003a). “Establishing Pairwise Keys in Distributed Sensor Networks”, pp. 52–61. Proceedings of the 10th ACM Conference on Computer and Communication Security, Washington D.C., USA, October 2003.
[182] Liu, D. and P. Ning (2003b). “Location-Based Pairwise Key Establishments for Static Sensor Networks”, pp. 72–82. Proceedings of the 1st ACM Workshop on Security in Ad Hoc and Sensor Networks, Fairfax, Virginia, 31 October 2003.
[183] Liu, D., P. Ning and R. Li (2005). “Establishing Pairwise Keys in Distributed Sensor Networks”, ACM Transactions on Information and System Security, (8) 1: 41–77.
[184] Lucks, S. (2000). “Attacking Seven Rounds of Rijndael Under 192-bit and 256-bit Keys”, pp. 215–229. Proceedings of the 3rd Advanced Encryption Standard Candidate conference, New York, April 2000.
[185] Malone-Lee, J. (2002), “Identity-Based Signcryption” [online document]. Available at http://eprint.iacr.org/2002/098/ (October 2008).
[186] Mao, W. (2001). “New Zero-Knowledge Undeniable Signatures—Forgery of Signature Equivalent to Factor-isation”. Hewlett-Packard technical report HPL-2201-36.
[187] Mao, W. and K. G. Paterson (2000). “Convertible Undeniable Standard RSA Signatures”. Hewlett-Packard technical report HPL-2000-148.
[188] Matsumoto, T. and H. Imai (1988). “Public Quadratic Polynomial-Tuples for Efficient Signature-Verification and Message-Encryption”, Advances in Cryptology—EUROCRYPT ’88, Lecture Notes in Computer Science, 330. pp. 419–453. Berlin/Heidelberg: Springer.
[189] McCurley, K. S. (1990). “The Discrete Logarithm Problem”, in C. Pomerance and S. Goldwasser (eds.), Cryptology and Computational Number Theory: American Mathematical Society Short Course, Boulder, Colorado, 6–7 August 1989. Proceedings of Symposia in Applied Mathematics, 42. pp. 49–74. Providence, Rhode Island: American Mathematical Society.
[190] McEliece, R. J. (1978). “A Public-Key Cryptosystem Based on Algebraic Coding Theory”. DSN progress report 42–44, Jet Propulsion Laboratory, California Institute of Technology, pp. 114–116.
[191] Menezes, A. J. (ed.) (1993). Applications of Finite Fields. Boston: Kluwer Academic Publishers.
[192] Menezes, A. J. (1993). Elliptic Curve Public Key Cryptosystems. The Springer International Series in Engineering and Computer Science, 234. Springer. Available at http://books.google.co.in/books?id=bIb54ShKS68C (October 2008).
[193] Menezes, A. J., T. Okamoto and S. Vanstone (1993). “Reducing Elliptic Curve Logarithms to a Finite Field”, IEEE Transactions on Information Theory, 39: 1639–1646.
[194] Menezes, A. J., P. van Oorschot and S. Vanstone (1997). Handbook of Applied Cryptography. Boca Raton, Florida: CRC Press.
[195] Menezes, A. J., Y. Wu and R. Zuccherato (1996). “An Elementary Introduction to Hyperelliptic Curves”. CACR technical report CORR 96-19, University of Waterloo, Canada.
[196] Merkle, R. C. amd M. E. Hellman (1978). “Hiding Information and Signatures in Trapdoor Knapsacks”, IEEE Transactions on Information Theory, 24 (5): 525–530.
[197] Mermin, N. D. (2003). “From Cbits to Qbits: Teaching Computer Scientists Quantum Mechanics”, American Journal of Physics, 71: 23–30.
[198] Mermin, N. D. (2006), “Phys481-681-CS483 Lecture Notes and Homework Assignments” [online document]. Available at http://people.ccmr.cornell.edu/~mermin/qcomp/CS483.html (October 2008).
[199] Messerges, T. S. (2000). “Securing the AES Finalists Against Power Analysis Attacks”, Fast Software Encryption—FSE 2000, Lecture Notes in Computer Science, 1978. pp. 150–164. Berlin/Heidelberg: Springer.
[200] Messerges, T. S., E. A. Dabbish and R. H. Sloan (1999). “Power Analysis Attacks of Modular Exponentiation in Smartcards”, Cryptographic Hardware and Embedded Systems—CHES 1999, Lecture Notes in Computer Science, 1717. pp. 144–157. Berlin/Heidelberg: Springer.
[201] Messerges, T. S., E. A. Dabbish and R. H. Sloan (2002). “Examining Smart-Card Security Under the Threat of Power Analysis Attacks”, IEEE Transactions on Computers, 51 (4): 541–552.
[202] Michels, M. and M. Stadler (1997). “Efficient Convertible Undeniable Signature Schemes”, pp. 231–244. Proceedings of the 4th International Workshop on Selected Areas in Cryptography, Ottawa, Canada.
[203] Mignotte, M. (1992). Mathematics for Computer Algebra. New York: Springer.
[204] Miller, G. L. (1976). “Riemann’s Hypothesis and Tests for Primality”, Journal of Computer and System Sciences, 13: 300–317.
[205] Miller, V. (1986). “Uses of Elliptic Curves in Cryptography”, Advances in Cryptology—CRYPTO ’85, Lecture Notes in Computer Science, 18. pp. 417–426. Berlin/Heidelberg: Springer.
[206] Möller, B. (2001). “Securing Elliptic Curve Point Multiplication Against Side-Channel Attacks”, Information Security Conference, Lecture Notes in Computer Science, 2200. pp. 324–334. Berlin/Heidelberg: Springer.
[207] Mollin, R. A. (1998). Fundamental Number Theory with Applications. Boca Raton, Florida: Chapman & Hall/CRC.
[208] Mollin, R. A. (1999). Algebraic Number Theory. Boca Raton, Florida: Chapman & Hall/CRC.
[209] Mollin, R. A. (2001). An Introduction to Cryptography. Boca Raton, Florida: Chapman & Hall/CRC.
[210] Montgomery, P. L. (1985). “Modular Multiplication Without Trial Division”, Mathematics of Computation, 44: 519–521.
[211] Montgomery, P. L. (1994). “A Survey of Modern Integer Factorization Algorithms”, CWI Quarterly, 7 (4): 337–366.
[212] Montgomery, P. L. (1995). “A Block Lanczos Algorithm for Finding Dependencies over GF(2)”, Advances in Cryptology—EUROCRYPT ’95, Lecture Notes in Computer Science, 921. pp. 106–120. Berlin/Heidelberg: Springer.
[213] Morrison, M. A. and J. Brillhart (1975). “A Method of Factoring and a Factorization of F7”, Mathematics of Computation, 29: 183–205.
[214] * Motwani, R. and P. Raghavan (1995). Randomized Algorithms. Cambridge: Cambridge University Press.
[215] Muir, J. A. (2001). Techniques of Side Channel Cryptanalysis [dissertation]. Canada: University of Waterloo. Available at http://www.uwspace.uwaterloo.ca/bitstream/10012/1098/1/jamuir2001.pdf (October 2008).
[216] Neukirch, J. (1999). Algebraic Number Theory. Berlin and Heidelberg: Springer.
[217] Nguyen, P. Q. (2006), “A Note on the Security of NTRUSign” [online document]. Available at http://eprint.iacr.org/2006/387 (October 2008).
[218] * Nielsen, M. A. and I. L. Chuang (2000). Quantum Computation and Quantum Information. Cambridge: Cambridge University Press.
[219] NIST (2001), “Advanced Encryption Standard” [online document]. Available at http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf (October 2008).
[220] NIST (2006), “Digital Signature Standard (DSS)” [online document]. Available at http://csrc.nist.gov/publications/drafts/fips_186-3/Draft-FIPS-186-3%20_March2006.pdf (October 2008).
[221] NIST (2007a), “Federal Information Processing Standards” [online document]. Available at http://csrc.nist.gov/publications/PubsFIPS.html (October 2008).
[222] NIST (2007b), “Secure Hash Standard (SHS)” [online document]. Available at http://csrc.nist.gov/publications/drafts/fips_180-3/draft_fips-180-3_June-08-2007.pdf (October 2008).
[223] Nyberg, K. and R. A. Rueppel (1993). “A New Signature Scheme Based on the DSA Giving Message Recovery”, pp. 58–61. Proceedings of the 1st ACM Conference on Computer and Communications Security, Fairfax, Virginia, 3–5 November.
[224] Nyberg, K. and R. A. Rueppel (1995). “Message Recovery for Signature Schemes Based on the Discrete Logarithm Problem”, Advances in Cryptology—EUROCRYPT ’94, Lecture Notes in Computer Science, 950. pp. 182–193. Berlin/Heidelberg: Springer.
[225] Odlyzko, A. M. (1985). “Discrete Logarithms and Their Cryptographic Significance”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 224–314. Berlin/Heidelberg: Springer.
[226] Odlyzko, A. M. (2000). “Discrete Logarithms: The Past and the Future”, Designs, Codes and Cryptography, 19: 129–145.
[227] Okamoto, T. (1992). “Provably Secure and Practical Identification Schemes and Corresponding Signature Schemes”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 31–53. Berlin/Heidelberg: Springer.
[228] Okamoto, T., E. Fujisaki and H. Morita (1998). “TSH-ESIGN: Efficient Digital Signature Scheme Using Trisection Size Hash”, submission to IEEE P1363a.
[229] Papadimitriou, C. H. (1994). Computational Complexity. Reading, Massachusetts: Addison-Wesley.
[230] Park, S., T. Kim, Y. An and D. Won (1995). “A Provably Entrusted Undeniable Signature”, pp. 644–648. IEEE Singapore International Conference on Network/International Conference on Information Engineering (SICON/ICIE ’95).
[231] Patarin, J. (1995). “Cryptanalysis of the Matsumoto and Imai Public Key Scheme of Eurocrypt’88”, Advances in Cryptology—CRYPTO ’95, Lecture Notes in Computer Science, 963. pp. 248–261. Berlin/Heidelberg: Springer.
[232] Patarin, J. (1996). “Hidden Fields Equations (HFE) and Isomorphisms of Polynomials (IP): Two New Families of Asymmetric Algorithms”, Advances in Cryptology—EUROCRYPT ’96, Lecture Notes in Computer Science, 1070. pp. 33–48. Berlin/Heidelberg: Springer.
[233] Pirsig, R. M. (1974). Zen and the Art of Motorcycle Maintenance: An Inquiry into Values. London: Bodley Head.
[234] Pohlig, S. and M. Hellman (1978). “An Improved Algorithm for Computing Logarithms over GF (p) and its Cryptographic Significance”, IEEE Transactions on Information Theory, 24: 106–110.
[235] Pohst, M. and H. Zassenhaus (1989). Algorithmic Algebraic Number Theory, Encyclopaedia of Mathematics and Its Applications, 30. Cambridge: Cambridge University Press.
[236] Pointcheval, D. and J. Stern (1996). “Provably Secure Blind Signature Schemes”, Advances in Cryptology—ASIACRYPT ’96, Lecture Notes in Computer Science, 1163. pp. 252–265. Berlin/Heidelberg: Springer.
[237] Pointcheval, D. and J. Stern (2000). “Security Arguments for Digital Signatures and Blind Signatures”, Journal of Cryptology, 13 (3): 361–396.
[238] Pollard, J. M. (1974). “Theorems on Factorization and Primality Testing”, Proceedings of the Cambridge Philosophical Society, 76 (2): 521–528.
[239] Pollard, J. M. (1975). “A Monte Carlo Method for Factorization”, BIT, 15 (3): 331–334.
[240] Pollard, J. M. (1993). “Factoring with Cubic Integers”, in A. K. Lenstra and H. W. Lenstra (eds.), The Development of the Number Field Sieve, Lecture Notes in Mathematics, 1554. pp. 4–10. Berlin: Springer.
[241] Pomerance, C. (1985). “The Quadratic Sieve Factoring Algorithm”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 169–182. Berlin/Heidelberg: Springer.
[242] Pomerance, C. (2008). “Elementary Thoughts on Discrete Logarithms”, pp. 385–396. in J. P. Buhler and P. Stevenhagen (eds.), Surveys in Algorithmic Number Theory, Publications of the Research Institute for Mathematical Sciences, 44. New York: Cambridge University Press.
[243] Preskill, J. (1998). “Quantum Computing: Pro and Con”, Proceedings of the Royal Society of London, A454:469–486.
[244] Preskill, J. (2007), “Course Information for Quantum Computation” [online document]. Available at http://theory.caltech.edu/people/preskill/ph219/ (October 2008).
[245] Proos, J. and C. Zalka (2004), “Shor’s Discrete Logarithm Quantum Algorithm for Elliptic Curves” [online document]. Available at http://arxiv.org/abs/quant-ph/0301141 (October 2008).
[246] Rabin, M. O. (1979). “Digitalized Signatures and Public-Key Functions as Intractable as Factorization”. Technical report MIT/LCS/TR-212, MIT Laboratory for Computer Science, Massachusetts.
[247] Rabin, M. O. (1980a). “Probabilistic Algorithms in Finite Fields”, SIAM Journal of Computing, 9: 273–280.
[248] Rabin, M. O. (1980b). “Probabilistic Algorithm for Testing Primality”, Journal of Number Theory, 12: 128–138.
[249] Ram Murty, M. (2001). Problems in Analytic Number Theory. New York: Springer.
[250] Raymond, J.-F. and A. Stiglic (2000), “Security Issues in the Diffie-Hellman Key Agreement Protocol” [online document]. Available at http://crypto.cs.mcgill.ca/~stiglic/Papers/dhfull.pdf (October 2008).
[251] Ribenboim, P. (2001). Classical Theory of Algebraic Numbers. Universitext. New York: Springer.
[252] Rivest, R. L., A. Shamir, and L. M. Adleman (1978). “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems”, Communications of the ACM, 2: 120–126.
[253] Rosser, J. and J. Schoenfield (1962). “Approximate Formulas for Some Functions of Prime Numbers”, Illinois Journal of Mathematics, 6: 64–94.
[254] RSA Security Inc. (2008), “Public-Key Cryptography Standards” [online document]. Available at http://www.rsa.com/rsalabs/node.asp?id=2124 (October 2008).
[255] Sakurai, J. J. (1994). Modern Quantum Mechanics. Revised by San-Fu Tuan, Reading, Massachusetts: Addison-Wesley.
[256] Satoh, T. (2000). “The Canonical Lift of an Ordinary Elliptic Curve over a Finite Field and Its Point Counting”, Journal of Ramanujan Mathematical Society, 15: 247–270.
[257] Satoh, T. and K. Araki (1998). “Fermat Quotients and the Polynomial Time Discrete Log Algorithm for Anomalous Elliptic Curves”, Commentarii Mathematici Universitatis Sancti Pauli, 47: 81–92.
[258] Schiff, L. I. (1968). Quantum Mechanics, 3rd ed. New York: McGraw-Hill.
[259] Schindler, W., F. Koeune and J.-J. Quisquater (2001). “Unleashing the Full Power of Timing Attack”. Technical report CG-2001/3, Université Catholique de Louvain, Belgium. Available at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.6622.
[260] Schirokauer, O. (1993). “Discrete Logarithms and Local Units”, Philosophical Transactions of the Royal Society of London, Series A, 345: 409–423.
[261] Schirokauer, O., D. Weber, and T. Denny (1996). “Discrete Logarithms: The Effectiveness of the Index Calculus Method”, Algorithmic Number Theory—ANTS-II, Lecture Notes in Computer Science, 1122. pp. 337–361. Berlin/Heidelberg: Springer.
[262] * Schneier, B. (2006). Applied Cryptography, 2nd ed. New York: John Wiley & Sons.
[263] Schnorr, C. P. (1991). “Efficient Signature Generation for Smart Cards”, Journal of Cryptology, 4: 161–174.
[264] Schoof, R. (1995). “Counting Points on Elliptic Curves over Finite Fields”, Journal de Théorie des Nombres de Bourdeaux, 7: 219-254.
[265] Semaev, I. A. (1998). “Evaluation of Discrete Logarithms on Some Elliptic Curves”, Mathematics of Computation, 67: 353–356.
[266] Shamir, A. (1984). “A Polynomial-Time Algorithm for Breaking the Basic Merkle-Hellman Cryptosystem”, IEEE Transactions on Information Theory, 30: 699–704.
[267] Shamir, A. (1984). “Identity-Based Cryptosystems and Signature Schemes”, Advances in Cryptology—CRYPTO ’84, Lecture Notes in Computer Science, 196. pp. 47–53. Berlin/Heidelberg: Springer.
[268] Shamir, A. (1997). “How to Check Modular Exponentiation”, presented at the rump session of Advances in Cryptology—EUROCRYPT ’97, May.
[269] Shamir, A. (1999). “Factoring Large Numbers with the TWINKLE Device”, Cryptographic Hardware and Embedded Systems—CHES ’99, Lecture Notes in Computer Science, 1717. pp. 2–12. Berlin/Heidelberg: Springer.
[270] Shamir, A. and E. Tromer (2003). “Factoring Large Numbers with the TWIRL Device”, Advances in Cryptology—CRYPTO 2003, Lecture Notes in Computer Science, 2729. pp. 1–26. Berlin/Heidelberg: Springer.
[271] Shor, P. W. (1997). “Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer”, SIAM Journal of Computing, 26: 1484–1509.
[272] Shoup, V. (1990). “On the Deterministic Complexity of Factoring Polynomials over Finite Fields”, Information Processing Letters, 33: 261–267.
[273] Shparlinski, I. E. (1991). “On Some Problems in the Theory of Finite Fields”, Russian Mathematical Surveys, 46 (1): 199–240.
[274] Shparlinski, I. E. (1992). Computational and Algorithmic Problems in Finite Fields, Mathematics and its Applications, 88. Kluwer Academic Publishers.
[275] * Silverman, J. H. (1986). The Arithmetic of Elliptic Curves. Graduate Texts in Mathematics, 106. Berlin and New York: Springer.
[276] Silverman, J. H. (1994). Advanced Topics in the Arithmetic of Elliptic Curves. Graduate Texts in Mathematics, 151. New York: Springer.
[277] Silverman, J. H. (2000). “The Xedni Calculus and the Elliptic Curve Discrete Logarithm Problem”, Design, Codes and Cryptography, 20: 5–40.
[278] Silverman, J. H. and J. Suzuki (1998). “Elliptic Curve Discrete Logarithms and the Index Calculus”, Advances in Cryptology—ASIACRYPT ’98, Lecture Notes in Computer Science, 1514. pp. 110–125. Berlin/Heidelberg: Springer.
[279] Silverman, R. D. (1987). “The Multiple Polynomial Quadratic Sieve”, Mathematics of Computation, 48: 329–339.
[280] * Sipser, M. (1997). Introduction to the Theory of Computation, 2nd ed. Boston: PWS Publishing Company.
[281] B. Skjernaa (2003). “Satoh’s Algorithm in Characteristic 2”, Mathematics of Computation, 72: 477–487.
[282] Smart, N. P. (1999). “The Discrete Logarithm Problem on Elliptic Curves of Trace One”, Journal of Cryptology, 12: 193–196.
[283] Smart, N. P. (2002). Cryptography: An Introduction. New York: McGraw-Hill. The 2nd edition of this book is available online at http://www.cs.bris.ac.uk/~nigel/Crypto_Book/ (October 2008).
[284] Smith, P. J. (1993). “LUC Public-Key Encryption: A Secure Alternative to RSA”, Dr. Dobb’s Journal, 18 (1): 44–49.
[285] Smith, P. J. and M. J. J. Lennon (1993). “LUC: A New Public Key System”, IFIP Transactions, A 37. pp. 103–117. Proceedings of the IFIP TC11, 9th International Conference on Information Security. Computer Security. Amsterdam: North-Holland Co.
[286] Smith, P. J. and C. Skinner (1995). “A Public-Key Cryptosystem and Digital Signature System Based on the Lucas Function Analogue to Discrete Logarithms”, Advances in Cryptology—ASIACRYPT ’94, Lecture Notes in Computer Science, 917. pp. 357–364. Berlin/Heidelberg: Springer.
[287] Solovay, R. and V. Strassen (1977). “A Fast Monte Carlo Test for Primality”, SIAM Journal of Computing, 6: 84–86.
[288] * Stallings, W. (2006). Cryptography and Network Security, 4th ed. Upper Saddle River, New Jersey: Prentice-Hall.
[289] Stam, M. and A. K. Lenstra (2001). “Speeding up XTR”, Advances in Cryptology—ASIACRYPT 2001, Lecture Notes in Computer Science, 2248. pp. 125–143. Berlin/Heidelberg: Springer.
[290] Stein, A. and E. Teske (2005). “Optimized Baby Step-Giant Step Methods”, Journal of Ramanujan Mathematical Society, 20 (1): 27–58.
[291] * Stinson, D. (2005). Cryptography: Theory and Practice, 3rd ed. Boca Raton, Florida: CRC Press.
[292] Strassen, V. (1969). “Gaussian Elimination Is not Optimal”, Numerische Mathematik, 13: 354–356.
[293] Stucki, D., N. Gisin, O. Guinnard, G. Ribordy and H. Zbinden (2002). “Quantum Key Distribution over 67 km with a Plug & Play System”, New Journal of Physics, 4: 41.1–41.8.
[294] Sun, H.-M., W.-C. Yang and C.-S. Laih (1999). “On the Design of RSA with Short Secret Exponent”, Advances in Cryptology—ASIACRYPT ’99, Lecture Notes in Computer Science, 1716. pp. 150–164. Berlin/Heidelberg: Springer.
[295] Swade, D. (2000). The Cogwheel Brain: Charles Babbage and the Quest to Build the First Computer. London: Little, Brown and Company.
[296] Trappe, W. and L. C. Washington (2006). Introduction to Cryptography with Coding Theory, 2nd ed. Upper Saddle River: Prentice-Hall.
[297] Verheul, E. R. (2001). “Evidence that XTR is More Secure than Supersingular Elliptic Curve Cryptosystems”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. pp. 195–210. Berlin/Heidelberg: Springer.
[298] Washington, L. C. (2003). Elliptic Curves: Number Theory and Cryptography. Boca Raton, Florida: Chapman & Hall/CRC.
[299] Weber, D. (1996). “Computing Discrete Logarithms with the General Number Field Sieve”, Algorithmic Number Theory—ANTS-II, Lecture Notes in Computer Science, 1122. pp. 337–361. Berlin/Heidelberg: Springer.
[300] Weber, D. (1998). “Computing Discrete Logarithms with Quadratic Number Rings”, Advances in Cryptology—EUROCRYPT ’98, Lecture Notes in Computer Science, 1403. pp. 171–183. Berlin/Heidelberg: Springer.
[301] Weber, D. and T. Denny (1998). “The Solution of McCurley’s Discrete Log Challenge”, Advances in Cryptology—CRYPTO ’98, Lecture Notes in Computer Science, 1462. pp. 458–471. Berlin/Heidelberg: Springer.
[302] Western, A. E. and J. C. P. Miller (1968). “Tables of Indices and Primitive Roots”, Royal Mathematical Tables, 9, Cambridge: Cambridge University Press.
[303] Wiedemann, D. H. (1986). “Solving Sparse Linear Equations over Finite Fields”, IEEE Transactions on Information Theory, 32: 54–62.
[304] Wiener, M. J. (1990). “Cryptanalysis of Short RSA Secret Exponents”, IEEE Transactions on Information Theory, 36: 553–558.
[305] Williams, H. C. (1982). “A p + 1 Method for Factoring”, Mathematics of Computation, 39 (159): 225–234.
[306] Yang, L. T. and R. P. Brent (2001). “The Parallel Improved Lanczos Method for Integer Factorization over Finite Fields for Public Key Cryptosystems”, pp. 106–114. Proceedings of the ICPP Workshops 2001, Valencia, Spain, 3–7 September.
[307] Young, A. and M. Yung (1996). “The Dark Side of “Black-Box” Cryptography, or: Should We Trust Capstone?”, Advances in Cryptology—CRYPTO ’96, Lecture Notes in Computer Science, 1109. pp. 89–103. Berlin/Heidelberg: Springer.
[308] Young, A. and M. Yung (1997a). “Kleptography: Using Cryptography Against Cryptography”, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 62–74. Berlin/Heidelberg: Springer.
[309] Young, A. and M. Yung (1997b). “The Prevalence of Kleptographic Attacks on Discrete-Log Based Cryptosystems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 264–276. Berlin/Heidelberg: Springer.
[310] Zheng, Y. (1997). “Digital Signcryption or How to Achieve Cost(Signature & Encryption) << Cost(Signature) + Cost(Encryption)”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 165–179. Berlin/Heidelberg: Springer.
[311] Zheng, Y. (1998a). “Signcryption and Its Applications in Efficient Public Key Solutions”, 1997 Information Security Workshop ISW ’97, Lecture Notes in Computer Science, 1397. pp. 291–312. Berlin/Heidelberg: Springer.
[312] Zheng, Y. (1998b). “Shortened Digital Signature, Signcryption, and Compact and Unforgeable Key Agreement Schemes”, contribution to IEEE P1363 Standard for Public Key Cryptography.
[313] Zheng, Y. and H. Imai (1998a). “Efficient Signcryption Schemes on Elliptic Curves”. Proceedings of the IFIP 14th International Information Security Conference IFIP/SEC ’98, Vienna, Austria, September 1998. Chapman & Hall.
[314] Zheng, Y. and H. Imai (1998b). “How to Construct Efficient Signcryption Schemes on Elliptic Curves”, Information Processing Letters, 68: 227–233.
[315] Zheng, Y. and T. Matsumoto (1996). “Breaking Smartcard Implementations of ElGamal Signatures and Its Variants”, presented at the rump session of Advances in Cryptology—ASIACRYPT ’96. Available at http://www.sis.uncc.edu/~yzheng/publications/ (October 2008).
[316] * Zuckerman, H. S., H. L. Montgomery, I. M. Niven and A. Niven (1991). An Introduction to the Theory of Numbers. New York: John Wiley & Sons.
Books marked by stars have Asian editions (at the time of writing this book).